[2025-01-20 23:01:54,889 I 13636 13636] (raylet) main.cc:180: Setting cluster ID to: 7cd267f3e9ae7d263565d743c4adec3db9f4c00e2d94b48fe89053a6 [2025-01-20 23:01:54,898 I 13636 13636] (raylet) main.cc:289: Raylet is not set to kill unknown children. [2025-01-20 23:01:54,898 I 13636 13636] (raylet) io_service_pool.cc:35: IOServicePool is running with 1 io_service. [2025-01-20 23:01:54,899 I 13636 13636] (raylet) main.cc:419: Setting node ID node_id=94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3 [2025-01-20 23:01:54,899 I 13636 13636] (raylet) store_runner.cc:32: Allowing the Plasma store to use up to 2.14748GB of memory. [2025-01-20 23:01:54,900 I 13636 13636] (raylet) store_runner.cc:48: Starting object store with directory /dev/shm, fallback /tmp/ray, and huge page support disabled [2025-01-20 23:01:54,900 I 13636 13664] (raylet) dlmalloc.cc:154: create_and_mmap_buffer(2147483656, /dev/shm/plasmaXXXXXX) [2025-01-20 23:01:54,902 I 13636 13664] (raylet) store.cc:564: Plasma store debug dump: Current usage: 0 / 2.14748 GB - num bytes created total: 0 0 pending objects of total size 0MB - objects spillable: 0 - bytes spillable: 0 - objects unsealed: 0 - bytes unsealed: 0 - objects in use: 0 - bytes in use: 0 - objects evictable: 0 - bytes evictable: 0 - objects created by worker: 0 - bytes created by worker: 0 - objects restored: 0 - bytes restored: 0 - objects received: 0 - bytes received: 0 - objects errored: 0 - bytes errored: 0 [2025-01-20 23:01:55,906 I 13636 13636] (raylet) grpc_server.cc:134: ObjectManager server started, listening on port 43469. [2025-01-20 23:01:55,910 I 13636 13636] (raylet) worker_killing_policy.cc:101: Running GroupByOwner policy. [2025-01-20 23:01:55,910 I 13636 13636] (raylet) memory_monitor.cc:47: MemoryMonitor initialized with usage threshold at 94999994368 bytes (0.95 system memory), total system memory bytes: 99999997952 [2025-01-20 23:01:55,910 I 13636 13636] (raylet) node_manager.cc:287: Initializing NodeManager node_id=94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3 [2025-01-20 23:01:55,912 I 13636 13636] (raylet) grpc_server.cc:134: NodeManager server started, listening on port 34453. [2025-01-20 23:01:55,919 I 13636 13730] (raylet) agent_manager.cc:77: Monitor agent process with name dashboard_agent/424238335 [2025-01-20 23:01:55,920 I 13636 13732] (raylet) agent_manager.cc:77: Monitor agent process with name runtime_env_agent [2025-01-20 23:01:55,920 I 13636 13636] (raylet) event.cc:493: Ray Event initialized for RAYLET [2025-01-20 23:01:55,920 I 13636 13636] (raylet) event.cc:324: Set ray event level to warning [2025-01-20 23:01:55,922 I 13636 13636] (raylet) raylet.cc:134: Raylet of id, 94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3 started. Raylet consists of node_manager and object_manager. node_manager address: 192.168.0.2:34453 object_manager address: 192.168.0.2:43469 hostname: 0cd925b1f73b [2025-01-20 23:01:55,925 I 13636 13636] (raylet) node_manager.cc:525: [state-dump] NodeManager: [state-dump] Node ID: 94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3 [state-dump] Node name: 192.168.0.2 [state-dump] InitialConfigResources: {object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, GPU: 20000, node:__internal_head__: 10000, accelerator_type:A40: 10000, memory: 844966629380000} [state-dump] ClusterTaskManager: [state-dump] ========== Node: 94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3 ================= [state-dump] Infeasible queue length: 0 [state-dump] Schedule queue length: 0 [state-dump] Dispatch queue length: 0 [state-dump] num_waiting_for_resource: 0 [state-dump] num_waiting_for_plasma_memory: 0 [state-dump] num_waiting_for_remote_node_resources: 0 [state-dump] num_worker_not_started_by_job_config_not_exist: 0 [state-dump] num_worker_not_started_by_registration_timeout: 0 [state-dump] num_tasks_waiting_for_workers: 0 [state-dump] num_cancelled_tasks: 0 [state-dump] cluster_resource_scheduler state: [state-dump] Local id: -2729954820020547849 Local resources: {"total":{CPU: [200000], memory: [844966629380000], accelerator_type:A40: [10000], node:192.168.0.2: [10000], node:__internal_head__: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000]}}, "available": {CPU: [200000], memory: [844966629380000], accelerator_type:A40: [10000], node:192.168.0.2: [10000], node:__internal_head__: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000]}}, "labels":{"ray.io/node_id":"94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3",} is_draining: 0 is_idle: 1 Cluster resources: node id: -2729954820020547849{"total":{accelerator_type:A40: 10000, node:192.168.0.2: 10000, CPU: 200000, GPU: 20000, object_store_memory: 21474836480000, memory: 844966629380000, node:__internal_head__: 10000}}, "available": {accelerator_type:A40: 10000, node:192.168.0.2: 10000, CPU: 200000, GPU: 20000, object_store_memory: 21474836480000, memory: 844966629380000, node:__internal_head__: 10000}}, "labels":{"ray.io/node_id":"94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} [state-dump] Waiting tasks size: 0 [state-dump] Number of executing tasks: 0 [state-dump] Number of pinned task arguments: 0 [state-dump] Number of total spilled tasks: 0 [state-dump] Number of spilled waiting tasks: 0 [state-dump] Number of spilled unschedulable tasks: 0 [state-dump] Resource usage { [state-dump] } [state-dump] Backlog Size per scheduling descriptor :{workerId: num backlogs}: [state-dump] [state-dump] Running tasks by scheduling class: [state-dump] ================================================== [state-dump] [state-dump] ClusterResources: [state-dump] LocalObjectManager: [state-dump] - num pinned objects: 0 [state-dump] - pinned objects size: 0 [state-dump] - num objects pending restore: 0 [state-dump] - num objects pending spill: 0 [state-dump] - num bytes pending spill: 0 [state-dump] - num bytes currently spilled: 0 [state-dump] - cumulative spill requests: 0 [state-dump] - cumulative restore requests: 0 [state-dump] - spilled objects pending delete: 0 [state-dump] [state-dump] ObjectManager: [state-dump] - num local objects: 0 [state-dump] - num unfulfilled push requests: 0 [state-dump] - num object pull requests: 0 [state-dump] - num chunks received total: 0 [state-dump] - num chunks received failed (all): 0 [state-dump] - num chunks received failed / cancelled: 0 [state-dump] - num chunks received failed / plasma error: 0 [state-dump] Event stats: [state-dump] Global stats: 0 total (0 active) [state-dump] Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Execution time: mean = -nan s, total = 0.000 s [state-dump] Event stats: [state-dump] PushManager: [state-dump] - num pushes in flight: 0 [state-dump] - num chunks in flight: 0 [state-dump] - num chunks remaining: 0 [state-dump] - max chunks allowed: 409 [state-dump] OwnershipBasedObjectDirectory: [state-dump] - num listeners: 0 [state-dump] - cumulative location updates: 0 [state-dump] - num location updates per second: 70307155418844000.000 [state-dump] - num location lookups per second: 70307155418832000.000 [state-dump] - num locations added per second: 0.000 [state-dump] - num locations removed per second: 0.000 [state-dump] BufferPool: [state-dump] - create buffer state map size: 0 [state-dump] PullManager: [state-dump] - num bytes available for pulled objects: 2147483648 [state-dump] - num bytes being pulled (all): 0 [state-dump] - num bytes being pulled / pinned: 0 [state-dump] - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - first get request bundle: N/A [state-dump] - first wait request bundle: N/A [state-dump] - first task request bundle: N/A [state-dump] - num objects queued: 0 [state-dump] - num objects actively pulled (all): 0 [state-dump] - num objects actively pulled / pinned: 0 [state-dump] - num bundles being pulled: 0 [state-dump] - num pull retries: 0 [state-dump] - max timeout seconds: 0 [state-dump] - max timeout request is already processed. No entry. [state-dump] [state-dump] WorkerPool: [state-dump] - registered jobs: 0 [state-dump] - process_failed_job_config_missing: 0 [state-dump] - process_failed_rate_limited: 0 [state-dump] - process_failed_pending_registration: 0 [state-dump] - process_failed_runtime_env_setup_failed: 0 [state-dump] - num PYTHON workers: 0 [state-dump] - num PYTHON drivers: 0 [state-dump] - num PYTHON pending start requests: 0 [state-dump] - num PYTHON pending registration requests: 0 [state-dump] - num object spill callbacks queued: 0 [state-dump] - num object restore queued: 0 [state-dump] - num util functions queued: 0 [state-dump] - num idle workers: 0 [state-dump] TaskDependencyManager: [state-dump] - task deps map size: 0 [state-dump] - get req map size: 0 [state-dump] - wait req map size: 0 [state-dump] - local objects map size: 0 [state-dump] WaitManager: [state-dump] - num active wait requests: 0 [state-dump] Subscriber: [state-dump] Channel WORKER_OBJECT_LOCATIONS_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_OBJECT_EVICTION [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_REF_REMOVED_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] num async plasma notifications: 0 [state-dump] Remote node managers: [state-dump] Event stats: [state-dump] Global stats: 28 total (13 active) [state-dump] Queueing time: mean = 1.299 ms, max = 9.915 ms, min = 28.359 us, total = 36.378 ms [state-dump] Execution time: mean = 36.773 ms, total = 1.030 s [state-dump] Event stats: [state-dump] PeriodicalRunner.RunFnPeriodically - 11 total (2 active, 1 running), Execution time: mean = 173.472 us, total = 1.908 ms, Queueing time: mean = 3.282 ms, max = 9.915 ms, min = 28.359 us, total = 36.100 ms [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 1 total (1 active), Execution time: mean = 0.000 s, total = 0.000 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] RayletWorkerPool.deadline_timer.kill_idle_workers - 1 total (1 active), Execution time: mean = 0.000 s, total = 0.000 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 1 total (1 active), Execution time: mean = 0.000 s, total = 0.000 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.deadline_timer.flush_free_objects - 1 total (1 active), Execution time: mean = 0.000 s, total = 0.000 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.deadline_timer.debug_state_dump - 1 total (1 active), Execution time: mean = 0.000 s, total = 0.000 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ObjectManager.UpdateAvailableMemory - 1 total (0 active), Execution time: mean = 2.252 us, total = 2.252 us, Queueing time: mean = 28.471 us, max = 28.471 us, min = 28.471 us, total = 28.471 us [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 2.002 ms, total = 2.002 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.ScheduleAndDispatchTasks - 1 total (1 active), Execution time: mean = 0.000 s, total = 0.000 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ClusterResourceManager.ResetRemoteNodeView - 1 total (1 active), Execution time: mean = 0.000 s, total = 0.000 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 1 total (0 active), Execution time: mean = 1.303 ms, total = 1.303 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 1.022 s, total = 1.022 s, Queueing time: mean = 126.779 us, max = 126.779 us, min = 126.779 us, total = 126.779 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 2.178 ms, total = 2.178 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 288.995 us, total = 288.995 us, Queueing time: mean = 122.868 us, max = 122.868 us, min = 122.868 us, total = 122.868 us [state-dump] NodeManager.deadline_timer.record_metrics - 1 total (1 active), Execution time: mean = 0.000 s, total = 0.000 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 1 total (1 active), Execution time: mean = 0.000 s, total = 0.000 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.deadline_timer.spill_objects_when_over_threshold - 1 total (1 active), Execution time: mean = 0.000 s, total = 0.000 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.GCTaskFailureReason - 1 total (1 active), Execution time: mean = 0.000 s, total = 0.000 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] DebugString() time ms: 0 [state-dump] [state-dump] [2025-01-20 23:01:55,927 I 13636 13636] (raylet) accessor.cc:762: Received notification for node, IsAlive = 1 node_id=94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3 [2025-01-20 23:01:56,011 I 13636 13636] (raylet) worker_pool.cc:501: Started worker process with pid 13768, the token is 0 [2025-01-20 23:01:56,015 I 13636 13636] (raylet) worker_pool.cc:501: Started worker process with pid 13769, the token is 1 [2025-01-20 23:01:56,017 I 13636 13636] (raylet) worker_pool.cc:501: Started worker process with pid 13770, the token is 2 [2025-01-20 23:01:56,019 I 13636 13636] (raylet) worker_pool.cc:501: Started worker process with pid 13771, the token is 3 [2025-01-20 23:01:56,021 I 13636 13636] (raylet) worker_pool.cc:501: Started worker process with pid 13772, the token is 4 [2025-01-20 23:01:56,023 I 13636 13636] (raylet) worker_pool.cc:501: Started worker process with pid 13773, the token is 5 [2025-01-20 23:01:56,025 I 13636 13636] (raylet) worker_pool.cc:501: Started worker process with pid 13774, the token is 6 [2025-01-20 23:01:56,027 I 13636 13636] (raylet) worker_pool.cc:501: Started worker process with pid 13775, the token is 7 [2025-01-20 23:01:56,029 I 13636 13636] (raylet) worker_pool.cc:501: Started worker process with pid 13776, the token is 8 [2025-01-20 23:01:56,031 I 13636 13636] (raylet) worker_pool.cc:501: Started worker process with pid 13777, the token is 9 [2025-01-20 23:01:56,033 I 13636 13636] (raylet) worker_pool.cc:501: Started worker process with pid 13778, the token is 10 [2025-01-20 23:01:56,035 I 13636 13636] (raylet) worker_pool.cc:501: Started worker process with pid 13779, the token is 11 [2025-01-20 23:01:56,037 I 13636 13636] (raylet) worker_pool.cc:501: Started worker process with pid 13780, the token is 12 [2025-01-20 23:01:56,039 I 13636 13636] (raylet) worker_pool.cc:501: Started worker process with pid 13781, the token is 13 [2025-01-20 23:01:56,041 I 13636 13636] (raylet) worker_pool.cc:501: Started worker process with pid 13782, the token is 14 [2025-01-20 23:01:56,043 I 13636 13636] (raylet) worker_pool.cc:501: Started worker process with pid 13783, the token is 15 [2025-01-20 23:01:56,045 I 13636 13636] (raylet) worker_pool.cc:501: Started worker process with pid 13784, the token is 16 [2025-01-20 23:01:56,047 I 13636 13636] (raylet) worker_pool.cc:501: Started worker process with pid 13785, the token is 17 [2025-01-20 23:01:56,050 I 13636 13636] (raylet) worker_pool.cc:501: Started worker process with pid 13786, the token is 18 [2025-01-20 23:01:56,053 I 13636 13636] (raylet) worker_pool.cc:501: Started worker process with pid 13787, the token is 19 [2025-01-20 23:01:56,750 I 13636 13664] (raylet) object_store.cc:35: Object store current usage 8e-09 / 2.14748 GB. [2025-01-20 23:01:56,927 I 13636 13636] (raylet) worker_pool.cc:692: Job 01000000 already started in worker pool. [2025-01-20 23:02:04,914 W 13636 13658] (raylet) metric_exporter.cc:105: [1] Export metrics to agent failed: RpcError: RPC Error message: failed to connect to all addresses; last error: UNKNOWN: ipv4:127.0.0.1:64875: Failed to connect to remote host: Connection refused; RPC Error details: . This won't affect Ray, but you can lose metrics from the cluster. [2025-01-20 23:02:54,903 I 13636 13664] (raylet) store.cc:564: Plasma store debug dump: Current usage: 0 / 2.14748 GB - num bytes created total: 168 0 pending objects of total size 0MB - objects spillable: 0 - bytes spillable: 0 - objects unsealed: 0 - bytes unsealed: 0 - objects in use: 0 - bytes in use: 0 - objects evictable: 0 - bytes evictable: 0 - objects created by worker: 0 - bytes created by worker: 0 - objects restored: 0 - bytes restored: 0 - objects received: 0 - bytes received: 0 - objects errored: 0 - bytes errored: 0 [2025-01-20 23:02:55,928 I 13636 13636] (raylet) node_manager.cc:525: [state-dump] NodeManager: [state-dump] Node ID: 94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3 [state-dump] Node name: 192.168.0.2 [state-dump] InitialConfigResources: {object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, GPU: 20000, node:__internal_head__: 10000, accelerator_type:A40: 10000, memory: 844966629380000} [state-dump] ClusterTaskManager: [state-dump] ========== Node: 94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3 ================= [state-dump] Infeasible queue length: 0 [state-dump] Schedule queue length: 0 [state-dump] Dispatch queue length: 0 [state-dump] num_waiting_for_resource: 0 [state-dump] num_waiting_for_plasma_memory: 0 [state-dump] num_waiting_for_remote_node_resources: 0 [state-dump] num_worker_not_started_by_job_config_not_exist: 0 [state-dump] num_worker_not_started_by_registration_timeout: 0 [state-dump] num_tasks_waiting_for_workers: 0 [state-dump] num_cancelled_tasks: 0 [state-dump] cluster_resource_scheduler state: [state-dump] Local id: -2729954820020547849 Local resources: {"total":{CPU: [200000], memory: [844966629380000], accelerator_type:A40: [10000], node:192.168.0.2: [10000], node:__internal_head__: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000]}}, "available": {CPU: [190000], memory: [844966629380000], accelerator_type:A40: [10000], node:192.168.0.2: [10000], node:__internal_head__: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000]}}, "labels":{"ray.io/node_id":"94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3",} is_draining: 0 is_idle: 0 Cluster resources: node id: -2729954820020547849{"total":{accelerator_type:A40: 10000, GPU: 20000, object_store_memory: 21474836480000, CPU: 200000, memory: 844966629380000, node:__internal_head__: 10000, node:192.168.0.2: 10000}}, "available": {node:__internal_head__: 10000, GPU: 20000, object_store_memory: 21474836480000, CPU: 190000, memory: 844966629380000, node:192.168.0.2: 10000, accelerator_type:A40: 10000}}, "labels":{"ray.io/node_id":"94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} [state-dump] Waiting tasks size: 0 [state-dump] Number of executing tasks: 1 [state-dump] Number of pinned task arguments: 0 [state-dump] Number of total spilled tasks: 0 [state-dump] Number of spilled waiting tasks: 0 [state-dump] Number of spilled unschedulable tasks: 0 [state-dump] Resource usage { [state-dump] - (language=PYTHON actor_or_task=process_single_file pid=13775 worker_id=f4cba37f04bbcfd7d5921e503f687e65f311e73c798427b4edd6e4bc): {CPU: 10000} [state-dump] } [state-dump] Backlog Size per scheduling descriptor :{workerId: num backlogs}: [state-dump] [state-dump] Running tasks by scheduling class: [state-dump] - {depth=1 function_descriptor={type=PythonFunctionDescriptor, module_name=__main__, class_name=, function_name=process_single_file, function_hash=e8eb034cf2de482d93ec82443b81bbe5} scheduling_strategy=default_scheduling_strategy { [state-dump] } [state-dump] resource_set={CPU : 1, }}: 1/20 [state-dump] ================================================== [state-dump] [state-dump] ClusterResources: [state-dump] LocalObjectManager: [state-dump] - num pinned objects: 0 [state-dump] - pinned objects size: 0 [state-dump] - num objects pending restore: 0 [state-dump] - num objects pending spill: 0 [state-dump] - num bytes pending spill: 0 [state-dump] - num bytes currently spilled: 0 [state-dump] - cumulative spill requests: 0 [state-dump] - cumulative restore requests: 0 [state-dump] - spilled objects pending delete: 0 [state-dump] [state-dump] ObjectManager: [state-dump] - num local objects: 0 [state-dump] - num unfulfilled push requests: 0 [state-dump] - num object pull requests: 0 [state-dump] - num chunks received total: 0 [state-dump] - num chunks received failed (all): 0 [state-dump] - num chunks received failed / cancelled: 0 [state-dump] - num chunks received failed / plasma error: 0 [state-dump] Event stats: [state-dump] Global stats: 0 total (0 active) [state-dump] Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Execution time: mean = -nan s, total = 0.000 s [state-dump] Event stats: [state-dump] PushManager: [state-dump] - num pushes in flight: 0 [state-dump] - num chunks in flight: 0 [state-dump] - num chunks remaining: 0 [state-dump] - max chunks allowed: 409 [state-dump] OwnershipBasedObjectDirectory: [state-dump] - num listeners: 0 [state-dump] - cumulative location updates: 0 [state-dump] - num location updates per second: 0.000 [state-dump] - num location lookups per second: 0.000 [state-dump] - num locations added per second: 0.000 [state-dump] - num locations removed per second: 0.000 [state-dump] BufferPool: [state-dump] - create buffer state map size: 0 [state-dump] PullManager: [state-dump] - num bytes available for pulled objects: 2147483648 [state-dump] - num bytes being pulled (all): 0 [state-dump] - num bytes being pulled / pinned: 0 [state-dump] - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - first get request bundle: N/A [state-dump] - first wait request bundle: N/A [state-dump] - first task request bundle: N/A [state-dump] - num objects queued: 0 [state-dump] - num objects actively pulled (all): 0 [state-dump] - num objects actively pulled / pinned: 0 [state-dump] - num bundles being pulled: 0 [state-dump] - num pull retries: 0 [state-dump] - max timeout seconds: 0 [state-dump] - max timeout request is already processed. No entry. [state-dump] [state-dump] WorkerPool: [state-dump] - registered jobs: 1 [state-dump] - process_failed_job_config_missing: 0 [state-dump] - process_failed_rate_limited: 0 [state-dump] - process_failed_pending_registration: 0 [state-dump] - process_failed_runtime_env_setup_failed: 0 [state-dump] - num PYTHON workers: 20 [state-dump] - num PYTHON drivers: 1 [state-dump] - num PYTHON pending start requests: 0 [state-dump] - num PYTHON pending registration requests: 0 [state-dump] - num object spill callbacks queued: 0 [state-dump] - num object restore queued: 0 [state-dump] - num util functions queued: 0 [state-dump] - num idle workers: 19 [state-dump] TaskDependencyManager: [state-dump] - task deps map size: 0 [state-dump] - get req map size: 0 [state-dump] - wait req map size: 0 [state-dump] - local objects map size: 0 [state-dump] WaitManager: [state-dump] - num active wait requests: 0 [state-dump] Subscriber: [state-dump] Channel WORKER_OBJECT_LOCATIONS_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_OBJECT_EVICTION [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_REF_REMOVED_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] num async plasma notifications: 0 [state-dump] Remote node managers: [state-dump] Event stats: [state-dump] Global stats: 5609 total (35 active) [state-dump] Queueing time: mean = 624.517 us, max = 871.816 ms, min = 75.000 ns, total = 3.503 s [state-dump] Execution time: mean = 556.376 us, total = 3.121 s [state-dump] Event stats: [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog - 1258 total (0 active), Execution time: mean = 558.162 us, total = 702.167 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 1258 total (0 active), Execution time: mean = 44.830 us, total = 56.396 ms, Queueing time: mean = 110.794 us, max = 291.782 us, min = 12.669 us, total = 139.378 ms [state-dump] RaySyncer.OnDemandBroadcasting - 600 total (1 active), Execution time: mean = 17.260 us, total = 10.356 ms, Queueing time: mean = 135.534 us, max = 25.869 ms, min = 11.266 us, total = 81.321 ms [state-dump] NodeManager.CheckGC - 600 total (1 active), Execution time: mean = 3.123 us, total = 1.874 ms, Queueing time: mean = 148.796 us, max = 25.875 ms, min = 10.923 us, total = 89.278 ms [state-dump] ObjectManager.UpdateAvailableMemory - 600 total (0 active), Execution time: mean = 6.655 us, total = 3.993 ms, Queueing time: mean = 113.930 us, max = 425.405 us, min = 5.019 us, total = 68.358 ms [state-dump] RayletWorkerPool.deadline_timer.kill_idle_workers - 300 total (1 active), Execution time: mean = 20.804 us, total = 6.241 ms, Queueing time: mean = 175.261 us, max = 26.386 ms, min = 13.416 us, total = 52.578 ms [state-dump] MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 240 total (1 active), Execution time: mean = 468.632 us, total = 112.472 ms, Queueing time: mean = 77.819 us, max = 255.439 us, min = 20.261 us, total = 18.676 ms [state-dump] ClientConnection.async_read.ProcessMessageHeader - 94 total (21 active), Execution time: mean = 7.559 us, total = 710.561 us, Queueing time: mean = 31.393 ms, max = 871.816 ms, min = 29.537 us, total = 2.951 s [state-dump] ClientConnection.async_read.ProcessMessage - 73 total (0 active), Execution time: mean = 762.375 us, total = 55.653 ms, Queueing time: mean = 18.964 us, max = 247.615 us, min = 4.453 us, total = 1.384 ms [state-dump] NodeManager.ScheduleAndDispatchTasks - 60 total (1 active), Execution time: mean = 17.442 us, total = 1.047 ms, Queueing time: mean = 69.959 us, max = 151.945 us, min = 10.666 us, total = 4.198 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad - 60 total (0 active), Execution time: mean = 664.220 us, total = 39.853 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.deadline_timer.flush_free_objects - 60 total (1 active), Execution time: mean = 8.787 us, total = 527.232 us, Queueing time: mean = 174.246 us, max = 1.293 ms, min = 10.538 us, total = 10.455 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 60 total (0 active), Execution time: mean = 122.526 us, total = 7.352 ms, Queueing time: mean = 116.991 us, max = 209.375 us, min = 25.598 us, total = 7.019 ms [state-dump] NodeManager.deadline_timer.spill_objects_when_over_threshold - 60 total (1 active), Execution time: mean = 3.181 us, total = 190.863 us, Queueing time: mean = 178.337 us, max = 1.288 ms, min = 5.573 us, total = 10.700 ms [state-dump] ClientConnection.async_write.DoAsyncWrites - 22 total (0 active), Execution time: mean = 2.129 us, total = 46.834 us, Queueing time: mean = 43.475 us, max = 264.443 us, min = 17.727 us, total = 956.459 us [state-dump] ClusterResourceManager.ResetRemoteNodeView - 21 total (1 active), Execution time: mean = 8.763 us, total = 184.023 us, Queueing time: mean = 69.945 us, max = 178.398 us, min = 33.100 us, total = 1.469 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig - 21 total (0 active), Execution time: mean = 800.451 us, total = 16.809 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ObjectManager.ObjectAdded - 21 total (0 active), Execution time: mean = 13.522 us, total = 283.961 us, Queueing time: mean = 115.200 us, max = 203.067 us, min = 15.853 us, total = 2.419 ms [state-dump] ObjectManager.ObjectDeleted - 21 total (0 active), Execution time: mean = 18.654 us, total = 391.728 us, Queueing time: mean = 66.932 us, max = 269.040 us, min = 24.280 us, total = 1.406 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 21 total (0 active), Execution time: mean = 119.783 us, total = 2.515 ms, Queueing time: mean = 119.709 us, max = 193.975 us, min = 26.529 us, total = 2.514 ms [state-dump] PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 204.482 us, total = 2.658 ms, Queueing time: mean = 3.109 ms, max = 9.915 ms, min = 28.359 us, total = 40.413 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 12 total (0 active), Execution time: mean = 1.524 ms, total = 18.288 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.deadline_timer.record_metrics - 12 total (1 active), Execution time: mean = 529.166 us, total = 6.350 ms, Queueing time: mean = 329.140 us, max = 1.047 ms, min = 11.295 us, total = 3.950 ms [state-dump] NodeManager.GcsCheckAlive - 12 total (1 active), Execution time: mean = 263.721 us, total = 3.165 ms, Queueing time: mean = 566.002 us, max = 1.280 ms, min = 34.942 us, total = 6.792 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 12 total (0 active), Execution time: mean = 53.518 us, total = 642.214 us, Queueing time: mean = 116.418 us, max = 167.932 us, min = 30.456 us, total = 1.397 ms [state-dump] - 11 total (0 active), Execution time: mean = 1.129 us, total = 12.419 us, Queueing time: mean = 115.521 us, max = 207.332 us, min = 32.817 us, total = 1.271 ms [state-dump] RaySyncer.BroadcastMessage - 11 total (0 active), Execution time: mean = 231.476 us, total = 2.546 ms, Queueing time: mean = 819.182 ns, max = 1.330 us, min = 180.000 ns, total = 9.011 us [state-dump] WorkerPool.PopWorkerCallback - 10 total (0 active), Execution time: mean = 24.488 us, total = 244.878 us, Queueing time: mean = 114.997 us, max = 206.869 us, min = 23.653 us, total = 1.150 ms [state-dump] NodeManagerService.grpc_server.RequestWorkerLease.HandleRequestImpl - 10 total (0 active), Execution time: mean = 133.400 us, total = 1.334 ms, Queueing time: mean = 133.654 us, max = 258.016 us, min = 50.206 us, total = 1.337 ms [state-dump] NodeManagerService.grpc_server.RequestWorkerLease - 10 total (0 active), Execution time: mean = 967.372 us, total = 9.674 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReturnWorker - 9 total (0 active), Execution time: mean = 712.218 us, total = 6.410 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReturnWorker.HandleRequestImpl - 9 total (0 active), Execution time: mean = 132.680 us, total = 1.194 ms, Queueing time: mean = 118.144 us, max = 137.712 us, min = 107.146 us, total = 1.063 ms [state-dump] NodeManager.deadline_timer.debug_state_dump - 6 total (1 active), Execution time: mean = 1.557 ms, total = 9.344 ms, Queueing time: mean = 47.845 us, max = 73.081 us, min = 18.059 us, total = 287.072 us [state-dump] RaySyncerRegister - 2 total (0 active), Execution time: mean = 2.053 us, total = 4.106 us, Queueing time: mean = 222.000 ns, max = 369.000 ns, min = 75.000 ns, total = 444.000 ns [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 2 total (1 active), Execution time: mean = 502.391 ms, total = 1.005 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 142.119 us, total = 284.237 us, Queueing time: mean = 702.200 us, max = 1.287 ms, min = 117.240 us, total = 1.404 ms [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 1.320 ms, total = 2.640 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.deadline_timer.print_event_loop_stats - 1 total (1 active, 1 running), Execution time: mean = 0.000 s, total = 0.000 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 2.002 ms, total = 2.002 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 1 total (0 active), Execution time: mean = 235.496 us, total = 235.496 us, Queueing time: mean = 20.299 us, max = 20.299 us, min = 20.299 us, total = 20.299 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 43.810 us, total = 43.810 us, Queueing time: mean = 116.080 us, max = 116.080 us, min = 116.080 us, total = 116.080 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 2.178 ms, total = 2.178 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 1.022 s, total = 1.022 s, Queueing time: mean = 126.779 us, max = 126.779 us, min = 126.779 us, total = 126.779 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 1.278 ms, total = 1.278 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 1.935 ms, total = 1.935 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 242.297 us, total = 242.297 us, Queueing time: mean = 122.977 us, max = 122.977 us, min = 122.977 us, total = 122.977 us [state-dump] NodeManager.GCTaskFailureReason - 1 total (1 active), Execution time: mean = 0.000 s, total = 0.000 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 74.750 us, total = 74.750 us, Queueing time: mean = 225.429 us, max = 225.429 us, min = 225.429 us, total = 225.429 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 1.746 ms, total = 1.746 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 288.995 us, total = 288.995 us, Queueing time: mean = 122.868 us, max = 122.868 us, min = 122.868 us, total = 122.868 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 92.080 us, total = 92.080 us, Queueing time: mean = 36.492 us, max = 36.492 us, min = 36.492 us, total = 36.492 us [state-dump] DebugString() time ms: 2 [state-dump] [state-dump] [2025-01-20 23:03:54,903 I 13636 13664] (raylet) store.cc:564: Plasma store debug dump: Current usage: 0 / 2.14748 GB - num bytes created total: 168 0 pending objects of total size 0MB - objects spillable: 0 - bytes spillable: 0 - objects unsealed: 0 - bytes unsealed: 0 - objects in use: 0 - bytes in use: 0 - objects evictable: 0 - bytes evictable: 0 - objects created by worker: 0 - bytes created by worker: 0 - objects restored: 0 - bytes restored: 0 - objects received: 0 - bytes received: 0 - objects errored: 0 - bytes errored: 0 [2025-01-20 23:03:55,931 I 13636 13636] (raylet) node_manager.cc:525: [state-dump] NodeManager: [state-dump] Node ID: 94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3 [state-dump] Node name: 192.168.0.2 [state-dump] InitialConfigResources: {object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, GPU: 20000, node:__internal_head__: 10000, accelerator_type:A40: 10000, memory: 844966629380000} [state-dump] ClusterTaskManager: [state-dump] ========== Node: 94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3 ================= [state-dump] Infeasible queue length: 0 [state-dump] Schedule queue length: 0 [state-dump] Dispatch queue length: 0 [state-dump] num_waiting_for_resource: 0 [state-dump] num_waiting_for_plasma_memory: 0 [state-dump] num_waiting_for_remote_node_resources: 0 [state-dump] num_worker_not_started_by_job_config_not_exist: 0 [state-dump] num_worker_not_started_by_registration_timeout: 0 [state-dump] num_tasks_waiting_for_workers: 0 [state-dump] num_cancelled_tasks: 0 [state-dump] cluster_resource_scheduler state: [state-dump] Local id: -2729954820020547849 Local resources: {"total":{CPU: [200000], memory: [844966629380000], accelerator_type:A40: [10000], node:192.168.0.2: [10000], node:__internal_head__: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000]}}, "available": {CPU: [200000], memory: [844966629380000], accelerator_type:A40: [10000], node:192.168.0.2: [10000], node:__internal_head__: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000]}}, "labels":{"ray.io/node_id":"94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3",} is_draining: 0 is_idle: 1 Cluster resources: node id: -2729954820020547849{"total":{object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, GPU: 20000, node:__internal_head__: 10000, accelerator_type:A40: 10000, memory: 844966629380000}}, "available": {GPU: 20000, node:__internal_head__: 10000, memory: 844966629380000, accelerator_type:A40: 10000, object_store_memory: 21474836480000, node:192.168.0.2: 10000, CPU: 200000}}, "labels":{"ray.io/node_id":"94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} [state-dump] Waiting tasks size: 0 [state-dump] Number of executing tasks: 0 [state-dump] Number of pinned task arguments: 0 [state-dump] Number of total spilled tasks: 0 [state-dump] Number of spilled waiting tasks: 0 [state-dump] Number of spilled unschedulable tasks: 0 [state-dump] Resource usage { [state-dump] } [state-dump] Backlog Size per scheduling descriptor :{workerId: num backlogs}: [state-dump] [state-dump] Running tasks by scheduling class: [state-dump] ================================================== [state-dump] [state-dump] ClusterResources: [state-dump] LocalObjectManager: [state-dump] - num pinned objects: 0 [state-dump] - pinned objects size: 0 [state-dump] - num objects pending restore: 0 [state-dump] - num objects pending spill: 0 [state-dump] - num bytes pending spill: 0 [state-dump] - num bytes currently spilled: 0 [state-dump] - cumulative spill requests: 0 [state-dump] - cumulative restore requests: 0 [state-dump] - spilled objects pending delete: 0 [state-dump] [state-dump] ObjectManager: [state-dump] - num local objects: 0 [state-dump] - num unfulfilled push requests: 0 [state-dump] - num object pull requests: 0 [state-dump] - num chunks received total: 0 [state-dump] - num chunks received failed (all): 0 [state-dump] - num chunks received failed / cancelled: 0 [state-dump] - num chunks received failed / plasma error: 0 [state-dump] Event stats: [state-dump] Global stats: 0 total (0 active) [state-dump] Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Execution time: mean = -nan s, total = 0.000 s [state-dump] Event stats: [state-dump] PushManager: [state-dump] - num pushes in flight: 0 [state-dump] - num chunks in flight: 0 [state-dump] - num chunks remaining: 0 [state-dump] - max chunks allowed: 409 [state-dump] OwnershipBasedObjectDirectory: [state-dump] - num listeners: 0 [state-dump] - cumulative location updates: 0 [state-dump] - num location updates per second: 0.000 [state-dump] - num location lookups per second: 0.000 [state-dump] - num locations added per second: 0.000 [state-dump] - num locations removed per second: 0.000 [state-dump] BufferPool: [state-dump] - create buffer state map size: 0 [state-dump] PullManager: [state-dump] - num bytes available for pulled objects: 2147483648 [state-dump] - num bytes being pulled (all): 0 [state-dump] - num bytes being pulled / pinned: 0 [state-dump] - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - first get request bundle: N/A [state-dump] - first wait request bundle: N/A [state-dump] - first task request bundle: N/A [state-dump] - num objects queued: 0 [state-dump] - num objects actively pulled (all): 0 [state-dump] - num objects actively pulled / pinned: 0 [state-dump] - num bundles being pulled: 0 [state-dump] - num pull retries: 0 [state-dump] - max timeout seconds: 0 [state-dump] - max timeout request is already processed. No entry. [state-dump] [state-dump] WorkerPool: [state-dump] - registered jobs: 1 [state-dump] - process_failed_job_config_missing: 0 [state-dump] - process_failed_rate_limited: 0 [state-dump] - process_failed_pending_registration: 0 [state-dump] - process_failed_runtime_env_setup_failed: 0 [state-dump] - num PYTHON workers: 20 [state-dump] - num PYTHON drivers: 1 [state-dump] - num PYTHON pending start requests: 0 [state-dump] - num PYTHON pending registration requests: 0 [state-dump] - num object spill callbacks queued: 0 [state-dump] - num object restore queued: 0 [state-dump] - num util functions queued: 0 [state-dump] - num idle workers: 20 [state-dump] TaskDependencyManager: [state-dump] - task deps map size: 0 [state-dump] - get req map size: 0 [state-dump] - wait req map size: 0 [state-dump] - local objects map size: 0 [state-dump] WaitManager: [state-dump] - num active wait requests: 0 [state-dump] Subscriber: [state-dump] Channel WORKER_OBJECT_LOCATIONS_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_OBJECT_EVICTION [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_REF_REMOVED_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] num async plasma notifications: 0 [state-dump] Remote node managers: [state-dump] Event stats: [state-dump] Global stats: 10846 total (35 active) [state-dump] Queueing time: mean = 5.878 ms, max = 59.826 s, min = -0.000 s, total = 63.749 s [state-dump] Execution time: mean = 376.875 us, total = 4.088 s [state-dump] Event stats: [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog - 2518 total (0 active), Execution time: mean = 555.583 us, total = 1.399 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 2518 total (0 active), Execution time: mean = 42.345 us, total = 106.626 ms, Queueing time: mean = 112.189 us, max = 1.154 ms, min = 8.980 us, total = 282.492 ms [state-dump] NodeManager.CheckGC - 1199 total (1 active), Execution time: mean = 3.020 us, total = 3.621 ms, Queueing time: mean = 125.862 us, max = 25.875 ms, min = 10.923 us, total = 150.908 ms [state-dump] RaySyncer.OnDemandBroadcasting - 1199 total (1 active), Execution time: mean = 14.382 us, total = 17.243 ms, Queueing time: mean = 115.400 us, max = 25.869 ms, min = 11.266 us, total = 138.364 ms [state-dump] ObjectManager.UpdateAvailableMemory - 1199 total (0 active), Execution time: mean = 6.480 us, total = 7.769 ms, Queueing time: mean = 112.157 us, max = 546.539 us, min = 4.315 us, total = 134.477 ms [state-dump] RayletWorkerPool.deadline_timer.kill_idle_workers - 600 total (1 active), Execution time: mean = 20.157 us, total = 12.094 ms, Queueing time: mean = 126.578 us, max = 26.386 ms, min = 13.416 us, total = 75.947 ms [state-dump] MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 479 total (1 active), Execution time: mean = 468.264 us, total = 224.299 ms, Queueing time: mean = 75.966 us, max = 255.439 us, min = -0.000 s, total = 36.388 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad - 120 total (0 active), Execution time: mean = 656.073 us, total = 78.729 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.ScheduleAndDispatchTasks - 120 total (1 active), Execution time: mean = 16.947 us, total = 2.034 ms, Queueing time: mean = 93.639 us, max = 2.581 ms, min = 10.666 us, total = 11.237 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 120 total (0 active), Execution time: mean = 111.219 us, total = 13.346 ms, Queueing time: mean = 121.363 us, max = 209.375 us, min = 25.598 us, total = 14.564 ms [state-dump] NodeManager.deadline_timer.spill_objects_when_over_threshold - 120 total (1 active), Execution time: mean = 3.132 us, total = 375.888 us, Queueing time: mean = 176.536 us, max = 1.288 ms, min = 5.573 us, total = 21.184 ms [state-dump] NodeManager.deadline_timer.flush_free_objects - 120 total (1 active), Execution time: mean = 8.741 us, total = 1.049 ms, Queueing time: mean = 172.523 us, max = 1.293 ms, min = 10.538 us, total = 20.703 ms [state-dump] ClientConnection.async_read.ProcessMessageHeader - 95 total (21 active), Execution time: mean = 7.620 us, total = 723.861 us, Queueing time: mean = 660.815 ms, max = 59.826 s, min = 29.537 us, total = 62.777 s [state-dump] ClientConnection.async_read.ProcessMessage - 74 total (0 active), Execution time: mean = 752.373 us, total = 55.676 ms, Queueing time: mean = 18.898 us, max = 247.615 us, min = 4.453 us, total = 1.398 ms [state-dump] ClusterResourceManager.ResetRemoteNodeView - 41 total (1 active), Execution time: mean = 9.260 us, total = 379.680 us, Queueing time: mean = 82.558 us, max = 283.776 us, min = 33.100 us, total = 3.385 ms [state-dump] NodeManager.deadline_timer.record_metrics - 24 total (1 active), Execution time: mean = 539.686 us, total = 12.952 ms, Queueing time: mean = 333.359 us, max = 1.047 ms, min = 11.295 us, total = 8.001 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 24 total (0 active), Execution time: mean = 1.533 ms, total = 36.798 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.GcsCheckAlive - 24 total (1 active), Execution time: mean = 268.465 us, total = 6.443 ms, Queueing time: mean = 588.938 us, max = 1.280 ms, min = 34.942 us, total = 14.135 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 24 total (0 active), Execution time: mean = 53.341 us, total = 1.280 ms, Queueing time: mean = 119.986 us, max = 167.932 us, min = 30.456 us, total = 2.880 ms [state-dump] ClientConnection.async_write.DoAsyncWrites - 22 total (0 active), Execution time: mean = 2.129 us, total = 46.834 us, Queueing time: mean = 43.475 us, max = 264.443 us, min = 17.727 us, total = 956.459 us [state-dump] ObjectManager.ObjectDeleted - 21 total (0 active), Execution time: mean = 18.654 us, total = 391.728 us, Queueing time: mean = 66.932 us, max = 269.040 us, min = 24.280 us, total = 1.406 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig - 21 total (0 active), Execution time: mean = 800.451 us, total = 16.809 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ObjectManager.ObjectAdded - 21 total (0 active), Execution time: mean = 13.522 us, total = 283.961 us, Queueing time: mean = 115.200 us, max = 203.067 us, min = 15.853 us, total = 2.419 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 21 total (0 active), Execution time: mean = 119.783 us, total = 2.515 ms, Queueing time: mean = 119.709 us, max = 193.975 us, min = 26.529 us, total = 2.514 ms [state-dump] PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 204.482 us, total = 2.658 ms, Queueing time: mean = 3.109 ms, max = 9.915 ms, min = 28.359 us, total = 40.413 ms [state-dump] - 12 total (0 active), Execution time: mean = 1.109 us, total = 13.308 us, Queueing time: mean = 117.235 us, max = 207.332 us, min = 32.817 us, total = 1.407 ms [state-dump] RaySyncer.BroadcastMessage - 12 total (0 active), Execution time: mean = 229.790 us, total = 2.757 ms, Queueing time: mean = 796.500 ns, max = 1.330 us, min = 180.000 ns, total = 9.558 us [state-dump] NodeManager.deadline_timer.debug_state_dump - 12 total (1 active), Execution time: mean = 1.600 ms, total = 19.197 ms, Queueing time: mean = 55.945 us, max = 93.752 us, min = 18.059 us, total = 671.337 us [state-dump] NodeManagerService.grpc_server.RequestWorkerLease - 10 total (0 active), Execution time: mean = 967.372 us, total = 9.674 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] WorkerPool.PopWorkerCallback - 10 total (0 active), Execution time: mean = 24.488 us, total = 244.878 us, Queueing time: mean = 114.997 us, max = 206.869 us, min = 23.653 us, total = 1.150 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker - 10 total (0 active), Execution time: mean = 698.937 us, total = 6.989 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReturnWorker.HandleRequestImpl - 10 total (0 active), Execution time: mean = 129.906 us, total = 1.299 ms, Queueing time: mean = 109.463 us, max = 137.712 us, min = 31.332 us, total = 1.095 ms [state-dump] NodeManagerService.grpc_server.RequestWorkerLease.HandleRequestImpl - 10 total (0 active), Execution time: mean = 133.400 us, total = 1.334 ms, Queueing time: mean = 133.654 us, max = 258.016 us, min = 50.206 us, total = 1.337 ms [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 142.119 us, total = 284.237 us, Queueing time: mean = 702.200 us, max = 1.287 ms, min = 117.240 us, total = 1.404 ms [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 2 total (1 active), Execution time: mean = 502.391 ms, total = 1.005 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] RaySyncerRegister - 2 total (0 active), Execution time: mean = 2.053 us, total = 4.106 us, Queueing time: mean = 222.000 ns, max = 369.000 ns, min = 75.000 ns, total = 444.000 ns [state-dump] NodeManager.deadline_timer.print_event_loop_stats - 2 total (1 active, 1 running), Execution time: mean = 1.597 ms, total = 3.193 ms, Queueing time: mean = 32.269 us, max = 64.537 us, min = 64.537 us, total = 64.537 us [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 1.320 ms, total = 2.640 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 2.002 ms, total = 2.002 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 43.810 us, total = 43.810 us, Queueing time: mean = 116.080 us, max = 116.080 us, min = 116.080 us, total = 116.080 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 288.995 us, total = 288.995 us, Queueing time: mean = 122.868 us, max = 122.868 us, min = 122.868 us, total = 122.868 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 2.178 ms, total = 2.178 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 1.022 s, total = 1.022 s, Queueing time: mean = 126.779 us, max = 126.779 us, min = 126.779 us, total = 126.779 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 1.935 ms, total = 1.935 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 1.278 ms, total = 1.278 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.GCTaskFailureReason - 1 total (1 active), Execution time: mean = 0.000 s, total = 0.000 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 74.750 us, total = 74.750 us, Queueing time: mean = 225.429 us, max = 225.429 us, min = 225.429 us, total = 225.429 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 1.746 ms, total = 1.746 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 242.297 us, total = 242.297 us, Queueing time: mean = 122.977 us, max = 122.977 us, min = 122.977 us, total = 122.977 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 92.080 us, total = 92.080 us, Queueing time: mean = 36.492 us, max = 36.492 us, min = 36.492 us, total = 36.492 us [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 1 total (0 active), Execution time: mean = 235.496 us, total = 235.496 us, Queueing time: mean = 20.299 us, max = 20.299 us, min = 20.299 us, total = 20.299 us [state-dump] DebugString() time ms: 1 [state-dump] [state-dump] [2025-01-20 23:04:54,903 I 13636 13664] (raylet) store.cc:564: Plasma store debug dump: Current usage: 0 / 2.14748 GB - num bytes created total: 168 0 pending objects of total size 0MB - objects spillable: 0 - bytes spillable: 0 - objects unsealed: 0 - bytes unsealed: 0 - objects in use: 0 - bytes in use: 0 - objects evictable: 0 - bytes evictable: 0 - objects created by worker: 0 - bytes created by worker: 0 - objects restored: 0 - bytes restored: 0 - objects received: 0 - bytes received: 0 - objects errored: 0 - bytes errored: 0 [2025-01-20 23:04:55,933 I 13636 13636] (raylet) node_manager.cc:525: [state-dump] NodeManager: [state-dump] Node ID: 94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3 [state-dump] Node name: 192.168.0.2 [state-dump] InitialConfigResources: {object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, GPU: 20000, node:__internal_head__: 10000, accelerator_type:A40: 10000, memory: 844966629380000} [state-dump] ClusterTaskManager: [state-dump] ========== Node: 94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3 ================= [state-dump] Infeasible queue length: 0 [state-dump] Schedule queue length: 0 [state-dump] Dispatch queue length: 0 [state-dump] num_waiting_for_resource: 0 [state-dump] num_waiting_for_plasma_memory: 0 [state-dump] num_waiting_for_remote_node_resources: 0 [state-dump] num_worker_not_started_by_job_config_not_exist: 0 [state-dump] num_worker_not_started_by_registration_timeout: 0 [state-dump] num_tasks_waiting_for_workers: 0 [state-dump] num_cancelled_tasks: 0 [state-dump] cluster_resource_scheduler state: [state-dump] Local id: -2729954820020547849 Local resources: {"total":{CPU: [200000], memory: [844966629380000], accelerator_type:A40: [10000], node:192.168.0.2: [10000], node:__internal_head__: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000]}}, "available": {CPU: [200000], memory: [844966629380000], accelerator_type:A40: [10000], node:192.168.0.2: [10000], node:__internal_head__: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000]}}, "labels":{"ray.io/node_id":"94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3",} is_draining: 0 is_idle: 1 Cluster resources: node id: -2729954820020547849{"total":{object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, GPU: 20000, node:__internal_head__: 10000, accelerator_type:A40: 10000, memory: 844966629380000}}, "available": {GPU: 20000, node:__internal_head__: 10000, memory: 844966629380000, accelerator_type:A40: 10000, object_store_memory: 21474836480000, node:192.168.0.2: 10000, CPU: 200000}}, "labels":{"ray.io/node_id":"94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} [state-dump] Waiting tasks size: 0 [state-dump] Number of executing tasks: 0 [state-dump] Number of pinned task arguments: 0 [state-dump] Number of total spilled tasks: 0 [state-dump] Number of spilled waiting tasks: 0 [state-dump] Number of spilled unschedulable tasks: 0 [state-dump] Resource usage { [state-dump] } [state-dump] Backlog Size per scheduling descriptor :{workerId: num backlogs}: [state-dump] [state-dump] Running tasks by scheduling class: [state-dump] ================================================== [state-dump] [state-dump] ClusterResources: [state-dump] LocalObjectManager: [state-dump] - num pinned objects: 0 [state-dump] - pinned objects size: 0 [state-dump] - num objects pending restore: 0 [state-dump] - num objects pending spill: 0 [state-dump] - num bytes pending spill: 0 [state-dump] - num bytes currently spilled: 0 [state-dump] - cumulative spill requests: 0 [state-dump] - cumulative restore requests: 0 [state-dump] - spilled objects pending delete: 0 [state-dump] [state-dump] ObjectManager: [state-dump] - num local objects: 0 [state-dump] - num unfulfilled push requests: 0 [state-dump] - num object pull requests: 0 [state-dump] - num chunks received total: 0 [state-dump] - num chunks received failed (all): 0 [state-dump] - num chunks received failed / cancelled: 0 [state-dump] - num chunks received failed / plasma error: 0 [state-dump] Event stats: [state-dump] Global stats: 0 total (0 active) [state-dump] Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Execution time: mean = -nan s, total = 0.000 s [state-dump] Event stats: [state-dump] PushManager: [state-dump] - num pushes in flight: 0 [state-dump] - num chunks in flight: 0 [state-dump] - num chunks remaining: 0 [state-dump] - max chunks allowed: 409 [state-dump] OwnershipBasedObjectDirectory: [state-dump] - num listeners: 0 [state-dump] - cumulative location updates: 0 [state-dump] - num location updates per second: 0.000 [state-dump] - num location lookups per second: 0.000 [state-dump] - num locations added per second: 0.000 [state-dump] - num locations removed per second: 0.000 [state-dump] BufferPool: [state-dump] - create buffer state map size: 0 [state-dump] PullManager: [state-dump] - num bytes available for pulled objects: 2147483648 [state-dump] - num bytes being pulled (all): 0 [state-dump] - num bytes being pulled / pinned: 0 [state-dump] - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - first get request bundle: N/A [state-dump] - first wait request bundle: N/A [state-dump] - first task request bundle: N/A [state-dump] - num objects queued: 0 [state-dump] - num objects actively pulled (all): 0 [state-dump] - num objects actively pulled / pinned: 0 [state-dump] - num bundles being pulled: 0 [state-dump] - num pull retries: 0 [state-dump] - max timeout seconds: 0 [state-dump] - max timeout request is already processed. No entry. [state-dump] [state-dump] WorkerPool: [state-dump] - registered jobs: 1 [state-dump] - process_failed_job_config_missing: 0 [state-dump] - process_failed_rate_limited: 0 [state-dump] - process_failed_pending_registration: 0 [state-dump] - process_failed_runtime_env_setup_failed: 0 [state-dump] - num PYTHON workers: 20 [state-dump] - num PYTHON drivers: 1 [state-dump] - num PYTHON pending start requests: 0 [state-dump] - num PYTHON pending registration requests: 0 [state-dump] - num object spill callbacks queued: 0 [state-dump] - num object restore queued: 0 [state-dump] - num util functions queued: 0 [state-dump] - num idle workers: 20 [state-dump] TaskDependencyManager: [state-dump] - task deps map size: 0 [state-dump] - get req map size: 0 [state-dump] - wait req map size: 0 [state-dump] - local objects map size: 0 [state-dump] WaitManager: [state-dump] - num active wait requests: 0 [state-dump] Subscriber: [state-dump] Channel WORKER_OBJECT_LOCATIONS_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_OBJECT_EVICTION [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_REF_REMOVED_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] num async plasma notifications: 0 [state-dump] Remote node managers: [state-dump] Event stats: [state-dump] Global stats: 16076 total (35 active) [state-dump] Queueing time: mean = 3.990 ms, max = 59.826 s, min = -0.000 s, total = 64.147 s [state-dump] Execution time: mean = 312.652 us, total = 5.026 s [state-dump] Event stats: [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog - 3777 total (0 active), Execution time: mean = 550.801 us, total = 2.080 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 3777 total (0 active), Execution time: mean = 41.550 us, total = 156.935 ms, Queueing time: mean = 111.449 us, max = 1.154 ms, min = 8.980 us, total = 420.942 ms [state-dump] NodeManager.CheckGC - 1798 total (1 active), Execution time: mean = 2.920 us, total = 5.249 ms, Queueing time: mean = 115.679 us, max = 25.875 ms, min = 10.923 us, total = 207.991 ms [state-dump] RaySyncer.OnDemandBroadcasting - 1798 total (1 active), Execution time: mean = 12.813 us, total = 23.037 ms, Queueing time: mean = 106.680 us, max = 25.869 ms, min = 11.266 us, total = 191.810 ms [state-dump] ObjectManager.UpdateAvailableMemory - 1798 total (0 active), Execution time: mean = 6.335 us, total = 11.390 ms, Queueing time: mean = 113.346 us, max = 546.539 us, min = 3.813 us, total = 203.796 ms [state-dump] RayletWorkerPool.deadline_timer.kill_idle_workers - 900 total (1 active), Execution time: mean = 19.035 us, total = 17.132 ms, Queueing time: mean = 108.894 us, max = 26.386 ms, min = 13.416 us, total = 98.004 ms [state-dump] MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 719 total (1 active), Execution time: mean = 459.641 us, total = 330.482 ms, Queueing time: mean = 76.312 us, max = 1.481 ms, min = -0.000 s, total = 54.869 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad - 180 total (0 active), Execution time: mean = 647.774 us, total = 116.599 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.ScheduleAndDispatchTasks - 180 total (1 active), Execution time: mean = 16.062 us, total = 2.891 ms, Queueing time: mean = 87.632 us, max = 2.581 ms, min = 10.666 us, total = 15.774 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 180 total (0 active), Execution time: mean = 107.072 us, total = 19.273 ms, Queueing time: mean = 118.778 us, max = 211.602 us, min = 25.598 us, total = 21.380 ms [state-dump] NodeManager.deadline_timer.spill_objects_when_over_threshold - 180 total (1 active), Execution time: mean = 3.083 us, total = 555.004 us, Queueing time: mean = 165.290 us, max = 1.288 ms, min = 5.573 us, total = 29.752 ms [state-dump] NodeManager.deadline_timer.flush_free_objects - 180 total (1 active), Execution time: mean = 8.612 us, total = 1.550 ms, Queueing time: mean = 161.361 us, max = 1.293 ms, min = 10.420 us, total = 29.045 ms [state-dump] ClientConnection.async_read.ProcessMessageHeader - 95 total (21 active), Execution time: mean = 7.620 us, total = 723.861 us, Queueing time: mean = 660.815 ms, max = 59.826 s, min = 29.537 us, total = 62.777 s [state-dump] ClientConnection.async_read.ProcessMessage - 74 total (0 active), Execution time: mean = 752.373 us, total = 55.676 ms, Queueing time: mean = 18.898 us, max = 247.615 us, min = 4.453 us, total = 1.398 ms [state-dump] ClusterResourceManager.ResetRemoteNodeView - 61 total (1 active), Execution time: mean = 8.851 us, total = 539.914 us, Queueing time: mean = 79.938 us, max = 283.776 us, min = 24.809 us, total = 4.876 ms [state-dump] NodeManager.deadline_timer.record_metrics - 36 total (1 active), Execution time: mean = 541.215 us, total = 19.484 ms, Queueing time: mean = 284.343 us, max = 1.047 ms, min = 11.295 us, total = 10.236 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 36 total (0 active), Execution time: mean = 1.500 ms, total = 53.991 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.GcsCheckAlive - 36 total (1 active), Execution time: mean = 259.791 us, total = 9.352 ms, Queueing time: mean = 554.116 us, max = 1.280 ms, min = 34.942 us, total = 19.948 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 36 total (0 active), Execution time: mean = 51.600 us, total = 1.858 ms, Queueing time: mean = 116.595 us, max = 167.932 us, min = 30.456 us, total = 4.197 ms [state-dump] ClientConnection.async_write.DoAsyncWrites - 22 total (0 active), Execution time: mean = 2.129 us, total = 46.834 us, Queueing time: mean = 43.475 us, max = 264.443 us, min = 17.727 us, total = 956.459 us [state-dump] ObjectManager.ObjectDeleted - 21 total (0 active), Execution time: mean = 18.654 us, total = 391.728 us, Queueing time: mean = 66.932 us, max = 269.040 us, min = 24.280 us, total = 1.406 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 21 total (0 active), Execution time: mean = 119.783 us, total = 2.515 ms, Queueing time: mean = 119.709 us, max = 193.975 us, min = 26.529 us, total = 2.514 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig - 21 total (0 active), Execution time: mean = 800.451 us, total = 16.809 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ObjectManager.ObjectAdded - 21 total (0 active), Execution time: mean = 13.522 us, total = 283.961 us, Queueing time: mean = 115.200 us, max = 203.067 us, min = 15.853 us, total = 2.419 ms [state-dump] NodeManager.deadline_timer.debug_state_dump - 18 total (1 active), Execution time: mean = 1.568 ms, total = 28.227 ms, Queueing time: mean = 56.670 us, max = 93.752 us, min = 18.059 us, total = 1.020 ms [state-dump] PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 204.482 us, total = 2.658 ms, Queueing time: mean = 3.109 ms, max = 9.915 ms, min = 28.359 us, total = 40.413 ms [state-dump] - 12 total (0 active), Execution time: mean = 1.109 us, total = 13.308 us, Queueing time: mean = 117.235 us, max = 207.332 us, min = 32.817 us, total = 1.407 ms [state-dump] RaySyncer.BroadcastMessage - 12 total (0 active), Execution time: mean = 229.790 us, total = 2.757 ms, Queueing time: mean = 796.500 ns, max = 1.330 us, min = 180.000 ns, total = 9.558 us [state-dump] NodeManagerService.grpc_server.RequestWorkerLease.HandleRequestImpl - 10 total (0 active), Execution time: mean = 133.400 us, total = 1.334 ms, Queueing time: mean = 133.654 us, max = 258.016 us, min = 50.206 us, total = 1.337 ms [state-dump] NodeManagerService.grpc_server.RequestWorkerLease - 10 total (0 active), Execution time: mean = 967.372 us, total = 9.674 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReturnWorker - 10 total (0 active), Execution time: mean = 698.937 us, total = 6.989 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] WorkerPool.PopWorkerCallback - 10 total (0 active), Execution time: mean = 24.488 us, total = 244.878 us, Queueing time: mean = 114.997 us, max = 206.869 us, min = 23.653 us, total = 1.150 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker.HandleRequestImpl - 10 total (0 active), Execution time: mean = 129.906 us, total = 1.299 ms, Queueing time: mean = 109.463 us, max = 137.712 us, min = 31.332 us, total = 1.095 ms [state-dump] NodeManager.deadline_timer.print_event_loop_stats - 3 total (1 active, 1 running), Execution time: mean = 2.023 ms, total = 6.070 ms, Queueing time: mean = 43.190 us, max = 65.032 us, min = 64.537 us, total = 129.569 us [state-dump] RaySyncerRegister - 2 total (0 active), Execution time: mean = 2.053 us, total = 4.106 us, Queueing time: mean = 222.000 ns, max = 369.000 ns, min = 75.000 ns, total = 444.000 ns [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 2 total (1 active), Execution time: mean = 502.391 ms, total = 1.005 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 142.119 us, total = 284.237 us, Queueing time: mean = 702.200 us, max = 1.287 ms, min = 117.240 us, total = 1.404 ms [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 1.320 ms, total = 2.640 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 1 total (0 active), Execution time: mean = 235.496 us, total = 235.496 us, Queueing time: mean = 20.299 us, max = 20.299 us, min = 20.299 us, total = 20.299 us [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 2.002 ms, total = 2.002 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 2.178 ms, total = 2.178 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 1.022 s, total = 1.022 s, Queueing time: mean = 126.779 us, max = 126.779 us, min = 126.779 us, total = 126.779 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 43.810 us, total = 43.810 us, Queueing time: mean = 116.080 us, max = 116.080 us, min = 116.080 us, total = 116.080 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 1.935 ms, total = 1.935 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 1.278 ms, total = 1.278 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.GCTaskFailureReason - 1 total (1 active), Execution time: mean = 0.000 s, total = 0.000 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 74.750 us, total = 74.750 us, Queueing time: mean = 225.429 us, max = 225.429 us, min = 225.429 us, total = 225.429 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 1.746 ms, total = 1.746 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 242.297 us, total = 242.297 us, Queueing time: mean = 122.977 us, max = 122.977 us, min = 122.977 us, total = 122.977 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 92.080 us, total = 92.080 us, Queueing time: mean = 36.492 us, max = 36.492 us, min = 36.492 us, total = 36.492 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 288.995 us, total = 288.995 us, Queueing time: mean = 122.868 us, max = 122.868 us, min = 122.868 us, total = 122.868 us [state-dump] DebugString() time ms: 1 [state-dump] [state-dump] [2025-01-20 23:05:54,904 I 13636 13664] (raylet) store.cc:564: Plasma store debug dump: Current usage: 0 / 2.14748 GB - num bytes created total: 168 0 pending objects of total size 0MB - objects spillable: 0 - bytes spillable: 0 - objects unsealed: 0 - bytes unsealed: 0 - objects in use: 0 - bytes in use: 0 - objects evictable: 0 - bytes evictable: 0 - objects created by worker: 0 - bytes created by worker: 0 - objects restored: 0 - bytes restored: 0 - objects received: 0 - bytes received: 0 - objects errored: 0 - bytes errored: 0 [2025-01-20 23:05:55,936 I 13636 13636] (raylet) node_manager.cc:525: [state-dump] NodeManager: [state-dump] Node ID: 94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3 [state-dump] Node name: 192.168.0.2 [state-dump] InitialConfigResources: {object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, GPU: 20000, node:__internal_head__: 10000, accelerator_type:A40: 10000, memory: 844966629380000} [state-dump] ClusterTaskManager: [state-dump] ========== Node: 94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3 ================= [state-dump] Infeasible queue length: 0 [state-dump] Schedule queue length: 0 [state-dump] Dispatch queue length: 0 [state-dump] num_waiting_for_resource: 0 [state-dump] num_waiting_for_plasma_memory: 0 [state-dump] num_waiting_for_remote_node_resources: 0 [state-dump] num_worker_not_started_by_job_config_not_exist: 0 [state-dump] num_worker_not_started_by_registration_timeout: 0 [state-dump] num_tasks_waiting_for_workers: 0 [state-dump] num_cancelled_tasks: 0 [state-dump] cluster_resource_scheduler state: [state-dump] Local id: -2729954820020547849 Local resources: {"total":{CPU: [200000], memory: [844966629380000], accelerator_type:A40: [10000], node:192.168.0.2: [10000], node:__internal_head__: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000]}}, "available": {CPU: [200000], memory: [844966629380000], accelerator_type:A40: [10000], node:192.168.0.2: [10000], node:__internal_head__: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000]}}, "labels":{"ray.io/node_id":"94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3",} is_draining: 0 is_idle: 1 Cluster resources: node id: -2729954820020547849{"total":{object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, GPU: 20000, node:__internal_head__: 10000, accelerator_type:A40: 10000, memory: 844966629380000}}, "available": {GPU: 20000, node:__internal_head__: 10000, memory: 844966629380000, accelerator_type:A40: 10000, object_store_memory: 21474836480000, node:192.168.0.2: 10000, CPU: 200000}}, "labels":{"ray.io/node_id":"94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} [state-dump] Waiting tasks size: 0 [state-dump] Number of executing tasks: 0 [state-dump] Number of pinned task arguments: 0 [state-dump] Number of total spilled tasks: 0 [state-dump] Number of spilled waiting tasks: 0 [state-dump] Number of spilled unschedulable tasks: 0 [state-dump] Resource usage { [state-dump] } [state-dump] Backlog Size per scheduling descriptor :{workerId: num backlogs}: [state-dump] [state-dump] Running tasks by scheduling class: [state-dump] ================================================== [state-dump] [state-dump] ClusterResources: [state-dump] LocalObjectManager: [state-dump] - num pinned objects: 0 [state-dump] - pinned objects size: 0 [state-dump] - num objects pending restore: 0 [state-dump] - num objects pending spill: 0 [state-dump] - num bytes pending spill: 0 [state-dump] - num bytes currently spilled: 0 [state-dump] - cumulative spill requests: 0 [state-dump] - cumulative restore requests: 0 [state-dump] - spilled objects pending delete: 0 [state-dump] [state-dump] ObjectManager: [state-dump] - num local objects: 0 [state-dump] - num unfulfilled push requests: 0 [state-dump] - num object pull requests: 0 [state-dump] - num chunks received total: 0 [state-dump] - num chunks received failed (all): 0 [state-dump] - num chunks received failed / cancelled: 0 [state-dump] - num chunks received failed / plasma error: 0 [state-dump] Event stats: [state-dump] Global stats: 0 total (0 active) [state-dump] Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Execution time: mean = -nan s, total = 0.000 s [state-dump] Event stats: [state-dump] PushManager: [state-dump] - num pushes in flight: 0 [state-dump] - num chunks in flight: 0 [state-dump] - num chunks remaining: 0 [state-dump] - max chunks allowed: 409 [state-dump] OwnershipBasedObjectDirectory: [state-dump] - num listeners: 0 [state-dump] - cumulative location updates: 0 [state-dump] - num location updates per second: 0.000 [state-dump] - num location lookups per second: 0.000 [state-dump] - num locations added per second: 0.000 [state-dump] - num locations removed per second: 0.000 [state-dump] BufferPool: [state-dump] - create buffer state map size: 0 [state-dump] PullManager: [state-dump] - num bytes available for pulled objects: 2147483648 [state-dump] - num bytes being pulled (all): 0 [state-dump] - num bytes being pulled / pinned: 0 [state-dump] - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - first get request bundle: N/A [state-dump] - first wait request bundle: N/A [state-dump] - first task request bundle: N/A [state-dump] - num objects queued: 0 [state-dump] - num objects actively pulled (all): 0 [state-dump] - num objects actively pulled / pinned: 0 [state-dump] - num bundles being pulled: 0 [state-dump] - num pull retries: 0 [state-dump] - max timeout seconds: 0 [state-dump] - max timeout request is already processed. No entry. [state-dump] [state-dump] WorkerPool: [state-dump] - registered jobs: 1 [state-dump] - process_failed_job_config_missing: 0 [state-dump] - process_failed_rate_limited: 0 [state-dump] - process_failed_pending_registration: 0 [state-dump] - process_failed_runtime_env_setup_failed: 0 [state-dump] - num PYTHON workers: 20 [state-dump] - num PYTHON drivers: 1 [state-dump] - num PYTHON pending start requests: 0 [state-dump] - num PYTHON pending registration requests: 0 [state-dump] - num object spill callbacks queued: 0 [state-dump] - num object restore queued: 0 [state-dump] - num util functions queued: 0 [state-dump] - num idle workers: 20 [state-dump] TaskDependencyManager: [state-dump] - task deps map size: 0 [state-dump] - get req map size: 0 [state-dump] - wait req map size: 0 [state-dump] - local objects map size: 0 [state-dump] WaitManager: [state-dump] - num active wait requests: 0 [state-dump] Subscriber: [state-dump] Channel WORKER_OBJECT_LOCATIONS_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_OBJECT_EVICTION [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_REF_REMOVED_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] num async plasma notifications: 0 [state-dump] Remote node managers: [state-dump] Event stats: [state-dump] Global stats: 21309 total (35 active) [state-dump] Queueing time: mean = 3.030 ms, max = 59.826 s, min = -0.000 s, total = 64.571 s [state-dump] Execution time: mean = 281.412 us, total = 5.997 s [state-dump] Event stats: [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog - 5036 total (0 active), Execution time: mean = 552.469 us, total = 2.782 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 5036 total (0 active), Execution time: mean = 41.498 us, total = 208.983 ms, Queueing time: mean = 112.039 us, max = 1.154 ms, min = 8.980 us, total = 564.229 ms [state-dump] NodeManager.CheckGC - 2398 total (1 active), Execution time: mean = 2.914 us, total = 6.987 ms, Queueing time: mean = 111.397 us, max = 25.875 ms, min = 10.923 us, total = 267.129 ms [state-dump] RaySyncer.OnDemandBroadcasting - 2398 total (1 active), Execution time: mean = 12.156 us, total = 29.149 ms, Queueing time: mean = 103.052 us, max = 25.869 ms, min = 11.266 us, total = 247.119 ms [state-dump] ObjectManager.UpdateAvailableMemory - 2398 total (0 active), Execution time: mean = 6.347 us, total = 15.219 ms, Queueing time: mean = 114.420 us, max = 546.539 us, min = 3.776 us, total = 274.380 ms [state-dump] RayletWorkerPool.deadline_timer.kill_idle_workers - 1200 total (1 active), Execution time: mean = 19.021 us, total = 22.826 ms, Queueing time: mean = 101.674 us, max = 26.386 ms, min = 13.416 us, total = 122.009 ms [state-dump] MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 959 total (1 active), Execution time: mean = 460.231 us, total = 441.362 ms, Queueing time: mean = 77.036 us, max = 1.481 ms, min = -0.000 s, total = 73.878 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad - 240 total (0 active), Execution time: mean = 640.321 us, total = 153.677 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.ScheduleAndDispatchTasks - 240 total (1 active), Execution time: mean = 16.003 us, total = 3.841 ms, Queueing time: mean = 85.561 us, max = 2.581 ms, min = 10.666 us, total = 20.535 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 240 total (0 active), Execution time: mean = 104.892 us, total = 25.174 ms, Queueing time: mean = 118.230 us, max = 222.344 us, min = 25.598 us, total = 28.375 ms [state-dump] NodeManager.deadline_timer.spill_objects_when_over_threshold - 240 total (1 active), Execution time: mean = 3.099 us, total = 743.865 us, Queueing time: mean = 172.736 us, max = 1.288 ms, min = 5.573 us, total = 41.457 ms [state-dump] NodeManager.deadline_timer.flush_free_objects - 240 total (1 active), Execution time: mean = 8.793 us, total = 2.110 ms, Queueing time: mean = 168.714 us, max = 1.293 ms, min = 10.420 us, total = 40.491 ms [state-dump] ClientConnection.async_read.ProcessMessageHeader - 95 total (21 active), Execution time: mean = 7.620 us, total = 723.861 us, Queueing time: mean = 660.815 ms, max = 59.826 s, min = 29.537 us, total = 62.777 s [state-dump] ClusterResourceManager.ResetRemoteNodeView - 81 total (1 active), Execution time: mean = 9.121 us, total = 738.824 us, Queueing time: mean = 83.416 us, max = 283.776 us, min = 24.809 us, total = 6.757 ms [state-dump] ClientConnection.async_read.ProcessMessage - 74 total (0 active), Execution time: mean = 752.373 us, total = 55.676 ms, Queueing time: mean = 18.898 us, max = 247.615 us, min = 4.453 us, total = 1.398 ms [state-dump] NodeManager.deadline_timer.record_metrics - 48 total (1 active), Execution time: mean = 549.325 us, total = 26.368 ms, Queueing time: mean = 313.697 us, max = 1.047 ms, min = 11.295 us, total = 15.057 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 48 total (0 active), Execution time: mean = 1.513 ms, total = 72.646 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.GcsCheckAlive - 48 total (1 active), Execution time: mean = 270.392 us, total = 12.979 ms, Queueing time: mean = 586.660 us, max = 1.280 ms, min = 34.942 us, total = 28.160 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 48 total (0 active), Execution time: mean = 52.207 us, total = 2.506 ms, Queueing time: mean = 117.694 us, max = 191.489 us, min = 28.758 us, total = 5.649 ms [state-dump] NodeManager.deadline_timer.debug_state_dump - 24 total (1 active), Execution time: mean = 1.633 ms, total = 39.197 ms, Queueing time: mean = 58.554 us, max = 93.752 us, min = 18.059 us, total = 1.405 ms [state-dump] ClientConnection.async_write.DoAsyncWrites - 22 total (0 active), Execution time: mean = 2.129 us, total = 46.834 us, Queueing time: mean = 43.475 us, max = 264.443 us, min = 17.727 us, total = 956.459 us [state-dump] ObjectManager.ObjectDeleted - 21 total (0 active), Execution time: mean = 18.654 us, total = 391.728 us, Queueing time: mean = 66.932 us, max = 269.040 us, min = 24.280 us, total = 1.406 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 21 total (0 active), Execution time: mean = 119.783 us, total = 2.515 ms, Queueing time: mean = 119.709 us, max = 193.975 us, min = 26.529 us, total = 2.514 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig - 21 total (0 active), Execution time: mean = 800.451 us, total = 16.809 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ObjectManager.ObjectAdded - 21 total (0 active), Execution time: mean = 13.522 us, total = 283.961 us, Queueing time: mean = 115.200 us, max = 203.067 us, min = 15.853 us, total = 2.419 ms [state-dump] PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 204.482 us, total = 2.658 ms, Queueing time: mean = 3.109 ms, max = 9.915 ms, min = 28.359 us, total = 40.413 ms [state-dump] - 12 total (0 active), Execution time: mean = 1.109 us, total = 13.308 us, Queueing time: mean = 117.235 us, max = 207.332 us, min = 32.817 us, total = 1.407 ms [state-dump] RaySyncer.BroadcastMessage - 12 total (0 active), Execution time: mean = 229.790 us, total = 2.757 ms, Queueing time: mean = 796.500 ns, max = 1.330 us, min = 180.000 ns, total = 9.558 us [state-dump] NodeManagerService.grpc_server.RequestWorkerLease.HandleRequestImpl - 10 total (0 active), Execution time: mean = 133.400 us, total = 1.334 ms, Queueing time: mean = 133.654 us, max = 258.016 us, min = 50.206 us, total = 1.337 ms [state-dump] NodeManagerService.grpc_server.RequestWorkerLease - 10 total (0 active), Execution time: mean = 967.372 us, total = 9.674 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReturnWorker - 10 total (0 active), Execution time: mean = 698.937 us, total = 6.989 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] WorkerPool.PopWorkerCallback - 10 total (0 active), Execution time: mean = 24.488 us, total = 244.878 us, Queueing time: mean = 114.997 us, max = 206.869 us, min = 23.653 us, total = 1.150 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker.HandleRequestImpl - 10 total (0 active), Execution time: mean = 129.906 us, total = 1.299 ms, Queueing time: mean = 109.463 us, max = 137.712 us, min = 31.332 us, total = 1.095 ms [state-dump] NodeManager.deadline_timer.print_event_loop_stats - 4 total (1 active, 1 running), Execution time: mean = 2.166 ms, total = 8.665 ms, Queueing time: mean = 58.150 us, max = 103.030 us, min = 64.537 us, total = 232.599 us [state-dump] RaySyncerRegister - 2 total (0 active), Execution time: mean = 2.053 us, total = 4.106 us, Queueing time: mean = 222.000 ns, max = 369.000 ns, min = 75.000 ns, total = 444.000 ns [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 2 total (1 active), Execution time: mean = 502.391 ms, total = 1.005 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 142.119 us, total = 284.237 us, Queueing time: mean = 702.200 us, max = 1.287 ms, min = 117.240 us, total = 1.404 ms [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 1.320 ms, total = 2.640 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 1 total (0 active), Execution time: mean = 235.496 us, total = 235.496 us, Queueing time: mean = 20.299 us, max = 20.299 us, min = 20.299 us, total = 20.299 us [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 2.002 ms, total = 2.002 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 2.178 ms, total = 2.178 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 1.022 s, total = 1.022 s, Queueing time: mean = 126.779 us, max = 126.779 us, min = 126.779 us, total = 126.779 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 43.810 us, total = 43.810 us, Queueing time: mean = 116.080 us, max = 116.080 us, min = 116.080 us, total = 116.080 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 1.935 ms, total = 1.935 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 1.278 ms, total = 1.278 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.GCTaskFailureReason - 1 total (1 active), Execution time: mean = 0.000 s, total = 0.000 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 74.750 us, total = 74.750 us, Queueing time: mean = 225.429 us, max = 225.429 us, min = 225.429 us, total = 225.429 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 1.746 ms, total = 1.746 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 242.297 us, total = 242.297 us, Queueing time: mean = 122.977 us, max = 122.977 us, min = 122.977 us, total = 122.977 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 92.080 us, total = 92.080 us, Queueing time: mean = 36.492 us, max = 36.492 us, min = 36.492 us, total = 36.492 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 288.995 us, total = 288.995 us, Queueing time: mean = 122.868 us, max = 122.868 us, min = 122.868 us, total = 122.868 us [state-dump] DebugString() time ms: 1 [state-dump] [state-dump] [2025-01-20 23:06:54,904 I 13636 13664] (raylet) store.cc:564: Plasma store debug dump: Current usage: 0 / 2.14748 GB - num bytes created total: 168 0 pending objects of total size 0MB - objects spillable: 0 - bytes spillable: 0 - objects unsealed: 0 - bytes unsealed: 0 - objects in use: 0 - bytes in use: 0 - objects evictable: 0 - bytes evictable: 0 - objects created by worker: 0 - bytes created by worker: 0 - objects restored: 0 - bytes restored: 0 - objects received: 0 - bytes received: 0 - objects errored: 0 - bytes errored: 0 [2025-01-20 23:06:55,939 I 13636 13636] (raylet) node_manager.cc:525: [state-dump] NodeManager: [state-dump] Node ID: 94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3 [state-dump] Node name: 192.168.0.2 [state-dump] InitialConfigResources: {object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, GPU: 20000, node:__internal_head__: 10000, accelerator_type:A40: 10000, memory: 844966629380000} [state-dump] ClusterTaskManager: [state-dump] ========== Node: 94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3 ================= [state-dump] Infeasible queue length: 0 [state-dump] Schedule queue length: 0 [state-dump] Dispatch queue length: 0 [state-dump] num_waiting_for_resource: 0 [state-dump] num_waiting_for_plasma_memory: 0 [state-dump] num_waiting_for_remote_node_resources: 0 [state-dump] num_worker_not_started_by_job_config_not_exist: 0 [state-dump] num_worker_not_started_by_registration_timeout: 0 [state-dump] num_tasks_waiting_for_workers: 0 [state-dump] num_cancelled_tasks: 0 [state-dump] cluster_resource_scheduler state: [state-dump] Local id: -2729954820020547849 Local resources: {"total":{CPU: [200000], memory: [844966629380000], accelerator_type:A40: [10000], node:192.168.0.2: [10000], node:__internal_head__: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000]}}, "available": {CPU: [200000], memory: [844966629380000], accelerator_type:A40: [10000], node:192.168.0.2: [10000], node:__internal_head__: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000]}}, "labels":{"ray.io/node_id":"94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3",} is_draining: 0 is_idle: 1 Cluster resources: node id: -2729954820020547849{"total":{object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, GPU: 20000, node:__internal_head__: 10000, accelerator_type:A40: 10000, memory: 844966629380000}}, "available": {GPU: 20000, node:__internal_head__: 10000, memory: 844966629380000, accelerator_type:A40: 10000, object_store_memory: 21474836480000, node:192.168.0.2: 10000, CPU: 200000}}, "labels":{"ray.io/node_id":"94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} [state-dump] Waiting tasks size: 0 [state-dump] Number of executing tasks: 0 [state-dump] Number of pinned task arguments: 0 [state-dump] Number of total spilled tasks: 0 [state-dump] Number of spilled waiting tasks: 0 [state-dump] Number of spilled unschedulable tasks: 0 [state-dump] Resource usage { [state-dump] } [state-dump] Backlog Size per scheduling descriptor :{workerId: num backlogs}: [state-dump] [state-dump] Running tasks by scheduling class: [state-dump] ================================================== [state-dump] [state-dump] ClusterResources: [state-dump] LocalObjectManager: [state-dump] - num pinned objects: 0 [state-dump] - pinned objects size: 0 [state-dump] - num objects pending restore: 0 [state-dump] - num objects pending spill: 0 [state-dump] - num bytes pending spill: 0 [state-dump] - num bytes currently spilled: 0 [state-dump] - cumulative spill requests: 0 [state-dump] - cumulative restore requests: 0 [state-dump] - spilled objects pending delete: 0 [state-dump] [state-dump] ObjectManager: [state-dump] - num local objects: 0 [state-dump] - num unfulfilled push requests: 0 [state-dump] - num object pull requests: 0 [state-dump] - num chunks received total: 0 [state-dump] - num chunks received failed (all): 0 [state-dump] - num chunks received failed / cancelled: 0 [state-dump] - num chunks received failed / plasma error: 0 [state-dump] Event stats: [state-dump] Global stats: 0 total (0 active) [state-dump] Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Execution time: mean = -nan s, total = 0.000 s [state-dump] Event stats: [state-dump] PushManager: [state-dump] - num pushes in flight: 0 [state-dump] - num chunks in flight: 0 [state-dump] - num chunks remaining: 0 [state-dump] - max chunks allowed: 409 [state-dump] OwnershipBasedObjectDirectory: [state-dump] - num listeners: 0 [state-dump] - cumulative location updates: 0 [state-dump] - num location updates per second: 0.000 [state-dump] - num location lookups per second: 0.000 [state-dump] - num locations added per second: 0.000 [state-dump] - num locations removed per second: 0.000 [state-dump] BufferPool: [state-dump] - create buffer state map size: 0 [state-dump] PullManager: [state-dump] - num bytes available for pulled objects: 2147483648 [state-dump] - num bytes being pulled (all): 0 [state-dump] - num bytes being pulled / pinned: 0 [state-dump] - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - first get request bundle: N/A [state-dump] - first wait request bundle: N/A [state-dump] - first task request bundle: N/A [state-dump] - num objects queued: 0 [state-dump] - num objects actively pulled (all): 0 [state-dump] - num objects actively pulled / pinned: 0 [state-dump] - num bundles being pulled: 0 [state-dump] - num pull retries: 0 [state-dump] - max timeout seconds: 0 [state-dump] - max timeout request is already processed. No entry. [state-dump] [state-dump] WorkerPool: [state-dump] - registered jobs: 1 [state-dump] - process_failed_job_config_missing: 0 [state-dump] - process_failed_rate_limited: 0 [state-dump] - process_failed_pending_registration: 0 [state-dump] - process_failed_runtime_env_setup_failed: 0 [state-dump] - num PYTHON workers: 20 [state-dump] - num PYTHON drivers: 1 [state-dump] - num PYTHON pending start requests: 0 [state-dump] - num PYTHON pending registration requests: 0 [state-dump] - num object spill callbacks queued: 0 [state-dump] - num object restore queued: 0 [state-dump] - num util functions queued: 0 [state-dump] - num idle workers: 20 [state-dump] TaskDependencyManager: [state-dump] - task deps map size: 0 [state-dump] - get req map size: 0 [state-dump] - wait req map size: 0 [state-dump] - local objects map size: 0 [state-dump] WaitManager: [state-dump] - num active wait requests: 0 [state-dump] Subscriber: [state-dump] Channel WORKER_OBJECT_LOCATIONS_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_OBJECT_EVICTION [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_REF_REMOVED_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] num async plasma notifications: 0 [state-dump] Remote node managers: [state-dump] Event stats: [state-dump] Global stats: 26540 total (35 active) [state-dump] Queueing time: mean = 2.448 ms, max = 59.826 s, min = -0.000 s, total = 64.972 s [state-dump] Execution time: mean = 261.258 us, total = 6.934 s [state-dump] Event stats: [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog - 6296 total (0 active), Execution time: mean = 549.647 us, total = 3.461 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 6296 total (0 active), Execution time: mean = 41.327 us, total = 260.196 ms, Queueing time: mean = 111.575 us, max = 1.154 ms, min = 8.302 us, total = 702.476 ms [state-dump] NodeManager.CheckGC - 2997 total (1 active), Execution time: mean = 2.856 us, total = 8.560 ms, Queueing time: mean = 108.489 us, max = 25.875 ms, min = 10.923 us, total = 325.141 ms [state-dump] RaySyncer.OnDemandBroadcasting - 2997 total (1 active), Execution time: mean = 11.689 us, total = 35.031 ms, Queueing time: mean = 100.551 us, max = 25.869 ms, min = 11.266 us, total = 301.352 ms [state-dump] ObjectManager.UpdateAvailableMemory - 2997 total (0 active), Execution time: mean = 6.280 us, total = 18.821 ms, Queueing time: mean = 113.657 us, max = 708.376 us, min = 3.776 us, total = 340.629 ms [state-dump] RayletWorkerPool.deadline_timer.kill_idle_workers - 1500 total (1 active), Execution time: mean = 18.627 us, total = 27.941 ms, Queueing time: mean = 96.300 us, max = 26.386 ms, min = 13.416 us, total = 144.450 ms [state-dump] MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 1198 total (1 active), Execution time: mean = 457.377 us, total = 547.937 ms, Queueing time: mean = 76.703 us, max = 1.481 ms, min = -0.000 s, total = 91.890 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad - 300 total (0 active), Execution time: mean = 638.535 us, total = 191.560 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.ScheduleAndDispatchTasks - 300 total (1 active), Execution time: mean = 15.812 us, total = 4.744 ms, Queueing time: mean = 83.700 us, max = 2.581 ms, min = 10.666 us, total = 25.110 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 300 total (0 active), Execution time: mean = 104.012 us, total = 31.204 ms, Queueing time: mean = 116.505 us, max = 222.344 us, min = 18.847 us, total = 34.952 ms [state-dump] NodeManager.deadline_timer.spill_objects_when_over_threshold - 300 total (1 active), Execution time: mean = 3.087 us, total = 926.121 us, Queueing time: mean = 170.027 us, max = 1.288 ms, min = 5.573 us, total = 51.008 ms [state-dump] NodeManager.deadline_timer.flush_free_objects - 300 total (1 active), Execution time: mean = 8.700 us, total = 2.610 ms, Queueing time: mean = 166.091 us, max = 1.293 ms, min = 6.780 us, total = 49.827 ms [state-dump] ClusterResourceManager.ResetRemoteNodeView - 101 total (1 active), Execution time: mean = 9.101 us, total = 919.232 us, Queueing time: mean = 83.552 us, max = 283.776 us, min = 24.809 us, total = 8.439 ms [state-dump] ClientConnection.async_read.ProcessMessageHeader - 95 total (21 active), Execution time: mean = 7.620 us, total = 723.861 us, Queueing time: mean = 660.815 ms, max = 59.826 s, min = 29.537 us, total = 62.777 s [state-dump] ClientConnection.async_read.ProcessMessage - 74 total (0 active), Execution time: mean = 752.373 us, total = 55.676 ms, Queueing time: mean = 18.898 us, max = 247.615 us, min = 4.453 us, total = 1.398 ms [state-dump] NodeManager.deadline_timer.record_metrics - 60 total (1 active), Execution time: mean = 537.973 us, total = 32.278 ms, Queueing time: mean = 317.350 us, max = 1.047 ms, min = 11.295 us, total = 19.041 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 60 total (0 active), Execution time: mean = 1.497 ms, total = 89.844 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.GcsCheckAlive - 60 total (1 active), Execution time: mean = 267.422 us, total = 16.045 ms, Queueing time: mean = 581.605 us, max = 1.280 ms, min = 34.942 us, total = 34.896 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 60 total (0 active), Execution time: mean = 51.044 us, total = 3.063 ms, Queueing time: mean = 114.502 us, max = 191.489 us, min = 16.013 us, total = 6.870 ms [state-dump] NodeManager.deadline_timer.debug_state_dump - 30 total (1 active), Execution time: mean = 1.633 ms, total = 48.982 ms, Queueing time: mean = 58.681 us, max = 93.752 us, min = 18.037 us, total = 1.760 ms [state-dump] ClientConnection.async_write.DoAsyncWrites - 22 total (0 active), Execution time: mean = 2.129 us, total = 46.834 us, Queueing time: mean = 43.475 us, max = 264.443 us, min = 17.727 us, total = 956.459 us [state-dump] ObjectManager.ObjectDeleted - 21 total (0 active), Execution time: mean = 18.654 us, total = 391.728 us, Queueing time: mean = 66.932 us, max = 269.040 us, min = 24.280 us, total = 1.406 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 21 total (0 active), Execution time: mean = 119.783 us, total = 2.515 ms, Queueing time: mean = 119.709 us, max = 193.975 us, min = 26.529 us, total = 2.514 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig - 21 total (0 active), Execution time: mean = 800.451 us, total = 16.809 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ObjectManager.ObjectAdded - 21 total (0 active), Execution time: mean = 13.522 us, total = 283.961 us, Queueing time: mean = 115.200 us, max = 203.067 us, min = 15.853 us, total = 2.419 ms [state-dump] PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 204.482 us, total = 2.658 ms, Queueing time: mean = 3.109 ms, max = 9.915 ms, min = 28.359 us, total = 40.413 ms [state-dump] - 12 total (0 active), Execution time: mean = 1.109 us, total = 13.308 us, Queueing time: mean = 117.235 us, max = 207.332 us, min = 32.817 us, total = 1.407 ms [state-dump] RaySyncer.BroadcastMessage - 12 total (0 active), Execution time: mean = 229.790 us, total = 2.757 ms, Queueing time: mean = 796.500 ns, max = 1.330 us, min = 180.000 ns, total = 9.558 us [state-dump] NodeManagerService.grpc_server.RequestWorkerLease.HandleRequestImpl - 10 total (0 active), Execution time: mean = 133.400 us, total = 1.334 ms, Queueing time: mean = 133.654 us, max = 258.016 us, min = 50.206 us, total = 1.337 ms [state-dump] NodeManagerService.grpc_server.RequestWorkerLease - 10 total (0 active), Execution time: mean = 967.372 us, total = 9.674 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReturnWorker - 10 total (0 active), Execution time: mean = 698.937 us, total = 6.989 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] WorkerPool.PopWorkerCallback - 10 total (0 active), Execution time: mean = 24.488 us, total = 244.878 us, Queueing time: mean = 114.997 us, max = 206.869 us, min = 23.653 us, total = 1.150 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker.HandleRequestImpl - 10 total (0 active), Execution time: mean = 129.906 us, total = 1.299 ms, Queueing time: mean = 109.463 us, max = 137.712 us, min = 31.332 us, total = 1.095 ms [state-dump] NodeManager.deadline_timer.print_event_loop_stats - 5 total (1 active, 1 running), Execution time: mean = 2.272 ms, total = 11.359 ms, Queueing time: mean = 55.564 us, max = 103.030 us, min = 45.223 us, total = 277.822 us [state-dump] RaySyncerRegister - 2 total (0 active), Execution time: mean = 2.053 us, total = 4.106 us, Queueing time: mean = 222.000 ns, max = 369.000 ns, min = 75.000 ns, total = 444.000 ns [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 2 total (1 active), Execution time: mean = 502.391 ms, total = 1.005 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 142.119 us, total = 284.237 us, Queueing time: mean = 702.200 us, max = 1.287 ms, min = 117.240 us, total = 1.404 ms [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 1.320 ms, total = 2.640 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 1 total (0 active), Execution time: mean = 235.496 us, total = 235.496 us, Queueing time: mean = 20.299 us, max = 20.299 us, min = 20.299 us, total = 20.299 us [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 2.002 ms, total = 2.002 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 2.178 ms, total = 2.178 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 1.022 s, total = 1.022 s, Queueing time: mean = 126.779 us, max = 126.779 us, min = 126.779 us, total = 126.779 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 43.810 us, total = 43.810 us, Queueing time: mean = 116.080 us, max = 116.080 us, min = 116.080 us, total = 116.080 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 1.935 ms, total = 1.935 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 1.278 ms, total = 1.278 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.GCTaskFailureReason - 1 total (1 active), Execution time: mean = 0.000 s, total = 0.000 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 74.750 us, total = 74.750 us, Queueing time: mean = 225.429 us, max = 225.429 us, min = 225.429 us, total = 225.429 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 1.746 ms, total = 1.746 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 242.297 us, total = 242.297 us, Queueing time: mean = 122.977 us, max = 122.977 us, min = 122.977 us, total = 122.977 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 92.080 us, total = 92.080 us, Queueing time: mean = 36.492 us, max = 36.492 us, min = 36.492 us, total = 36.492 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 288.995 us, total = 288.995 us, Queueing time: mean = 122.868 us, max = 122.868 us, min = 122.868 us, total = 122.868 us [state-dump] DebugString() time ms: 1 [state-dump] [state-dump] [2025-01-20 23:07:54,904 I 13636 13664] (raylet) store.cc:564: Plasma store debug dump: Current usage: 0 / 2.14748 GB - num bytes created total: 168 0 pending objects of total size 0MB - objects spillable: 0 - bytes spillable: 0 - objects unsealed: 0 - bytes unsealed: 0 - objects in use: 0 - bytes in use: 0 - objects evictable: 0 - bytes evictable: 0 - objects created by worker: 0 - bytes created by worker: 0 - objects restored: 0 - bytes restored: 0 - objects received: 0 - bytes received: 0 - objects errored: 0 - bytes errored: 0 [2025-01-20 23:07:55,941 I 13636 13636] (raylet) node_manager.cc:525: [state-dump] NodeManager: [state-dump] Node ID: 94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3 [state-dump] Node name: 192.168.0.2 [state-dump] InitialConfigResources: {object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, GPU: 20000, node:__internal_head__: 10000, accelerator_type:A40: 10000, memory: 844966629380000} [state-dump] ClusterTaskManager: [state-dump] ========== Node: 94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3 ================= [state-dump] Infeasible queue length: 0 [state-dump] Schedule queue length: 0 [state-dump] Dispatch queue length: 0 [state-dump] num_waiting_for_resource: 0 [state-dump] num_waiting_for_plasma_memory: 0 [state-dump] num_waiting_for_remote_node_resources: 0 [state-dump] num_worker_not_started_by_job_config_not_exist: 0 [state-dump] num_worker_not_started_by_registration_timeout: 0 [state-dump] num_tasks_waiting_for_workers: 0 [state-dump] num_cancelled_tasks: 0 [state-dump] cluster_resource_scheduler state: [state-dump] Local id: -2729954820020547849 Local resources: {"total":{CPU: [200000], memory: [844966629380000], accelerator_type:A40: [10000], node:192.168.0.2: [10000], node:__internal_head__: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000]}}, "available": {CPU: [200000], memory: [844966629380000], accelerator_type:A40: [10000], node:192.168.0.2: [10000], node:__internal_head__: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000]}}, "labels":{"ray.io/node_id":"94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3",} is_draining: 0 is_idle: 1 Cluster resources: node id: -2729954820020547849{"total":{object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, GPU: 20000, node:__internal_head__: 10000, accelerator_type:A40: 10000, memory: 844966629380000}}, "available": {GPU: 20000, node:__internal_head__: 10000, memory: 844966629380000, accelerator_type:A40: 10000, object_store_memory: 21474836480000, node:192.168.0.2: 10000, CPU: 200000}}, "labels":{"ray.io/node_id":"94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} [state-dump] Waiting tasks size: 0 [state-dump] Number of executing tasks: 0 [state-dump] Number of pinned task arguments: 0 [state-dump] Number of total spilled tasks: 0 [state-dump] Number of spilled waiting tasks: 0 [state-dump] Number of spilled unschedulable tasks: 0 [state-dump] Resource usage { [state-dump] } [state-dump] Backlog Size per scheduling descriptor :{workerId: num backlogs}: [state-dump] [state-dump] Running tasks by scheduling class: [state-dump] ================================================== [state-dump] [state-dump] ClusterResources: [state-dump] LocalObjectManager: [state-dump] - num pinned objects: 0 [state-dump] - pinned objects size: 0 [state-dump] - num objects pending restore: 0 [state-dump] - num objects pending spill: 0 [state-dump] - num bytes pending spill: 0 [state-dump] - num bytes currently spilled: 0 [state-dump] - cumulative spill requests: 0 [state-dump] - cumulative restore requests: 0 [state-dump] - spilled objects pending delete: 0 [state-dump] [state-dump] ObjectManager: [state-dump] - num local objects: 0 [state-dump] - num unfulfilled push requests: 0 [state-dump] - num object pull requests: 0 [state-dump] - num chunks received total: 0 [state-dump] - num chunks received failed (all): 0 [state-dump] - num chunks received failed / cancelled: 0 [state-dump] - num chunks received failed / plasma error: 0 [state-dump] Event stats: [state-dump] Global stats: 0 total (0 active) [state-dump] Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Execution time: mean = -nan s, total = 0.000 s [state-dump] Event stats: [state-dump] PushManager: [state-dump] - num pushes in flight: 0 [state-dump] - num chunks in flight: 0 [state-dump] - num chunks remaining: 0 [state-dump] - max chunks allowed: 409 [state-dump] OwnershipBasedObjectDirectory: [state-dump] - num listeners: 0 [state-dump] - cumulative location updates: 0 [state-dump] - num location updates per second: 0.000 [state-dump] - num location lookups per second: 0.000 [state-dump] - num locations added per second: 0.000 [state-dump] - num locations removed per second: 0.000 [state-dump] BufferPool: [state-dump] - create buffer state map size: 0 [state-dump] PullManager: [state-dump] - num bytes available for pulled objects: 2147483648 [state-dump] - num bytes being pulled (all): 0 [state-dump] - num bytes being pulled / pinned: 0 [state-dump] - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - first get request bundle: N/A [state-dump] - first wait request bundle: N/A [state-dump] - first task request bundle: N/A [state-dump] - num objects queued: 0 [state-dump] - num objects actively pulled (all): 0 [state-dump] - num objects actively pulled / pinned: 0 [state-dump] - num bundles being pulled: 0 [state-dump] - num pull retries: 0 [state-dump] - max timeout seconds: 0 [state-dump] - max timeout request is already processed. No entry. [state-dump] [state-dump] WorkerPool: [state-dump] - registered jobs: 1 [state-dump] - process_failed_job_config_missing: 0 [state-dump] - process_failed_rate_limited: 0 [state-dump] - process_failed_pending_registration: 0 [state-dump] - process_failed_runtime_env_setup_failed: 0 [state-dump] - num PYTHON workers: 20 [state-dump] - num PYTHON drivers: 1 [state-dump] - num PYTHON pending start requests: 0 [state-dump] - num PYTHON pending registration requests: 0 [state-dump] - num object spill callbacks queued: 0 [state-dump] - num object restore queued: 0 [state-dump] - num util functions queued: 0 [state-dump] - num idle workers: 20 [state-dump] TaskDependencyManager: [state-dump] - task deps map size: 0 [state-dump] - get req map size: 0 [state-dump] - wait req map size: 0 [state-dump] - local objects map size: 0 [state-dump] WaitManager: [state-dump] - num active wait requests: 0 [state-dump] Subscriber: [state-dump] Channel WORKER_OBJECT_LOCATIONS_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_OBJECT_EVICTION [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_REF_REMOVED_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] num async plasma notifications: 0 [state-dump] Remote node managers: [state-dump] Event stats: [state-dump] Global stats: 31775 total (35 active) [state-dump] Queueing time: mean = 2.058 ms, max = 59.826 s, min = -0.000 s, total = 65.390 s [state-dump] Execution time: mean = 248.590 us, total = 7.899 s [state-dump] Event stats: [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog - 7556 total (0 active), Execution time: mean = 551.221 us, total = 4.165 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 7556 total (0 active), Execution time: mean = 40.926 us, total = 309.237 ms, Queueing time: mean = 112.416 us, max = 1.815 ms, min = 8.302 us, total = 849.418 ms [state-dump] NodeManager.CheckGC - 3597 total (1 active), Execution time: mean = 2.856 us, total = 10.271 ms, Queueing time: mean = 106.837 us, max = 25.875 ms, min = 10.923 us, total = 384.292 ms [state-dump] RaySyncer.OnDemandBroadcasting - 3597 total (1 active), Execution time: mean = 11.560 us, total = 41.582 ms, Queueing time: mean = 99.030 us, max = 25.869 ms, min = 11.266 us, total = 356.211 ms [state-dump] ObjectManager.UpdateAvailableMemory - 3597 total (0 active), Execution time: mean = 6.269 us, total = 22.549 ms, Queueing time: mean = 113.523 us, max = 708.376 us, min = 3.203 us, total = 408.343 ms [state-dump] RayletWorkerPool.deadline_timer.kill_idle_workers - 1800 total (1 active), Execution time: mean = 18.688 us, total = 33.639 ms, Queueing time: mean = 92.232 us, max = 26.386 ms, min = 10.737 us, total = 166.018 ms [state-dump] MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 1438 total (1 active), Execution time: mean = 456.348 us, total = 656.228 ms, Queueing time: mean = 76.828 us, max = 1.481 ms, min = -0.000 s, total = 110.479 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad - 360 total (0 active), Execution time: mean = 634.285 us, total = 228.342 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.ScheduleAndDispatchTasks - 360 total (1 active), Execution time: mean = 16.110 us, total = 5.800 ms, Queueing time: mean = 83.038 us, max = 2.581 ms, min = 10.666 us, total = 29.894 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 360 total (0 active), Execution time: mean = 103.172 us, total = 37.142 ms, Queueing time: mean = 115.506 us, max = 238.260 us, min = 18.847 us, total = 41.582 ms [state-dump] NodeManager.deadline_timer.spill_objects_when_over_threshold - 360 total (1 active), Execution time: mean = 3.070 us, total = 1.105 ms, Queueing time: mean = 172.625 us, max = 1.319 ms, min = 5.573 us, total = 62.145 ms [state-dump] NodeManager.deadline_timer.flush_free_objects - 360 total (1 active), Execution time: mean = 8.703 us, total = 3.133 ms, Queueing time: mean = 168.694 us, max = 1.325 ms, min = 6.780 us, total = 60.730 ms [state-dump] ClusterResourceManager.ResetRemoteNodeView - 121 total (1 active), Execution time: mean = 9.113 us, total = 1.103 ms, Queueing time: mean = 83.552 us, max = 283.776 us, min = 24.809 us, total = 10.110 ms [state-dump] ClientConnection.async_read.ProcessMessageHeader - 95 total (21 active), Execution time: mean = 7.620 us, total = 723.861 us, Queueing time: mean = 660.815 ms, max = 59.826 s, min = 29.537 us, total = 62.777 s [state-dump] ClientConnection.async_read.ProcessMessage - 74 total (0 active), Execution time: mean = 752.373 us, total = 55.676 ms, Queueing time: mean = 18.898 us, max = 247.615 us, min = 4.453 us, total = 1.398 ms [state-dump] NodeManager.deadline_timer.record_metrics - 72 total (1 active), Execution time: mean = 545.327 us, total = 39.264 ms, Queueing time: mean = 324.093 us, max = 1.047 ms, min = 11.295 us, total = 23.335 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 72 total (0 active), Execution time: mean = 1.487 ms, total = 107.029 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.GcsCheckAlive - 72 total (1 active), Execution time: mean = 268.939 us, total = 19.364 ms, Queueing time: mean = 595.307 us, max = 1.280 ms, min = 34.942 us, total = 42.862 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 72 total (0 active), Execution time: mean = 51.047 us, total = 3.675 ms, Queueing time: mean = 112.309 us, max = 191.489 us, min = 16.013 us, total = 8.086 ms [state-dump] NodeManager.deadline_timer.debug_state_dump - 36 total (1 active), Execution time: mean = 1.645 ms, total = 59.228 ms, Queueing time: mean = 64.114 us, max = 152.067 us, min = 18.037 us, total = 2.308 ms [state-dump] ClientConnection.async_write.DoAsyncWrites - 22 total (0 active), Execution time: mean = 2.129 us, total = 46.834 us, Queueing time: mean = 43.475 us, max = 264.443 us, min = 17.727 us, total = 956.459 us [state-dump] ObjectManager.ObjectDeleted - 21 total (0 active), Execution time: mean = 18.654 us, total = 391.728 us, Queueing time: mean = 66.932 us, max = 269.040 us, min = 24.280 us, total = 1.406 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 21 total (0 active), Execution time: mean = 119.783 us, total = 2.515 ms, Queueing time: mean = 119.709 us, max = 193.975 us, min = 26.529 us, total = 2.514 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig - 21 total (0 active), Execution time: mean = 800.451 us, total = 16.809 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ObjectManager.ObjectAdded - 21 total (0 active), Execution time: mean = 13.522 us, total = 283.961 us, Queueing time: mean = 115.200 us, max = 203.067 us, min = 15.853 us, total = 2.419 ms [state-dump] PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 204.482 us, total = 2.658 ms, Queueing time: mean = 3.109 ms, max = 9.915 ms, min = 28.359 us, total = 40.413 ms [state-dump] - 12 total (0 active), Execution time: mean = 1.109 us, total = 13.308 us, Queueing time: mean = 117.235 us, max = 207.332 us, min = 32.817 us, total = 1.407 ms [state-dump] RaySyncer.BroadcastMessage - 12 total (0 active), Execution time: mean = 229.790 us, total = 2.757 ms, Queueing time: mean = 796.500 ns, max = 1.330 us, min = 180.000 ns, total = 9.558 us [state-dump] NodeManagerService.grpc_server.RequestWorkerLease.HandleRequestImpl - 10 total (0 active), Execution time: mean = 133.400 us, total = 1.334 ms, Queueing time: mean = 133.654 us, max = 258.016 us, min = 50.206 us, total = 1.337 ms [state-dump] NodeManagerService.grpc_server.RequestWorkerLease - 10 total (0 active), Execution time: mean = 967.372 us, total = 9.674 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReturnWorker - 10 total (0 active), Execution time: mean = 698.937 us, total = 6.989 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] WorkerPool.PopWorkerCallback - 10 total (0 active), Execution time: mean = 24.488 us, total = 244.878 us, Queueing time: mean = 114.997 us, max = 206.869 us, min = 23.653 us, total = 1.150 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker.HandleRequestImpl - 10 total (0 active), Execution time: mean = 129.906 us, total = 1.299 ms, Queueing time: mean = 109.463 us, max = 137.712 us, min = 31.332 us, total = 1.095 ms [state-dump] NodeManager.deadline_timer.print_event_loop_stats - 6 total (1 active, 1 running), Execution time: mean = 2.341 ms, total = 14.045 ms, Queueing time: mean = 56.604 us, max = 103.030 us, min = 45.223 us, total = 339.626 us [state-dump] RaySyncerRegister - 2 total (0 active), Execution time: mean = 2.053 us, total = 4.106 us, Queueing time: mean = 222.000 ns, max = 369.000 ns, min = 75.000 ns, total = 444.000 ns [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 2 total (1 active), Execution time: mean = 502.391 ms, total = 1.005 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 142.119 us, total = 284.237 us, Queueing time: mean = 702.200 us, max = 1.287 ms, min = 117.240 us, total = 1.404 ms [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 1.320 ms, total = 2.640 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 1 total (0 active), Execution time: mean = 235.496 us, total = 235.496 us, Queueing time: mean = 20.299 us, max = 20.299 us, min = 20.299 us, total = 20.299 us [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 2.002 ms, total = 2.002 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 2.178 ms, total = 2.178 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 1.022 s, total = 1.022 s, Queueing time: mean = 126.779 us, max = 126.779 us, min = 126.779 us, total = 126.779 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 43.810 us, total = 43.810 us, Queueing time: mean = 116.080 us, max = 116.080 us, min = 116.080 us, total = 116.080 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 1.935 ms, total = 1.935 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 1.278 ms, total = 1.278 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.GCTaskFailureReason - 1 total (1 active), Execution time: mean = 0.000 s, total = 0.000 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 74.750 us, total = 74.750 us, Queueing time: mean = 225.429 us, max = 225.429 us, min = 225.429 us, total = 225.429 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 1.746 ms, total = 1.746 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 242.297 us, total = 242.297 us, Queueing time: mean = 122.977 us, max = 122.977 us, min = 122.977 us, total = 122.977 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 92.080 us, total = 92.080 us, Queueing time: mean = 36.492 us, max = 36.492 us, min = 36.492 us, total = 36.492 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 288.995 us, total = 288.995 us, Queueing time: mean = 122.868 us, max = 122.868 us, min = 122.868 us, total = 122.868 us [state-dump] DebugString() time ms: 1 [state-dump] [state-dump] [2025-01-20 23:08:54,904 I 13636 13664] (raylet) store.cc:564: Plasma store debug dump: Current usage: 0 / 2.14748 GB - num bytes created total: 168 0 pending objects of total size 0MB - objects spillable: 0 - bytes spillable: 0 - objects unsealed: 0 - bytes unsealed: 0 - objects in use: 0 - bytes in use: 0 - objects evictable: 0 - bytes evictable: 0 - objects created by worker: 0 - bytes created by worker: 0 - objects restored: 0 - bytes restored: 0 - objects received: 0 - bytes received: 0 - objects errored: 0 - bytes errored: 0 [2025-01-20 23:08:55,944 I 13636 13636] (raylet) node_manager.cc:525: [state-dump] NodeManager: [state-dump] Node ID: 94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3 [state-dump] Node name: 192.168.0.2 [state-dump] InitialConfigResources: {object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, GPU: 20000, node:__internal_head__: 10000, accelerator_type:A40: 10000, memory: 844966629380000} [state-dump] ClusterTaskManager: [state-dump] ========== Node: 94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3 ================= [state-dump] Infeasible queue length: 0 [state-dump] Schedule queue length: 0 [state-dump] Dispatch queue length: 0 [state-dump] num_waiting_for_resource: 0 [state-dump] num_waiting_for_plasma_memory: 0 [state-dump] num_waiting_for_remote_node_resources: 0 [state-dump] num_worker_not_started_by_job_config_not_exist: 0 [state-dump] num_worker_not_started_by_registration_timeout: 0 [state-dump] num_tasks_waiting_for_workers: 0 [state-dump] num_cancelled_tasks: 0 [state-dump] cluster_resource_scheduler state: [state-dump] Local id: -2729954820020547849 Local resources: {"total":{CPU: [200000], memory: [844966629380000], accelerator_type:A40: [10000], node:192.168.0.2: [10000], node:__internal_head__: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000]}}, "available": {CPU: [200000], memory: [844966629380000], accelerator_type:A40: [10000], node:192.168.0.2: [10000], node:__internal_head__: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000]}}, "labels":{"ray.io/node_id":"94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3",} is_draining: 0 is_idle: 1 Cluster resources: node id: -2729954820020547849{"total":{object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, GPU: 20000, node:__internal_head__: 10000, accelerator_type:A40: 10000, memory: 844966629380000}}, "available": {GPU: 20000, node:__internal_head__: 10000, memory: 844966629380000, accelerator_type:A40: 10000, object_store_memory: 21474836480000, node:192.168.0.2: 10000, CPU: 200000}}, "labels":{"ray.io/node_id":"94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} [state-dump] Waiting tasks size: 0 [state-dump] Number of executing tasks: 0 [state-dump] Number of pinned task arguments: 0 [state-dump] Number of total spilled tasks: 0 [state-dump] Number of spilled waiting tasks: 0 [state-dump] Number of spilled unschedulable tasks: 0 [state-dump] Resource usage { [state-dump] } [state-dump] Backlog Size per scheduling descriptor :{workerId: num backlogs}: [state-dump] [state-dump] Running tasks by scheduling class: [state-dump] ================================================== [state-dump] [state-dump] ClusterResources: [state-dump] LocalObjectManager: [state-dump] - num pinned objects: 0 [state-dump] - pinned objects size: 0 [state-dump] - num objects pending restore: 0 [state-dump] - num objects pending spill: 0 [state-dump] - num bytes pending spill: 0 [state-dump] - num bytes currently spilled: 0 [state-dump] - cumulative spill requests: 0 [state-dump] - cumulative restore requests: 0 [state-dump] - spilled objects pending delete: 0 [state-dump] [state-dump] ObjectManager: [state-dump] - num local objects: 0 [state-dump] - num unfulfilled push requests: 0 [state-dump] - num object pull requests: 0 [state-dump] - num chunks received total: 0 [state-dump] - num chunks received failed (all): 0 [state-dump] - num chunks received failed / cancelled: 0 [state-dump] - num chunks received failed / plasma error: 0 [state-dump] Event stats: [state-dump] Global stats: 0 total (0 active) [state-dump] Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Execution time: mean = -nan s, total = 0.000 s [state-dump] Event stats: [state-dump] PushManager: [state-dump] - num pushes in flight: 0 [state-dump] - num chunks in flight: 0 [state-dump] - num chunks remaining: 0 [state-dump] - max chunks allowed: 409 [state-dump] OwnershipBasedObjectDirectory: [state-dump] - num listeners: 0 [state-dump] - cumulative location updates: 0 [state-dump] - num location updates per second: 0.000 [state-dump] - num location lookups per second: 0.000 [state-dump] - num locations added per second: 0.000 [state-dump] - num locations removed per second: 0.000 [state-dump] BufferPool: [state-dump] - create buffer state map size: 0 [state-dump] PullManager: [state-dump] - num bytes available for pulled objects: 2147483648 [state-dump] - num bytes being pulled (all): 0 [state-dump] - num bytes being pulled / pinned: 0 [state-dump] - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - first get request bundle: N/A [state-dump] - first wait request bundle: N/A [state-dump] - first task request bundle: N/A [state-dump] - num objects queued: 0 [state-dump] - num objects actively pulled (all): 0 [state-dump] - num objects actively pulled / pinned: 0 [state-dump] - num bundles being pulled: 0 [state-dump] - num pull retries: 0 [state-dump] - max timeout seconds: 0 [state-dump] - max timeout request is already processed. No entry. [state-dump] [state-dump] WorkerPool: [state-dump] - registered jobs: 1 [state-dump] - process_failed_job_config_missing: 0 [state-dump] - process_failed_rate_limited: 0 [state-dump] - process_failed_pending_registration: 0 [state-dump] - process_failed_runtime_env_setup_failed: 0 [state-dump] - num PYTHON workers: 20 [state-dump] - num PYTHON drivers: 1 [state-dump] - num PYTHON pending start requests: 0 [state-dump] - num PYTHON pending registration requests: 0 [state-dump] - num object spill callbacks queued: 0 [state-dump] - num object restore queued: 0 [state-dump] - num util functions queued: 0 [state-dump] - num idle workers: 20 [state-dump] TaskDependencyManager: [state-dump] - task deps map size: 0 [state-dump] - get req map size: 0 [state-dump] - wait req map size: 0 [state-dump] - local objects map size: 0 [state-dump] WaitManager: [state-dump] - num active wait requests: 0 [state-dump] Subscriber: [state-dump] Channel WORKER_OBJECT_LOCATIONS_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_OBJECT_EVICTION [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_REF_REMOVED_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] num async plasma notifications: 0 [state-dump] Remote node managers: [state-dump] Event stats: [state-dump] Global stats: 36997 total (37 active) [state-dump] Queueing time: mean = 1.779 ms, max = 59.826 s, min = -0.000 s, total = 65.823 s [state-dump] Execution time: mean = 239.746 us, total = 8.870 s [state-dump] Event stats: [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog - 8812 total (1 active), Execution time: mean = 552.215 us, total = 4.866 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 8812 total (1 active), Execution time: mean = 40.661 us, total = 358.303 ms, Queueing time: mean = 113.663 us, max = 1.815 ms, min = 8.302 us, total = 1.002 s [state-dump] NodeManager.CheckGC - 4196 total (1 active), Execution time: mean = 2.886 us, total = 12.111 ms, Queueing time: mean = 106.247 us, max = 25.875 ms, min = 10.923 us, total = 445.811 ms [state-dump] RaySyncer.OnDemandBroadcasting - 4196 total (1 active), Execution time: mean = 11.593 us, total = 48.643 ms, Queueing time: mean = 98.447 us, max = 25.869 ms, min = 11.266 us, total = 413.085 ms [state-dump] ObjectManager.UpdateAvailableMemory - 4196 total (0 active), Execution time: mean = 6.338 us, total = 26.596 ms, Queueing time: mean = 113.960 us, max = 708.376 us, min = 3.203 us, total = 478.176 ms [state-dump] RayletWorkerPool.deadline_timer.kill_idle_workers - 2099 total (1 active), Execution time: mean = 19.069 us, total = 40.026 ms, Queueing time: mean = 90.560 us, max = 26.386 ms, min = 10.737 us, total = 190.086 ms [state-dump] MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 1677 total (1 active), Execution time: mean = 457.611 us, total = 767.414 ms, Queueing time: mean = 76.907 us, max = 1.481 ms, min = -0.000 s, total = 128.973 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad - 420 total (0 active), Execution time: mean = 635.381 us, total = 266.860 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.ScheduleAndDispatchTasks - 420 total (1 active), Execution time: mean = 16.191 us, total = 6.800 ms, Queueing time: mean = 82.063 us, max = 2.581 ms, min = 10.666 us, total = 34.466 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 420 total (0 active), Execution time: mean = 103.955 us, total = 43.661 ms, Queueing time: mean = 116.102 us, max = 238.260 us, min = 18.847 us, total = 48.763 ms [state-dump] NodeManager.deadline_timer.spill_objects_when_over_threshold - 420 total (1 active), Execution time: mean = 3.099 us, total = 1.302 ms, Queueing time: mean = 174.969 us, max = 1.319 ms, min = 5.573 us, total = 73.487 ms [state-dump] NodeManager.deadline_timer.flush_free_objects - 420 total (1 active), Execution time: mean = 8.880 us, total = 3.730 ms, Queueing time: mean = 170.965 us, max = 1.325 ms, min = 6.780 us, total = 71.805 ms [state-dump] ClusterResourceManager.ResetRemoteNodeView - 141 total (1 active), Execution time: mean = 9.024 us, total = 1.272 ms, Queueing time: mean = 81.195 us, max = 283.776 us, min = 24.809 us, total = 11.448 ms [state-dump] ClientConnection.async_read.ProcessMessageHeader - 95 total (21 active), Execution time: mean = 7.620 us, total = 723.861 us, Queueing time: mean = 660.815 ms, max = 59.826 s, min = 29.537 us, total = 62.777 s [state-dump] NodeManager.deadline_timer.record_metrics - 84 total (1 active), Execution time: mean = 552.912 us, total = 46.445 ms, Queueing time: mean = 329.566 us, max = 1.144 ms, min = 11.295 us, total = 27.684 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 84 total (0 active), Execution time: mean = 1.495 ms, total = 125.603 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.GcsCheckAlive - 84 total (1 active), Execution time: mean = 272.060 us, total = 22.853 ms, Queueing time: mean = 605.720 us, max = 1.358 ms, min = 34.942 us, total = 50.881 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 84 total (0 active), Execution time: mean = 51.986 us, total = 4.367 ms, Queueing time: mean = 113.774 us, max = 193.657 us, min = 16.013 us, total = 9.557 ms [state-dump] ClientConnection.async_read.ProcessMessage - 74 total (0 active), Execution time: mean = 752.373 us, total = 55.676 ms, Queueing time: mean = 18.898 us, max = 247.615 us, min = 4.453 us, total = 1.398 ms [state-dump] NodeManager.deadline_timer.debug_state_dump - 42 total (1 active), Execution time: mean = 1.676 ms, total = 70.404 ms, Queueing time: mean = 65.061 us, max = 152.067 us, min = 18.037 us, total = 2.733 ms [state-dump] ClientConnection.async_write.DoAsyncWrites - 22 total (0 active), Execution time: mean = 2.129 us, total = 46.834 us, Queueing time: mean = 43.475 us, max = 264.443 us, min = 17.727 us, total = 956.459 us [state-dump] ObjectManager.ObjectDeleted - 21 total (0 active), Execution time: mean = 18.654 us, total = 391.728 us, Queueing time: mean = 66.932 us, max = 269.040 us, min = 24.280 us, total = 1.406 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig - 21 total (0 active), Execution time: mean = 800.451 us, total = 16.809 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ObjectManager.ObjectAdded - 21 total (0 active), Execution time: mean = 13.522 us, total = 283.961 us, Queueing time: mean = 115.200 us, max = 203.067 us, min = 15.853 us, total = 2.419 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 21 total (0 active), Execution time: mean = 119.783 us, total = 2.515 ms, Queueing time: mean = 119.709 us, max = 193.975 us, min = 26.529 us, total = 2.514 ms [state-dump] PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 204.482 us, total = 2.658 ms, Queueing time: mean = 3.109 ms, max = 9.915 ms, min = 28.359 us, total = 40.413 ms [state-dump] - 12 total (0 active), Execution time: mean = 1.109 us, total = 13.308 us, Queueing time: mean = 117.235 us, max = 207.332 us, min = 32.817 us, total = 1.407 ms [state-dump] RaySyncer.BroadcastMessage - 12 total (0 active), Execution time: mean = 229.790 us, total = 2.757 ms, Queueing time: mean = 796.500 ns, max = 1.330 us, min = 180.000 ns, total = 9.558 us [state-dump] NodeManagerService.grpc_server.RequestWorkerLease - 10 total (0 active), Execution time: mean = 967.372 us, total = 9.674 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] WorkerPool.PopWorkerCallback - 10 total (0 active), Execution time: mean = 24.488 us, total = 244.878 us, Queueing time: mean = 114.997 us, max = 206.869 us, min = 23.653 us, total = 1.150 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker - 10 total (0 active), Execution time: mean = 698.937 us, total = 6.989 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReturnWorker.HandleRequestImpl - 10 total (0 active), Execution time: mean = 129.906 us, total = 1.299 ms, Queueing time: mean = 109.463 us, max = 137.712 us, min = 31.332 us, total = 1.095 ms [state-dump] NodeManagerService.grpc_server.RequestWorkerLease.HandleRequestImpl - 10 total (0 active), Execution time: mean = 133.400 us, total = 1.334 ms, Queueing time: mean = 133.654 us, max = 258.016 us, min = 50.206 us, total = 1.337 ms [state-dump] NodeManager.deadline_timer.print_event_loop_stats - 7 total (1 active, 1 running), Execution time: mean = 2.312 ms, total = 16.181 ms, Queueing time: mean = 55.754 us, max = 103.030 us, min = 45.223 us, total = 390.276 us [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 1.320 ms, total = 2.640 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] RaySyncerRegister - 2 total (0 active), Execution time: mean = 2.053 us, total = 4.106 us, Queueing time: mean = 222.000 ns, max = 369.000 ns, min = 75.000 ns, total = 444.000 ns [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 142.119 us, total = 284.237 us, Queueing time: mean = 702.200 us, max = 1.287 ms, min = 117.240 us, total = 1.404 ms [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 2 total (1 active), Execution time: mean = 502.391 ms, total = 1.005 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 288.995 us, total = 288.995 us, Queueing time: mean = 122.868 us, max = 122.868 us, min = 122.868 us, total = 122.868 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 43.810 us, total = 43.810 us, Queueing time: mean = 116.080 us, max = 116.080 us, min = 116.080 us, total = 116.080 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 92.080 us, total = 92.080 us, Queueing time: mean = 36.492 us, max = 36.492 us, min = 36.492 us, total = 36.492 us [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 1 total (0 active), Execution time: mean = 235.496 us, total = 235.496 us, Queueing time: mean = 20.299 us, max = 20.299 us, min = 20.299 us, total = 20.299 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 1.746 ms, total = 1.746 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 74.750 us, total = 74.750 us, Queueing time: mean = 225.429 us, max = 225.429 us, min = 225.429 us, total = 225.429 us [state-dump] NodeManager.GCTaskFailureReason - 1 total (1 active), Execution time: mean = 0.000 s, total = 0.000 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 242.297 us, total = 242.297 us, Queueing time: mean = 122.977 us, max = 122.977 us, min = 122.977 us, total = 122.977 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 1.935 ms, total = 1.935 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 1.022 s, total = 1.022 s, Queueing time: mean = 126.779 us, max = 126.779 us, min = 126.779 us, total = 126.779 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 2.178 ms, total = 2.178 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 1.278 ms, total = 1.278 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 2.002 ms, total = 2.002 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] DebugString() time ms: 1 [state-dump] [state-dump] [2025-01-20 23:09:54,905 I 13636 13664] (raylet) store.cc:564: Plasma store debug dump: Current usage: 0 / 2.14748 GB - num bytes created total: 168 0 pending objects of total size 0MB - objects spillable: 0 - bytes spillable: 0 - objects unsealed: 0 - bytes unsealed: 0 - objects in use: 0 - bytes in use: 0 - objects evictable: 0 - bytes evictable: 0 - objects created by worker: 0 - bytes created by worker: 0 - objects restored: 0 - bytes restored: 0 - objects received: 0 - bytes received: 0 - objects errored: 0 - bytes errored: 0 [2025-01-20 23:09:55,946 I 13636 13636] (raylet) node_manager.cc:525: [state-dump] NodeManager: [state-dump] Node ID: 94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3 [state-dump] Node name: 192.168.0.2 [state-dump] InitialConfigResources: {object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, GPU: 20000, node:__internal_head__: 10000, accelerator_type:A40: 10000, memory: 844966629380000} [state-dump] ClusterTaskManager: [state-dump] ========== Node: 94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3 ================= [state-dump] Infeasible queue length: 0 [state-dump] Schedule queue length: 0 [state-dump] Dispatch queue length: 0 [state-dump] num_waiting_for_resource: 0 [state-dump] num_waiting_for_plasma_memory: 0 [state-dump] num_waiting_for_remote_node_resources: 0 [state-dump] num_worker_not_started_by_job_config_not_exist: 0 [state-dump] num_worker_not_started_by_registration_timeout: 0 [state-dump] num_tasks_waiting_for_workers: 0 [state-dump] num_cancelled_tasks: 0 [state-dump] cluster_resource_scheduler state: [state-dump] Local id: -2729954820020547849 Local resources: {"total":{CPU: [200000], memory: [844966629380000], accelerator_type:A40: [10000], node:192.168.0.2: [10000], node:__internal_head__: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000]}}, "available": {CPU: [200000], memory: [844966629380000], accelerator_type:A40: [10000], node:192.168.0.2: [10000], node:__internal_head__: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000]}}, "labels":{"ray.io/node_id":"94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3",} is_draining: 0 is_idle: 1 Cluster resources: node id: -2729954820020547849{"total":{object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, GPU: 20000, node:__internal_head__: 10000, accelerator_type:A40: 10000, memory: 844966629380000}}, "available": {GPU: 20000, node:__internal_head__: 10000, memory: 844966629380000, accelerator_type:A40: 10000, object_store_memory: 21474836480000, node:192.168.0.2: 10000, CPU: 200000}}, "labels":{"ray.io/node_id":"94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} [state-dump] Waiting tasks size: 0 [state-dump] Number of executing tasks: 0 [state-dump] Number of pinned task arguments: 0 [state-dump] Number of total spilled tasks: 0 [state-dump] Number of spilled waiting tasks: 0 [state-dump] Number of spilled unschedulable tasks: 0 [state-dump] Resource usage { [state-dump] } [state-dump] Backlog Size per scheduling descriptor :{workerId: num backlogs}: [state-dump] [state-dump] Running tasks by scheduling class: [state-dump] ================================================== [state-dump] [state-dump] ClusterResources: [state-dump] LocalObjectManager: [state-dump] - num pinned objects: 0 [state-dump] - pinned objects size: 0 [state-dump] - num objects pending restore: 0 [state-dump] - num objects pending spill: 0 [state-dump] - num bytes pending spill: 0 [state-dump] - num bytes currently spilled: 0 [state-dump] - cumulative spill requests: 0 [state-dump] - cumulative restore requests: 0 [state-dump] - spilled objects pending delete: 0 [state-dump] [state-dump] ObjectManager: [state-dump] - num local objects: 0 [state-dump] - num unfulfilled push requests: 0 [state-dump] - num object pull requests: 0 [state-dump] - num chunks received total: 0 [state-dump] - num chunks received failed (all): 0 [state-dump] - num chunks received failed / cancelled: 0 [state-dump] - num chunks received failed / plasma error: 0 [state-dump] Event stats: [state-dump] Global stats: 0 total (0 active) [state-dump] Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Execution time: mean = -nan s, total = 0.000 s [state-dump] Event stats: [state-dump] PushManager: [state-dump] - num pushes in flight: 0 [state-dump] - num chunks in flight: 0 [state-dump] - num chunks remaining: 0 [state-dump] - max chunks allowed: 409 [state-dump] OwnershipBasedObjectDirectory: [state-dump] - num listeners: 0 [state-dump] - cumulative location updates: 0 [state-dump] - num location updates per second: 0.000 [state-dump] - num location lookups per second: 0.000 [state-dump] - num locations added per second: 0.000 [state-dump] - num locations removed per second: 0.000 [state-dump] BufferPool: [state-dump] - create buffer state map size: 0 [state-dump] PullManager: [state-dump] - num bytes available for pulled objects: 2147483648 [state-dump] - num bytes being pulled (all): 0 [state-dump] - num bytes being pulled / pinned: 0 [state-dump] - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - first get request bundle: N/A [state-dump] - first wait request bundle: N/A [state-dump] - first task request bundle: N/A [state-dump] - num objects queued: 0 [state-dump] - num objects actively pulled (all): 0 [state-dump] - num objects actively pulled / pinned: 0 [state-dump] - num bundles being pulled: 0 [state-dump] - num pull retries: 0 [state-dump] - max timeout seconds: 0 [state-dump] - max timeout request is already processed. No entry. [state-dump] [state-dump] WorkerPool: [state-dump] - registered jobs: 1 [state-dump] - process_failed_job_config_missing: 0 [state-dump] - process_failed_rate_limited: 0 [state-dump] - process_failed_pending_registration: 0 [state-dump] - process_failed_runtime_env_setup_failed: 0 [state-dump] - num PYTHON workers: 20 [state-dump] - num PYTHON drivers: 1 [state-dump] - num PYTHON pending start requests: 0 [state-dump] - num PYTHON pending registration requests: 0 [state-dump] - num object spill callbacks queued: 0 [state-dump] - num object restore queued: 0 [state-dump] - num util functions queued: 0 [state-dump] - num idle workers: 20 [state-dump] TaskDependencyManager: [state-dump] - task deps map size: 0 [state-dump] - get req map size: 0 [state-dump] - wait req map size: 0 [state-dump] - local objects map size: 0 [state-dump] WaitManager: [state-dump] - num active wait requests: 0 [state-dump] Subscriber: [state-dump] Channel WORKER_OBJECT_LOCATIONS_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_OBJECT_EVICTION [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_REF_REMOVED_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] num async plasma notifications: 0 [state-dump] Remote node managers: [state-dump] Event stats: [state-dump] Global stats: 42221 total (35 active) [state-dump] Queueing time: mean = 1.569 ms, max = 59.826 s, min = -0.000 s, total = 66.235 s [state-dump] Execution time: mean = 232.240 us, total = 9.805 s [state-dump] Event stats: [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog - 10069 total (0 active), Execution time: mean = 550.466 us, total = 5.543 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 10069 total (0 active), Execution time: mean = 39.928 us, total = 402.038 ms, Queueing time: mean = 113.600 us, max = 2.644 ms, min = 6.292 us, total = 1.144 s [state-dump] RaySyncer.OnDemandBroadcasting - 4795 total (1 active), Execution time: mean = 11.561 us, total = 55.435 ms, Queueing time: mean = 98.043 us, max = 25.869 ms, min = 11.266 us, total = 470.118 ms [state-dump] NodeManager.CheckGC - 4795 total (1 active), Execution time: mean = 2.899 us, total = 13.902 ms, Queueing time: mean = 105.785 us, max = 25.875 ms, min = 10.923 us, total = 507.240 ms [state-dump] ObjectManager.UpdateAvailableMemory - 4795 total (0 active), Execution time: mean = 6.338 us, total = 30.390 ms, Queueing time: mean = 112.989 us, max = 708.376 us, min = 3.203 us, total = 541.785 ms [state-dump] RayletWorkerPool.deadline_timer.kill_idle_workers - 2399 total (1 active), Execution time: mean = 19.114 us, total = 45.854 ms, Queueing time: mean = 89.294 us, max = 26.386 ms, min = 10.737 us, total = 214.216 ms [state-dump] MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 1917 total (1 active), Execution time: mean = 458.552 us, total = 879.044 ms, Queueing time: mean = 76.514 us, max = 1.481 ms, min = -0.000 s, total = 146.677 ms [state-dump] NodeManager.ScheduleAndDispatchTasks - 480 total (1 active), Execution time: mean = 16.027 us, total = 7.693 ms, Queueing time: mean = 80.655 us, max = 2.581 ms, min = 10.666 us, total = 38.714 ms [state-dump] NodeManager.deadline_timer.flush_free_objects - 480 total (1 active), Execution time: mean = 9.021 us, total = 4.330 ms, Queueing time: mean = 170.733 us, max = 1.325 ms, min = 6.780 us, total = 81.952 ms [state-dump] NodeManager.deadline_timer.spill_objects_when_over_threshold - 480 total (1 active), Execution time: mean = 3.098 us, total = 1.487 ms, Queueing time: mean = 174.840 us, max = 1.319 ms, min = 5.573 us, total = 83.923 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad - 479 total (0 active), Execution time: mean = 633.362 us, total = 303.380 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 479 total (0 active), Execution time: mean = 103.446 us, total = 49.551 ms, Queueing time: mean = 115.599 us, max = 238.260 us, min = 18.847 us, total = 55.372 ms [state-dump] ClusterResourceManager.ResetRemoteNodeView - 161 total (1 active), Execution time: mean = 9.001 us, total = 1.449 ms, Queueing time: mean = 81.517 us, max = 283.776 us, min = 20.926 us, total = 13.124 ms [state-dump] NodeManager.deadline_timer.record_metrics - 96 total (1 active), Execution time: mean = 554.004 us, total = 53.184 ms, Queueing time: mean = 330.448 us, max = 1.144 ms, min = 11.295 us, total = 31.723 ms [state-dump] NodeManager.GcsCheckAlive - 96 total (1 active), Execution time: mean = 273.323 us, total = 26.239 ms, Queueing time: mean = 606.441 us, max = 1.358 ms, min = 34.942 us, total = 58.218 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 96 total (0 active), Execution time: mean = 52.206 us, total = 5.012 ms, Queueing time: mean = 113.381 us, max = 193.657 us, min = 16.013 us, total = 10.885 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 96 total (0 active), Execution time: mean = 1.486 ms, total = 142.698 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ClientConnection.async_read.ProcessMessageHeader - 95 total (21 active), Execution time: mean = 7.620 us, total = 723.861 us, Queueing time: mean = 660.815 ms, max = 59.826 s, min = 29.537 us, total = 62.777 s [state-dump] ClientConnection.async_read.ProcessMessage - 74 total (0 active), Execution time: mean = 752.373 us, total = 55.676 ms, Queueing time: mean = 18.898 us, max = 247.615 us, min = 4.453 us, total = 1.398 ms [state-dump] NodeManager.deadline_timer.debug_state_dump - 48 total (1 active), Execution time: mean = 1.688 ms, total = 81.045 ms, Queueing time: mean = 66.211 us, max = 152.067 us, min = 18.037 us, total = 3.178 ms [state-dump] ClientConnection.async_write.DoAsyncWrites - 22 total (0 active), Execution time: mean = 2.129 us, total = 46.834 us, Queueing time: mean = 43.475 us, max = 264.443 us, min = 17.727 us, total = 956.459 us [state-dump] ObjectManager.ObjectDeleted - 21 total (0 active), Execution time: mean = 18.654 us, total = 391.728 us, Queueing time: mean = 66.932 us, max = 269.040 us, min = 24.280 us, total = 1.406 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 21 total (0 active), Execution time: mean = 119.783 us, total = 2.515 ms, Queueing time: mean = 119.709 us, max = 193.975 us, min = 26.529 us, total = 2.514 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig - 21 total (0 active), Execution time: mean = 800.451 us, total = 16.809 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ObjectManager.ObjectAdded - 21 total (0 active), Execution time: mean = 13.522 us, total = 283.961 us, Queueing time: mean = 115.200 us, max = 203.067 us, min = 15.853 us, total = 2.419 ms [state-dump] PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 204.482 us, total = 2.658 ms, Queueing time: mean = 3.109 ms, max = 9.915 ms, min = 28.359 us, total = 40.413 ms [state-dump] RaySyncer.BroadcastMessage - 12 total (0 active), Execution time: mean = 229.790 us, total = 2.757 ms, Queueing time: mean = 796.500 ns, max = 1.330 us, min = 180.000 ns, total = 9.558 us [state-dump] - 12 total (0 active), Execution time: mean = 1.109 us, total = 13.308 us, Queueing time: mean = 117.235 us, max = 207.332 us, min = 32.817 us, total = 1.407 ms [state-dump] NodeManagerService.grpc_server.RequestWorkerLease - 10 total (0 active), Execution time: mean = 967.372 us, total = 9.674 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.RequestWorkerLease.HandleRequestImpl - 10 total (0 active), Execution time: mean = 133.400 us, total = 1.334 ms, Queueing time: mean = 133.654 us, max = 258.016 us, min = 50.206 us, total = 1.337 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker - 10 total (0 active), Execution time: mean = 698.937 us, total = 6.989 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] WorkerPool.PopWorkerCallback - 10 total (0 active), Execution time: mean = 24.488 us, total = 244.878 us, Queueing time: mean = 114.997 us, max = 206.869 us, min = 23.653 us, total = 1.150 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker.HandleRequestImpl - 10 total (0 active), Execution time: mean = 129.906 us, total = 1.299 ms, Queueing time: mean = 109.463 us, max = 137.712 us, min = 31.332 us, total = 1.095 ms [state-dump] NodeManager.deadline_timer.print_event_loop_stats - 8 total (1 active, 1 running), Execution time: mean = 2.352 ms, total = 18.813 ms, Queueing time: mean = 59.819 us, max = 103.030 us, min = 45.223 us, total = 478.551 us [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 1.320 ms, total = 2.640 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] RaySyncerRegister - 2 total (0 active), Execution time: mean = 2.053 us, total = 4.106 us, Queueing time: mean = 222.000 ns, max = 369.000 ns, min = 75.000 ns, total = 444.000 ns [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 142.119 us, total = 284.237 us, Queueing time: mean = 702.200 us, max = 1.287 ms, min = 117.240 us, total = 1.404 ms [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 2 total (1 active), Execution time: mean = 502.391 ms, total = 1.005 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 92.080 us, total = 92.080 us, Queueing time: mean = 36.492 us, max = 36.492 us, min = 36.492 us, total = 36.492 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 1.746 ms, total = 1.746 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 74.750 us, total = 74.750 us, Queueing time: mean = 225.429 us, max = 225.429 us, min = 225.429 us, total = 225.429 us [state-dump] NodeManager.GCTaskFailureReason - 1 total (1 active), Execution time: mean = 0.000 s, total = 0.000 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 1.935 ms, total = 1.935 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 1.022 s, total = 1.022 s, Queueing time: mean = 126.779 us, max = 126.779 us, min = 126.779 us, total = 126.779 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 2.178 ms, total = 2.178 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 43.810 us, total = 43.810 us, Queueing time: mean = 116.080 us, max = 116.080 us, min = 116.080 us, total = 116.080 us [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 2.002 ms, total = 2.002 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 1.278 ms, total = 1.278 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 242.297 us, total = 242.297 us, Queueing time: mean = 122.977 us, max = 122.977 us, min = 122.977 us, total = 122.977 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 288.995 us, total = 288.995 us, Queueing time: mean = 122.868 us, max = 122.868 us, min = 122.868 us, total = 122.868 us [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 1 total (0 active), Execution time: mean = 235.496 us, total = 235.496 us, Queueing time: mean = 20.299 us, max = 20.299 us, min = 20.299 us, total = 20.299 us [state-dump] DebugString() time ms: 1 [state-dump] [state-dump] [2025-01-20 23:10:54,905 I 13636 13664] (raylet) store.cc:564: Plasma store debug dump: Current usage: 0 / 2.14748 GB - num bytes created total: 168 0 pending objects of total size 0MB - objects spillable: 0 - bytes spillable: 0 - objects unsealed: 0 - bytes unsealed: 0 - objects in use: 0 - bytes in use: 0 - objects evictable: 0 - bytes evictable: 0 - objects created by worker: 0 - bytes created by worker: 0 - objects restored: 0 - bytes restored: 0 - objects received: 0 - bytes received: 0 - objects errored: 0 - bytes errored: 0 [2025-01-20 23:10:55,948 I 13636 13636] (raylet) node_manager.cc:525: [state-dump] NodeManager: [state-dump] Node ID: 94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3 [state-dump] Node name: 192.168.0.2 [state-dump] InitialConfigResources: {object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, GPU: 20000, node:__internal_head__: 10000, accelerator_type:A40: 10000, memory: 844966629380000} [state-dump] ClusterTaskManager: [state-dump] ========== Node: 94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3 ================= [state-dump] Infeasible queue length: 0 [state-dump] Schedule queue length: 0 [state-dump] Dispatch queue length: 0 [state-dump] num_waiting_for_resource: 0 [state-dump] num_waiting_for_plasma_memory: 0 [state-dump] num_waiting_for_remote_node_resources: 0 [state-dump] num_worker_not_started_by_job_config_not_exist: 0 [state-dump] num_worker_not_started_by_registration_timeout: 0 [state-dump] num_tasks_waiting_for_workers: 0 [state-dump] num_cancelled_tasks: 0 [state-dump] cluster_resource_scheduler state: [state-dump] Local id: -2729954820020547849 Local resources: {"total":{CPU: [200000], memory: [844966629380000], accelerator_type:A40: [10000], node:192.168.0.2: [10000], node:__internal_head__: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000]}}, "available": {CPU: [200000], memory: [844966629380000], accelerator_type:A40: [10000], node:192.168.0.2: [10000], node:__internal_head__: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000]}}, "labels":{"ray.io/node_id":"94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3",} is_draining: 0 is_idle: 1 Cluster resources: node id: -2729954820020547849{"total":{object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, GPU: 20000, node:__internal_head__: 10000, accelerator_type:A40: 10000, memory: 844966629380000}}, "available": {GPU: 20000, node:__internal_head__: 10000, memory: 844966629380000, accelerator_type:A40: 10000, object_store_memory: 21474836480000, node:192.168.0.2: 10000, CPU: 200000}}, "labels":{"ray.io/node_id":"94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} [state-dump] Waiting tasks size: 0 [state-dump] Number of executing tasks: 0 [state-dump] Number of pinned task arguments: 0 [state-dump] Number of total spilled tasks: 0 [state-dump] Number of spilled waiting tasks: 0 [state-dump] Number of spilled unschedulable tasks: 0 [state-dump] Resource usage { [state-dump] } [state-dump] Backlog Size per scheduling descriptor :{workerId: num backlogs}: [state-dump] [state-dump] Running tasks by scheduling class: [state-dump] ================================================== [state-dump] [state-dump] ClusterResources: [state-dump] LocalObjectManager: [state-dump] - num pinned objects: 0 [state-dump] - pinned objects size: 0 [state-dump] - num objects pending restore: 0 [state-dump] - num objects pending spill: 0 [state-dump] - num bytes pending spill: 0 [state-dump] - num bytes currently spilled: 0 [state-dump] - cumulative spill requests: 0 [state-dump] - cumulative restore requests: 0 [state-dump] - spilled objects pending delete: 0 [state-dump] [state-dump] ObjectManager: [state-dump] - num local objects: 0 [state-dump] - num unfulfilled push requests: 0 [state-dump] - num object pull requests: 0 [state-dump] - num chunks received total: 0 [state-dump] - num chunks received failed (all): 0 [state-dump] - num chunks received failed / cancelled: 0 [state-dump] - num chunks received failed / plasma error: 0 [state-dump] Event stats: [state-dump] Global stats: 0 total (0 active) [state-dump] Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Execution time: mean = -nan s, total = 0.000 s [state-dump] Event stats: [state-dump] PushManager: [state-dump] - num pushes in flight: 0 [state-dump] - num chunks in flight: 0 [state-dump] - num chunks remaining: 0 [state-dump] - max chunks allowed: 409 [state-dump] OwnershipBasedObjectDirectory: [state-dump] - num listeners: 0 [state-dump] - cumulative location updates: 0 [state-dump] - num location updates per second: 0.000 [state-dump] - num location lookups per second: 0.000 [state-dump] - num locations added per second: 0.000 [state-dump] - num locations removed per second: 0.000 [state-dump] BufferPool: [state-dump] - create buffer state map size: 0 [state-dump] PullManager: [state-dump] - num bytes available for pulled objects: 2147483648 [state-dump] - num bytes being pulled (all): 0 [state-dump] - num bytes being pulled / pinned: 0 [state-dump] - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - first get request bundle: N/A [state-dump] - first wait request bundle: N/A [state-dump] - first task request bundle: N/A [state-dump] - num objects queued: 0 [state-dump] - num objects actively pulled (all): 0 [state-dump] - num objects actively pulled / pinned: 0 [state-dump] - num bundles being pulled: 0 [state-dump] - num pull retries: 0 [state-dump] - max timeout seconds: 0 [state-dump] - max timeout request is already processed. No entry. [state-dump] [state-dump] WorkerPool: [state-dump] - registered jobs: 1 [state-dump] - process_failed_job_config_missing: 0 [state-dump] - process_failed_rate_limited: 0 [state-dump] - process_failed_pending_registration: 0 [state-dump] - process_failed_runtime_env_setup_failed: 0 [state-dump] - num PYTHON workers: 20 [state-dump] - num PYTHON drivers: 1 [state-dump] - num PYTHON pending start requests: 0 [state-dump] - num PYTHON pending registration requests: 0 [state-dump] - num object spill callbacks queued: 0 [state-dump] - num object restore queued: 0 [state-dump] - num util functions queued: 0 [state-dump] - num idle workers: 20 [state-dump] TaskDependencyManager: [state-dump] - task deps map size: 0 [state-dump] - get req map size: 0 [state-dump] - wait req map size: 0 [state-dump] - local objects map size: 0 [state-dump] WaitManager: [state-dump] - num active wait requests: 0 [state-dump] Subscriber: [state-dump] Channel WORKER_OBJECT_LOCATIONS_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_OBJECT_EVICTION [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_REF_REMOVED_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] num async plasma notifications: 0 [state-dump] Remote node managers: [state-dump] Event stats: [state-dump] Global stats: 47439 total (35 active) [state-dump] Queueing time: mean = 1.405 ms, max = 59.826 s, min = -0.000 s, total = 66.647 s [state-dump] Execution time: mean = 226.382 us, total = 10.739 s [state-dump] Event stats: [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog - 11321 total (0 active), Execution time: mean = 549.785 us, total = 6.224 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 11321 total (0 active), Execution time: mean = 39.425 us, total = 446.326 ms, Queueing time: mean = 113.750 us, max = 2.644 ms, min = 5.613 us, total = 1.288 s [state-dump] RaySyncer.OnDemandBroadcasting - 5395 total (1 active), Execution time: mean = 11.482 us, total = 61.944 ms, Queueing time: mean = 97.204 us, max = 25.869 ms, min = 11.164 us, total = 524.416 ms [state-dump] NodeManager.CheckGC - 5395 total (1 active), Execution time: mean = 2.900 us, total = 15.646 ms, Queueing time: mean = 104.861 us, max = 25.875 ms, min = 7.687 us, total = 565.726 ms [state-dump] ObjectManager.UpdateAvailableMemory - 5395 total (0 active), Execution time: mean = 6.291 us, total = 33.943 ms, Queueing time: mean = 112.662 us, max = 1.076 ms, min = 3.203 us, total = 607.810 ms [state-dump] RayletWorkerPool.deadline_timer.kill_idle_workers - 2699 total (1 active), Execution time: mean = 18.874 us, total = 50.940 ms, Queueing time: mean = 87.071 us, max = 26.386 ms, min = 10.737 us, total = 235.004 ms [state-dump] MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 2156 total (1 active), Execution time: mean = 457.172 us, total = 985.662 ms, Queueing time: mean = 76.133 us, max = 1.481 ms, min = -0.000 s, total = 164.143 ms [state-dump] NodeManager.ScheduleAndDispatchTasks - 540 total (1 active), Execution time: mean = 15.787 us, total = 8.525 ms, Queueing time: mean = 80.738 us, max = 2.581 ms, min = 10.666 us, total = 43.598 ms [state-dump] NodeManager.deadline_timer.flush_free_objects - 540 total (1 active), Execution time: mean = 9.048 us, total = 4.886 ms, Queueing time: mean = 172.480 us, max = 1.325 ms, min = 6.780 us, total = 93.139 ms [state-dump] NodeManager.deadline_timer.spill_objects_when_over_threshold - 540 total (1 active), Execution time: mean = 3.082 us, total = 1.664 ms, Queueing time: mean = 176.613 us, max = 1.319 ms, min = 5.573 us, total = 95.371 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad - 539 total (0 active), Execution time: mean = 631.143 us, total = 340.186 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 539 total (0 active), Execution time: mean = 102.805 us, total = 55.412 ms, Queueing time: mean = 115.238 us, max = 238.260 us, min = 18.487 us, total = 62.113 ms [state-dump] ClusterResourceManager.ResetRemoteNodeView - 181 total (1 active), Execution time: mean = 9.043 us, total = 1.637 ms, Queueing time: mean = 81.216 us, max = 283.776 us, min = 20.926 us, total = 14.700 ms [state-dump] NodeManager.deadline_timer.record_metrics - 108 total (1 active), Execution time: mean = 552.564 us, total = 59.677 ms, Queueing time: mean = 338.963 us, max = 1.144 ms, min = 11.295 us, total = 36.608 ms [state-dump] NodeManager.GcsCheckAlive - 108 total (1 active), Execution time: mean = 272.660 us, total = 29.447 ms, Queueing time: mean = 615.410 us, max = 1.358 ms, min = 34.942 us, total = 66.464 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 108 total (0 active), Execution time: mean = 51.922 us, total = 5.608 ms, Queueing time: mean = 112.412 us, max = 193.657 us, min = 16.013 us, total = 12.141 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 108 total (0 active), Execution time: mean = 1.478 ms, total = 159.634 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ClientConnection.async_read.ProcessMessageHeader - 95 total (21 active), Execution time: mean = 7.620 us, total = 723.861 us, Queueing time: mean = 660.815 ms, max = 59.826 s, min = 29.537 us, total = 62.777 s [state-dump] ClientConnection.async_read.ProcessMessage - 74 total (0 active), Execution time: mean = 752.373 us, total = 55.676 ms, Queueing time: mean = 18.898 us, max = 247.615 us, min = 4.453 us, total = 1.398 ms [state-dump] NodeManager.deadline_timer.debug_state_dump - 54 total (1 active), Execution time: mean = 1.696 ms, total = 91.609 ms, Queueing time: mean = 66.646 us, max = 152.067 us, min = 18.037 us, total = 3.599 ms [state-dump] ClientConnection.async_write.DoAsyncWrites - 22 total (0 active), Execution time: mean = 2.129 us, total = 46.834 us, Queueing time: mean = 43.475 us, max = 264.443 us, min = 17.727 us, total = 956.459 us [state-dump] ObjectManager.ObjectDeleted - 21 total (0 active), Execution time: mean = 18.654 us, total = 391.728 us, Queueing time: mean = 66.932 us, max = 269.040 us, min = 24.280 us, total = 1.406 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 21 total (0 active), Execution time: mean = 119.783 us, total = 2.515 ms, Queueing time: mean = 119.709 us, max = 193.975 us, min = 26.529 us, total = 2.514 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig - 21 total (0 active), Execution time: mean = 800.451 us, total = 16.809 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ObjectManager.ObjectAdded - 21 total (0 active), Execution time: mean = 13.522 us, total = 283.961 us, Queueing time: mean = 115.200 us, max = 203.067 us, min = 15.853 us, total = 2.419 ms [state-dump] PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 204.482 us, total = 2.658 ms, Queueing time: mean = 3.109 ms, max = 9.915 ms, min = 28.359 us, total = 40.413 ms [state-dump] RaySyncer.BroadcastMessage - 12 total (0 active), Execution time: mean = 229.790 us, total = 2.757 ms, Queueing time: mean = 796.500 ns, max = 1.330 us, min = 180.000 ns, total = 9.558 us [state-dump] - 12 total (0 active), Execution time: mean = 1.109 us, total = 13.308 us, Queueing time: mean = 117.235 us, max = 207.332 us, min = 32.817 us, total = 1.407 ms [state-dump] NodeManagerService.grpc_server.RequestWorkerLease - 10 total (0 active), Execution time: mean = 967.372 us, total = 9.674 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.RequestWorkerLease.HandleRequestImpl - 10 total (0 active), Execution time: mean = 133.400 us, total = 1.334 ms, Queueing time: mean = 133.654 us, max = 258.016 us, min = 50.206 us, total = 1.337 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker - 10 total (0 active), Execution time: mean = 698.937 us, total = 6.989 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] WorkerPool.PopWorkerCallback - 10 total (0 active), Execution time: mean = 24.488 us, total = 244.878 us, Queueing time: mean = 114.997 us, max = 206.869 us, min = 23.653 us, total = 1.150 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker.HandleRequestImpl - 10 total (0 active), Execution time: mean = 129.906 us, total = 1.299 ms, Queueing time: mean = 109.463 us, max = 137.712 us, min = 31.332 us, total = 1.095 ms [state-dump] NodeManager.deadline_timer.print_event_loop_stats - 9 total (1 active, 1 running), Execution time: mean = 2.365 ms, total = 21.286 ms, Queueing time: mean = 54.932 us, max = 103.030 us, min = 15.835 us, total = 494.386 us [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 1.320 ms, total = 2.640 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] RaySyncerRegister - 2 total (0 active), Execution time: mean = 2.053 us, total = 4.106 us, Queueing time: mean = 222.000 ns, max = 369.000 ns, min = 75.000 ns, total = 444.000 ns [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 142.119 us, total = 284.237 us, Queueing time: mean = 702.200 us, max = 1.287 ms, min = 117.240 us, total = 1.404 ms [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 2 total (1 active), Execution time: mean = 502.391 ms, total = 1.005 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 92.080 us, total = 92.080 us, Queueing time: mean = 36.492 us, max = 36.492 us, min = 36.492 us, total = 36.492 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 1.746 ms, total = 1.746 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 74.750 us, total = 74.750 us, Queueing time: mean = 225.429 us, max = 225.429 us, min = 225.429 us, total = 225.429 us [state-dump] NodeManager.GCTaskFailureReason - 1 total (1 active), Execution time: mean = 0.000 s, total = 0.000 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 1.935 ms, total = 1.935 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 1.022 s, total = 1.022 s, Queueing time: mean = 126.779 us, max = 126.779 us, min = 126.779 us, total = 126.779 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 2.178 ms, total = 2.178 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 43.810 us, total = 43.810 us, Queueing time: mean = 116.080 us, max = 116.080 us, min = 116.080 us, total = 116.080 us [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 2.002 ms, total = 2.002 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 1.278 ms, total = 1.278 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 242.297 us, total = 242.297 us, Queueing time: mean = 122.977 us, max = 122.977 us, min = 122.977 us, total = 122.977 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 288.995 us, total = 288.995 us, Queueing time: mean = 122.868 us, max = 122.868 us, min = 122.868 us, total = 122.868 us [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 1 total (0 active), Execution time: mean = 235.496 us, total = 235.496 us, Queueing time: mean = 20.299 us, max = 20.299 us, min = 20.299 us, total = 20.299 us [state-dump] DebugString() time ms: 1 [state-dump] [state-dump] [2025-01-20 23:11:54,905 I 13636 13664] (raylet) store.cc:564: Plasma store debug dump: Current usage: 0 / 2.14748 GB - num bytes created total: 168 0 pending objects of total size 0MB - objects spillable: 0 - bytes spillable: 0 - objects unsealed: 0 - bytes unsealed: 0 - objects in use: 0 - bytes in use: 0 - objects evictable: 0 - bytes evictable: 0 - objects created by worker: 0 - bytes created by worker: 0 - objects restored: 0 - bytes restored: 0 - objects received: 0 - bytes received: 0 - objects errored: 0 - bytes errored: 0 [2025-01-20 23:11:55,951 I 13636 13636] (raylet) node_manager.cc:525: [state-dump] NodeManager: [state-dump] Node ID: 94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3 [state-dump] Node name: 192.168.0.2 [state-dump] InitialConfigResources: {object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, GPU: 20000, node:__internal_head__: 10000, accelerator_type:A40: 10000, memory: 844966629380000} [state-dump] ClusterTaskManager: [state-dump] ========== Node: 94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3 ================= [state-dump] Infeasible queue length: 0 [state-dump] Schedule queue length: 0 [state-dump] Dispatch queue length: 0 [state-dump] num_waiting_for_resource: 0 [state-dump] num_waiting_for_plasma_memory: 0 [state-dump] num_waiting_for_remote_node_resources: 0 [state-dump] num_worker_not_started_by_job_config_not_exist: 0 [state-dump] num_worker_not_started_by_registration_timeout: 0 [state-dump] num_tasks_waiting_for_workers: 0 [state-dump] num_cancelled_tasks: 0 [state-dump] cluster_resource_scheduler state: [state-dump] Local id: -2729954820020547849 Local resources: {"total":{CPU: [200000], memory: [844966629380000], accelerator_type:A40: [10000], node:192.168.0.2: [10000], node:__internal_head__: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000]}}, "available": {CPU: [200000], memory: [844966629380000], accelerator_type:A40: [10000], node:192.168.0.2: [10000], node:__internal_head__: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000]}}, "labels":{"ray.io/node_id":"94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3",} is_draining: 0 is_idle: 1 Cluster resources: node id: -2729954820020547849{"total":{object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, GPU: 20000, node:__internal_head__: 10000, accelerator_type:A40: 10000, memory: 844966629380000}}, "available": {GPU: 20000, node:__internal_head__: 10000, memory: 844966629380000, accelerator_type:A40: 10000, object_store_memory: 21474836480000, node:192.168.0.2: 10000, CPU: 200000}}, "labels":{"ray.io/node_id":"94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} [state-dump] Waiting tasks size: 0 [state-dump] Number of executing tasks: 0 [state-dump] Number of pinned task arguments: 0 [state-dump] Number of total spilled tasks: 0 [state-dump] Number of spilled waiting tasks: 0 [state-dump] Number of spilled unschedulable tasks: 0 [state-dump] Resource usage { [state-dump] } [state-dump] Backlog Size per scheduling descriptor :{workerId: num backlogs}: [state-dump] [state-dump] Running tasks by scheduling class: [state-dump] ================================================== [state-dump] [state-dump] ClusterResources: [state-dump] LocalObjectManager: [state-dump] - num pinned objects: 0 [state-dump] - pinned objects size: 0 [state-dump] - num objects pending restore: 0 [state-dump] - num objects pending spill: 0 [state-dump] - num bytes pending spill: 0 [state-dump] - num bytes currently spilled: 0 [state-dump] - cumulative spill requests: 0 [state-dump] - cumulative restore requests: 0 [state-dump] - spilled objects pending delete: 0 [state-dump] [state-dump] ObjectManager: [state-dump] - num local objects: 0 [state-dump] - num unfulfilled push requests: 0 [state-dump] - num object pull requests: 0 [state-dump] - num chunks received total: 0 [state-dump] - num chunks received failed (all): 0 [state-dump] - num chunks received failed / cancelled: 0 [state-dump] - num chunks received failed / plasma error: 0 [state-dump] Event stats: [state-dump] Global stats: 0 total (0 active) [state-dump] Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Execution time: mean = -nan s, total = 0.000 s [state-dump] Event stats: [state-dump] PushManager: [state-dump] - num pushes in flight: 0 [state-dump] - num chunks in flight: 0 [state-dump] - num chunks remaining: 0 [state-dump] - max chunks allowed: 409 [state-dump] OwnershipBasedObjectDirectory: [state-dump] - num listeners: 0 [state-dump] - cumulative location updates: 0 [state-dump] - num location updates per second: 0.000 [state-dump] - num location lookups per second: 0.000 [state-dump] - num locations added per second: 0.000 [state-dump] - num locations removed per second: 0.000 [state-dump] BufferPool: [state-dump] - create buffer state map size: 0 [state-dump] PullManager: [state-dump] - num bytes available for pulled objects: 2147483648 [state-dump] - num bytes being pulled (all): 0 [state-dump] - num bytes being pulled / pinned: 0 [state-dump] - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - first get request bundle: N/A [state-dump] - first wait request bundle: N/A [state-dump] - first task request bundle: N/A [state-dump] - num objects queued: 0 [state-dump] - num objects actively pulled (all): 0 [state-dump] - num objects actively pulled / pinned: 0 [state-dump] - num bundles being pulled: 0 [state-dump] - num pull retries: 0 [state-dump] - max timeout seconds: 0 [state-dump] - max timeout request is already processed. No entry. [state-dump] [state-dump] WorkerPool: [state-dump] - registered jobs: 1 [state-dump] - process_failed_job_config_missing: 0 [state-dump] - process_failed_rate_limited: 0 [state-dump] - process_failed_pending_registration: 0 [state-dump] - process_failed_runtime_env_setup_failed: 0 [state-dump] - num PYTHON workers: 20 [state-dump] - num PYTHON drivers: 1 [state-dump] - num PYTHON pending start requests: 0 [state-dump] - num PYTHON pending registration requests: 0 [state-dump] - num object spill callbacks queued: 0 [state-dump] - num object restore queued: 0 [state-dump] - num util functions queued: 0 [state-dump] - num idle workers: 20 [state-dump] TaskDependencyManager: [state-dump] - task deps map size: 0 [state-dump] - get req map size: 0 [state-dump] - wait req map size: 0 [state-dump] - local objects map size: 0 [state-dump] WaitManager: [state-dump] - num active wait requests: 0 [state-dump] Subscriber: [state-dump] Channel WORKER_OBJECT_LOCATIONS_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_OBJECT_EVICTION [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_REF_REMOVED_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] num async plasma notifications: 0 [state-dump] Remote node managers: [state-dump] Event stats: [state-dump] Global stats: 52669 total (35 active) [state-dump] Queueing time: mean = 1.272 ms, max = 59.826 s, min = -0.000 s, total = 66.998 s [state-dump] Execution time: mean = 11.547 ms, total = 608.189 s [state-dump] Event stats: [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog - 12579 total (0 active), Execution time: mean = 543.878 us, total = 6.841 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 12579 total (0 active), Execution time: mean = 38.589 us, total = 485.416 ms, Queueing time: mean = 111.598 us, max = 2.644 ms, min = 4.041 us, total = 1.404 s [state-dump] RaySyncer.OnDemandBroadcasting - 5994 total (1 active), Execution time: mean = 11.312 us, total = 67.804 ms, Queueing time: mean = 95.615 us, max = 25.869 ms, min = 11.164 us, total = 573.114 ms [state-dump] NodeManager.CheckGC - 5994 total (1 active), Execution time: mean = 2.886 us, total = 17.300 ms, Queueing time: mean = 103.120 us, max = 25.875 ms, min = 5.614 us, total = 618.104 ms [state-dump] ObjectManager.UpdateAvailableMemory - 5994 total (0 active), Execution time: mean = 6.221 us, total = 37.289 ms, Queueing time: mean = 110.698 us, max = 1.076 ms, min = 2.228 us, total = 663.522 ms [state-dump] RayletWorkerPool.deadline_timer.kill_idle_workers - 2999 total (1 active), Execution time: mean = 18.729 us, total = 56.168 ms, Queueing time: mean = 85.365 us, max = 26.386 ms, min = 10.737 us, total = 256.010 ms [state-dump] MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 2396 total (1 active), Execution time: mean = 454.796 us, total = 1.090 s, Queueing time: mean = 75.053 us, max = 1.481 ms, min = -0.000 s, total = 179.828 ms [state-dump] NodeManager.ScheduleAndDispatchTasks - 600 total (1 active), Execution time: mean = 15.611 us, total = 9.366 ms, Queueing time: mean = 80.584 us, max = 2.581 ms, min = 10.666 us, total = 48.351 ms [state-dump] NodeManager.deadline_timer.flush_free_objects - 600 total (1 active), Execution time: mean = 8.924 us, total = 5.354 ms, Queueing time: mean = 170.845 us, max = 1.325 ms, min = 6.780 us, total = 102.507 ms [state-dump] NodeManager.deadline_timer.spill_objects_when_over_threshold - 600 total (1 active), Execution time: mean = 3.064 us, total = 1.838 ms, Queueing time: mean = 174.911 us, max = 1.319 ms, min = 5.573 us, total = 104.946 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad - 599 total (0 active), Execution time: mean = 624.422 us, total = 374.029 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 599 total (0 active), Execution time: mean = 101.798 us, total = 60.977 ms, Queueing time: mean = 112.107 us, max = 238.260 us, min = 16.928 us, total = 67.152 ms [state-dump] ClusterResourceManager.ResetRemoteNodeView - 201 total (1 active), Execution time: mean = 8.920 us, total = 1.793 ms, Queueing time: mean = 79.056 us, max = 283.776 us, min = 20.926 us, total = 15.890 ms [state-dump] NodeManager.deadline_timer.record_metrics - 120 total (1 active), Execution time: mean = 552.280 us, total = 66.274 ms, Queueing time: mean = 332.183 us, max = 1.144 ms, min = 11.295 us, total = 39.862 ms [state-dump] NodeManager.GcsCheckAlive - 120 total (1 active), Execution time: mean = 269.224 us, total = 32.307 ms, Queueing time: mean = 611.517 us, max = 1.358 ms, min = 34.942 us, total = 73.382 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 120 total (0 active), Execution time: mean = 51.302 us, total = 6.156 ms, Queueing time: mean = 109.787 us, max = 193.657 us, min = 16.013 us, total = 13.174 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 120 total (0 active), Execution time: mean = 1.465 ms, total = 175.857 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ClientConnection.async_read.ProcessMessageHeader - 95 total (21 active), Execution time: mean = 7.620 us, total = 723.861 us, Queueing time: mean = 660.815 ms, max = 59.826 s, min = 29.537 us, total = 62.777 s [state-dump] ClientConnection.async_read.ProcessMessage - 74 total (0 active), Execution time: mean = 752.373 us, total = 55.676 ms, Queueing time: mean = 18.898 us, max = 247.615 us, min = 4.453 us, total = 1.398 ms [state-dump] NodeManager.deadline_timer.debug_state_dump - 60 total (1 active), Execution time: mean = 1.696 ms, total = 101.787 ms, Queueing time: mean = 65.842 us, max = 152.067 us, min = 18.037 us, total = 3.951 ms [state-dump] ClientConnection.async_write.DoAsyncWrites - 22 total (0 active), Execution time: mean = 2.129 us, total = 46.834 us, Queueing time: mean = 43.475 us, max = 264.443 us, min = 17.727 us, total = 956.459 us [state-dump] NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 21 total (0 active), Execution time: mean = 119.783 us, total = 2.515 ms, Queueing time: mean = 119.709 us, max = 193.975 us, min = 26.529 us, total = 2.514 ms [state-dump] ObjectManager.ObjectDeleted - 21 total (0 active), Execution time: mean = 18.654 us, total = 391.728 us, Queueing time: mean = 66.932 us, max = 269.040 us, min = 24.280 us, total = 1.406 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig - 21 total (0 active), Execution time: mean = 800.451 us, total = 16.809 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ObjectManager.ObjectAdded - 21 total (0 active), Execution time: mean = 13.522 us, total = 283.961 us, Queueing time: mean = 115.200 us, max = 203.067 us, min = 15.853 us, total = 2.419 ms [state-dump] PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 204.482 us, total = 2.658 ms, Queueing time: mean = 3.109 ms, max = 9.915 ms, min = 28.359 us, total = 40.413 ms [state-dump] - 12 total (0 active), Execution time: mean = 1.109 us, total = 13.308 us, Queueing time: mean = 117.235 us, max = 207.332 us, min = 32.817 us, total = 1.407 ms [state-dump] RaySyncer.BroadcastMessage - 12 total (0 active), Execution time: mean = 229.790 us, total = 2.757 ms, Queueing time: mean = 796.500 ns, max = 1.330 us, min = 180.000 ns, total = 9.558 us [state-dump] NodeManager.deadline_timer.print_event_loop_stats - 10 total (1 active, 1 running), Execution time: mean = 2.242 ms, total = 22.418 ms, Queueing time: mean = 70.984 us, max = 215.454 us, min = 15.835 us, total = 709.840 us [state-dump] NodeManagerService.grpc_server.RequestWorkerLease.HandleRequestImpl - 10 total (0 active), Execution time: mean = 133.400 us, total = 1.334 ms, Queueing time: mean = 133.654 us, max = 258.016 us, min = 50.206 us, total = 1.337 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker.HandleRequestImpl - 10 total (0 active), Execution time: mean = 129.906 us, total = 1.299 ms, Queueing time: mean = 109.463 us, max = 137.712 us, min = 31.332 us, total = 1.095 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker - 10 total (0 active), Execution time: mean = 698.937 us, total = 6.989 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.RequestWorkerLease - 10 total (0 active), Execution time: mean = 967.372 us, total = 9.674 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] WorkerPool.PopWorkerCallback - 10 total (0 active), Execution time: mean = 24.488 us, total = 244.878 us, Queueing time: mean = 114.997 us, max = 206.869 us, min = 23.653 us, total = 1.150 ms [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 3 total (1 active), Execution time: mean = 199.200 s, total = 597.599 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 1.320 ms, total = 2.640 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] RaySyncerRegister - 2 total (0 active), Execution time: mean = 2.053 us, total = 4.106 us, Queueing time: mean = 222.000 ns, max = 369.000 ns, min = 75.000 ns, total = 444.000 ns [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 2 total (0 active), Execution time: mean = 259.466 us, total = 518.931 us, Queueing time: mean = 34.553 us, max = 48.806 us, min = 20.299 us, total = 69.105 us [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 142.119 us, total = 284.237 us, Queueing time: mean = 702.200 us, max = 1.287 ms, min = 117.240 us, total = 1.404 ms [state-dump] Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 74.750 us, total = 74.750 us, Queueing time: mean = 225.429 us, max = 225.429 us, min = 225.429 us, total = 225.429 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 92.080 us, total = 92.080 us, Queueing time: mean = 36.492 us, max = 36.492 us, min = 36.492 us, total = 36.492 us [state-dump] NodeManager.GCTaskFailureReason - 1 total (1 active), Execution time: mean = 0.000 s, total = 0.000 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 1.746 ms, total = 1.746 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 1.935 ms, total = 1.935 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 1.022 s, total = 1.022 s, Queueing time: mean = 126.779 us, max = 126.779 us, min = 126.779 us, total = 126.779 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 2.178 ms, total = 2.178 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 43.810 us, total = 43.810 us, Queueing time: mean = 116.080 us, max = 116.080 us, min = 116.080 us, total = 116.080 us [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 2.002 ms, total = 2.002 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 1.278 ms, total = 1.278 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 242.297 us, total = 242.297 us, Queueing time: mean = 122.977 us, max = 122.977 us, min = 122.977 us, total = 122.977 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 288.995 us, total = 288.995 us, Queueing time: mean = 122.868 us, max = 122.868 us, min = 122.868 us, total = 122.868 us [state-dump] DebugString() time ms: 2 [state-dump] [state-dump] [2025-01-20 23:12:54,906 I 13636 13664] (raylet) store.cc:564: Plasma store debug dump: Current usage: 0 / 2.14748 GB - num bytes created total: 168 0 pending objects of total size 0MB - objects spillable: 0 - bytes spillable: 0 - objects unsealed: 0 - bytes unsealed: 0 - objects in use: 0 - bytes in use: 0 - objects evictable: 0 - bytes evictable: 0 - objects created by worker: 0 - bytes created by worker: 0 - objects restored: 0 - bytes restored: 0 - objects received: 0 - bytes received: 0 - objects errored: 0 - bytes errored: 0 [2025-01-20 23:12:55,954 I 13636 13636] (raylet) node_manager.cc:525: [state-dump] NodeManager: [state-dump] Node ID: 94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3 [state-dump] Node name: 192.168.0.2 [state-dump] InitialConfigResources: {object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, GPU: 20000, node:__internal_head__: 10000, accelerator_type:A40: 10000, memory: 844966629380000} [state-dump] ClusterTaskManager: [state-dump] ========== Node: 94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3 ================= [state-dump] Infeasible queue length: 0 [state-dump] Schedule queue length: 0 [state-dump] Dispatch queue length: 0 [state-dump] num_waiting_for_resource: 0 [state-dump] num_waiting_for_plasma_memory: 0 [state-dump] num_waiting_for_remote_node_resources: 0 [state-dump] num_worker_not_started_by_job_config_not_exist: 0 [state-dump] num_worker_not_started_by_registration_timeout: 0 [state-dump] num_tasks_waiting_for_workers: 0 [state-dump] num_cancelled_tasks: 0 [state-dump] cluster_resource_scheduler state: [state-dump] Local id: -2729954820020547849 Local resources: {"total":{CPU: [200000], memory: [844966629380000], accelerator_type:A40: [10000], node:192.168.0.2: [10000], node:__internal_head__: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000]}}, "available": {CPU: [200000], memory: [844966629380000], accelerator_type:A40: [10000], node:192.168.0.2: [10000], node:__internal_head__: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000]}}, "labels":{"ray.io/node_id":"94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3",} is_draining: 0 is_idle: 1 Cluster resources: node id: -2729954820020547849{"total":{object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, GPU: 20000, node:__internal_head__: 10000, accelerator_type:A40: 10000, memory: 844966629380000}}, "available": {GPU: 20000, node:__internal_head__: 10000, memory: 844966629380000, accelerator_type:A40: 10000, object_store_memory: 21474836480000, node:192.168.0.2: 10000, CPU: 200000}}, "labels":{"ray.io/node_id":"94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} [state-dump] Waiting tasks size: 0 [state-dump] Number of executing tasks: 0 [state-dump] Number of pinned task arguments: 0 [state-dump] Number of total spilled tasks: 0 [state-dump] Number of spilled waiting tasks: 0 [state-dump] Number of spilled unschedulable tasks: 0 [state-dump] Resource usage { [state-dump] } [state-dump] Backlog Size per scheduling descriptor :{workerId: num backlogs}: [state-dump] [state-dump] Running tasks by scheduling class: [state-dump] ================================================== [state-dump] [state-dump] ClusterResources: [state-dump] LocalObjectManager: [state-dump] - num pinned objects: 0 [state-dump] - pinned objects size: 0 [state-dump] - num objects pending restore: 0 [state-dump] - num objects pending spill: 0 [state-dump] - num bytes pending spill: 0 [state-dump] - num bytes currently spilled: 0 [state-dump] - cumulative spill requests: 0 [state-dump] - cumulative restore requests: 0 [state-dump] - spilled objects pending delete: 0 [state-dump] [state-dump] ObjectManager: [state-dump] - num local objects: 0 [state-dump] - num unfulfilled push requests: 0 [state-dump] - num object pull requests: 0 [state-dump] - num chunks received total: 0 [state-dump] - num chunks received failed (all): 0 [state-dump] - num chunks received failed / cancelled: 0 [state-dump] - num chunks received failed / plasma error: 0 [state-dump] Event stats: [state-dump] Global stats: 0 total (0 active) [state-dump] Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Execution time: mean = -nan s, total = 0.000 s [state-dump] Event stats: [state-dump] PushManager: [state-dump] - num pushes in flight: 0 [state-dump] - num chunks in flight: 0 [state-dump] - num chunks remaining: 0 [state-dump] - max chunks allowed: 409 [state-dump] OwnershipBasedObjectDirectory: [state-dump] - num listeners: 0 [state-dump] - cumulative location updates: 0 [state-dump] - num location updates per second: 0.000 [state-dump] - num location lookups per second: 0.000 [state-dump] - num locations added per second: 0.000 [state-dump] - num locations removed per second: 0.000 [state-dump] BufferPool: [state-dump] - create buffer state map size: 0 [state-dump] PullManager: [state-dump] - num bytes available for pulled objects: 2147483648 [state-dump] - num bytes being pulled (all): 0 [state-dump] - num bytes being pulled / pinned: 0 [state-dump] - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - first get request bundle: N/A [state-dump] - first wait request bundle: N/A [state-dump] - first task request bundle: N/A [state-dump] - num objects queued: 0 [state-dump] - num objects actively pulled (all): 0 [state-dump] - num objects actively pulled / pinned: 0 [state-dump] - num bundles being pulled: 0 [state-dump] - num pull retries: 0 [state-dump] - max timeout seconds: 0 [state-dump] - max timeout request is already processed. No entry. [state-dump] [state-dump] WorkerPool: [state-dump] - registered jobs: 1 [state-dump] - process_failed_job_config_missing: 0 [state-dump] - process_failed_rate_limited: 0 [state-dump] - process_failed_pending_registration: 0 [state-dump] - process_failed_runtime_env_setup_failed: 0 [state-dump] - num PYTHON workers: 20 [state-dump] - num PYTHON drivers: 1 [state-dump] - num PYTHON pending start requests: 0 [state-dump] - num PYTHON pending registration requests: 0 [state-dump] - num object spill callbacks queued: 0 [state-dump] - num object restore queued: 0 [state-dump] - num util functions queued: 0 [state-dump] - num idle workers: 20 [state-dump] TaskDependencyManager: [state-dump] - task deps map size: 0 [state-dump] - get req map size: 0 [state-dump] - wait req map size: 0 [state-dump] - local objects map size: 0 [state-dump] WaitManager: [state-dump] - num active wait requests: 0 [state-dump] Subscriber: [state-dump] Channel WORKER_OBJECT_LOCATIONS_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_OBJECT_EVICTION [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_REF_REMOVED_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] num async plasma notifications: 0 [state-dump] Remote node managers: [state-dump] Event stats: [state-dump] Global stats: 57903 total (35 active) [state-dump] Queueing time: mean = 1.164 ms, max = 59.826 s, min = -0.000 s, total = 67.408 s [state-dump] Execution time: mean = 10.519 ms, total = 609.103 s [state-dump] Event stats: [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog - 13839 total (0 active), Execution time: mean = 542.026 us, total = 7.501 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 13839 total (0 active), Execution time: mean = 38.361 us, total = 530.878 ms, Queueing time: mean = 111.580 us, max = 2.644 ms, min = 4.041 us, total = 1.544 s [state-dump] RaySyncer.OnDemandBroadcasting - 6594 total (1 active), Execution time: mean = 11.315 us, total = 74.608 ms, Queueing time: mean = 95.160 us, max = 25.869 ms, min = 8.666 us, total = 627.482 ms [state-dump] NodeManager.CheckGC - 6594 total (1 active), Execution time: mean = 2.894 us, total = 19.084 ms, Queueing time: mean = 102.663 us, max = 25.875 ms, min = 3.123 us, total = 676.959 ms [state-dump] ObjectManager.UpdateAvailableMemory - 6594 total (0 active), Execution time: mean = 6.223 us, total = 41.036 ms, Queueing time: mean = 110.990 us, max = 1.076 ms, min = 2.228 us, total = 731.870 ms [state-dump] RayletWorkerPool.deadline_timer.kill_idle_workers - 3299 total (1 active), Execution time: mean = 18.903 us, total = 62.361 ms, Queueing time: mean = 84.678 us, max = 26.386 ms, min = 10.737 us, total = 279.351 ms [state-dump] MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 2635 total (1 active), Execution time: mean = 454.043 us, total = 1.196 s, Queueing time: mean = 74.656 us, max = 1.481 ms, min = -0.000 s, total = 196.718 ms [state-dump] NodeManager.ScheduleAndDispatchTasks - 660 total (1 active), Execution time: mean = 15.590 us, total = 10.289 ms, Queueing time: mean = 80.474 us, max = 2.581 ms, min = 10.666 us, total = 53.113 ms [state-dump] NodeManager.deadline_timer.flush_free_objects - 660 total (1 active), Execution time: mean = 8.893 us, total = 5.870 ms, Queueing time: mean = 171.340 us, max = 1.325 ms, min = 6.780 us, total = 113.084 ms [state-dump] NodeManager.deadline_timer.spill_objects_when_over_threshold - 660 total (1 active), Execution time: mean = 3.112 us, total = 2.054 ms, Queueing time: mean = 175.389 us, max = 1.319 ms, min = 5.573 us, total = 115.757 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad - 659 total (0 active), Execution time: mean = 622.126 us, total = 409.981 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 659 total (0 active), Execution time: mean = 101.283 us, total = 66.745 ms, Queueing time: mean = 112.963 us, max = 663.977 us, min = 16.928 us, total = 74.443 ms [state-dump] ClusterResourceManager.ResetRemoteNodeView - 221 total (1 active), Execution time: mean = 9.012 us, total = 1.992 ms, Queueing time: mean = 78.889 us, max = 283.776 us, min = 20.926 us, total = 17.434 ms [state-dump] NodeManager.deadline_timer.record_metrics - 132 total (1 active), Execution time: mean = 552.277 us, total = 72.901 ms, Queueing time: mean = 333.312 us, max = 1.144 ms, min = 11.295 us, total = 43.997 ms [state-dump] NodeManager.GcsCheckAlive - 132 total (1 active), Execution time: mean = 271.808 us, total = 35.879 ms, Queueing time: mean = 610.578 us, max = 1.358 ms, min = 34.942 us, total = 80.596 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 132 total (0 active), Execution time: mean = 51.069 us, total = 6.741 ms, Queueing time: mean = 108.893 us, max = 193.657 us, min = 16.013 us, total = 14.374 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 132 total (0 active), Execution time: mean = 1.457 ms, total = 192.366 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ClientConnection.async_read.ProcessMessageHeader - 95 total (21 active), Execution time: mean = 7.620 us, total = 723.861 us, Queueing time: mean = 660.815 ms, max = 59.826 s, min = 29.537 us, total = 62.777 s [state-dump] ClientConnection.async_read.ProcessMessage - 74 total (0 active), Execution time: mean = 752.373 us, total = 55.676 ms, Queueing time: mean = 18.898 us, max = 247.615 us, min = 4.453 us, total = 1.398 ms [state-dump] NodeManager.deadline_timer.debug_state_dump - 66 total (1 active), Execution time: mean = 1.692 ms, total = 111.695 ms, Queueing time: mean = 65.353 us, max = 152.067 us, min = 18.037 us, total = 4.313 ms [state-dump] ClientConnection.async_write.DoAsyncWrites - 22 total (0 active), Execution time: mean = 2.129 us, total = 46.834 us, Queueing time: mean = 43.475 us, max = 264.443 us, min = 17.727 us, total = 956.459 us [state-dump] NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 21 total (0 active), Execution time: mean = 119.783 us, total = 2.515 ms, Queueing time: mean = 119.709 us, max = 193.975 us, min = 26.529 us, total = 2.514 ms [state-dump] ObjectManager.ObjectDeleted - 21 total (0 active), Execution time: mean = 18.654 us, total = 391.728 us, Queueing time: mean = 66.932 us, max = 269.040 us, min = 24.280 us, total = 1.406 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig - 21 total (0 active), Execution time: mean = 800.451 us, total = 16.809 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ObjectManager.ObjectAdded - 21 total (0 active), Execution time: mean = 13.522 us, total = 283.961 us, Queueing time: mean = 115.200 us, max = 203.067 us, min = 15.853 us, total = 2.419 ms [state-dump] PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 204.482 us, total = 2.658 ms, Queueing time: mean = 3.109 ms, max = 9.915 ms, min = 28.359 us, total = 40.413 ms [state-dump] - 12 total (0 active), Execution time: mean = 1.109 us, total = 13.308 us, Queueing time: mean = 117.235 us, max = 207.332 us, min = 32.817 us, total = 1.407 ms [state-dump] RaySyncer.BroadcastMessage - 12 total (0 active), Execution time: mean = 229.790 us, total = 2.757 ms, Queueing time: mean = 796.500 ns, max = 1.330 us, min = 180.000 ns, total = 9.558 us [state-dump] NodeManager.deadline_timer.print_event_loop_stats - 11 total (1 active, 1 running), Execution time: mean = 2.309 ms, total = 25.402 ms, Queueing time: mean = 72.502 us, max = 215.454 us, min = 15.835 us, total = 797.526 us [state-dump] NodeManagerService.grpc_server.RequestWorkerLease.HandleRequestImpl - 10 total (0 active), Execution time: mean = 133.400 us, total = 1.334 ms, Queueing time: mean = 133.654 us, max = 258.016 us, min = 50.206 us, total = 1.337 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker.HandleRequestImpl - 10 total (0 active), Execution time: mean = 129.906 us, total = 1.299 ms, Queueing time: mean = 109.463 us, max = 137.712 us, min = 31.332 us, total = 1.095 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker - 10 total (0 active), Execution time: mean = 698.937 us, total = 6.989 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.RequestWorkerLease - 10 total (0 active), Execution time: mean = 967.372 us, total = 9.674 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] WorkerPool.PopWorkerCallback - 10 total (0 active), Execution time: mean = 24.488 us, total = 244.878 us, Queueing time: mean = 114.997 us, max = 206.869 us, min = 23.653 us, total = 1.150 ms [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 3 total (1 active), Execution time: mean = 199.200 s, total = 597.599 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 1.320 ms, total = 2.640 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] RaySyncerRegister - 2 total (0 active), Execution time: mean = 2.053 us, total = 4.106 us, Queueing time: mean = 222.000 ns, max = 369.000 ns, min = 75.000 ns, total = 444.000 ns [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 2 total (0 active), Execution time: mean = 259.466 us, total = 518.931 us, Queueing time: mean = 34.553 us, max = 48.806 us, min = 20.299 us, total = 69.105 us [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 142.119 us, total = 284.237 us, Queueing time: mean = 702.200 us, max = 1.287 ms, min = 117.240 us, total = 1.404 ms [state-dump] Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 74.750 us, total = 74.750 us, Queueing time: mean = 225.429 us, max = 225.429 us, min = 225.429 us, total = 225.429 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 92.080 us, total = 92.080 us, Queueing time: mean = 36.492 us, max = 36.492 us, min = 36.492 us, total = 36.492 us [state-dump] NodeManager.GCTaskFailureReason - 1 total (1 active), Execution time: mean = 0.000 s, total = 0.000 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 1.746 ms, total = 1.746 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 1.935 ms, total = 1.935 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 1.022 s, total = 1.022 s, Queueing time: mean = 126.779 us, max = 126.779 us, min = 126.779 us, total = 126.779 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 2.178 ms, total = 2.178 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 43.810 us, total = 43.810 us, Queueing time: mean = 116.080 us, max = 116.080 us, min = 116.080 us, total = 116.080 us [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 2.002 ms, total = 2.002 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 1.278 ms, total = 1.278 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 242.297 us, total = 242.297 us, Queueing time: mean = 122.977 us, max = 122.977 us, min = 122.977 us, total = 122.977 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 288.995 us, total = 288.995 us, Queueing time: mean = 122.868 us, max = 122.868 us, min = 122.868 us, total = 122.868 us [state-dump] DebugString() time ms: 1 [state-dump] [state-dump] [2025-01-20 23:13:54,906 I 13636 13664] (raylet) store.cc:564: Plasma store debug dump: Current usage: 0 / 2.14748 GB - num bytes created total: 168 0 pending objects of total size 0MB - objects spillable: 0 - bytes spillable: 0 - objects unsealed: 0 - bytes unsealed: 0 - objects in use: 0 - bytes in use: 0 - objects evictable: 0 - bytes evictable: 0 - objects created by worker: 0 - bytes created by worker: 0 - objects restored: 0 - bytes restored: 0 - objects received: 0 - bytes received: 0 - objects errored: 0 - bytes errored: 0 [2025-01-20 23:13:55,957 I 13636 13636] (raylet) node_manager.cc:525: [state-dump] NodeManager: [state-dump] Node ID: 94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3 [state-dump] Node name: 192.168.0.2 [state-dump] InitialConfigResources: {object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, GPU: 20000, node:__internal_head__: 10000, accelerator_type:A40: 10000, memory: 844966629380000} [state-dump] ClusterTaskManager: [state-dump] ========== Node: 94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3 ================= [state-dump] Infeasible queue length: 0 [state-dump] Schedule queue length: 0 [state-dump] Dispatch queue length: 0 [state-dump] num_waiting_for_resource: 0 [state-dump] num_waiting_for_plasma_memory: 0 [state-dump] num_waiting_for_remote_node_resources: 0 [state-dump] num_worker_not_started_by_job_config_not_exist: 0 [state-dump] num_worker_not_started_by_registration_timeout: 0 [state-dump] num_tasks_waiting_for_workers: 0 [state-dump] num_cancelled_tasks: 0 [state-dump] cluster_resource_scheduler state: [state-dump] Local id: -2729954820020547849 Local resources: {"total":{CPU: [200000], memory: [844966629380000], accelerator_type:A40: [10000], node:192.168.0.2: [10000], node:__internal_head__: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000]}}, "available": {CPU: [200000], memory: [844966629380000], accelerator_type:A40: [10000], node:192.168.0.2: [10000], node:__internal_head__: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000]}}, "labels":{"ray.io/node_id":"94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3",} is_draining: 0 is_idle: 1 Cluster resources: node id: -2729954820020547849{"total":{object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, GPU: 20000, node:__internal_head__: 10000, accelerator_type:A40: 10000, memory: 844966629380000}}, "available": {GPU: 20000, node:__internal_head__: 10000, memory: 844966629380000, accelerator_type:A40: 10000, object_store_memory: 21474836480000, node:192.168.0.2: 10000, CPU: 200000}}, "labels":{"ray.io/node_id":"94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} [state-dump] Waiting tasks size: 0 [state-dump] Number of executing tasks: 0 [state-dump] Number of pinned task arguments: 0 [state-dump] Number of total spilled tasks: 0 [state-dump] Number of spilled waiting tasks: 0 [state-dump] Number of spilled unschedulable tasks: 0 [state-dump] Resource usage { [state-dump] } [state-dump] Backlog Size per scheduling descriptor :{workerId: num backlogs}: [state-dump] [state-dump] Running tasks by scheduling class: [state-dump] ================================================== [state-dump] [state-dump] ClusterResources: [state-dump] LocalObjectManager: [state-dump] - num pinned objects: 0 [state-dump] - pinned objects size: 0 [state-dump] - num objects pending restore: 0 [state-dump] - num objects pending spill: 0 [state-dump] - num bytes pending spill: 0 [state-dump] - num bytes currently spilled: 0 [state-dump] - cumulative spill requests: 0 [state-dump] - cumulative restore requests: 0 [state-dump] - spilled objects pending delete: 0 [state-dump] [state-dump] ObjectManager: [state-dump] - num local objects: 0 [state-dump] - num unfulfilled push requests: 0 [state-dump] - num object pull requests: 0 [state-dump] - num chunks received total: 0 [state-dump] - num chunks received failed (all): 0 [state-dump] - num chunks received failed / cancelled: 0 [state-dump] - num chunks received failed / plasma error: 0 [state-dump] Event stats: [state-dump] Global stats: 0 total (0 active) [state-dump] Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Execution time: mean = -nan s, total = 0.000 s [state-dump] Event stats: [state-dump] PushManager: [state-dump] - num pushes in flight: 0 [state-dump] - num chunks in flight: 0 [state-dump] - num chunks remaining: 0 [state-dump] - max chunks allowed: 409 [state-dump] OwnershipBasedObjectDirectory: [state-dump] - num listeners: 0 [state-dump] - cumulative location updates: 0 [state-dump] - num location updates per second: 0.000 [state-dump] - num location lookups per second: 0.000 [state-dump] - num locations added per second: 0.000 [state-dump] - num locations removed per second: 0.000 [state-dump] BufferPool: [state-dump] - create buffer state map size: 0 [state-dump] PullManager: [state-dump] - num bytes available for pulled objects: 2147483648 [state-dump] - num bytes being pulled (all): 0 [state-dump] - num bytes being pulled / pinned: 0 [state-dump] - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - first get request bundle: N/A [state-dump] - first wait request bundle: N/A [state-dump] - first task request bundle: N/A [state-dump] - num objects queued: 0 [state-dump] - num objects actively pulled (all): 0 [state-dump] - num objects actively pulled / pinned: 0 [state-dump] - num bundles being pulled: 0 [state-dump] - num pull retries: 0 [state-dump] - max timeout seconds: 0 [state-dump] - max timeout request is already processed. No entry. [state-dump] [state-dump] WorkerPool: [state-dump] - registered jobs: 1 [state-dump] - process_failed_job_config_missing: 0 [state-dump] - process_failed_rate_limited: 0 [state-dump] - process_failed_pending_registration: 0 [state-dump] - process_failed_runtime_env_setup_failed: 0 [state-dump] - num PYTHON workers: 20 [state-dump] - num PYTHON drivers: 1 [state-dump] - num PYTHON pending start requests: 0 [state-dump] - num PYTHON pending registration requests: 0 [state-dump] - num object spill callbacks queued: 0 [state-dump] - num object restore queued: 0 [state-dump] - num util functions queued: 0 [state-dump] - num idle workers: 20 [state-dump] TaskDependencyManager: [state-dump] - task deps map size: 0 [state-dump] - get req map size: 0 [state-dump] - wait req map size: 0 [state-dump] - local objects map size: 0 [state-dump] WaitManager: [state-dump] - num active wait requests: 0 [state-dump] Subscriber: [state-dump] Channel WORKER_OBJECT_LOCATIONS_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_OBJECT_EVICTION [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_REF_REMOVED_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] num async plasma notifications: 0 [state-dump] Remote node managers: [state-dump] Event stats: [state-dump] Global stats: 63135 total (35 active) [state-dump] Queueing time: mean = 1.074 ms, max = 59.826 s, min = -0.000 s, total = 67.822 s [state-dump] Execution time: mean = 9.663 ms, total = 610.066 s [state-dump] Event stats: [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog - 15099 total (0 active), Execution time: mean = 542.986 us, total = 8.199 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 15099 total (0 active), Execution time: mean = 38.542 us, total = 581.947 ms, Queueing time: mean = 111.701 us, max = 2.644 ms, min = 2.750 us, total = 1.687 s [state-dump] RaySyncer.OnDemandBroadcasting - 7193 total (1 active), Execution time: mean = 11.229 us, total = 80.767 ms, Queueing time: mean = 94.719 us, max = 25.869 ms, min = 8.666 us, total = 681.316 ms [state-dump] NodeManager.CheckGC - 7193 total (1 active), Execution time: mean = 2.884 us, total = 20.744 ms, Queueing time: mean = 102.152 us, max = 25.875 ms, min = 3.123 us, total = 734.781 ms [state-dump] ObjectManager.UpdateAvailableMemory - 7193 total (0 active), Execution time: mean = 6.224 us, total = 44.770 ms, Queueing time: mean = 111.489 us, max = 1.076 ms, min = 2.228 us, total = 801.937 ms [state-dump] RayletWorkerPool.deadline_timer.kill_idle_workers - 3599 total (1 active), Execution time: mean = 18.814 us, total = 67.710 ms, Queueing time: mean = 83.696 us, max = 26.386 ms, min = -0.000 s, total = 301.221 ms [state-dump] MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 2875 total (1 active), Execution time: mean = 454.372 us, total = 1.306 s, Queueing time: mean = 74.719 us, max = 1.481 ms, min = -0.000 s, total = 214.816 ms [state-dump] NodeManager.ScheduleAndDispatchTasks - 720 total (1 active), Execution time: mean = 15.750 us, total = 11.340 ms, Queueing time: mean = 79.988 us, max = 2.581 ms, min = 10.666 us, total = 57.591 ms [state-dump] NodeManager.deadline_timer.flush_free_objects - 720 total (1 active), Execution time: mean = 8.931 us, total = 6.430 ms, Queueing time: mean = 172.474 us, max = 1.325 ms, min = 6.780 us, total = 124.181 ms [state-dump] NodeManager.deadline_timer.spill_objects_when_over_threshold - 720 total (1 active), Execution time: mean = 3.103 us, total = 2.234 ms, Queueing time: mean = 176.503 us, max = 1.319 ms, min = 5.573 us, total = 127.082 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad - 719 total (0 active), Execution time: mean = 621.090 us, total = 446.564 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 719 total (0 active), Execution time: mean = 101.084 us, total = 72.679 ms, Queueing time: mean = 113.240 us, max = 663.977 us, min = 16.928 us, total = 81.419 ms [state-dump] ClusterResourceManager.ResetRemoteNodeView - 241 total (1 active), Execution time: mean = 9.114 us, total = 2.196 ms, Queueing time: mean = 78.273 us, max = 283.776 us, min = 20.926 us, total = 18.864 ms [state-dump] NodeManager.deadline_timer.record_metrics - 144 total (1 active), Execution time: mean = 553.196 us, total = 79.660 ms, Queueing time: mean = 338.477 us, max = 1.144 ms, min = 11.295 us, total = 48.741 ms [state-dump] NodeManager.GcsCheckAlive - 144 total (1 active), Execution time: mean = 274.950 us, total = 39.593 ms, Queueing time: mean = 613.759 us, max = 1.457 ms, min = 34.942 us, total = 88.381 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 144 total (0 active), Execution time: mean = 51.326 us, total = 7.391 ms, Queueing time: mean = 109.979 us, max = 193.657 us, min = 16.013 us, total = 15.837 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 144 total (0 active), Execution time: mean = 1.466 ms, total = 211.078 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ClientConnection.async_read.ProcessMessageHeader - 95 total (21 active), Execution time: mean = 7.620 us, total = 723.861 us, Queueing time: mean = 660.815 ms, max = 59.826 s, min = 29.537 us, total = 62.777 s [state-dump] ClientConnection.async_read.ProcessMessage - 74 total (0 active), Execution time: mean = 752.373 us, total = 55.676 ms, Queueing time: mean = 18.898 us, max = 247.615 us, min = 4.453 us, total = 1.398 ms [state-dump] NodeManager.deadline_timer.debug_state_dump - 72 total (1 active), Execution time: mean = 1.701 ms, total = 122.471 ms, Queueing time: mean = 65.474 us, max = 152.067 us, min = 18.037 us, total = 4.714 ms [state-dump] ClientConnection.async_write.DoAsyncWrites - 22 total (0 active), Execution time: mean = 2.129 us, total = 46.834 us, Queueing time: mean = 43.475 us, max = 264.443 us, min = 17.727 us, total = 956.459 us [state-dump] NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 21 total (0 active), Execution time: mean = 119.783 us, total = 2.515 ms, Queueing time: mean = 119.709 us, max = 193.975 us, min = 26.529 us, total = 2.514 ms [state-dump] ObjectManager.ObjectDeleted - 21 total (0 active), Execution time: mean = 18.654 us, total = 391.728 us, Queueing time: mean = 66.932 us, max = 269.040 us, min = 24.280 us, total = 1.406 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig - 21 total (0 active), Execution time: mean = 800.451 us, total = 16.809 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ObjectManager.ObjectAdded - 21 total (0 active), Execution time: mean = 13.522 us, total = 283.961 us, Queueing time: mean = 115.200 us, max = 203.067 us, min = 15.853 us, total = 2.419 ms [state-dump] PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 204.482 us, total = 2.658 ms, Queueing time: mean = 3.109 ms, max = 9.915 ms, min = 28.359 us, total = 40.413 ms [state-dump] NodeManager.deadline_timer.print_event_loop_stats - 12 total (1 active, 1 running), Execution time: mean = 2.362 ms, total = 28.343 ms, Queueing time: mean = 74.680 us, max = 215.454 us, min = 15.835 us, total = 896.159 us [state-dump] - 12 total (0 active), Execution time: mean = 1.109 us, total = 13.308 us, Queueing time: mean = 117.235 us, max = 207.332 us, min = 32.817 us, total = 1.407 ms [state-dump] RaySyncer.BroadcastMessage - 12 total (0 active), Execution time: mean = 229.790 us, total = 2.757 ms, Queueing time: mean = 796.500 ns, max = 1.330 us, min = 180.000 ns, total = 9.558 us [state-dump] NodeManagerService.grpc_server.RequestWorkerLease.HandleRequestImpl - 10 total (0 active), Execution time: mean = 133.400 us, total = 1.334 ms, Queueing time: mean = 133.654 us, max = 258.016 us, min = 50.206 us, total = 1.337 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker.HandleRequestImpl - 10 total (0 active), Execution time: mean = 129.906 us, total = 1.299 ms, Queueing time: mean = 109.463 us, max = 137.712 us, min = 31.332 us, total = 1.095 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker - 10 total (0 active), Execution time: mean = 698.937 us, total = 6.989 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.RequestWorkerLease - 10 total (0 active), Execution time: mean = 967.372 us, total = 9.674 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] WorkerPool.PopWorkerCallback - 10 total (0 active), Execution time: mean = 24.488 us, total = 244.878 us, Queueing time: mean = 114.997 us, max = 206.869 us, min = 23.653 us, total = 1.150 ms [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 3 total (1 active), Execution time: mean = 199.200 s, total = 597.599 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 1.320 ms, total = 2.640 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] RaySyncerRegister - 2 total (0 active), Execution time: mean = 2.053 us, total = 4.106 us, Queueing time: mean = 222.000 ns, max = 369.000 ns, min = 75.000 ns, total = 444.000 ns [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 2 total (0 active), Execution time: mean = 259.466 us, total = 518.931 us, Queueing time: mean = 34.553 us, max = 48.806 us, min = 20.299 us, total = 69.105 us [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 142.119 us, total = 284.237 us, Queueing time: mean = 702.200 us, max = 1.287 ms, min = 117.240 us, total = 1.404 ms [state-dump] Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 74.750 us, total = 74.750 us, Queueing time: mean = 225.429 us, max = 225.429 us, min = 225.429 us, total = 225.429 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 92.080 us, total = 92.080 us, Queueing time: mean = 36.492 us, max = 36.492 us, min = 36.492 us, total = 36.492 us [state-dump] NodeManager.GCTaskFailureReason - 1 total (1 active), Execution time: mean = 0.000 s, total = 0.000 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 1.746 ms, total = 1.746 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 1.935 ms, total = 1.935 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 1.022 s, total = 1.022 s, Queueing time: mean = 126.779 us, max = 126.779 us, min = 126.779 us, total = 126.779 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 2.178 ms, total = 2.178 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 43.810 us, total = 43.810 us, Queueing time: mean = 116.080 us, max = 116.080 us, min = 116.080 us, total = 116.080 us [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 2.002 ms, total = 2.002 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 1.278 ms, total = 1.278 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 242.297 us, total = 242.297 us, Queueing time: mean = 122.977 us, max = 122.977 us, min = 122.977 us, total = 122.977 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 288.995 us, total = 288.995 us, Queueing time: mean = 122.868 us, max = 122.868 us, min = 122.868 us, total = 122.868 us [state-dump] DebugString() time ms: 1 [state-dump] [state-dump] [2025-01-20 23:14:54,906 I 13636 13664] (raylet) store.cc:564: Plasma store debug dump: Current usage: 0 / 2.14748 GB - num bytes created total: 168 0 pending objects of total size 0MB - objects spillable: 0 - bytes spillable: 0 - objects unsealed: 0 - bytes unsealed: 0 - objects in use: 0 - bytes in use: 0 - objects evictable: 0 - bytes evictable: 0 - objects created by worker: 0 - bytes created by worker: 0 - objects restored: 0 - bytes restored: 0 - objects received: 0 - bytes received: 0 - objects errored: 0 - bytes errored: 0 [2025-01-20 23:14:55,960 I 13636 13636] (raylet) node_manager.cc:525: [state-dump] NodeManager: [state-dump] Node ID: 94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3 [state-dump] Node name: 192.168.0.2 [state-dump] InitialConfigResources: {object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, GPU: 20000, node:__internal_head__: 10000, accelerator_type:A40: 10000, memory: 844966629380000} [state-dump] ClusterTaskManager: [state-dump] ========== Node: 94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3 ================= [state-dump] Infeasible queue length: 0 [state-dump] Schedule queue length: 0 [state-dump] Dispatch queue length: 0 [state-dump] num_waiting_for_resource: 0 [state-dump] num_waiting_for_plasma_memory: 0 [state-dump] num_waiting_for_remote_node_resources: 0 [state-dump] num_worker_not_started_by_job_config_not_exist: 0 [state-dump] num_worker_not_started_by_registration_timeout: 0 [state-dump] num_tasks_waiting_for_workers: 0 [state-dump] num_cancelled_tasks: 0 [state-dump] cluster_resource_scheduler state: [state-dump] Local id: -2729954820020547849 Local resources: {"total":{CPU: [200000], memory: [844966629380000], accelerator_type:A40: [10000], node:192.168.0.2: [10000], node:__internal_head__: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000]}}, "available": {CPU: [200000], memory: [844966629380000], accelerator_type:A40: [10000], node:192.168.0.2: [10000], node:__internal_head__: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000]}}, "labels":{"ray.io/node_id":"94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3",} is_draining: 0 is_idle: 1 Cluster resources: node id: -2729954820020547849{"total":{object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, GPU: 20000, node:__internal_head__: 10000, accelerator_type:A40: 10000, memory: 844966629380000}}, "available": {GPU: 20000, node:__internal_head__: 10000, memory: 844966629380000, accelerator_type:A40: 10000, object_store_memory: 21474836480000, node:192.168.0.2: 10000, CPU: 200000}}, "labels":{"ray.io/node_id":"94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} [state-dump] Waiting tasks size: 0 [state-dump] Number of executing tasks: 0 [state-dump] Number of pinned task arguments: 0 [state-dump] Number of total spilled tasks: 0 [state-dump] Number of spilled waiting tasks: 0 [state-dump] Number of spilled unschedulable tasks: 0 [state-dump] Resource usage { [state-dump] } [state-dump] Backlog Size per scheduling descriptor :{workerId: num backlogs}: [state-dump] [state-dump] Running tasks by scheduling class: [state-dump] ================================================== [state-dump] [state-dump] ClusterResources: [state-dump] LocalObjectManager: [state-dump] - num pinned objects: 0 [state-dump] - pinned objects size: 0 [state-dump] - num objects pending restore: 0 [state-dump] - num objects pending spill: 0 [state-dump] - num bytes pending spill: 0 [state-dump] - num bytes currently spilled: 0 [state-dump] - cumulative spill requests: 0 [state-dump] - cumulative restore requests: 0 [state-dump] - spilled objects pending delete: 0 [state-dump] [state-dump] ObjectManager: [state-dump] - num local objects: 0 [state-dump] - num unfulfilled push requests: 0 [state-dump] - num object pull requests: 0 [state-dump] - num chunks received total: 0 [state-dump] - num chunks received failed (all): 0 [state-dump] - num chunks received failed / cancelled: 0 [state-dump] - num chunks received failed / plasma error: 0 [state-dump] Event stats: [state-dump] Global stats: 0 total (0 active) [state-dump] Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Execution time: mean = -nan s, total = 0.000 s [state-dump] Event stats: [state-dump] PushManager: [state-dump] - num pushes in flight: 0 [state-dump] - num chunks in flight: 0 [state-dump] - num chunks remaining: 0 [state-dump] - max chunks allowed: 409 [state-dump] OwnershipBasedObjectDirectory: [state-dump] - num listeners: 0 [state-dump] - cumulative location updates: 0 [state-dump] - num location updates per second: 0.000 [state-dump] - num location lookups per second: 0.000 [state-dump] - num locations added per second: 0.000 [state-dump] - num locations removed per second: 0.000 [state-dump] BufferPool: [state-dump] - create buffer state map size: 0 [state-dump] PullManager: [state-dump] - num bytes available for pulled objects: 2147483648 [state-dump] - num bytes being pulled (all): 0 [state-dump] - num bytes being pulled / pinned: 0 [state-dump] - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - first get request bundle: N/A [state-dump] - first wait request bundle: N/A [state-dump] - first task request bundle: N/A [state-dump] - num objects queued: 0 [state-dump] - num objects actively pulled (all): 0 [state-dump] - num objects actively pulled / pinned: 0 [state-dump] - num bundles being pulled: 0 [state-dump] - num pull retries: 0 [state-dump] - max timeout seconds: 0 [state-dump] - max timeout request is already processed. No entry. [state-dump] [state-dump] WorkerPool: [state-dump] - registered jobs: 1 [state-dump] - process_failed_job_config_missing: 0 [state-dump] - process_failed_rate_limited: 0 [state-dump] - process_failed_pending_registration: 0 [state-dump] - process_failed_runtime_env_setup_failed: 0 [state-dump] - num PYTHON workers: 20 [state-dump] - num PYTHON drivers: 1 [state-dump] - num PYTHON pending start requests: 0 [state-dump] - num PYTHON pending registration requests: 0 [state-dump] - num object spill callbacks queued: 0 [state-dump] - num object restore queued: 0 [state-dump] - num util functions queued: 0 [state-dump] - num idle workers: 20 [state-dump] TaskDependencyManager: [state-dump] - task deps map size: 0 [state-dump] - get req map size: 0 [state-dump] - wait req map size: 0 [state-dump] - local objects map size: 0 [state-dump] WaitManager: [state-dump] - num active wait requests: 0 [state-dump] Subscriber: [state-dump] Channel WORKER_OBJECT_LOCATIONS_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_OBJECT_EVICTION [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_REF_REMOVED_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] num async plasma notifications: 0 [state-dump] Remote node managers: [state-dump] Event stats: [state-dump] Global stats: 68369 total (35 active) [state-dump] Queueing time: mean = 998.150 us, max = 59.826 s, min = -0.000 s, total = 68.243 s [state-dump] Execution time: mean = 8.937 ms, total = 611.030 s [state-dump] Event stats: [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog - 16359 total (0 active), Execution time: mean = 543.701 us, total = 8.894 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 16359 total (0 active), Execution time: mean = 38.417 us, total = 628.470 ms, Queueing time: mean = 111.791 us, max = 2.644 ms, min = 2.750 us, total = 1.829 s [state-dump] RaySyncer.OnDemandBroadcasting - 7793 total (1 active), Execution time: mean = 11.260 us, total = 87.752 ms, Queueing time: mean = 94.616 us, max = 25.869 ms, min = 8.666 us, total = 737.342 ms [state-dump] NodeManager.CheckGC - 7793 total (1 active), Execution time: mean = 2.900 us, total = 22.601 ms, Queueing time: mean = 102.062 us, max = 25.875 ms, min = 3.123 us, total = 795.369 ms [state-dump] ObjectManager.UpdateAvailableMemory - 7793 total (0 active), Execution time: mean = 6.260 us, total = 48.786 ms, Queueing time: mean = 111.823 us, max = 1.076 ms, min = 2.228 us, total = 871.436 ms [state-dump] RayletWorkerPool.deadline_timer.kill_idle_workers - 3899 total (1 active), Execution time: mean = 19.093 us, total = 74.445 ms, Queueing time: mean = 83.522 us, max = 26.386 ms, min = -0.000 s, total = 325.651 ms [state-dump] MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 3114 total (1 active), Execution time: mean = 455.751 us, total = 1.419 s, Queueing time: mean = 74.802 us, max = 1.481 ms, min = -0.000 s, total = 232.934 ms [state-dump] NodeManager.ScheduleAndDispatchTasks - 780 total (1 active), Execution time: mean = 16.075 us, total = 12.539 ms, Queueing time: mean = 79.626 us, max = 2.581 ms, min = 10.666 us, total = 62.108 ms [state-dump] NodeManager.deadline_timer.flush_free_objects - 780 total (1 active), Execution time: mean = 9.011 us, total = 7.028 ms, Queueing time: mean = 171.672 us, max = 1.325 ms, min = 6.780 us, total = 133.904 ms [state-dump] NodeManager.deadline_timer.spill_objects_when_over_threshold - 780 total (1 active), Execution time: mean = 3.109 us, total = 2.425 ms, Queueing time: mean = 175.735 us, max = 1.319 ms, min = 5.573 us, total = 137.073 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad - 779 total (0 active), Execution time: mean = 622.368 us, total = 484.825 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 779 total (0 active), Execution time: mean = 101.317 us, total = 78.926 ms, Queueing time: mean = 113.074 us, max = 663.977 us, min = 16.928 us, total = 88.085 ms [state-dump] ClusterResourceManager.ResetRemoteNodeView - 261 total (1 active), Execution time: mean = 9.203 us, total = 2.402 ms, Queueing time: mean = 78.361 us, max = 283.776 us, min = 20.926 us, total = 20.452 ms [state-dump] NodeManager.deadline_timer.record_metrics - 156 total (1 active), Execution time: mean = 554.667 us, total = 86.528 ms, Queueing time: mean = 335.367 us, max = 1.144 ms, min = 11.295 us, total = 52.317 ms [state-dump] NodeManager.GcsCheckAlive - 156 total (1 active), Execution time: mean = 275.588 us, total = 42.992 ms, Queueing time: mean = 611.669 us, max = 1.457 ms, min = 34.942 us, total = 95.420 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 156 total (0 active), Execution time: mean = 51.319 us, total = 8.006 ms, Queueing time: mean = 138.969 us, max = 4.779 ms, min = 16.013 us, total = 21.679 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 156 total (0 active), Execution time: mean = 1.467 ms, total = 228.857 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ClientConnection.async_read.ProcessMessageHeader - 95 total (21 active), Execution time: mean = 7.620 us, total = 723.861 us, Queueing time: mean = 660.815 ms, max = 59.826 s, min = 29.537 us, total = 62.777 s [state-dump] NodeManager.deadline_timer.debug_state_dump - 78 total (1 active), Execution time: mean = 1.701 ms, total = 132.645 ms, Queueing time: mean = 67.246 us, max = 152.067 us, min = 18.037 us, total = 5.245 ms [state-dump] ClientConnection.async_read.ProcessMessage - 74 total (0 active), Execution time: mean = 752.373 us, total = 55.676 ms, Queueing time: mean = 18.898 us, max = 247.615 us, min = 4.453 us, total = 1.398 ms [state-dump] ClientConnection.async_write.DoAsyncWrites - 22 total (0 active), Execution time: mean = 2.129 us, total = 46.834 us, Queueing time: mean = 43.475 us, max = 264.443 us, min = 17.727 us, total = 956.459 us [state-dump] NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 21 total (0 active), Execution time: mean = 119.783 us, total = 2.515 ms, Queueing time: mean = 119.709 us, max = 193.975 us, min = 26.529 us, total = 2.514 ms [state-dump] ObjectManager.ObjectDeleted - 21 total (0 active), Execution time: mean = 18.654 us, total = 391.728 us, Queueing time: mean = 66.932 us, max = 269.040 us, min = 24.280 us, total = 1.406 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig - 21 total (0 active), Execution time: mean = 800.451 us, total = 16.809 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ObjectManager.ObjectAdded - 21 total (0 active), Execution time: mean = 13.522 us, total = 283.961 us, Queueing time: mean = 115.200 us, max = 203.067 us, min = 15.853 us, total = 2.419 ms [state-dump] NodeManager.deadline_timer.print_event_loop_stats - 13 total (1 active, 1 running), Execution time: mean = 2.409 ms, total = 31.312 ms, Queueing time: mean = 75.995 us, max = 215.454 us, min = 15.835 us, total = 987.936 us [state-dump] PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 204.482 us, total = 2.658 ms, Queueing time: mean = 3.109 ms, max = 9.915 ms, min = 28.359 us, total = 40.413 ms [state-dump] - 12 total (0 active), Execution time: mean = 1.109 us, total = 13.308 us, Queueing time: mean = 117.235 us, max = 207.332 us, min = 32.817 us, total = 1.407 ms [state-dump] RaySyncer.BroadcastMessage - 12 total (0 active), Execution time: mean = 229.790 us, total = 2.757 ms, Queueing time: mean = 796.500 ns, max = 1.330 us, min = 180.000 ns, total = 9.558 us [state-dump] NodeManagerService.grpc_server.RequestWorkerLease.HandleRequestImpl - 10 total (0 active), Execution time: mean = 133.400 us, total = 1.334 ms, Queueing time: mean = 133.654 us, max = 258.016 us, min = 50.206 us, total = 1.337 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker.HandleRequestImpl - 10 total (0 active), Execution time: mean = 129.906 us, total = 1.299 ms, Queueing time: mean = 109.463 us, max = 137.712 us, min = 31.332 us, total = 1.095 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker - 10 total (0 active), Execution time: mean = 698.937 us, total = 6.989 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.RequestWorkerLease - 10 total (0 active), Execution time: mean = 967.372 us, total = 9.674 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] WorkerPool.PopWorkerCallback - 10 total (0 active), Execution time: mean = 24.488 us, total = 244.878 us, Queueing time: mean = 114.997 us, max = 206.869 us, min = 23.653 us, total = 1.150 ms [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 3 total (1 active), Execution time: mean = 199.200 s, total = 597.599 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 1.320 ms, total = 2.640 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] RaySyncerRegister - 2 total (0 active), Execution time: mean = 2.053 us, total = 4.106 us, Queueing time: mean = 222.000 ns, max = 369.000 ns, min = 75.000 ns, total = 444.000 ns [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 2 total (0 active), Execution time: mean = 259.466 us, total = 518.931 us, Queueing time: mean = 34.553 us, max = 48.806 us, min = 20.299 us, total = 69.105 us [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 142.119 us, total = 284.237 us, Queueing time: mean = 702.200 us, max = 1.287 ms, min = 117.240 us, total = 1.404 ms [state-dump] Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 74.750 us, total = 74.750 us, Queueing time: mean = 225.429 us, max = 225.429 us, min = 225.429 us, total = 225.429 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 92.080 us, total = 92.080 us, Queueing time: mean = 36.492 us, max = 36.492 us, min = 36.492 us, total = 36.492 us [state-dump] NodeManager.GCTaskFailureReason - 1 total (1 active), Execution time: mean = 0.000 s, total = 0.000 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 1.746 ms, total = 1.746 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 1.935 ms, total = 1.935 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 1.022 s, total = 1.022 s, Queueing time: mean = 126.779 us, max = 126.779 us, min = 126.779 us, total = 126.779 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 2.178 ms, total = 2.178 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 43.810 us, total = 43.810 us, Queueing time: mean = 116.080 us, max = 116.080 us, min = 116.080 us, total = 116.080 us [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 2.002 ms, total = 2.002 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 1.278 ms, total = 1.278 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 242.297 us, total = 242.297 us, Queueing time: mean = 122.977 us, max = 122.977 us, min = 122.977 us, total = 122.977 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 288.995 us, total = 288.995 us, Queueing time: mean = 122.868 us, max = 122.868 us, min = 122.868 us, total = 122.868 us [state-dump] DebugString() time ms: 1 [state-dump] [state-dump] [2025-01-20 23:15:54,906 I 13636 13664] (raylet) store.cc:564: Plasma store debug dump: Current usage: 0 / 2.14748 GB - num bytes created total: 168 0 pending objects of total size 0MB - objects spillable: 0 - bytes spillable: 0 - objects unsealed: 0 - bytes unsealed: 0 - objects in use: 0 - bytes in use: 0 - objects evictable: 0 - bytes evictable: 0 - objects created by worker: 0 - bytes created by worker: 0 - objects restored: 0 - bytes restored: 0 - objects received: 0 - bytes received: 0 - objects errored: 0 - bytes errored: 0 [2025-01-20 23:15:55,963 I 13636 13636] (raylet) node_manager.cc:525: [state-dump] NodeManager: [state-dump] Node ID: 94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3 [state-dump] Node name: 192.168.0.2 [state-dump] InitialConfigResources: {object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, GPU: 20000, node:__internal_head__: 10000, accelerator_type:A40: 10000, memory: 844966629380000} [state-dump] ClusterTaskManager: [state-dump] ========== Node: 94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3 ================= [state-dump] Infeasible queue length: 0 [state-dump] Schedule queue length: 0 [state-dump] Dispatch queue length: 0 [state-dump] num_waiting_for_resource: 0 [state-dump] num_waiting_for_plasma_memory: 0 [state-dump] num_waiting_for_remote_node_resources: 0 [state-dump] num_worker_not_started_by_job_config_not_exist: 0 [state-dump] num_worker_not_started_by_registration_timeout: 0 [state-dump] num_tasks_waiting_for_workers: 0 [state-dump] num_cancelled_tasks: 0 [state-dump] cluster_resource_scheduler state: [state-dump] Local id: -2729954820020547849 Local resources: {"total":{CPU: [200000], memory: [844966629380000], accelerator_type:A40: [10000], node:192.168.0.2: [10000], node:__internal_head__: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000]}}, "available": {CPU: [200000], memory: [844966629380000], accelerator_type:A40: [10000], node:192.168.0.2: [10000], node:__internal_head__: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000]}}, "labels":{"ray.io/node_id":"94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3",} is_draining: 0 is_idle: 1 Cluster resources: node id: -2729954820020547849{"total":{object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, GPU: 20000, node:__internal_head__: 10000, accelerator_type:A40: 10000, memory: 844966629380000}}, "available": {GPU: 20000, node:__internal_head__: 10000, memory: 844966629380000, accelerator_type:A40: 10000, object_store_memory: 21474836480000, node:192.168.0.2: 10000, CPU: 200000}}, "labels":{"ray.io/node_id":"94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} [state-dump] Waiting tasks size: 0 [state-dump] Number of executing tasks: 0 [state-dump] Number of pinned task arguments: 0 [state-dump] Number of total spilled tasks: 0 [state-dump] Number of spilled waiting tasks: 0 [state-dump] Number of spilled unschedulable tasks: 0 [state-dump] Resource usage { [state-dump] } [state-dump] Backlog Size per scheduling descriptor :{workerId: num backlogs}: [state-dump] [state-dump] Running tasks by scheduling class: [state-dump] ================================================== [state-dump] [state-dump] ClusterResources: [state-dump] LocalObjectManager: [state-dump] - num pinned objects: 0 [state-dump] - pinned objects size: 0 [state-dump] - num objects pending restore: 0 [state-dump] - num objects pending spill: 0 [state-dump] - num bytes pending spill: 0 [state-dump] - num bytes currently spilled: 0 [state-dump] - cumulative spill requests: 0 [state-dump] - cumulative restore requests: 0 [state-dump] - spilled objects pending delete: 0 [state-dump] [state-dump] ObjectManager: [state-dump] - num local objects: 0 [state-dump] - num unfulfilled push requests: 0 [state-dump] - num object pull requests: 0 [state-dump] - num chunks received total: 0 [state-dump] - num chunks received failed (all): 0 [state-dump] - num chunks received failed / cancelled: 0 [state-dump] - num chunks received failed / plasma error: 0 [state-dump] Event stats: [state-dump] Global stats: 0 total (0 active) [state-dump] Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Execution time: mean = -nan s, total = 0.000 s [state-dump] Event stats: [state-dump] PushManager: [state-dump] - num pushes in flight: 0 [state-dump] - num chunks in flight: 0 [state-dump] - num chunks remaining: 0 [state-dump] - max chunks allowed: 409 [state-dump] OwnershipBasedObjectDirectory: [state-dump] - num listeners: 0 [state-dump] - cumulative location updates: 0 [state-dump] - num location updates per second: 0.000 [state-dump] - num location lookups per second: 0.000 [state-dump] - num locations added per second: 0.000 [state-dump] - num locations removed per second: 0.000 [state-dump] BufferPool: [state-dump] - create buffer state map size: 0 [state-dump] PullManager: [state-dump] - num bytes available for pulled objects: 2147483648 [state-dump] - num bytes being pulled (all): 0 [state-dump] - num bytes being pulled / pinned: 0 [state-dump] - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - first get request bundle: N/A [state-dump] - first wait request bundle: N/A [state-dump] - first task request bundle: N/A [state-dump] - num objects queued: 0 [state-dump] - num objects actively pulled (all): 0 [state-dump] - num objects actively pulled / pinned: 0 [state-dump] - num bundles being pulled: 0 [state-dump] - num pull retries: 0 [state-dump] - max timeout seconds: 0 [state-dump] - max timeout request is already processed. No entry. [state-dump] [state-dump] WorkerPool: [state-dump] - registered jobs: 1 [state-dump] - process_failed_job_config_missing: 0 [state-dump] - process_failed_rate_limited: 0 [state-dump] - process_failed_pending_registration: 0 [state-dump] - process_failed_runtime_env_setup_failed: 0 [state-dump] - num PYTHON workers: 20 [state-dump] - num PYTHON drivers: 1 [state-dump] - num PYTHON pending start requests: 0 [state-dump] - num PYTHON pending registration requests: 0 [state-dump] - num object spill callbacks queued: 0 [state-dump] - num object restore queued: 0 [state-dump] - num util functions queued: 0 [state-dump] - num idle workers: 20 [state-dump] TaskDependencyManager: [state-dump] - task deps map size: 0 [state-dump] - get req map size: 0 [state-dump] - wait req map size: 0 [state-dump] - local objects map size: 0 [state-dump] WaitManager: [state-dump] - num active wait requests: 0 [state-dump] Subscriber: [state-dump] Channel WORKER_OBJECT_LOCATIONS_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_OBJECT_EVICTION [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_REF_REMOVED_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] num async plasma notifications: 0 [state-dump] Remote node managers: [state-dump] Event stats: [state-dump] Global stats: 73601 total (35 active) [state-dump] Queueing time: mean = 932.402 us, max = 59.826 s, min = -0.000 s, total = 68.626 s [state-dump] Execution time: mean = 8.314 ms, total = 611.936 s [state-dump] Event stats: [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog - 17619 total (0 active), Execution time: mean = 542.261 us, total = 9.554 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 17619 total (0 active), Execution time: mean = 37.984 us, total = 669.247 ms, Queueing time: mean = 111.364 us, max = 2.644 ms, min = 2.750 us, total = 1.962 s [state-dump] RaySyncer.OnDemandBroadcasting - 8392 total (1 active), Execution time: mean = 11.153 us, total = 93.593 ms, Queueing time: mean = 94.458 us, max = 25.869 ms, min = 8.560 us, total = 792.693 ms [state-dump] NodeManager.CheckGC - 8392 total (1 active), Execution time: mean = 2.884 us, total = 24.200 ms, Queueing time: mean = 101.812 us, max = 25.875 ms, min = 3.123 us, total = 854.407 ms [state-dump] ObjectManager.UpdateAvailableMemory - 8392 total (0 active), Execution time: mean = 6.210 us, total = 52.117 ms, Queueing time: mean = 110.612 us, max = 1.076 ms, min = 2.228 us, total = 928.253 ms [state-dump] RayletWorkerPool.deadline_timer.kill_idle_workers - 4199 total (1 active), Execution time: mean = 18.969 us, total = 79.650 ms, Queueing time: mean = 82.669 us, max = 26.386 ms, min = -0.000 s, total = 347.128 ms [state-dump] MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 3354 total (1 active), Execution time: mean = 454.703 us, total = 1.525 s, Queueing time: mean = 74.245 us, max = 1.481 ms, min = -0.000 s, total = 249.016 ms [state-dump] NodeManager.ScheduleAndDispatchTasks - 840 total (1 active), Execution time: mean = 16.078 us, total = 13.505 ms, Queueing time: mean = 78.767 us, max = 2.581 ms, min = 10.666 us, total = 66.164 ms [state-dump] NodeManager.deadline_timer.flush_free_objects - 840 total (1 active), Execution time: mean = 8.979 us, total = 7.542 ms, Queueing time: mean = 170.193 us, max = 2.212 ms, min = 6.780 us, total = 142.962 ms [state-dump] NodeManager.deadline_timer.spill_objects_when_over_threshold - 840 total (1 active), Execution time: mean = 3.099 us, total = 2.603 ms, Queueing time: mean = 174.221 us, max = 2.222 ms, min = 5.573 us, total = 146.346 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad - 839 total (0 active), Execution time: mean = 622.557 us, total = 522.326 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 839 total (0 active), Execution time: mean = 100.785 us, total = 84.559 ms, Queueing time: mean = 112.707 us, max = 663.977 us, min = 16.928 us, total = 94.561 ms [state-dump] ClusterResourceManager.ResetRemoteNodeView - 281 total (1 active), Execution time: mean = 9.146 us, total = 2.570 ms, Queueing time: mean = 77.238 us, max = 283.776 us, min = 14.912 us, total = 21.704 ms [state-dump] NodeManager.deadline_timer.record_metrics - 168 total (1 active), Execution time: mean = 555.670 us, total = 93.353 ms, Queueing time: mean = 328.215 us, max = 1.144 ms, min = 11.295 us, total = 55.140 ms [state-dump] NodeManager.GcsCheckAlive - 168 total (1 active), Execution time: mean = 272.682 us, total = 45.811 ms, Queueing time: mean = 608.223 us, max = 1.457 ms, min = 34.942 us, total = 102.181 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 168 total (0 active), Execution time: mean = 51.067 us, total = 8.579 ms, Queueing time: mean = 135.043 us, max = 4.779 ms, min = 11.561 us, total = 22.687 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 168 total (0 active), Execution time: mean = 1.458 ms, total = 244.887 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ClientConnection.async_read.ProcessMessageHeader - 95 total (21 active), Execution time: mean = 7.620 us, total = 723.861 us, Queueing time: mean = 660.815 ms, max = 59.826 s, min = 29.537 us, total = 62.777 s [state-dump] NodeManager.deadline_timer.debug_state_dump - 84 total (1 active), Execution time: mean = 1.694 ms, total = 142.309 ms, Queueing time: mean = 66.657 us, max = 152.067 us, min = 18.037 us, total = 5.599 ms [state-dump] ClientConnection.async_read.ProcessMessage - 74 total (0 active), Execution time: mean = 752.373 us, total = 55.676 ms, Queueing time: mean = 18.898 us, max = 247.615 us, min = 4.453 us, total = 1.398 ms [state-dump] ClientConnection.async_write.DoAsyncWrites - 22 total (0 active), Execution time: mean = 2.129 us, total = 46.834 us, Queueing time: mean = 43.475 us, max = 264.443 us, min = 17.727 us, total = 956.459 us [state-dump] NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 21 total (0 active), Execution time: mean = 119.783 us, total = 2.515 ms, Queueing time: mean = 119.709 us, max = 193.975 us, min = 26.529 us, total = 2.514 ms [state-dump] ObjectManager.ObjectDeleted - 21 total (0 active), Execution time: mean = 18.654 us, total = 391.728 us, Queueing time: mean = 66.932 us, max = 269.040 us, min = 24.280 us, total = 1.406 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig - 21 total (0 active), Execution time: mean = 800.451 us, total = 16.809 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ObjectManager.ObjectAdded - 21 total (0 active), Execution time: mean = 13.522 us, total = 283.961 us, Queueing time: mean = 115.200 us, max = 203.067 us, min = 15.853 us, total = 2.419 ms [state-dump] NodeManager.deadline_timer.print_event_loop_stats - 14 total (1 active, 1 running), Execution time: mean = 2.450 ms, total = 34.305 ms, Queueing time: mean = 76.349 us, max = 215.454 us, min = 15.835 us, total = 1.069 ms [state-dump] PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 204.482 us, total = 2.658 ms, Queueing time: mean = 3.109 ms, max = 9.915 ms, min = 28.359 us, total = 40.413 ms [state-dump] - 12 total (0 active), Execution time: mean = 1.109 us, total = 13.308 us, Queueing time: mean = 117.235 us, max = 207.332 us, min = 32.817 us, total = 1.407 ms [state-dump] RaySyncer.BroadcastMessage - 12 total (0 active), Execution time: mean = 229.790 us, total = 2.757 ms, Queueing time: mean = 796.500 ns, max = 1.330 us, min = 180.000 ns, total = 9.558 us [state-dump] NodeManagerService.grpc_server.RequestWorkerLease.HandleRequestImpl - 10 total (0 active), Execution time: mean = 133.400 us, total = 1.334 ms, Queueing time: mean = 133.654 us, max = 258.016 us, min = 50.206 us, total = 1.337 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker.HandleRequestImpl - 10 total (0 active), Execution time: mean = 129.906 us, total = 1.299 ms, Queueing time: mean = 109.463 us, max = 137.712 us, min = 31.332 us, total = 1.095 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker - 10 total (0 active), Execution time: mean = 698.937 us, total = 6.989 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.RequestWorkerLease - 10 total (0 active), Execution time: mean = 967.372 us, total = 9.674 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] WorkerPool.PopWorkerCallback - 10 total (0 active), Execution time: mean = 24.488 us, total = 244.878 us, Queueing time: mean = 114.997 us, max = 206.869 us, min = 23.653 us, total = 1.150 ms [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 3 total (1 active), Execution time: mean = 199.200 s, total = 597.599 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 1.320 ms, total = 2.640 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] RaySyncerRegister - 2 total (0 active), Execution time: mean = 2.053 us, total = 4.106 us, Queueing time: mean = 222.000 ns, max = 369.000 ns, min = 75.000 ns, total = 444.000 ns [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 2 total (0 active), Execution time: mean = 259.466 us, total = 518.931 us, Queueing time: mean = 34.553 us, max = 48.806 us, min = 20.299 us, total = 69.105 us [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 142.119 us, total = 284.237 us, Queueing time: mean = 702.200 us, max = 1.287 ms, min = 117.240 us, total = 1.404 ms [state-dump] Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 74.750 us, total = 74.750 us, Queueing time: mean = 225.429 us, max = 225.429 us, min = 225.429 us, total = 225.429 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 92.080 us, total = 92.080 us, Queueing time: mean = 36.492 us, max = 36.492 us, min = 36.492 us, total = 36.492 us [state-dump] NodeManager.GCTaskFailureReason - 1 total (1 active), Execution time: mean = 0.000 s, total = 0.000 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 1.746 ms, total = 1.746 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 1.935 ms, total = 1.935 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 1.022 s, total = 1.022 s, Queueing time: mean = 126.779 us, max = 126.779 us, min = 126.779 us, total = 126.779 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 2.178 ms, total = 2.178 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 43.810 us, total = 43.810 us, Queueing time: mean = 116.080 us, max = 116.080 us, min = 116.080 us, total = 116.080 us [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 2.002 ms, total = 2.002 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 1.278 ms, total = 1.278 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 242.297 us, total = 242.297 us, Queueing time: mean = 122.977 us, max = 122.977 us, min = 122.977 us, total = 122.977 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 288.995 us, total = 288.995 us, Queueing time: mean = 122.868 us, max = 122.868 us, min = 122.868 us, total = 122.868 us [state-dump] DebugString() time ms: 1 [state-dump] [state-dump] [2025-01-20 23:16:54,907 I 13636 13664] (raylet) store.cc:564: Plasma store debug dump: Current usage: 0 / 2.14748 GB - num bytes created total: 168 0 pending objects of total size 0MB - objects spillable: 0 - bytes spillable: 0 - objects unsealed: 0 - bytes unsealed: 0 - objects in use: 0 - bytes in use: 0 - objects evictable: 0 - bytes evictable: 0 - objects created by worker: 0 - bytes created by worker: 0 - objects restored: 0 - bytes restored: 0 - objects received: 0 - bytes received: 0 - objects errored: 0 - bytes errored: 0 [2025-01-20 23:16:55,965 I 13636 13636] (raylet) node_manager.cc:525: [state-dump] NodeManager: [state-dump] Node ID: 94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3 [state-dump] Node name: 192.168.0.2 [state-dump] InitialConfigResources: {object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, GPU: 20000, node:__internal_head__: 10000, accelerator_type:A40: 10000, memory: 844966629380000} [state-dump] ClusterTaskManager: [state-dump] ========== Node: 94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3 ================= [state-dump] Infeasible queue length: 0 [state-dump] Schedule queue length: 0 [state-dump] Dispatch queue length: 0 [state-dump] num_waiting_for_resource: 0 [state-dump] num_waiting_for_plasma_memory: 0 [state-dump] num_waiting_for_remote_node_resources: 0 [state-dump] num_worker_not_started_by_job_config_not_exist: 0 [state-dump] num_worker_not_started_by_registration_timeout: 0 [state-dump] num_tasks_waiting_for_workers: 0 [state-dump] num_cancelled_tasks: 0 [state-dump] cluster_resource_scheduler state: [state-dump] Local id: -2729954820020547849 Local resources: {"total":{CPU: [200000], memory: [844966629380000], accelerator_type:A40: [10000], node:192.168.0.2: [10000], node:__internal_head__: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000]}}, "available": {CPU: [200000], memory: [844966629380000], accelerator_type:A40: [10000], node:192.168.0.2: [10000], node:__internal_head__: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000]}}, "labels":{"ray.io/node_id":"94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3",} is_draining: 0 is_idle: 1 Cluster resources: node id: -2729954820020547849{"total":{object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, GPU: 20000, node:__internal_head__: 10000, accelerator_type:A40: 10000, memory: 844966629380000}}, "available": {GPU: 20000, node:__internal_head__: 10000, memory: 844966629380000, accelerator_type:A40: 10000, object_store_memory: 21474836480000, node:192.168.0.2: 10000, CPU: 200000}}, "labels":{"ray.io/node_id":"94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} [state-dump] Waiting tasks size: 0 [state-dump] Number of executing tasks: 0 [state-dump] Number of pinned task arguments: 0 [state-dump] Number of total spilled tasks: 0 [state-dump] Number of spilled waiting tasks: 0 [state-dump] Number of spilled unschedulable tasks: 0 [state-dump] Resource usage { [state-dump] } [state-dump] Backlog Size per scheduling descriptor :{workerId: num backlogs}: [state-dump] [state-dump] Running tasks by scheduling class: [state-dump] ================================================== [state-dump] [state-dump] ClusterResources: [state-dump] LocalObjectManager: [state-dump] - num pinned objects: 0 [state-dump] - pinned objects size: 0 [state-dump] - num objects pending restore: 0 [state-dump] - num objects pending spill: 0 [state-dump] - num bytes pending spill: 0 [state-dump] - num bytes currently spilled: 0 [state-dump] - cumulative spill requests: 0 [state-dump] - cumulative restore requests: 0 [state-dump] - spilled objects pending delete: 0 [state-dump] [state-dump] ObjectManager: [state-dump] - num local objects: 0 [state-dump] - num unfulfilled push requests: 0 [state-dump] - num object pull requests: 0 [state-dump] - num chunks received total: 0 [state-dump] - num chunks received failed (all): 0 [state-dump] - num chunks received failed / cancelled: 0 [state-dump] - num chunks received failed / plasma error: 0 [state-dump] Event stats: [state-dump] Global stats: 0 total (0 active) [state-dump] Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Execution time: mean = -nan s, total = 0.000 s [state-dump] Event stats: [state-dump] PushManager: [state-dump] - num pushes in flight: 0 [state-dump] - num chunks in flight: 0 [state-dump] - num chunks remaining: 0 [state-dump] - max chunks allowed: 409 [state-dump] OwnershipBasedObjectDirectory: [state-dump] - num listeners: 0 [state-dump] - cumulative location updates: 0 [state-dump] - num location updates per second: 0.000 [state-dump] - num location lookups per second: 0.000 [state-dump] - num locations added per second: 0.000 [state-dump] - num locations removed per second: 0.000 [state-dump] BufferPool: [state-dump] - create buffer state map size: 0 [state-dump] PullManager: [state-dump] - num bytes available for pulled objects: 2147483648 [state-dump] - num bytes being pulled (all): 0 [state-dump] - num bytes being pulled / pinned: 0 [state-dump] - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - first get request bundle: N/A [state-dump] - first wait request bundle: N/A [state-dump] - first task request bundle: N/A [state-dump] - num objects queued: 0 [state-dump] - num objects actively pulled (all): 0 [state-dump] - num objects actively pulled / pinned: 0 [state-dump] - num bundles being pulled: 0 [state-dump] - num pull retries: 0 [state-dump] - max timeout seconds: 0 [state-dump] - max timeout request is already processed. No entry. [state-dump] [state-dump] WorkerPool: [state-dump] - registered jobs: 1 [state-dump] - process_failed_job_config_missing: 0 [state-dump] - process_failed_rate_limited: 0 [state-dump] - process_failed_pending_registration: 0 [state-dump] - process_failed_runtime_env_setup_failed: 0 [state-dump] - num PYTHON workers: 20 [state-dump] - num PYTHON drivers: 1 [state-dump] - num PYTHON pending start requests: 0 [state-dump] - num PYTHON pending registration requests: 0 [state-dump] - num object spill callbacks queued: 0 [state-dump] - num object restore queued: 0 [state-dump] - num util functions queued: 0 [state-dump] - num idle workers: 20 [state-dump] TaskDependencyManager: [state-dump] - task deps map size: 0 [state-dump] - get req map size: 0 [state-dump] - wait req map size: 0 [state-dump] - local objects map size: 0 [state-dump] WaitManager: [state-dump] - num active wait requests: 0 [state-dump] Subscriber: [state-dump] Channel WORKER_OBJECT_LOCATIONS_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_OBJECT_EVICTION [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_REF_REMOVED_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] num async plasma notifications: 0 [state-dump] Remote node managers: [state-dump] Event stats: [state-dump] Global stats: 78836 total (35 active) [state-dump] Queueing time: mean = 875.663 us, max = 59.826 s, min = -0.000 s, total = 69.034 s [state-dump] Execution time: mean = 7.774 ms, total = 612.866 s [state-dump] Event stats: [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog - 18879 total (0 active), Execution time: mean = 541.765 us, total = 10.228 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 18879 total (0 active), Execution time: mean = 37.790 us, total = 713.445 ms, Queueing time: mean = 111.665 us, max = 2.644 ms, min = 2.750 us, total = 2.108 s [state-dump] RaySyncer.OnDemandBroadcasting - 8992 total (1 active), Execution time: mean = 11.107 us, total = 99.876 ms, Queueing time: mean = 94.104 us, max = 25.869 ms, min = 8.560 us, total = 846.180 ms [state-dump] NodeManager.CheckGC - 8992 total (1 active), Execution time: mean = 2.885 us, total = 25.943 ms, Queueing time: mean = 101.413 us, max = 25.875 ms, min = 3.123 us, total = 911.909 ms [state-dump] ObjectManager.UpdateAvailableMemory - 8992 total (0 active), Execution time: mean = 6.212 us, total = 55.862 ms, Queueing time: mean = 110.733 us, max = 1.076 ms, min = 2.228 us, total = 995.709 ms [state-dump] RayletWorkerPool.deadline_timer.kill_idle_workers - 4499 total (1 active), Execution time: mean = 18.848 us, total = 84.796 ms, Queueing time: mean = 81.814 us, max = 26.386 ms, min = -0.000 s, total = 368.083 ms [state-dump] MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 3593 total (1 active), Execution time: mean = 454.672 us, total = 1.634 s, Queueing time: mean = 74.118 us, max = 1.481 ms, min = -0.000 s, total = 266.307 ms [state-dump] NodeManager.ScheduleAndDispatchTasks - 900 total (1 active), Execution time: mean = 16.173 us, total = 14.556 ms, Queueing time: mean = 78.628 us, max = 2.581 ms, min = 10.666 us, total = 70.765 ms [state-dump] NodeManager.deadline_timer.flush_free_objects - 900 total (1 active), Execution time: mean = 8.947 us, total = 8.053 ms, Queueing time: mean = 170.192 us, max = 2.212 ms, min = 6.780 us, total = 153.173 ms [state-dump] NodeManager.deadline_timer.spill_objects_when_over_threshold - 900 total (1 active), Execution time: mean = 3.092 us, total = 2.783 ms, Queueing time: mean = 174.194 us, max = 2.222 ms, min = 5.573 us, total = 156.775 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad - 899 total (0 active), Execution time: mean = 622.557 us, total = 559.678 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 899 total (0 active), Execution time: mean = 100.787 us, total = 90.608 ms, Queueing time: mean = 113.050 us, max = 663.977 us, min = 16.928 us, total = 101.632 ms [state-dump] ClusterResourceManager.ResetRemoteNodeView - 301 total (1 active), Execution time: mean = 9.136 us, total = 2.750 ms, Queueing time: mean = 77.399 us, max = 283.776 us, min = 14.912 us, total = 23.297 ms [state-dump] NodeManager.deadline_timer.record_metrics - 180 total (1 active), Execution time: mean = 556.364 us, total = 100.146 ms, Queueing time: mean = 324.465 us, max = 1.144 ms, min = 11.295 us, total = 58.404 ms [state-dump] NodeManager.GcsCheckAlive - 180 total (1 active), Execution time: mean = 275.792 us, total = 49.642 ms, Queueing time: mean = 603.821 us, max = 1.457 ms, min = 34.942 us, total = 108.688 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 180 total (0 active), Execution time: mean = 51.114 us, total = 9.201 ms, Queueing time: mean = 132.409 us, max = 4.779 ms, min = 11.561 us, total = 23.834 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 180 total (0 active), Execution time: mean = 1.462 ms, total = 263.094 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ClientConnection.async_read.ProcessMessageHeader - 95 total (21 active), Execution time: mean = 7.620 us, total = 723.861 us, Queueing time: mean = 660.815 ms, max = 59.826 s, min = 29.537 us, total = 62.777 s [state-dump] NodeManager.deadline_timer.debug_state_dump - 90 total (1 active), Execution time: mean = 1.685 ms, total = 151.632 ms, Queueing time: mean = 66.279 us, max = 157.349 us, min = 15.806 us, total = 5.965 ms [state-dump] ClientConnection.async_read.ProcessMessage - 74 total (0 active), Execution time: mean = 752.373 us, total = 55.676 ms, Queueing time: mean = 18.898 us, max = 247.615 us, min = 4.453 us, total = 1.398 ms [state-dump] ClientConnection.async_write.DoAsyncWrites - 22 total (0 active), Execution time: mean = 2.129 us, total = 46.834 us, Queueing time: mean = 43.475 us, max = 264.443 us, min = 17.727 us, total = 956.459 us [state-dump] NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 21 total (0 active), Execution time: mean = 119.783 us, total = 2.515 ms, Queueing time: mean = 119.709 us, max = 193.975 us, min = 26.529 us, total = 2.514 ms [state-dump] ObjectManager.ObjectDeleted - 21 total (0 active), Execution time: mean = 18.654 us, total = 391.728 us, Queueing time: mean = 66.932 us, max = 269.040 us, min = 24.280 us, total = 1.406 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig - 21 total (0 active), Execution time: mean = 800.451 us, total = 16.809 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ObjectManager.ObjectAdded - 21 total (0 active), Execution time: mean = 13.522 us, total = 283.961 us, Queueing time: mean = 115.200 us, max = 203.067 us, min = 15.853 us, total = 2.419 ms [state-dump] NodeManager.deadline_timer.print_event_loop_stats - 15 total (1 active, 1 running), Execution time: mean = 2.472 ms, total = 37.075 ms, Queueing time: mean = 72.824 us, max = 215.454 us, min = 15.835 us, total = 1.092 ms [state-dump] PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 204.482 us, total = 2.658 ms, Queueing time: mean = 3.109 ms, max = 9.915 ms, min = 28.359 us, total = 40.413 ms [state-dump] - 12 total (0 active), Execution time: mean = 1.109 us, total = 13.308 us, Queueing time: mean = 117.235 us, max = 207.332 us, min = 32.817 us, total = 1.407 ms [state-dump] RaySyncer.BroadcastMessage - 12 total (0 active), Execution time: mean = 229.790 us, total = 2.757 ms, Queueing time: mean = 796.500 ns, max = 1.330 us, min = 180.000 ns, total = 9.558 us [state-dump] NodeManagerService.grpc_server.RequestWorkerLease.HandleRequestImpl - 10 total (0 active), Execution time: mean = 133.400 us, total = 1.334 ms, Queueing time: mean = 133.654 us, max = 258.016 us, min = 50.206 us, total = 1.337 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker.HandleRequestImpl - 10 total (0 active), Execution time: mean = 129.906 us, total = 1.299 ms, Queueing time: mean = 109.463 us, max = 137.712 us, min = 31.332 us, total = 1.095 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker - 10 total (0 active), Execution time: mean = 698.937 us, total = 6.989 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.RequestWorkerLease - 10 total (0 active), Execution time: mean = 967.372 us, total = 9.674 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] WorkerPool.PopWorkerCallback - 10 total (0 active), Execution time: mean = 24.488 us, total = 244.878 us, Queueing time: mean = 114.997 us, max = 206.869 us, min = 23.653 us, total = 1.150 ms [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 3 total (1 active), Execution time: mean = 199.200 s, total = 597.599 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 1.320 ms, total = 2.640 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] RaySyncerRegister - 2 total (0 active), Execution time: mean = 2.053 us, total = 4.106 us, Queueing time: mean = 222.000 ns, max = 369.000 ns, min = 75.000 ns, total = 444.000 ns [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 2 total (0 active), Execution time: mean = 259.466 us, total = 518.931 us, Queueing time: mean = 34.553 us, max = 48.806 us, min = 20.299 us, total = 69.105 us [state-dump] NodeManager.GCTaskFailureReason - 2 total (1 active), Execution time: mean = 4.207 us, total = 8.414 us, Queueing time: mean = 41.935 us, max = 83.871 us, min = 83.871 us, total = 83.871 us [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 142.119 us, total = 284.237 us, Queueing time: mean = 702.200 us, max = 1.287 ms, min = 117.240 us, total = 1.404 ms [state-dump] Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 74.750 us, total = 74.750 us, Queueing time: mean = 225.429 us, max = 225.429 us, min = 225.429 us, total = 225.429 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 92.080 us, total = 92.080 us, Queueing time: mean = 36.492 us, max = 36.492 us, min = 36.492 us, total = 36.492 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 1.746 ms, total = 1.746 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 1.935 ms, total = 1.935 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 1.022 s, total = 1.022 s, Queueing time: mean = 126.779 us, max = 126.779 us, min = 126.779 us, total = 126.779 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 2.178 ms, total = 2.178 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 43.810 us, total = 43.810 us, Queueing time: mean = 116.080 us, max = 116.080 us, min = 116.080 us, total = 116.080 us [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 2.002 ms, total = 2.002 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 1.278 ms, total = 1.278 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 242.297 us, total = 242.297 us, Queueing time: mean = 122.977 us, max = 122.977 us, min = 122.977 us, total = 122.977 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 288.995 us, total = 288.995 us, Queueing time: mean = 122.868 us, max = 122.868 us, min = 122.868 us, total = 122.868 us [state-dump] DebugString() time ms: 2 [state-dump] [state-dump] [2025-01-20 23:17:54,907 I 13636 13664] (raylet) store.cc:564: Plasma store debug dump: Current usage: 0 / 2.14748 GB - num bytes created total: 168 0 pending objects of total size 0MB - objects spillable: 0 - bytes spillable: 0 - objects unsealed: 0 - bytes unsealed: 0 - objects in use: 0 - bytes in use: 0 - objects evictable: 0 - bytes evictable: 0 - objects created by worker: 0 - bytes created by worker: 0 - objects restored: 0 - bytes restored: 0 - objects received: 0 - bytes received: 0 - objects errored: 0 - bytes errored: 0 [2025-01-20 23:17:55,967 I 13636 13636] (raylet) node_manager.cc:525: [state-dump] NodeManager: [state-dump] Node ID: 94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3 [state-dump] Node name: 192.168.0.2 [state-dump] InitialConfigResources: {object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, GPU: 20000, node:__internal_head__: 10000, accelerator_type:A40: 10000, memory: 844966629380000} [state-dump] ClusterTaskManager: [state-dump] ========== Node: 94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3 ================= [state-dump] Infeasible queue length: 0 [state-dump] Schedule queue length: 0 [state-dump] Dispatch queue length: 0 [state-dump] num_waiting_for_resource: 0 [state-dump] num_waiting_for_plasma_memory: 0 [state-dump] num_waiting_for_remote_node_resources: 0 [state-dump] num_worker_not_started_by_job_config_not_exist: 0 [state-dump] num_worker_not_started_by_registration_timeout: 0 [state-dump] num_tasks_waiting_for_workers: 0 [state-dump] num_cancelled_tasks: 0 [state-dump] cluster_resource_scheduler state: [state-dump] Local id: -2729954820020547849 Local resources: {"total":{CPU: [200000], memory: [844966629380000], accelerator_type:A40: [10000], node:192.168.0.2: [10000], node:__internal_head__: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000]}}, "available": {CPU: [200000], memory: [844966629380000], accelerator_type:A40: [10000], node:192.168.0.2: [10000], node:__internal_head__: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000]}}, "labels":{"ray.io/node_id":"94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3",} is_draining: 0 is_idle: 1 Cluster resources: node id: -2729954820020547849{"total":{object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, GPU: 20000, node:__internal_head__: 10000, accelerator_type:A40: 10000, memory: 844966629380000}}, "available": {GPU: 20000, node:__internal_head__: 10000, memory: 844966629380000, accelerator_type:A40: 10000, object_store_memory: 21474836480000, node:192.168.0.2: 10000, CPU: 200000}}, "labels":{"ray.io/node_id":"94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} [state-dump] Waiting tasks size: 0 [state-dump] Number of executing tasks: 0 [state-dump] Number of pinned task arguments: 0 [state-dump] Number of total spilled tasks: 0 [state-dump] Number of spilled waiting tasks: 0 [state-dump] Number of spilled unschedulable tasks: 0 [state-dump] Resource usage { [state-dump] } [state-dump] Backlog Size per scheduling descriptor :{workerId: num backlogs}: [state-dump] [state-dump] Running tasks by scheduling class: [state-dump] ================================================== [state-dump] [state-dump] ClusterResources: [state-dump] LocalObjectManager: [state-dump] - num pinned objects: 0 [state-dump] - pinned objects size: 0 [state-dump] - num objects pending restore: 0 [state-dump] - num objects pending spill: 0 [state-dump] - num bytes pending spill: 0 [state-dump] - num bytes currently spilled: 0 [state-dump] - cumulative spill requests: 0 [state-dump] - cumulative restore requests: 0 [state-dump] - spilled objects pending delete: 0 [state-dump] [state-dump] ObjectManager: [state-dump] - num local objects: 0 [state-dump] - num unfulfilled push requests: 0 [state-dump] - num object pull requests: 0 [state-dump] - num chunks received total: 0 [state-dump] - num chunks received failed (all): 0 [state-dump] - num chunks received failed / cancelled: 0 [state-dump] - num chunks received failed / plasma error: 0 [state-dump] Event stats: [state-dump] Global stats: 0 total (0 active) [state-dump] Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Execution time: mean = -nan s, total = 0.000 s [state-dump] Event stats: [state-dump] PushManager: [state-dump] - num pushes in flight: 0 [state-dump] - num chunks in flight: 0 [state-dump] - num chunks remaining: 0 [state-dump] - max chunks allowed: 409 [state-dump] OwnershipBasedObjectDirectory: [state-dump] - num listeners: 0 [state-dump] - cumulative location updates: 0 [state-dump] - num location updates per second: 0.000 [state-dump] - num location lookups per second: 0.000 [state-dump] - num locations added per second: 0.000 [state-dump] - num locations removed per second: 0.000 [state-dump] BufferPool: [state-dump] - create buffer state map size: 0 [state-dump] PullManager: [state-dump] - num bytes available for pulled objects: 2147483648 [state-dump] - num bytes being pulled (all): 0 [state-dump] - num bytes being pulled / pinned: 0 [state-dump] - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - first get request bundle: N/A [state-dump] - first wait request bundle: N/A [state-dump] - first task request bundle: N/A [state-dump] - num objects queued: 0 [state-dump] - num objects actively pulled (all): 0 [state-dump] - num objects actively pulled / pinned: 0 [state-dump] - num bundles being pulled: 0 [state-dump] - num pull retries: 0 [state-dump] - max timeout seconds: 0 [state-dump] - max timeout request is already processed. No entry. [state-dump] [state-dump] WorkerPool: [state-dump] - registered jobs: 1 [state-dump] - process_failed_job_config_missing: 0 [state-dump] - process_failed_rate_limited: 0 [state-dump] - process_failed_pending_registration: 0 [state-dump] - process_failed_runtime_env_setup_failed: 0 [state-dump] - num PYTHON workers: 20 [state-dump] - num PYTHON drivers: 1 [state-dump] - num PYTHON pending start requests: 0 [state-dump] - num PYTHON pending registration requests: 0 [state-dump] - num object spill callbacks queued: 0 [state-dump] - num object restore queued: 0 [state-dump] - num util functions queued: 0 [state-dump] - num idle workers: 20 [state-dump] TaskDependencyManager: [state-dump] - task deps map size: 0 [state-dump] - get req map size: 0 [state-dump] - wait req map size: 0 [state-dump] - local objects map size: 0 [state-dump] WaitManager: [state-dump] - num active wait requests: 0 [state-dump] Subscriber: [state-dump] Channel WORKER_OBJECT_LOCATIONS_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_OBJECT_EVICTION [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_REF_REMOVED_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] num async plasma notifications: 0 [state-dump] Remote node managers: [state-dump] Event stats: [state-dump] Global stats: 84067 total (35 active) [state-dump] Queueing time: mean = 826.384 us, max = 59.826 s, min = -0.000 s, total = 69.472 s [state-dump] Execution time: mean = 7.302 ms, total = 613.833 s [state-dump] Event stats: [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog - 20139 total (0 active), Execution time: mean = 542.628 us, total = 10.928 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 20139 total (0 active), Execution time: mean = 37.675 us, total = 758.734 ms, Queueing time: mean = 112.291 us, max = 2.644 ms, min = 2.750 us, total = 2.261 s [state-dump] ObjectManager.UpdateAvailableMemory - 9591 total (0 active), Execution time: mean = 6.229 us, total = 59.740 ms, Queueing time: mean = 110.887 us, max = 1.076 ms, min = 2.228 us, total = 1.064 s [state-dump] NodeManager.CheckGC - 9591 total (1 active), Execution time: mean = 2.889 us, total = 27.707 ms, Queueing time: mean = 101.405 us, max = 25.875 ms, min = 3.123 us, total = 972.578 ms [state-dump] RaySyncer.OnDemandBroadcasting - 9591 total (1 active), Execution time: mean = 11.093 us, total = 106.390 ms, Queueing time: mean = 94.120 us, max = 25.869 ms, min = 8.560 us, total = 902.704 ms [state-dump] RayletWorkerPool.deadline_timer.kill_idle_workers - 4798 total (1 active), Execution time: mean = 18.928 us, total = 90.817 ms, Queueing time: mean = 81.574 us, max = 26.386 ms, min = -0.000 s, total = 391.391 ms [state-dump] MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 3833 total (1 active), Execution time: mean = 455.102 us, total = 1.744 s, Queueing time: mean = 74.546 us, max = 1.481 ms, min = -0.000 s, total = 285.736 ms [state-dump] NodeManager.deadline_timer.spill_objects_when_over_threshold - 960 total (1 active), Execution time: mean = 3.096 us, total = 2.972 ms, Queueing time: mean = 176.300 us, max = 2.222 ms, min = 5.573 us, total = 169.248 ms [state-dump] NodeManager.deadline_timer.flush_free_objects - 960 total (1 active), Execution time: mean = 8.939 us, total = 8.581 ms, Queueing time: mean = 172.321 us, max = 2.212 ms, min = 6.780 us, total = 165.428 ms [state-dump] NodeManager.ScheduleAndDispatchTasks - 960 total (1 active), Execution time: mean = 16.276 us, total = 15.625 ms, Queueing time: mean = 78.267 us, max = 2.581 ms, min = 10.666 us, total = 75.136 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 959 total (0 active), Execution time: mean = 100.705 us, total = 96.576 ms, Queueing time: mean = 116.210 us, max = 1.722 ms, min = 16.928 us, total = 111.445 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad - 959 total (0 active), Execution time: mean = 625.699 us, total = 600.046 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ClusterResourceManager.ResetRemoteNodeView - 321 total (1 active), Execution time: mean = 9.255 us, total = 2.971 ms, Queueing time: mean = 77.761 us, max = 283.776 us, min = 14.912 us, total = 24.961 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 192 total (0 active), Execution time: mean = 1.464 ms, total = 281.117 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.deadline_timer.record_metrics - 192 total (1 active), Execution time: mean = 559.951 us, total = 107.511 ms, Queueing time: mean = 332.228 us, max = 1.419 ms, min = 11.295 us, total = 63.788 ms [state-dump] NodeManager.GcsCheckAlive - 192 total (1 active), Execution time: mean = 275.761 us, total = 52.946 ms, Queueing time: mean = 613.667 us, max = 1.682 ms, min = 34.942 us, total = 117.824 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 192 total (0 active), Execution time: mean = 51.289 us, total = 9.847 ms, Queueing time: mean = 130.899 us, max = 4.779 ms, min = 11.561 us, total = 25.133 ms [state-dump] NodeManager.deadline_timer.debug_state_dump - 96 total (1 active), Execution time: mean = 1.704 ms, total = 163.630 ms, Queueing time: mean = 66.449 us, max = 157.349 us, min = 15.806 us, total = 6.379 ms [state-dump] ClientConnection.async_read.ProcessMessageHeader - 95 total (21 active), Execution time: mean = 7.620 us, total = 723.861 us, Queueing time: mean = 660.815 ms, max = 59.826 s, min = 29.537 us, total = 62.777 s [state-dump] ClientConnection.async_read.ProcessMessage - 74 total (0 active), Execution time: mean = 752.373 us, total = 55.676 ms, Queueing time: mean = 18.898 us, max = 247.615 us, min = 4.453 us, total = 1.398 ms [state-dump] ClientConnection.async_write.DoAsyncWrites - 22 total (0 active), Execution time: mean = 2.129 us, total = 46.834 us, Queueing time: mean = 43.475 us, max = 264.443 us, min = 17.727 us, total = 956.459 us [state-dump] ObjectManager.ObjectDeleted - 21 total (0 active), Execution time: mean = 18.654 us, total = 391.728 us, Queueing time: mean = 66.932 us, max = 269.040 us, min = 24.280 us, total = 1.406 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 21 total (0 active), Execution time: mean = 119.783 us, total = 2.515 ms, Queueing time: mean = 119.709 us, max = 193.975 us, min = 26.529 us, total = 2.514 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig - 21 total (0 active), Execution time: mean = 800.451 us, total = 16.809 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ObjectManager.ObjectAdded - 21 total (0 active), Execution time: mean = 13.522 us, total = 283.961 us, Queueing time: mean = 115.200 us, max = 203.067 us, min = 15.853 us, total = 2.419 ms [state-dump] NodeManager.deadline_timer.print_event_loop_stats - 16 total (1 active, 1 running), Execution time: mean = 2.484 ms, total = 39.750 ms, Queueing time: mean = 71.434 us, max = 215.454 us, min = 15.835 us, total = 1.143 ms [state-dump] PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 204.482 us, total = 2.658 ms, Queueing time: mean = 3.109 ms, max = 9.915 ms, min = 28.359 us, total = 40.413 ms [state-dump] - 12 total (0 active), Execution time: mean = 1.109 us, total = 13.308 us, Queueing time: mean = 117.235 us, max = 207.332 us, min = 32.817 us, total = 1.407 ms [state-dump] RaySyncer.BroadcastMessage - 12 total (0 active), Execution time: mean = 229.790 us, total = 2.757 ms, Queueing time: mean = 796.500 ns, max = 1.330 us, min = 180.000 ns, total = 9.558 us [state-dump] NodeManagerService.grpc_server.RequestWorkerLease.HandleRequestImpl - 10 total (0 active), Execution time: mean = 133.400 us, total = 1.334 ms, Queueing time: mean = 133.654 us, max = 258.016 us, min = 50.206 us, total = 1.337 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker - 10 total (0 active), Execution time: mean = 698.937 us, total = 6.989 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.RequestWorkerLease - 10 total (0 active), Execution time: mean = 967.372 us, total = 9.674 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] WorkerPool.PopWorkerCallback - 10 total (0 active), Execution time: mean = 24.488 us, total = 244.878 us, Queueing time: mean = 114.997 us, max = 206.869 us, min = 23.653 us, total = 1.150 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker.HandleRequestImpl - 10 total (0 active), Execution time: mean = 129.906 us, total = 1.299 ms, Queueing time: mean = 109.463 us, max = 137.712 us, min = 31.332 us, total = 1.095 ms [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 3 total (1 active), Execution time: mean = 199.200 s, total = 597.599 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 1.320 ms, total = 2.640 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.GCTaskFailureReason - 2 total (1 active), Execution time: mean = 4.207 us, total = 8.414 us, Queueing time: mean = 41.935 us, max = 83.871 us, min = 83.871 us, total = 83.871 us [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 142.119 us, total = 284.237 us, Queueing time: mean = 702.200 us, max = 1.287 ms, min = 117.240 us, total = 1.404 ms [state-dump] RaySyncerRegister - 2 total (0 active), Execution time: mean = 2.053 us, total = 4.106 us, Queueing time: mean = 222.000 ns, max = 369.000 ns, min = 75.000 ns, total = 444.000 ns [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 2 total (0 active), Execution time: mean = 259.466 us, total = 518.931 us, Queueing time: mean = 34.553 us, max = 48.806 us, min = 20.299 us, total = 69.105 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 2.178 ms, total = 2.178 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 1.022 s, total = 1.022 s, Queueing time: mean = 126.779 us, max = 126.779 us, min = 126.779 us, total = 126.779 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 43.810 us, total = 43.810 us, Queueing time: mean = 116.080 us, max = 116.080 us, min = 116.080 us, total = 116.080 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 1.935 ms, total = 1.935 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 2.002 ms, total = 2.002 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 1.278 ms, total = 1.278 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 288.995 us, total = 288.995 us, Queueing time: mean = 122.868 us, max = 122.868 us, min = 122.868 us, total = 122.868 us [state-dump] Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 74.750 us, total = 74.750 us, Queueing time: mean = 225.429 us, max = 225.429 us, min = 225.429 us, total = 225.429 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 242.297 us, total = 242.297 us, Queueing time: mean = 122.977 us, max = 122.977 us, min = 122.977 us, total = 122.977 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 92.080 us, total = 92.080 us, Queueing time: mean = 36.492 us, max = 36.492 us, min = 36.492 us, total = 36.492 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 1.746 ms, total = 1.746 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] DebugString() time ms: 1 [state-dump] [state-dump] [2025-01-20 23:18:54,907 I 13636 13664] (raylet) store.cc:564: Plasma store debug dump: Current usage: 0 / 2.14748 GB - num bytes created total: 168 0 pending objects of total size 0MB - objects spillable: 0 - bytes spillable: 0 - objects unsealed: 0 - bytes unsealed: 0 - objects in use: 0 - bytes in use: 0 - objects evictable: 0 - bytes evictable: 0 - objects created by worker: 0 - bytes created by worker: 0 - objects restored: 0 - bytes restored: 0 - objects received: 0 - bytes received: 0 - objects errored: 0 - bytes errored: 0 [2025-01-20 23:18:55,970 I 13636 13636] (raylet) node_manager.cc:525: [state-dump] NodeManager: [state-dump] Node ID: 94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3 [state-dump] Node name: 192.168.0.2 [state-dump] InitialConfigResources: {object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, GPU: 20000, node:__internal_head__: 10000, accelerator_type:A40: 10000, memory: 844966629380000} [state-dump] ClusterTaskManager: [state-dump] ========== Node: 94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3 ================= [state-dump] Infeasible queue length: 0 [state-dump] Schedule queue length: 0 [state-dump] Dispatch queue length: 0 [state-dump] num_waiting_for_resource: 0 [state-dump] num_waiting_for_plasma_memory: 0 [state-dump] num_waiting_for_remote_node_resources: 0 [state-dump] num_worker_not_started_by_job_config_not_exist: 0 [state-dump] num_worker_not_started_by_registration_timeout: 0 [state-dump] num_tasks_waiting_for_workers: 0 [state-dump] num_cancelled_tasks: 0 [state-dump] cluster_resource_scheduler state: [state-dump] Local id: -2729954820020547849 Local resources: {"total":{CPU: [200000], memory: [844966629380000], accelerator_type:A40: [10000], node:192.168.0.2: [10000], node:__internal_head__: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000]}}, "available": {CPU: [200000], memory: [844966629380000], accelerator_type:A40: [10000], node:192.168.0.2: [10000], node:__internal_head__: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000]}}, "labels":{"ray.io/node_id":"94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3",} is_draining: 0 is_idle: 1 Cluster resources: node id: -2729954820020547849{"total":{object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, GPU: 20000, node:__internal_head__: 10000, accelerator_type:A40: 10000, memory: 844966629380000}}, "available": {GPU: 20000, node:__internal_head__: 10000, memory: 844966629380000, accelerator_type:A40: 10000, object_store_memory: 21474836480000, node:192.168.0.2: 10000, CPU: 200000}}, "labels":{"ray.io/node_id":"94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} [state-dump] Waiting tasks size: 0 [state-dump] Number of executing tasks: 0 [state-dump] Number of pinned task arguments: 0 [state-dump] Number of total spilled tasks: 0 [state-dump] Number of spilled waiting tasks: 0 [state-dump] Number of spilled unschedulable tasks: 0 [state-dump] Resource usage { [state-dump] } [state-dump] Backlog Size per scheduling descriptor :{workerId: num backlogs}: [state-dump] [state-dump] Running tasks by scheduling class: [state-dump] ================================================== [state-dump] [state-dump] ClusterResources: [state-dump] LocalObjectManager: [state-dump] - num pinned objects: 0 [state-dump] - pinned objects size: 0 [state-dump] - num objects pending restore: 0 [state-dump] - num objects pending spill: 0 [state-dump] - num bytes pending spill: 0 [state-dump] - num bytes currently spilled: 0 [state-dump] - cumulative spill requests: 0 [state-dump] - cumulative restore requests: 0 [state-dump] - spilled objects pending delete: 0 [state-dump] [state-dump] ObjectManager: [state-dump] - num local objects: 0 [state-dump] - num unfulfilled push requests: 0 [state-dump] - num object pull requests: 0 [state-dump] - num chunks received total: 0 [state-dump] - num chunks received failed (all): 0 [state-dump] - num chunks received failed / cancelled: 0 [state-dump] - num chunks received failed / plasma error: 0 [state-dump] Event stats: [state-dump] Global stats: 0 total (0 active) [state-dump] Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Execution time: mean = -nan s, total = 0.000 s [state-dump] Event stats: [state-dump] PushManager: [state-dump] - num pushes in flight: 0 [state-dump] - num chunks in flight: 0 [state-dump] - num chunks remaining: 0 [state-dump] - max chunks allowed: 409 [state-dump] OwnershipBasedObjectDirectory: [state-dump] - num listeners: 0 [state-dump] - cumulative location updates: 0 [state-dump] - num location updates per second: 0.000 [state-dump] - num location lookups per second: 0.000 [state-dump] - num locations added per second: 0.000 [state-dump] - num locations removed per second: 0.000 [state-dump] BufferPool: [state-dump] - create buffer state map size: 0 [state-dump] PullManager: [state-dump] - num bytes available for pulled objects: 2147483648 [state-dump] - num bytes being pulled (all): 0 [state-dump] - num bytes being pulled / pinned: 0 [state-dump] - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - first get request bundle: N/A [state-dump] - first wait request bundle: N/A [state-dump] - first task request bundle: N/A [state-dump] - num objects queued: 0 [state-dump] - num objects actively pulled (all): 0 [state-dump] - num objects actively pulled / pinned: 0 [state-dump] - num bundles being pulled: 0 [state-dump] - num pull retries: 0 [state-dump] - max timeout seconds: 0 [state-dump] - max timeout request is already processed. No entry. [state-dump] [state-dump] WorkerPool: [state-dump] - registered jobs: 1 [state-dump] - process_failed_job_config_missing: 0 [state-dump] - process_failed_rate_limited: 0 [state-dump] - process_failed_pending_registration: 0 [state-dump] - process_failed_runtime_env_setup_failed: 0 [state-dump] - num PYTHON workers: 20 [state-dump] - num PYTHON drivers: 1 [state-dump] - num PYTHON pending start requests: 0 [state-dump] - num PYTHON pending registration requests: 0 [state-dump] - num object spill callbacks queued: 0 [state-dump] - num object restore queued: 0 [state-dump] - num util functions queued: 0 [state-dump] - num idle workers: 20 [state-dump] TaskDependencyManager: [state-dump] - task deps map size: 0 [state-dump] - get req map size: 0 [state-dump] - wait req map size: 0 [state-dump] - local objects map size: 0 [state-dump] WaitManager: [state-dump] - num active wait requests: 0 [state-dump] Subscriber: [state-dump] Channel WORKER_OBJECT_LOCATIONS_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_OBJECT_EVICTION [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_REF_REMOVED_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] num async plasma notifications: 0 [state-dump] Remote node managers: [state-dump] Event stats: [state-dump] Global stats: 89298 total (35 active) [state-dump] Queueing time: mean = 782.537 us, max = 59.826 s, min = -0.000 s, total = 69.879 s [state-dump] Execution time: mean = 6.884 ms, total = 614.741 s [state-dump] Event stats: [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog - 21399 total (0 active), Execution time: mean = 541.189 us, total = 11.581 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 21399 total (0 active), Execution time: mean = 37.506 us, total = 802.584 ms, Queueing time: mean = 112.847 us, max = 2.644 ms, min = 2.750 us, total = 2.415 s [state-dump] ObjectManager.UpdateAvailableMemory - 10190 total (0 active), Execution time: mean = 6.220 us, total = 63.386 ms, Queueing time: mean = 110.270 us, max = 1.076 ms, min = 2.228 us, total = 1.124 s [state-dump] NodeManager.CheckGC - 10190 total (1 active), Execution time: mean = 2.896 us, total = 29.510 ms, Queueing time: mean = 101.287 us, max = 25.875 ms, min = 3.123 us, total = 1.032 s [state-dump] RaySyncer.OnDemandBroadcasting - 10190 total (1 active), Execution time: mean = 11.113 us, total = 113.247 ms, Queueing time: mean = 93.987 us, max = 25.869 ms, min = 8.560 us, total = 957.732 ms [state-dump] RayletWorkerPool.deadline_timer.kill_idle_workers - 5098 total (1 active), Execution time: mean = 18.984 us, total = 96.779 ms, Queueing time: mean = 80.973 us, max = 26.386 ms, min = -0.000 s, total = 412.799 ms [state-dump] MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 4072 total (1 active), Execution time: mean = 455.257 us, total = 1.854 s, Queueing time: mean = 74.396 us, max = 1.481 ms, min = -0.000 s, total = 302.941 ms [state-dump] NodeManager.deadline_timer.spill_objects_when_over_threshold - 1020 total (1 active), Execution time: mean = 3.105 us, total = 3.167 ms, Queueing time: mean = 174.826 us, max = 2.222 ms, min = 5.004 us, total = 178.323 ms [state-dump] NodeManager.deadline_timer.flush_free_objects - 1020 total (1 active), Execution time: mean = 8.972 us, total = 9.151 ms, Queueing time: mean = 170.842 us, max = 2.212 ms, min = 6.450 us, total = 174.259 ms [state-dump] NodeManager.ScheduleAndDispatchTasks - 1020 total (1 active), Execution time: mean = 16.266 us, total = 16.592 ms, Queueing time: mean = 77.481 us, max = 2.581 ms, min = 10.666 us, total = 79.031 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 1019 total (0 active), Execution time: mean = 100.528 us, total = 102.438 ms, Queueing time: mean = 115.693 us, max = 1.722 ms, min = 8.842 us, total = 117.891 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad - 1019 total (0 active), Execution time: mean = 624.030 us, total = 635.886 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ClusterResourceManager.ResetRemoteNodeView - 341 total (1 active), Execution time: mean = 9.284 us, total = 3.166 ms, Queueing time: mean = 77.544 us, max = 283.776 us, min = 14.912 us, total = 26.443 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 204 total (0 active), Execution time: mean = 1.468 ms, total = 299.380 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.deadline_timer.record_metrics - 204 total (1 active), Execution time: mean = 558.827 us, total = 114.001 ms, Queueing time: mean = 326.960 us, max = 1.419 ms, min = 11.295 us, total = 66.700 ms [state-dump] NodeManager.GcsCheckAlive - 204 total (1 active), Execution time: mean = 277.087 us, total = 56.526 ms, Queueing time: mean = 606.849 us, max = 1.682 ms, min = 6.690 us, total = 123.797 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 204 total (0 active), Execution time: mean = 51.101 us, total = 10.425 ms, Queueing time: mean = 130.023 us, max = 4.779 ms, min = 11.561 us, total = 26.525 ms [state-dump] NodeManager.deadline_timer.debug_state_dump - 102 total (1 active), Execution time: mean = 1.697 ms, total = 173.097 ms, Queueing time: mean = 68.025 us, max = 163.431 us, min = 15.806 us, total = 6.939 ms [state-dump] ClientConnection.async_read.ProcessMessageHeader - 95 total (21 active), Execution time: mean = 7.620 us, total = 723.861 us, Queueing time: mean = 660.815 ms, max = 59.826 s, min = 29.537 us, total = 62.777 s [state-dump] ClientConnection.async_read.ProcessMessage - 74 total (0 active), Execution time: mean = 752.373 us, total = 55.676 ms, Queueing time: mean = 18.898 us, max = 247.615 us, min = 4.453 us, total = 1.398 ms [state-dump] ClientConnection.async_write.DoAsyncWrites - 22 total (0 active), Execution time: mean = 2.129 us, total = 46.834 us, Queueing time: mean = 43.475 us, max = 264.443 us, min = 17.727 us, total = 956.459 us [state-dump] ObjectManager.ObjectDeleted - 21 total (0 active), Execution time: mean = 18.654 us, total = 391.728 us, Queueing time: mean = 66.932 us, max = 269.040 us, min = 24.280 us, total = 1.406 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 21 total (0 active), Execution time: mean = 119.783 us, total = 2.515 ms, Queueing time: mean = 119.709 us, max = 193.975 us, min = 26.529 us, total = 2.514 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig - 21 total (0 active), Execution time: mean = 800.451 us, total = 16.809 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ObjectManager.ObjectAdded - 21 total (0 active), Execution time: mean = 13.522 us, total = 283.961 us, Queueing time: mean = 115.200 us, max = 203.067 us, min = 15.853 us, total = 2.419 ms [state-dump] NodeManager.deadline_timer.print_event_loop_stats - 17 total (1 active, 1 running), Execution time: mean = 2.429 ms, total = 41.289 ms, Queueing time: mean = 70.834 us, max = 215.454 us, min = 15.835 us, total = 1.204 ms [state-dump] PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 204.482 us, total = 2.658 ms, Queueing time: mean = 3.109 ms, max = 9.915 ms, min = 28.359 us, total = 40.413 ms [state-dump] - 12 total (0 active), Execution time: mean = 1.109 us, total = 13.308 us, Queueing time: mean = 117.235 us, max = 207.332 us, min = 32.817 us, total = 1.407 ms [state-dump] RaySyncer.BroadcastMessage - 12 total (0 active), Execution time: mean = 229.790 us, total = 2.757 ms, Queueing time: mean = 796.500 ns, max = 1.330 us, min = 180.000 ns, total = 9.558 us [state-dump] NodeManagerService.grpc_server.RequestWorkerLease.HandleRequestImpl - 10 total (0 active), Execution time: mean = 133.400 us, total = 1.334 ms, Queueing time: mean = 133.654 us, max = 258.016 us, min = 50.206 us, total = 1.337 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker - 10 total (0 active), Execution time: mean = 698.937 us, total = 6.989 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.RequestWorkerLease - 10 total (0 active), Execution time: mean = 967.372 us, total = 9.674 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] WorkerPool.PopWorkerCallback - 10 total (0 active), Execution time: mean = 24.488 us, total = 244.878 us, Queueing time: mean = 114.997 us, max = 206.869 us, min = 23.653 us, total = 1.150 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker.HandleRequestImpl - 10 total (0 active), Execution time: mean = 129.906 us, total = 1.299 ms, Queueing time: mean = 109.463 us, max = 137.712 us, min = 31.332 us, total = 1.095 ms [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 3 total (1 active), Execution time: mean = 199.200 s, total = 597.599 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 1.320 ms, total = 2.640 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.GCTaskFailureReason - 2 total (1 active), Execution time: mean = 4.207 us, total = 8.414 us, Queueing time: mean = 41.935 us, max = 83.871 us, min = 83.871 us, total = 83.871 us [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 142.119 us, total = 284.237 us, Queueing time: mean = 702.200 us, max = 1.287 ms, min = 117.240 us, total = 1.404 ms [state-dump] RaySyncerRegister - 2 total (0 active), Execution time: mean = 2.053 us, total = 4.106 us, Queueing time: mean = 222.000 ns, max = 369.000 ns, min = 75.000 ns, total = 444.000 ns [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 2 total (0 active), Execution time: mean = 259.466 us, total = 518.931 us, Queueing time: mean = 34.553 us, max = 48.806 us, min = 20.299 us, total = 69.105 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 2.178 ms, total = 2.178 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 1.022 s, total = 1.022 s, Queueing time: mean = 126.779 us, max = 126.779 us, min = 126.779 us, total = 126.779 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 43.810 us, total = 43.810 us, Queueing time: mean = 116.080 us, max = 116.080 us, min = 116.080 us, total = 116.080 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 1.935 ms, total = 1.935 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 2.002 ms, total = 2.002 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 1.278 ms, total = 1.278 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 288.995 us, total = 288.995 us, Queueing time: mean = 122.868 us, max = 122.868 us, min = 122.868 us, total = 122.868 us [state-dump] Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 74.750 us, total = 74.750 us, Queueing time: mean = 225.429 us, max = 225.429 us, min = 225.429 us, total = 225.429 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 242.297 us, total = 242.297 us, Queueing time: mean = 122.977 us, max = 122.977 us, min = 122.977 us, total = 122.977 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 92.080 us, total = 92.080 us, Queueing time: mean = 36.492 us, max = 36.492 us, min = 36.492 us, total = 36.492 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 1.746 ms, total = 1.746 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] DebugString() time ms: 1 [state-dump] [state-dump] [2025-01-20 23:19:54,907 I 13636 13664] (raylet) store.cc:564: Plasma store debug dump: Current usage: 0 / 2.14748 GB - num bytes created total: 168 0 pending objects of total size 0MB - objects spillable: 0 - bytes spillable: 0 - objects unsealed: 0 - bytes unsealed: 0 - objects in use: 0 - bytes in use: 0 - objects evictable: 0 - bytes evictable: 0 - objects created by worker: 0 - bytes created by worker: 0 - objects restored: 0 - bytes restored: 0 - objects received: 0 - bytes received: 0 - objects errored: 0 - bytes errored: 0 [2025-01-20 23:19:55,973 I 13636 13636] (raylet) node_manager.cc:525: [state-dump] NodeManager: [state-dump] Node ID: 94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3 [state-dump] Node name: 192.168.0.2 [state-dump] InitialConfigResources: {object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, GPU: 20000, node:__internal_head__: 10000, accelerator_type:A40: 10000, memory: 844966629380000} [state-dump] ClusterTaskManager: [state-dump] ========== Node: 94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3 ================= [state-dump] Infeasible queue length: 0 [state-dump] Schedule queue length: 0 [state-dump] Dispatch queue length: 0 [state-dump] num_waiting_for_resource: 0 [state-dump] num_waiting_for_plasma_memory: 0 [state-dump] num_waiting_for_remote_node_resources: 0 [state-dump] num_worker_not_started_by_job_config_not_exist: 0 [state-dump] num_worker_not_started_by_registration_timeout: 0 [state-dump] num_tasks_waiting_for_workers: 0 [state-dump] num_cancelled_tasks: 0 [state-dump] cluster_resource_scheduler state: [state-dump] Local id: -2729954820020547849 Local resources: {"total":{CPU: [200000], memory: [844966629380000], accelerator_type:A40: [10000], node:192.168.0.2: [10000], node:__internal_head__: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000]}}, "available": {CPU: [200000], memory: [844966629380000], accelerator_type:A40: [10000], node:192.168.0.2: [10000], node:__internal_head__: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000]}}, "labels":{"ray.io/node_id":"94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3",} is_draining: 0 is_idle: 1 Cluster resources: node id: -2729954820020547849{"total":{object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, GPU: 20000, node:__internal_head__: 10000, accelerator_type:A40: 10000, memory: 844966629380000}}, "available": {GPU: 20000, node:__internal_head__: 10000, memory: 844966629380000, accelerator_type:A40: 10000, object_store_memory: 21474836480000, node:192.168.0.2: 10000, CPU: 200000}}, "labels":{"ray.io/node_id":"94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} [state-dump] Waiting tasks size: 0 [state-dump] Number of executing tasks: 0 [state-dump] Number of pinned task arguments: 0 [state-dump] Number of total spilled tasks: 0 [state-dump] Number of spilled waiting tasks: 0 [state-dump] Number of spilled unschedulable tasks: 0 [state-dump] Resource usage { [state-dump] } [state-dump] Backlog Size per scheduling descriptor :{workerId: num backlogs}: [state-dump] [state-dump] Running tasks by scheduling class: [state-dump] ================================================== [state-dump] [state-dump] ClusterResources: [state-dump] LocalObjectManager: [state-dump] - num pinned objects: 0 [state-dump] - pinned objects size: 0 [state-dump] - num objects pending restore: 0 [state-dump] - num objects pending spill: 0 [state-dump] - num bytes pending spill: 0 [state-dump] - num bytes currently spilled: 0 [state-dump] - cumulative spill requests: 0 [state-dump] - cumulative restore requests: 0 [state-dump] - spilled objects pending delete: 0 [state-dump] [state-dump] ObjectManager: [state-dump] - num local objects: 0 [state-dump] - num unfulfilled push requests: 0 [state-dump] - num object pull requests: 0 [state-dump] - num chunks received total: 0 [state-dump] - num chunks received failed (all): 0 [state-dump] - num chunks received failed / cancelled: 0 [state-dump] - num chunks received failed / plasma error: 0 [state-dump] Event stats: [state-dump] Global stats: 0 total (0 active) [state-dump] Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Execution time: mean = -nan s, total = 0.000 s [state-dump] Event stats: [state-dump] PushManager: [state-dump] - num pushes in flight: 0 [state-dump] - num chunks in flight: 0 [state-dump] - num chunks remaining: 0 [state-dump] - max chunks allowed: 409 [state-dump] OwnershipBasedObjectDirectory: [state-dump] - num listeners: 0 [state-dump] - cumulative location updates: 0 [state-dump] - num location updates per second: 0.000 [state-dump] - num location lookups per second: 0.000 [state-dump] - num locations added per second: 0.000 [state-dump] - num locations removed per second: 0.000 [state-dump] BufferPool: [state-dump] - create buffer state map size: 0 [state-dump] PullManager: [state-dump] - num bytes available for pulled objects: 2147483648 [state-dump] - num bytes being pulled (all): 0 [state-dump] - num bytes being pulled / pinned: 0 [state-dump] - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - first get request bundle: N/A [state-dump] - first wait request bundle: N/A [state-dump] - first task request bundle: N/A [state-dump] - num objects queued: 0 [state-dump] - num objects actively pulled (all): 0 [state-dump] - num objects actively pulled / pinned: 0 [state-dump] - num bundles being pulled: 0 [state-dump] - num pull retries: 0 [state-dump] - max timeout seconds: 0 [state-dump] - max timeout request is already processed. No entry. [state-dump] [state-dump] WorkerPool: [state-dump] - registered jobs: 1 [state-dump] - process_failed_job_config_missing: 0 [state-dump] - process_failed_rate_limited: 0 [state-dump] - process_failed_pending_registration: 0 [state-dump] - process_failed_runtime_env_setup_failed: 0 [state-dump] - num PYTHON workers: 20 [state-dump] - num PYTHON drivers: 1 [state-dump] - num PYTHON pending start requests: 0 [state-dump] - num PYTHON pending registration requests: 0 [state-dump] - num object spill callbacks queued: 0 [state-dump] - num object restore queued: 0 [state-dump] - num util functions queued: 0 [state-dump] - num idle workers: 20 [state-dump] TaskDependencyManager: [state-dump] - task deps map size: 0 [state-dump] - get req map size: 0 [state-dump] - wait req map size: 0 [state-dump] - local objects map size: 0 [state-dump] WaitManager: [state-dump] - num active wait requests: 0 [state-dump] Subscriber: [state-dump] Channel WORKER_OBJECT_LOCATIONS_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_OBJECT_EVICTION [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_REF_REMOVED_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] num async plasma notifications: 0 [state-dump] Remote node managers: [state-dump] Event stats: [state-dump] Global stats: 94533 total (35 active) [state-dump] Queueing time: mean = 743.472 us, max = 59.826 s, min = -0.000 s, total = 70.283 s [state-dump] Execution time: mean = 6.513 ms, total = 615.656 s [state-dump] Event stats: [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog - 22659 total (0 active), Execution time: mean = 540.182 us, total = 12.240 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 22659 total (0 active), Execution time: mean = 37.309 us, total = 845.375 ms, Queueing time: mean = 112.785 us, max = 2.644 ms, min = 2.750 us, total = 2.556 s [state-dump] ObjectManager.UpdateAvailableMemory - 10790 total (0 active), Execution time: mean = 6.214 us, total = 67.046 ms, Queueing time: mean = 109.748 us, max = 1.076 ms, min = 2.228 us, total = 1.184 s [state-dump] NodeManager.CheckGC - 10790 total (1 active), Execution time: mean = 2.903 us, total = 31.327 ms, Queueing time: mean = 101.269 us, max = 25.875 ms, min = 3.123 us, total = 1.093 s [state-dump] RaySyncer.OnDemandBroadcasting - 10790 total (1 active), Execution time: mean = 11.132 us, total = 120.120 ms, Queueing time: mean = 93.958 us, max = 25.869 ms, min = 8.560 us, total = 1.014 s [state-dump] RayletWorkerPool.deadline_timer.kill_idle_workers - 5398 total (1 active), Execution time: mean = 18.974 us, total = 102.422 ms, Queueing time: mean = 80.357 us, max = 26.386 ms, min = -0.000 s, total = 433.767 ms [state-dump] MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 4312 total (1 active), Execution time: mean = 455.540 us, total = 1.964 s, Queueing time: mean = 74.843 us, max = 1.481 ms, min = -0.000 s, total = 322.721 ms [state-dump] NodeManager.deadline_timer.spill_objects_when_over_threshold - 1080 total (1 active), Execution time: mean = 3.111 us, total = 3.360 ms, Queueing time: mean = 174.801 us, max = 2.222 ms, min = 5.004 us, total = 188.785 ms [state-dump] NodeManager.deadline_timer.flush_free_objects - 1080 total (1 active), Execution time: mean = 9.018 us, total = 9.739 ms, Queueing time: mean = 170.786 us, max = 2.212 ms, min = 6.450 us, total = 184.449 ms [state-dump] NodeManager.ScheduleAndDispatchTasks - 1080 total (1 active), Execution time: mean = 16.409 us, total = 17.721 ms, Queueing time: mean = 77.226 us, max = 2.581 ms, min = 10.666 us, total = 83.404 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 1079 total (0 active), Execution time: mean = 100.118 us, total = 108.027 ms, Queueing time: mean = 114.763 us, max = 1.722 ms, min = 6.847 us, total = 123.829 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad - 1079 total (0 active), Execution time: mean = 621.255 us, total = 670.335 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ClusterResourceManager.ResetRemoteNodeView - 361 total (1 active), Execution time: mean = 9.302 us, total = 3.358 ms, Queueing time: mean = 77.408 us, max = 283.776 us, min = 14.912 us, total = 27.944 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 216 total (0 active), Execution time: mean = 1.475 ms, total = 318.509 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.deadline_timer.record_metrics - 216 total (1 active), Execution time: mean = 557.784 us, total = 120.481 ms, Queueing time: mean = 328.164 us, max = 1.419 ms, min = 11.295 us, total = 70.883 ms [state-dump] NodeManager.GcsCheckAlive - 216 total (1 active), Execution time: mean = 280.554 us, total = 60.600 ms, Queueing time: mean = 603.590 us, max = 1.682 ms, min = 6.690 us, total = 130.375 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 216 total (0 active), Execution time: mean = 51.239 us, total = 11.068 ms, Queueing time: mean = 128.791 us, max = 4.779 ms, min = 11.561 us, total = 27.819 ms [state-dump] NodeManager.deadline_timer.debug_state_dump - 108 total (1 active), Execution time: mean = 1.696 ms, total = 183.220 ms, Queueing time: mean = 67.879 us, max = 163.431 us, min = 15.806 us, total = 7.331 ms [state-dump] ClientConnection.async_read.ProcessMessageHeader - 95 total (21 active), Execution time: mean = 7.620 us, total = 723.861 us, Queueing time: mean = 660.815 ms, max = 59.826 s, min = 29.537 us, total = 62.777 s [state-dump] ClientConnection.async_read.ProcessMessage - 74 total (0 active), Execution time: mean = 752.373 us, total = 55.676 ms, Queueing time: mean = 18.898 us, max = 247.615 us, min = 4.453 us, total = 1.398 ms [state-dump] ClientConnection.async_write.DoAsyncWrites - 22 total (0 active), Execution time: mean = 2.129 us, total = 46.834 us, Queueing time: mean = 43.475 us, max = 264.443 us, min = 17.727 us, total = 956.459 us [state-dump] ObjectManager.ObjectDeleted - 21 total (0 active), Execution time: mean = 18.654 us, total = 391.728 us, Queueing time: mean = 66.932 us, max = 269.040 us, min = 24.280 us, total = 1.406 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 21 total (0 active), Execution time: mean = 119.783 us, total = 2.515 ms, Queueing time: mean = 119.709 us, max = 193.975 us, min = 26.529 us, total = 2.514 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig - 21 total (0 active), Execution time: mean = 800.451 us, total = 16.809 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ObjectManager.ObjectAdded - 21 total (0 active), Execution time: mean = 13.522 us, total = 283.961 us, Queueing time: mean = 115.200 us, max = 203.067 us, min = 15.853 us, total = 2.419 ms [state-dump] NodeManager.deadline_timer.print_event_loop_stats - 18 total (1 active, 1 running), Execution time: mean = 2.450 ms, total = 44.093 ms, Queueing time: mean = 70.655 us, max = 215.454 us, min = 15.835 us, total = 1.272 ms [state-dump] PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 204.482 us, total = 2.658 ms, Queueing time: mean = 3.109 ms, max = 9.915 ms, min = 28.359 us, total = 40.413 ms [state-dump] - 12 total (0 active), Execution time: mean = 1.109 us, total = 13.308 us, Queueing time: mean = 117.235 us, max = 207.332 us, min = 32.817 us, total = 1.407 ms [state-dump] RaySyncer.BroadcastMessage - 12 total (0 active), Execution time: mean = 229.790 us, total = 2.757 ms, Queueing time: mean = 796.500 ns, max = 1.330 us, min = 180.000 ns, total = 9.558 us [state-dump] NodeManagerService.grpc_server.RequestWorkerLease.HandleRequestImpl - 10 total (0 active), Execution time: mean = 133.400 us, total = 1.334 ms, Queueing time: mean = 133.654 us, max = 258.016 us, min = 50.206 us, total = 1.337 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker - 10 total (0 active), Execution time: mean = 698.937 us, total = 6.989 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.RequestWorkerLease - 10 total (0 active), Execution time: mean = 967.372 us, total = 9.674 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] WorkerPool.PopWorkerCallback - 10 total (0 active), Execution time: mean = 24.488 us, total = 244.878 us, Queueing time: mean = 114.997 us, max = 206.869 us, min = 23.653 us, total = 1.150 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker.HandleRequestImpl - 10 total (0 active), Execution time: mean = 129.906 us, total = 1.299 ms, Queueing time: mean = 109.463 us, max = 137.712 us, min = 31.332 us, total = 1.095 ms [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 3 total (1 active), Execution time: mean = 199.200 s, total = 597.599 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 1.320 ms, total = 2.640 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.GCTaskFailureReason - 2 total (1 active), Execution time: mean = 4.207 us, total = 8.414 us, Queueing time: mean = 41.935 us, max = 83.871 us, min = 83.871 us, total = 83.871 us [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 142.119 us, total = 284.237 us, Queueing time: mean = 702.200 us, max = 1.287 ms, min = 117.240 us, total = 1.404 ms [state-dump] RaySyncerRegister - 2 total (0 active), Execution time: mean = 2.053 us, total = 4.106 us, Queueing time: mean = 222.000 ns, max = 369.000 ns, min = 75.000 ns, total = 444.000 ns [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 2 total (0 active), Execution time: mean = 259.466 us, total = 518.931 us, Queueing time: mean = 34.553 us, max = 48.806 us, min = 20.299 us, total = 69.105 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 2.178 ms, total = 2.178 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 1.022 s, total = 1.022 s, Queueing time: mean = 126.779 us, max = 126.779 us, min = 126.779 us, total = 126.779 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 43.810 us, total = 43.810 us, Queueing time: mean = 116.080 us, max = 116.080 us, min = 116.080 us, total = 116.080 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 1.935 ms, total = 1.935 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 2.002 ms, total = 2.002 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 1.278 ms, total = 1.278 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 288.995 us, total = 288.995 us, Queueing time: mean = 122.868 us, max = 122.868 us, min = 122.868 us, total = 122.868 us [state-dump] Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 74.750 us, total = 74.750 us, Queueing time: mean = 225.429 us, max = 225.429 us, min = 225.429 us, total = 225.429 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 242.297 us, total = 242.297 us, Queueing time: mean = 122.977 us, max = 122.977 us, min = 122.977 us, total = 122.977 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 92.080 us, total = 92.080 us, Queueing time: mean = 36.492 us, max = 36.492 us, min = 36.492 us, total = 36.492 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 1.746 ms, total = 1.746 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] DebugString() time ms: 1 [state-dump] [state-dump] [2025-01-20 23:20:54,908 I 13636 13664] (raylet) store.cc:564: Plasma store debug dump: Current usage: 0 / 2.14748 GB - num bytes created total: 168 0 pending objects of total size 0MB - objects spillable: 0 - bytes spillable: 0 - objects unsealed: 0 - bytes unsealed: 0 - objects in use: 0 - bytes in use: 0 - objects evictable: 0 - bytes evictable: 0 - objects created by worker: 0 - bytes created by worker: 0 - objects restored: 0 - bytes restored: 0 - objects received: 0 - bytes received: 0 - objects errored: 0 - bytes errored: 0 [2025-01-20 23:20:55,974 I 13636 13636] (raylet) node_manager.cc:525: [state-dump] NodeManager: [state-dump] Node ID: 94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3 [state-dump] Node name: 192.168.0.2 [state-dump] InitialConfigResources: {object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, GPU: 20000, node:__internal_head__: 10000, accelerator_type:A40: 10000, memory: 844966629380000} [state-dump] ClusterTaskManager: [state-dump] ========== Node: 94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3 ================= [state-dump] Infeasible queue length: 0 [state-dump] Schedule queue length: 0 [state-dump] Dispatch queue length: 0 [state-dump] num_waiting_for_resource: 0 [state-dump] num_waiting_for_plasma_memory: 0 [state-dump] num_waiting_for_remote_node_resources: 0 [state-dump] num_worker_not_started_by_job_config_not_exist: 0 [state-dump] num_worker_not_started_by_registration_timeout: 0 [state-dump] num_tasks_waiting_for_workers: 0 [state-dump] num_cancelled_tasks: 0 [state-dump] cluster_resource_scheduler state: [state-dump] Local id: -2729954820020547849 Local resources: {"total":{CPU: [200000], memory: [844966629380000], accelerator_type:A40: [10000], node:192.168.0.2: [10000], node:__internal_head__: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000]}}, "available": {CPU: [200000], memory: [844966629380000], accelerator_type:A40: [10000], node:192.168.0.2: [10000], node:__internal_head__: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000]}}, "labels":{"ray.io/node_id":"94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3",} is_draining: 0 is_idle: 1 Cluster resources: node id: -2729954820020547849{"total":{object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, GPU: 20000, node:__internal_head__: 10000, accelerator_type:A40: 10000, memory: 844966629380000}}, "available": {GPU: 20000, node:__internal_head__: 10000, memory: 844966629380000, accelerator_type:A40: 10000, object_store_memory: 21474836480000, node:192.168.0.2: 10000, CPU: 200000}}, "labels":{"ray.io/node_id":"94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} [state-dump] Waiting tasks size: 0 [state-dump] Number of executing tasks: 0 [state-dump] Number of pinned task arguments: 0 [state-dump] Number of total spilled tasks: 0 [state-dump] Number of spilled waiting tasks: 0 [state-dump] Number of spilled unschedulable tasks: 0 [state-dump] Resource usage { [state-dump] } [state-dump] Backlog Size per scheduling descriptor :{workerId: num backlogs}: [state-dump] [state-dump] Running tasks by scheduling class: [state-dump] ================================================== [state-dump] [state-dump] ClusterResources: [state-dump] LocalObjectManager: [state-dump] - num pinned objects: 0 [state-dump] - pinned objects size: 0 [state-dump] - num objects pending restore: 0 [state-dump] - num objects pending spill: 0 [state-dump] - num bytes pending spill: 0 [state-dump] - num bytes currently spilled: 0 [state-dump] - cumulative spill requests: 0 [state-dump] - cumulative restore requests: 0 [state-dump] - spilled objects pending delete: 0 [state-dump] [state-dump] ObjectManager: [state-dump] - num local objects: 0 [state-dump] - num unfulfilled push requests: 0 [state-dump] - num object pull requests: 0 [state-dump] - num chunks received total: 0 [state-dump] - num chunks received failed (all): 0 [state-dump] - num chunks received failed / cancelled: 0 [state-dump] - num chunks received failed / plasma error: 0 [state-dump] Event stats: [state-dump] Global stats: 0 total (0 active) [state-dump] Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Execution time: mean = -nan s, total = 0.000 s [state-dump] Event stats: [state-dump] PushManager: [state-dump] - num pushes in flight: 0 [state-dump] - num chunks in flight: 0 [state-dump] - num chunks remaining: 0 [state-dump] - max chunks allowed: 409 [state-dump] OwnershipBasedObjectDirectory: [state-dump] - num listeners: 0 [state-dump] - cumulative location updates: 0 [state-dump] - num location updates per second: 0.000 [state-dump] - num location lookups per second: 0.000 [state-dump] - num locations added per second: 0.000 [state-dump] - num locations removed per second: 0.000 [state-dump] BufferPool: [state-dump] - create buffer state map size: 0 [state-dump] PullManager: [state-dump] - num bytes available for pulled objects: 2147483648 [state-dump] - num bytes being pulled (all): 0 [state-dump] - num bytes being pulled / pinned: 0 [state-dump] - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - first get request bundle: N/A [state-dump] - first wait request bundle: N/A [state-dump] - first task request bundle: N/A [state-dump] - num objects queued: 0 [state-dump] - num objects actively pulled (all): 0 [state-dump] - num objects actively pulled / pinned: 0 [state-dump] - num bundles being pulled: 0 [state-dump] - num pull retries: 0 [state-dump] - max timeout seconds: 0 [state-dump] - max timeout request is already processed. No entry. [state-dump] [state-dump] WorkerPool: [state-dump] - registered jobs: 1 [state-dump] - process_failed_job_config_missing: 0 [state-dump] - process_failed_rate_limited: 0 [state-dump] - process_failed_pending_registration: 0 [state-dump] - process_failed_runtime_env_setup_failed: 0 [state-dump] - num PYTHON workers: 20 [state-dump] - num PYTHON drivers: 1 [state-dump] - num PYTHON pending start requests: 0 [state-dump] - num PYTHON pending registration requests: 0 [state-dump] - num object spill callbacks queued: 0 [state-dump] - num object restore queued: 0 [state-dump] - num util functions queued: 0 [state-dump] - num idle workers: 20 [state-dump] TaskDependencyManager: [state-dump] - task deps map size: 0 [state-dump] - get req map size: 0 [state-dump] - wait req map size: 0 [state-dump] - local objects map size: 0 [state-dump] WaitManager: [state-dump] - num active wait requests: 0 [state-dump] Subscriber: [state-dump] Channel WORKER_OBJECT_LOCATIONS_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_OBJECT_EVICTION [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_REF_REMOVED_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] num async plasma notifications: 0 [state-dump] Remote node managers: [state-dump] Event stats: [state-dump] Global stats: 99764 total (35 active) [state-dump] Queueing time: mean = 708.688 us, max = 59.826 s, min = -0.000 s, total = 70.702 s [state-dump] Execution time: mean = 6.181 ms, total = 616.613 s [state-dump] Event stats: [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog - 23919 total (0 active), Execution time: mean = 540.630 us, total = 12.931 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 23919 total (0 active), Execution time: mean = 37.236 us, total = 890.644 ms, Queueing time: mean = 112.939 us, max = 2.644 ms, min = 2.750 us, total = 2.701 s [state-dump] ObjectManager.UpdateAvailableMemory - 11389 total (0 active), Execution time: mean = 6.211 us, total = 70.741 ms, Queueing time: mean = 109.933 us, max = 1.076 ms, min = 2.228 us, total = 1.252 s [state-dump] NodeManager.CheckGC - 11389 total (1 active), Execution time: mean = 2.903 us, total = 33.059 ms, Queueing time: mean = 101.118 us, max = 25.875 ms, min = 3.123 us, total = 1.152 s [state-dump] RaySyncer.OnDemandBroadcasting - 11389 total (1 active), Execution time: mean = 11.103 us, total = 126.449 ms, Queueing time: mean = 93.834 us, max = 25.869 ms, min = 8.560 us, total = 1.069 s [state-dump] RayletWorkerPool.deadline_timer.kill_idle_workers - 5698 total (1 active), Execution time: mean = 19.021 us, total = 108.379 ms, Queueing time: mean = 80.438 us, max = 26.386 ms, min = -0.000 s, total = 458.337 ms [state-dump] MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 4551 total (1 active), Execution time: mean = 455.658 us, total = 2.074 s, Queueing time: mean = 74.712 us, max = 1.481 ms, min = -0.000 s, total = 340.015 ms [state-dump] NodeManager.deadline_timer.spill_objects_when_over_threshold - 1140 total (1 active), Execution time: mean = 3.118 us, total = 3.554 ms, Queueing time: mean = 175.699 us, max = 2.222 ms, min = 5.004 us, total = 200.297 ms [state-dump] NodeManager.deadline_timer.flush_free_objects - 1140 total (1 active), Execution time: mean = 9.065 us, total = 10.334 ms, Queueing time: mean = 171.657 us, max = 2.212 ms, min = 6.450 us, total = 195.689 ms [state-dump] NodeManager.ScheduleAndDispatchTasks - 1140 total (1 active), Execution time: mean = 16.468 us, total = 18.773 ms, Queueing time: mean = 77.525 us, max = 2.581 ms, min = 10.666 us, total = 88.378 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 1139 total (0 active), Execution time: mean = 100.001 us, total = 113.901 ms, Queueing time: mean = 114.504 us, max = 1.722 ms, min = 6.847 us, total = 130.420 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad - 1139 total (0 active), Execution time: mean = 621.093 us, total = 707.425 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ClusterResourceManager.ResetRemoteNodeView - 381 total (1 active), Execution time: mean = 12.275 us, total = 4.677 ms, Queueing time: mean = 77.265 us, max = 283.776 us, min = 14.912 us, total = 29.438 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 228 total (0 active), Execution time: mean = 1.487 ms, total = 339.072 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.deadline_timer.record_metrics - 228 total (1 active), Execution time: mean = 562.171 us, total = 128.175 ms, Queueing time: mean = 328.381 us, max = 1.419 ms, min = 11.295 us, total = 74.871 ms [state-dump] NodeManager.GcsCheckAlive - 228 total (1 active), Execution time: mean = 282.908 us, total = 64.503 ms, Queueing time: mean = 606.174 us, max = 1.682 ms, min = 6.690 us, total = 138.208 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 228 total (0 active), Execution time: mean = 51.704 us, total = 11.789 ms, Queueing time: mean = 128.372 us, max = 4.779 ms, min = 11.561 us, total = 29.269 ms [state-dump] NodeManager.deadline_timer.debug_state_dump - 114 total (1 active), Execution time: mean = 1.703 ms, total = 194.128 ms, Queueing time: mean = 67.967 us, max = 163.431 us, min = 15.806 us, total = 7.748 ms [state-dump] ClientConnection.async_read.ProcessMessageHeader - 95 total (21 active), Execution time: mean = 7.620 us, total = 723.861 us, Queueing time: mean = 660.815 ms, max = 59.826 s, min = 29.537 us, total = 62.777 s [state-dump] ClientConnection.async_read.ProcessMessage - 74 total (0 active), Execution time: mean = 752.373 us, total = 55.676 ms, Queueing time: mean = 18.898 us, max = 247.615 us, min = 4.453 us, total = 1.398 ms [state-dump] ClientConnection.async_write.DoAsyncWrites - 22 total (0 active), Execution time: mean = 2.129 us, total = 46.834 us, Queueing time: mean = 43.475 us, max = 264.443 us, min = 17.727 us, total = 956.459 us [state-dump] ObjectManager.ObjectDeleted - 21 total (0 active), Execution time: mean = 18.654 us, total = 391.728 us, Queueing time: mean = 66.932 us, max = 269.040 us, min = 24.280 us, total = 1.406 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 21 total (0 active), Execution time: mean = 119.783 us, total = 2.515 ms, Queueing time: mean = 119.709 us, max = 193.975 us, min = 26.529 us, total = 2.514 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig - 21 total (0 active), Execution time: mean = 800.451 us, total = 16.809 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ObjectManager.ObjectAdded - 21 total (0 active), Execution time: mean = 13.522 us, total = 283.961 us, Queueing time: mean = 115.200 us, max = 203.067 us, min = 15.853 us, total = 2.419 ms [state-dump] NodeManager.deadline_timer.print_event_loop_stats - 19 total (1 active, 1 running), Execution time: mean = 2.471 ms, total = 46.957 ms, Queueing time: mean = 70.181 us, max = 215.454 us, min = 15.835 us, total = 1.333 ms [state-dump] PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 204.482 us, total = 2.658 ms, Queueing time: mean = 3.109 ms, max = 9.915 ms, min = 28.359 us, total = 40.413 ms [state-dump] - 12 total (0 active), Execution time: mean = 1.109 us, total = 13.308 us, Queueing time: mean = 117.235 us, max = 207.332 us, min = 32.817 us, total = 1.407 ms [state-dump] RaySyncer.BroadcastMessage - 12 total (0 active), Execution time: mean = 229.790 us, total = 2.757 ms, Queueing time: mean = 796.500 ns, max = 1.330 us, min = 180.000 ns, total = 9.558 us [state-dump] NodeManagerService.grpc_server.RequestWorkerLease.HandleRequestImpl - 10 total (0 active), Execution time: mean = 133.400 us, total = 1.334 ms, Queueing time: mean = 133.654 us, max = 258.016 us, min = 50.206 us, total = 1.337 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker - 10 total (0 active), Execution time: mean = 698.937 us, total = 6.989 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.RequestWorkerLease - 10 total (0 active), Execution time: mean = 967.372 us, total = 9.674 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] WorkerPool.PopWorkerCallback - 10 total (0 active), Execution time: mean = 24.488 us, total = 244.878 us, Queueing time: mean = 114.997 us, max = 206.869 us, min = 23.653 us, total = 1.150 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker.HandleRequestImpl - 10 total (0 active), Execution time: mean = 129.906 us, total = 1.299 ms, Queueing time: mean = 109.463 us, max = 137.712 us, min = 31.332 us, total = 1.095 ms [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 3 total (1 active), Execution time: mean = 199.200 s, total = 597.599 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 1.320 ms, total = 2.640 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.GCTaskFailureReason - 2 total (1 active), Execution time: mean = 4.207 us, total = 8.414 us, Queueing time: mean = 41.935 us, max = 83.871 us, min = 83.871 us, total = 83.871 us [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 142.119 us, total = 284.237 us, Queueing time: mean = 702.200 us, max = 1.287 ms, min = 117.240 us, total = 1.404 ms [state-dump] RaySyncerRegister - 2 total (0 active), Execution time: mean = 2.053 us, total = 4.106 us, Queueing time: mean = 222.000 ns, max = 369.000 ns, min = 75.000 ns, total = 444.000 ns [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 2 total (0 active), Execution time: mean = 259.466 us, total = 518.931 us, Queueing time: mean = 34.553 us, max = 48.806 us, min = 20.299 us, total = 69.105 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 2.178 ms, total = 2.178 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 1.022 s, total = 1.022 s, Queueing time: mean = 126.779 us, max = 126.779 us, min = 126.779 us, total = 126.779 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 43.810 us, total = 43.810 us, Queueing time: mean = 116.080 us, max = 116.080 us, min = 116.080 us, total = 116.080 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 1.935 ms, total = 1.935 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 2.002 ms, total = 2.002 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 1.278 ms, total = 1.278 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 288.995 us, total = 288.995 us, Queueing time: mean = 122.868 us, max = 122.868 us, min = 122.868 us, total = 122.868 us [state-dump] Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 74.750 us, total = 74.750 us, Queueing time: mean = 225.429 us, max = 225.429 us, min = 225.429 us, total = 225.429 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 242.297 us, total = 242.297 us, Queueing time: mean = 122.977 us, max = 122.977 us, min = 122.977 us, total = 122.977 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 92.080 us, total = 92.080 us, Queueing time: mean = 36.492 us, max = 36.492 us, min = 36.492 us, total = 36.492 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 1.746 ms, total = 1.746 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] DebugString() time ms: 0 [state-dump] [state-dump] [2025-01-20 23:21:54,908 I 13636 13664] (raylet) store.cc:564: Plasma store debug dump: Current usage: 0 / 2.14748 GB - num bytes created total: 168 0 pending objects of total size 0MB - objects spillable: 0 - bytes spillable: 0 - objects unsealed: 0 - bytes unsealed: 0 - objects in use: 0 - bytes in use: 0 - objects evictable: 0 - bytes evictable: 0 - objects created by worker: 0 - bytes created by worker: 0 - objects restored: 0 - bytes restored: 0 - objects received: 0 - bytes received: 0 - objects errored: 0 - bytes errored: 0 [2025-01-20 23:21:55,977 I 13636 13636] (raylet) node_manager.cc:525: [state-dump] NodeManager: [state-dump] Node ID: 94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3 [state-dump] Node name: 192.168.0.2 [state-dump] InitialConfigResources: {object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, GPU: 20000, node:__internal_head__: 10000, accelerator_type:A40: 10000, memory: 844966629380000} [state-dump] ClusterTaskManager: [state-dump] ========== Node: 94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3 ================= [state-dump] Infeasible queue length: 0 [state-dump] Schedule queue length: 0 [state-dump] Dispatch queue length: 0 [state-dump] num_waiting_for_resource: 0 [state-dump] num_waiting_for_plasma_memory: 0 [state-dump] num_waiting_for_remote_node_resources: 0 [state-dump] num_worker_not_started_by_job_config_not_exist: 0 [state-dump] num_worker_not_started_by_registration_timeout: 0 [state-dump] num_tasks_waiting_for_workers: 0 [state-dump] num_cancelled_tasks: 0 [state-dump] cluster_resource_scheduler state: [state-dump] Local id: -2729954820020547849 Local resources: {"total":{CPU: [200000], memory: [844966629380000], accelerator_type:A40: [10000], node:192.168.0.2: [10000], node:__internal_head__: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000]}}, "available": {CPU: [200000], memory: [844966629380000], accelerator_type:A40: [10000], node:192.168.0.2: [10000], node:__internal_head__: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000]}}, "labels":{"ray.io/node_id":"94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3",} is_draining: 0 is_idle: 1 Cluster resources: node id: -2729954820020547849{"total":{object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, GPU: 20000, node:__internal_head__: 10000, accelerator_type:A40: 10000, memory: 844966629380000}}, "available": {GPU: 20000, node:__internal_head__: 10000, memory: 844966629380000, accelerator_type:A40: 10000, object_store_memory: 21474836480000, node:192.168.0.2: 10000, CPU: 200000}}, "labels":{"ray.io/node_id":"94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} [state-dump] Waiting tasks size: 0 [state-dump] Number of executing tasks: 0 [state-dump] Number of pinned task arguments: 0 [state-dump] Number of total spilled tasks: 0 [state-dump] Number of spilled waiting tasks: 0 [state-dump] Number of spilled unschedulable tasks: 0 [state-dump] Resource usage { [state-dump] } [state-dump] Backlog Size per scheduling descriptor :{workerId: num backlogs}: [state-dump] [state-dump] Running tasks by scheduling class: [state-dump] ================================================== [state-dump] [state-dump] ClusterResources: [state-dump] LocalObjectManager: [state-dump] - num pinned objects: 0 [state-dump] - pinned objects size: 0 [state-dump] - num objects pending restore: 0 [state-dump] - num objects pending spill: 0 [state-dump] - num bytes pending spill: 0 [state-dump] - num bytes currently spilled: 0 [state-dump] - cumulative spill requests: 0 [state-dump] - cumulative restore requests: 0 [state-dump] - spilled objects pending delete: 0 [state-dump] [state-dump] ObjectManager: [state-dump] - num local objects: 0 [state-dump] - num unfulfilled push requests: 0 [state-dump] - num object pull requests: 0 [state-dump] - num chunks received total: 0 [state-dump] - num chunks received failed (all): 0 [state-dump] - num chunks received failed / cancelled: 0 [state-dump] - num chunks received failed / plasma error: 0 [state-dump] Event stats: [state-dump] Global stats: 0 total (0 active) [state-dump] Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Execution time: mean = -nan s, total = 0.000 s [state-dump] Event stats: [state-dump] PushManager: [state-dump] - num pushes in flight: 0 [state-dump] - num chunks in flight: 0 [state-dump] - num chunks remaining: 0 [state-dump] - max chunks allowed: 409 [state-dump] OwnershipBasedObjectDirectory: [state-dump] - num listeners: 0 [state-dump] - cumulative location updates: 0 [state-dump] - num location updates per second: 0.000 [state-dump] - num location lookups per second: 0.000 [state-dump] - num locations added per second: 0.000 [state-dump] - num locations removed per second: 0.000 [state-dump] BufferPool: [state-dump] - create buffer state map size: 0 [state-dump] PullManager: [state-dump] - num bytes available for pulled objects: 2147483648 [state-dump] - num bytes being pulled (all): 0 [state-dump] - num bytes being pulled / pinned: 0 [state-dump] - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - first get request bundle: N/A [state-dump] - first wait request bundle: N/A [state-dump] - first task request bundle: N/A [state-dump] - num objects queued: 0 [state-dump] - num objects actively pulled (all): 0 [state-dump] - num objects actively pulled / pinned: 0 [state-dump] - num bundles being pulled: 0 [state-dump] - num pull retries: 0 [state-dump] - max timeout seconds: 0 [state-dump] - max timeout request is already processed. No entry. [state-dump] [state-dump] WorkerPool: [state-dump] - registered jobs: 1 [state-dump] - process_failed_job_config_missing: 0 [state-dump] - process_failed_rate_limited: 0 [state-dump] - process_failed_pending_registration: 0 [state-dump] - process_failed_runtime_env_setup_failed: 0 [state-dump] - num PYTHON workers: 20 [state-dump] - num PYTHON drivers: 1 [state-dump] - num PYTHON pending start requests: 0 [state-dump] - num PYTHON pending registration requests: 0 [state-dump] - num object spill callbacks queued: 0 [state-dump] - num object restore queued: 0 [state-dump] - num util functions queued: 0 [state-dump] - num idle workers: 20 [state-dump] TaskDependencyManager: [state-dump] - task deps map size: 0 [state-dump] - get req map size: 0 [state-dump] - wait req map size: 0 [state-dump] - local objects map size: 0 [state-dump] WaitManager: [state-dump] - num active wait requests: 0 [state-dump] Subscriber: [state-dump] Channel WORKER_OBJECT_LOCATIONS_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_OBJECT_EVICTION [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_REF_REMOVED_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] num async plasma notifications: 0 [state-dump] Remote node managers: [state-dump] Event stats: [state-dump] Global stats: 105001 total (35 active) [state-dump] Queueing time: mean = 677.451 us, max = 59.826 s, min = -0.000 s, total = 71.133 s [state-dump] Execution time: mean = 11.596 ms, total = 1217.582 s [state-dump] Event stats: [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog - 25179 total (0 active), Execution time: mean = 541.594 us, total = 13.637 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 25179 total (0 active), Execution time: mean = 37.155 us, total = 935.534 ms, Queueing time: mean = 113.217 us, max = 2.644 ms, min = 2.750 us, total = 2.851 s [state-dump] ObjectManager.UpdateAvailableMemory - 11989 total (0 active), Execution time: mean = 6.217 us, total = 74.533 ms, Queueing time: mean = 110.670 us, max = 1.076 ms, min = 2.228 us, total = 1.327 s [state-dump] NodeManager.CheckGC - 11989 total (1 active), Execution time: mean = 2.900 us, total = 34.772 ms, Queueing time: mean = 101.120 us, max = 25.875 ms, min = 3.123 us, total = 1.212 s [state-dump] RaySyncer.OnDemandBroadcasting - 11989 total (1 active), Execution time: mean = 11.063 us, total = 132.635 ms, Queueing time: mean = 93.872 us, max = 25.869 ms, min = 8.560 us, total = 1.125 s [state-dump] RayletWorkerPool.deadline_timer.kill_idle_workers - 5998 total (1 active), Execution time: mean = 18.960 us, total = 113.720 ms, Queueing time: mean = 80.395 us, max = 26.386 ms, min = -0.000 s, total = 482.210 ms [state-dump] MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 4791 total (1 active), Execution time: mean = 455.894 us, total = 2.184 s, Queueing time: mean = 74.933 us, max = 1.481 ms, min = -0.000 s, total = 359.003 ms [state-dump] NodeManager.deadline_timer.spill_objects_when_over_threshold - 1200 total (1 active), Execution time: mean = 3.130 us, total = 3.755 ms, Queueing time: mean = 175.874 us, max = 2.222 ms, min = 5.004 us, total = 211.049 ms [state-dump] NodeManager.deadline_timer.flush_free_objects - 1200 total (1 active), Execution time: mean = 9.068 us, total = 10.881 ms, Queueing time: mean = 171.834 us, max = 2.212 ms, min = 6.450 us, total = 206.200 ms [state-dump] NodeManager.ScheduleAndDispatchTasks - 1200 total (1 active), Execution time: mean = 16.485 us, total = 19.782 ms, Queueing time: mean = 77.429 us, max = 2.581 ms, min = 10.666 us, total = 92.914 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 1199 total (0 active), Execution time: mean = 99.663 us, total = 119.496 ms, Queueing time: mean = 114.332 us, max = 1.722 ms, min = 6.847 us, total = 137.084 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad - 1199 total (0 active), Execution time: mean = 621.609 us, total = 745.310 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ClusterResourceManager.ResetRemoteNodeView - 401 total (1 active), Execution time: mean = 12.082 us, total = 4.845 ms, Queueing time: mean = 77.073 us, max = 283.776 us, min = 14.912 us, total = 30.906 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 240 total (0 active), Execution time: mean = 1.496 ms, total = 359.122 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.deadline_timer.record_metrics - 240 total (1 active), Execution time: mean = 562.301 us, total = 134.952 ms, Queueing time: mean = 329.468 us, max = 1.419 ms, min = 11.295 us, total = 79.072 ms [state-dump] NodeManager.GcsCheckAlive - 240 total (1 active), Execution time: mean = 285.033 us, total = 68.408 ms, Queueing time: mean = 605.060 us, max = 1.682 ms, min = 6.690 us, total = 145.214 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 240 total (0 active), Execution time: mean = 52.018 us, total = 12.484 ms, Queueing time: mean = 127.638 us, max = 4.779 ms, min = 11.561 us, total = 30.633 ms [state-dump] NodeManager.deadline_timer.debug_state_dump - 120 total (1 active), Execution time: mean = 1.708 ms, total = 205.006 ms, Queueing time: mean = 68.093 us, max = 163.431 us, min = 15.806 us, total = 8.171 ms [state-dump] ClientConnection.async_read.ProcessMessageHeader - 95 total (21 active), Execution time: mean = 7.620 us, total = 723.861 us, Queueing time: mean = 660.815 ms, max = 59.826 s, min = 29.537 us, total = 62.777 s [state-dump] ClientConnection.async_read.ProcessMessage - 74 total (0 active), Execution time: mean = 752.373 us, total = 55.676 ms, Queueing time: mean = 18.898 us, max = 247.615 us, min = 4.453 us, total = 1.398 ms [state-dump] ClientConnection.async_write.DoAsyncWrites - 22 total (0 active), Execution time: mean = 2.129 us, total = 46.834 us, Queueing time: mean = 43.475 us, max = 264.443 us, min = 17.727 us, total = 956.459 us [state-dump] ObjectManager.ObjectDeleted - 21 total (0 active), Execution time: mean = 18.654 us, total = 391.728 us, Queueing time: mean = 66.932 us, max = 269.040 us, min = 24.280 us, total = 1.406 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 21 total (0 active), Execution time: mean = 119.783 us, total = 2.515 ms, Queueing time: mean = 119.709 us, max = 193.975 us, min = 26.529 us, total = 2.514 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig - 21 total (0 active), Execution time: mean = 800.451 us, total = 16.809 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ObjectManager.ObjectAdded - 21 total (0 active), Execution time: mean = 13.522 us, total = 283.961 us, Queueing time: mean = 115.200 us, max = 203.067 us, min = 15.853 us, total = 2.419 ms [state-dump] NodeManager.deadline_timer.print_event_loop_stats - 20 total (1 active, 1 running), Execution time: mean = 2.420 ms, total = 48.408 ms, Queueing time: mean = 69.176 us, max = 215.454 us, min = 15.835 us, total = 1.384 ms [state-dump] PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 204.482 us, total = 2.658 ms, Queueing time: mean = 3.109 ms, max = 9.915 ms, min = 28.359 us, total = 40.413 ms [state-dump] - 12 total (0 active), Execution time: mean = 1.109 us, total = 13.308 us, Queueing time: mean = 117.235 us, max = 207.332 us, min = 32.817 us, total = 1.407 ms [state-dump] RaySyncer.BroadcastMessage - 12 total (0 active), Execution time: mean = 229.790 us, total = 2.757 ms, Queueing time: mean = 796.500 ns, max = 1.330 us, min = 180.000 ns, total = 9.558 us [state-dump] NodeManagerService.grpc_server.RequestWorkerLease.HandleRequestImpl - 10 total (0 active), Execution time: mean = 133.400 us, total = 1.334 ms, Queueing time: mean = 133.654 us, max = 258.016 us, min = 50.206 us, total = 1.337 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker - 10 total (0 active), Execution time: mean = 698.937 us, total = 6.989 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.RequestWorkerLease - 10 total (0 active), Execution time: mean = 967.372 us, total = 9.674 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] WorkerPool.PopWorkerCallback - 10 total (0 active), Execution time: mean = 24.488 us, total = 244.878 us, Queueing time: mean = 114.997 us, max = 206.869 us, min = 23.653 us, total = 1.150 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker.HandleRequestImpl - 10 total (0 active), Execution time: mean = 129.906 us, total = 1.299 ms, Queueing time: mean = 109.463 us, max = 137.712 us, min = 31.332 us, total = 1.095 ms [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 4 total (1 active), Execution time: mean = 299.400 s, total = 1197.600 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 3 total (0 active), Execution time: mean = 294.453 us, total = 883.360 us, Queueing time: mean = 76.688 us, max = 160.959 us, min = 20.299 us, total = 230.064 us [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 1.320 ms, total = 2.640 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.GCTaskFailureReason - 2 total (1 active), Execution time: mean = 4.207 us, total = 8.414 us, Queueing time: mean = 41.935 us, max = 83.871 us, min = 83.871 us, total = 83.871 us [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 142.119 us, total = 284.237 us, Queueing time: mean = 702.200 us, max = 1.287 ms, min = 117.240 us, total = 1.404 ms [state-dump] RaySyncerRegister - 2 total (0 active), Execution time: mean = 2.053 us, total = 4.106 us, Queueing time: mean = 222.000 ns, max = 369.000 ns, min = 75.000 ns, total = 444.000 ns [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 2.178 ms, total = 2.178 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 1.022 s, total = 1.022 s, Queueing time: mean = 126.779 us, max = 126.779 us, min = 126.779 us, total = 126.779 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 43.810 us, total = 43.810 us, Queueing time: mean = 116.080 us, max = 116.080 us, min = 116.080 us, total = 116.080 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 1.935 ms, total = 1.935 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 2.002 ms, total = 2.002 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 1.278 ms, total = 1.278 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 288.995 us, total = 288.995 us, Queueing time: mean = 122.868 us, max = 122.868 us, min = 122.868 us, total = 122.868 us [state-dump] Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 74.750 us, total = 74.750 us, Queueing time: mean = 225.429 us, max = 225.429 us, min = 225.429 us, total = 225.429 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 242.297 us, total = 242.297 us, Queueing time: mean = 122.977 us, max = 122.977 us, min = 122.977 us, total = 122.977 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 92.080 us, total = 92.080 us, Queueing time: mean = 36.492 us, max = 36.492 us, min = 36.492 us, total = 36.492 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 1.746 ms, total = 1.746 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] DebugString() time ms: 1 [state-dump] [state-dump] [2025-01-20 23:22:54,908 I 13636 13664] (raylet) store.cc:564: Plasma store debug dump: Current usage: 0 / 2.14748 GB - num bytes created total: 168 0 pending objects of total size 0MB - objects spillable: 0 - bytes spillable: 0 - objects unsealed: 0 - bytes unsealed: 0 - objects in use: 0 - bytes in use: 0 - objects evictable: 0 - bytes evictable: 0 - objects created by worker: 0 - bytes created by worker: 0 - objects restored: 0 - bytes restored: 0 - objects received: 0 - bytes received: 0 - objects errored: 0 - bytes errored: 0 [2025-01-20 23:22:55,980 I 13636 13636] (raylet) node_manager.cc:525: [state-dump] NodeManager: [state-dump] Node ID: 94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3 [state-dump] Node name: 192.168.0.2 [state-dump] InitialConfigResources: {object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, GPU: 20000, node:__internal_head__: 10000, accelerator_type:A40: 10000, memory: 844966629380000} [state-dump] ClusterTaskManager: [state-dump] ========== Node: 94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3 ================= [state-dump] Infeasible queue length: 0 [state-dump] Schedule queue length: 0 [state-dump] Dispatch queue length: 0 [state-dump] num_waiting_for_resource: 0 [state-dump] num_waiting_for_plasma_memory: 0 [state-dump] num_waiting_for_remote_node_resources: 0 [state-dump] num_worker_not_started_by_job_config_not_exist: 0 [state-dump] num_worker_not_started_by_registration_timeout: 0 [state-dump] num_tasks_waiting_for_workers: 0 [state-dump] num_cancelled_tasks: 0 [state-dump] cluster_resource_scheduler state: [state-dump] Local id: -2729954820020547849 Local resources: {"total":{CPU: [200000], memory: [844966629380000], accelerator_type:A40: [10000], node:192.168.0.2: [10000], node:__internal_head__: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000]}}, "available": {CPU: [200000], memory: [844966629380000], accelerator_type:A40: [10000], node:192.168.0.2: [10000], node:__internal_head__: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000]}}, "labels":{"ray.io/node_id":"94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3",} is_draining: 0 is_idle: 1 Cluster resources: node id: -2729954820020547849{"total":{object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, GPU: 20000, node:__internal_head__: 10000, accelerator_type:A40: 10000, memory: 844966629380000}}, "available": {GPU: 20000, node:__internal_head__: 10000, memory: 844966629380000, accelerator_type:A40: 10000, object_store_memory: 21474836480000, node:192.168.0.2: 10000, CPU: 200000}}, "labels":{"ray.io/node_id":"94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} [state-dump] Waiting tasks size: 0 [state-dump] Number of executing tasks: 0 [state-dump] Number of pinned task arguments: 0 [state-dump] Number of total spilled tasks: 0 [state-dump] Number of spilled waiting tasks: 0 [state-dump] Number of spilled unschedulable tasks: 0 [state-dump] Resource usage { [state-dump] } [state-dump] Backlog Size per scheduling descriptor :{workerId: num backlogs}: [state-dump] [state-dump] Running tasks by scheduling class: [state-dump] ================================================== [state-dump] [state-dump] ClusterResources: [state-dump] LocalObjectManager: [state-dump] - num pinned objects: 0 [state-dump] - pinned objects size: 0 [state-dump] - num objects pending restore: 0 [state-dump] - num objects pending spill: 0 [state-dump] - num bytes pending spill: 0 [state-dump] - num bytes currently spilled: 0 [state-dump] - cumulative spill requests: 0 [state-dump] - cumulative restore requests: 0 [state-dump] - spilled objects pending delete: 0 [state-dump] [state-dump] ObjectManager: [state-dump] - num local objects: 0 [state-dump] - num unfulfilled push requests: 0 [state-dump] - num object pull requests: 0 [state-dump] - num chunks received total: 0 [state-dump] - num chunks received failed (all): 0 [state-dump] - num chunks received failed / cancelled: 0 [state-dump] - num chunks received failed / plasma error: 0 [state-dump] Event stats: [state-dump] Global stats: 0 total (0 active) [state-dump] Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Execution time: mean = -nan s, total = 0.000 s [state-dump] Event stats: [state-dump] PushManager: [state-dump] - num pushes in flight: 0 [state-dump] - num chunks in flight: 0 [state-dump] - num chunks remaining: 0 [state-dump] - max chunks allowed: 409 [state-dump] OwnershipBasedObjectDirectory: [state-dump] - num listeners: 0 [state-dump] - cumulative location updates: 0 [state-dump] - num location updates per second: 0.000 [state-dump] - num location lookups per second: 0.000 [state-dump] - num locations added per second: 0.000 [state-dump] - num locations removed per second: 0.000 [state-dump] BufferPool: [state-dump] - create buffer state map size: 0 [state-dump] PullManager: [state-dump] - num bytes available for pulled objects: 2147483648 [state-dump] - num bytes being pulled (all): 0 [state-dump] - num bytes being pulled / pinned: 0 [state-dump] - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - first get request bundle: N/A [state-dump] - first wait request bundle: N/A [state-dump] - first task request bundle: N/A [state-dump] - num objects queued: 0 [state-dump] - num objects actively pulled (all): 0 [state-dump] - num objects actively pulled / pinned: 0 [state-dump] - num bundles being pulled: 0 [state-dump] - num pull retries: 0 [state-dump] - max timeout seconds: 0 [state-dump] - max timeout request is already processed. No entry. [state-dump] [state-dump] WorkerPool: [state-dump] - registered jobs: 1 [state-dump] - process_failed_job_config_missing: 0 [state-dump] - process_failed_rate_limited: 0 [state-dump] - process_failed_pending_registration: 0 [state-dump] - process_failed_runtime_env_setup_failed: 0 [state-dump] - num PYTHON workers: 20 [state-dump] - num PYTHON drivers: 1 [state-dump] - num PYTHON pending start requests: 0 [state-dump] - num PYTHON pending registration requests: 0 [state-dump] - num object spill callbacks queued: 0 [state-dump] - num object restore queued: 0 [state-dump] - num util functions queued: 0 [state-dump] - num idle workers: 20 [state-dump] TaskDependencyManager: [state-dump] - task deps map size: 0 [state-dump] - get req map size: 0 [state-dump] - wait req map size: 0 [state-dump] - local objects map size: 0 [state-dump] WaitManager: [state-dump] - num active wait requests: 0 [state-dump] Subscriber: [state-dump] Channel WORKER_OBJECT_LOCATIONS_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_OBJECT_EVICTION [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_REF_REMOVED_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] num async plasma notifications: 0 [state-dump] Remote node managers: [state-dump] Event stats: [state-dump] Global stats: 110232 total (35 active) [state-dump] Queueing time: mean = 649.123 us, max = 59.826 s, min = -0.000 s, total = 71.554 s [state-dump] Execution time: mean = 11.054 ms, total = 1218.536 s [state-dump] Event stats: [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog - 26439 total (0 active), Execution time: mean = 541.946 us, total = 14.329 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 26439 total (0 active), Execution time: mean = 37.059 us, total = 979.815 ms, Queueing time: mean = 113.161 us, max = 2.644 ms, min = 2.750 us, total = 2.992 s [state-dump] ObjectManager.UpdateAvailableMemory - 12588 total (0 active), Execution time: mean = 6.218 us, total = 78.271 ms, Queueing time: mean = 111.032 us, max = 1.076 ms, min = 2.228 us, total = 1.398 s [state-dump] NodeManager.CheckGC - 12588 total (1 active), Execution time: mean = 2.895 us, total = 36.442 ms, Queueing time: mean = 101.113 us, max = 25.875 ms, min = 3.123 us, total = 1.273 s [state-dump] RaySyncer.OnDemandBroadcasting - 12588 total (1 active), Execution time: mean = 11.016 us, total = 138.667 ms, Queueing time: mean = 93.908 us, max = 25.869 ms, min = 8.560 us, total = 1.182 s [state-dump] RayletWorkerPool.deadline_timer.kill_idle_workers - 6298 total (1 active), Execution time: mean = 18.849 us, total = 118.711 ms, Queueing time: mean = 79.996 us, max = 26.386 ms, min = -0.000 s, total = 503.815 ms [state-dump] MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 5030 total (1 active), Execution time: mean = 455.953 us, total = 2.293 s, Queueing time: mean = 74.952 us, max = 1.481 ms, min = -0.000 s, total = 377.008 ms [state-dump] NodeManager.deadline_timer.spill_objects_when_over_threshold - 1260 total (1 active), Execution time: mean = 3.161 us, total = 3.983 ms, Queueing time: mean = 177.058 us, max = 2.222 ms, min = 5.004 us, total = 223.093 ms [state-dump] NodeManager.deadline_timer.flush_free_objects - 1260 total (1 active), Execution time: mean = 9.075 us, total = 11.434 ms, Queueing time: mean = 173.042 us, max = 2.212 ms, min = 6.450 us, total = 218.032 ms [state-dump] NodeManager.ScheduleAndDispatchTasks - 1260 total (1 active), Execution time: mean = 16.504 us, total = 20.795 ms, Queueing time: mean = 77.434 us, max = 2.581 ms, min = 10.666 us, total = 97.567 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 1259 total (0 active), Execution time: mean = 99.916 us, total = 125.794 ms, Queueing time: mean = 114.371 us, max = 1.722 ms, min = 6.847 us, total = 143.993 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad - 1259 total (0 active), Execution time: mean = 622.428 us, total = 783.637 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ClusterResourceManager.ResetRemoteNodeView - 421 total (1 active), Execution time: mean = 11.968 us, total = 5.038 ms, Queueing time: mean = 78.159 us, max = 341.916 us, min = 14.912 us, total = 32.905 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 252 total (0 active), Execution time: mean = 1.501 ms, total = 378.174 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.deadline_timer.record_metrics - 252 total (1 active), Execution time: mean = 566.142 us, total = 142.668 ms, Queueing time: mean = 331.751 us, max = 1.419 ms, min = 11.295 us, total = 83.601 ms [state-dump] NodeManager.GcsCheckAlive - 252 total (1 active), Execution time: mean = 286.697 us, total = 72.248 ms, Queueing time: mean = 609.524 us, max = 1.682 ms, min = 6.690 us, total = 153.600 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 252 total (0 active), Execution time: mean = 52.234 us, total = 13.163 ms, Queueing time: mean = 127.202 us, max = 4.779 ms, min = 11.561 us, total = 32.055 ms [state-dump] NodeManager.deadline_timer.debug_state_dump - 126 total (1 active), Execution time: mean = 1.719 ms, total = 216.647 ms, Queueing time: mean = 68.665 us, max = 163.431 us, min = 15.806 us, total = 8.652 ms [state-dump] ClientConnection.async_read.ProcessMessageHeader - 95 total (21 active), Execution time: mean = 7.620 us, total = 723.861 us, Queueing time: mean = 660.815 ms, max = 59.826 s, min = 29.537 us, total = 62.777 s [state-dump] ClientConnection.async_read.ProcessMessage - 74 total (0 active), Execution time: mean = 752.373 us, total = 55.676 ms, Queueing time: mean = 18.898 us, max = 247.615 us, min = 4.453 us, total = 1.398 ms [state-dump] ClientConnection.async_write.DoAsyncWrites - 22 total (0 active), Execution time: mean = 2.129 us, total = 46.834 us, Queueing time: mean = 43.475 us, max = 264.443 us, min = 17.727 us, total = 956.459 us [state-dump] ObjectManager.ObjectDeleted - 21 total (0 active), Execution time: mean = 18.654 us, total = 391.728 us, Queueing time: mean = 66.932 us, max = 269.040 us, min = 24.280 us, total = 1.406 ms [state-dump] NodeManager.deadline_timer.print_event_loop_stats - 21 total (1 active, 1 running), Execution time: mean = 2.431 ms, total = 51.056 ms, Queueing time: mean = 68.844 us, max = 215.454 us, min = 15.835 us, total = 1.446 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 21 total (0 active), Execution time: mean = 119.783 us, total = 2.515 ms, Queueing time: mean = 119.709 us, max = 193.975 us, min = 26.529 us, total = 2.514 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig - 21 total (0 active), Execution time: mean = 800.451 us, total = 16.809 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ObjectManager.ObjectAdded - 21 total (0 active), Execution time: mean = 13.522 us, total = 283.961 us, Queueing time: mean = 115.200 us, max = 203.067 us, min = 15.853 us, total = 2.419 ms [state-dump] PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 204.482 us, total = 2.658 ms, Queueing time: mean = 3.109 ms, max = 9.915 ms, min = 28.359 us, total = 40.413 ms [state-dump] - 12 total (0 active), Execution time: mean = 1.109 us, total = 13.308 us, Queueing time: mean = 117.235 us, max = 207.332 us, min = 32.817 us, total = 1.407 ms [state-dump] RaySyncer.BroadcastMessage - 12 total (0 active), Execution time: mean = 229.790 us, total = 2.757 ms, Queueing time: mean = 796.500 ns, max = 1.330 us, min = 180.000 ns, total = 9.558 us [state-dump] NodeManagerService.grpc_server.RequestWorkerLease.HandleRequestImpl - 10 total (0 active), Execution time: mean = 133.400 us, total = 1.334 ms, Queueing time: mean = 133.654 us, max = 258.016 us, min = 50.206 us, total = 1.337 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker - 10 total (0 active), Execution time: mean = 698.937 us, total = 6.989 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.RequestWorkerLease - 10 total (0 active), Execution time: mean = 967.372 us, total = 9.674 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] WorkerPool.PopWorkerCallback - 10 total (0 active), Execution time: mean = 24.488 us, total = 244.878 us, Queueing time: mean = 114.997 us, max = 206.869 us, min = 23.653 us, total = 1.150 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker.HandleRequestImpl - 10 total (0 active), Execution time: mean = 129.906 us, total = 1.299 ms, Queueing time: mean = 109.463 us, max = 137.712 us, min = 31.332 us, total = 1.095 ms [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 4 total (1 active), Execution time: mean = 299.400 s, total = 1197.600 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 3 total (0 active), Execution time: mean = 294.453 us, total = 883.360 us, Queueing time: mean = 76.688 us, max = 160.959 us, min = 20.299 us, total = 230.064 us [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 1.320 ms, total = 2.640 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.GCTaskFailureReason - 2 total (1 active), Execution time: mean = 4.207 us, total = 8.414 us, Queueing time: mean = 41.935 us, max = 83.871 us, min = 83.871 us, total = 83.871 us [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 142.119 us, total = 284.237 us, Queueing time: mean = 702.200 us, max = 1.287 ms, min = 117.240 us, total = 1.404 ms [state-dump] RaySyncerRegister - 2 total (0 active), Execution time: mean = 2.053 us, total = 4.106 us, Queueing time: mean = 222.000 ns, max = 369.000 ns, min = 75.000 ns, total = 444.000 ns [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 2.178 ms, total = 2.178 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 1.022 s, total = 1.022 s, Queueing time: mean = 126.779 us, max = 126.779 us, min = 126.779 us, total = 126.779 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 43.810 us, total = 43.810 us, Queueing time: mean = 116.080 us, max = 116.080 us, min = 116.080 us, total = 116.080 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 1.935 ms, total = 1.935 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 2.002 ms, total = 2.002 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 1.278 ms, total = 1.278 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 288.995 us, total = 288.995 us, Queueing time: mean = 122.868 us, max = 122.868 us, min = 122.868 us, total = 122.868 us [state-dump] Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 74.750 us, total = 74.750 us, Queueing time: mean = 225.429 us, max = 225.429 us, min = 225.429 us, total = 225.429 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 242.297 us, total = 242.297 us, Queueing time: mean = 122.977 us, max = 122.977 us, min = 122.977 us, total = 122.977 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 92.080 us, total = 92.080 us, Queueing time: mean = 36.492 us, max = 36.492 us, min = 36.492 us, total = 36.492 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 1.746 ms, total = 1.746 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] DebugString() time ms: 1 [state-dump] [state-dump] [2025-01-20 23:23:54,909 I 13636 13664] (raylet) store.cc:564: Plasma store debug dump: Current usage: 0 / 2.14748 GB - num bytes created total: 168 0 pending objects of total size 0MB - objects spillable: 0 - bytes spillable: 0 - objects unsealed: 0 - bytes unsealed: 0 - objects in use: 0 - bytes in use: 0 - objects evictable: 0 - bytes evictable: 0 - objects created by worker: 0 - bytes created by worker: 0 - objects restored: 0 - bytes restored: 0 - objects received: 0 - bytes received: 0 - objects errored: 0 - bytes errored: 0 [2025-01-20 23:23:55,983 I 13636 13636] (raylet) node_manager.cc:525: [state-dump] NodeManager: [state-dump] Node ID: 94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3 [state-dump] Node name: 192.168.0.2 [state-dump] InitialConfigResources: {object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, GPU: 20000, node:__internal_head__: 10000, accelerator_type:A40: 10000, memory: 844966629380000} [state-dump] ClusterTaskManager: [state-dump] ========== Node: 94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3 ================= [state-dump] Infeasible queue length: 0 [state-dump] Schedule queue length: 0 [state-dump] Dispatch queue length: 0 [state-dump] num_waiting_for_resource: 0 [state-dump] num_waiting_for_plasma_memory: 0 [state-dump] num_waiting_for_remote_node_resources: 0 [state-dump] num_worker_not_started_by_job_config_not_exist: 0 [state-dump] num_worker_not_started_by_registration_timeout: 0 [state-dump] num_tasks_waiting_for_workers: 0 [state-dump] num_cancelled_tasks: 0 [state-dump] cluster_resource_scheduler state: [state-dump] Local id: -2729954820020547849 Local resources: {"total":{CPU: [200000], memory: [844966629380000], accelerator_type:A40: [10000], node:192.168.0.2: [10000], node:__internal_head__: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000]}}, "available": {CPU: [200000], memory: [844966629380000], accelerator_type:A40: [10000], node:192.168.0.2: [10000], node:__internal_head__: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000]}}, "labels":{"ray.io/node_id":"94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3",} is_draining: 0 is_idle: 1 Cluster resources: node id: -2729954820020547849{"total":{object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, GPU: 20000, node:__internal_head__: 10000, accelerator_type:A40: 10000, memory: 844966629380000}}, "available": {GPU: 20000, node:__internal_head__: 10000, memory: 844966629380000, accelerator_type:A40: 10000, object_store_memory: 21474836480000, node:192.168.0.2: 10000, CPU: 200000}}, "labels":{"ray.io/node_id":"94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} [state-dump] Waiting tasks size: 0 [state-dump] Number of executing tasks: 0 [state-dump] Number of pinned task arguments: 0 [state-dump] Number of total spilled tasks: 0 [state-dump] Number of spilled waiting tasks: 0 [state-dump] Number of spilled unschedulable tasks: 0 [state-dump] Resource usage { [state-dump] } [state-dump] Backlog Size per scheduling descriptor :{workerId: num backlogs}: [state-dump] [state-dump] Running tasks by scheduling class: [state-dump] ================================================== [state-dump] [state-dump] ClusterResources: [state-dump] LocalObjectManager: [state-dump] - num pinned objects: 0 [state-dump] - pinned objects size: 0 [state-dump] - num objects pending restore: 0 [state-dump] - num objects pending spill: 0 [state-dump] - num bytes pending spill: 0 [state-dump] - num bytes currently spilled: 0 [state-dump] - cumulative spill requests: 0 [state-dump] - cumulative restore requests: 0 [state-dump] - spilled objects pending delete: 0 [state-dump] [state-dump] ObjectManager: [state-dump] - num local objects: 0 [state-dump] - num unfulfilled push requests: 0 [state-dump] - num object pull requests: 0 [state-dump] - num chunks received total: 0 [state-dump] - num chunks received failed (all): 0 [state-dump] - num chunks received failed / cancelled: 0 [state-dump] - num chunks received failed / plasma error: 0 [state-dump] Event stats: [state-dump] Global stats: 0 total (0 active) [state-dump] Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Execution time: mean = -nan s, total = 0.000 s [state-dump] Event stats: [state-dump] PushManager: [state-dump] - num pushes in flight: 0 [state-dump] - num chunks in flight: 0 [state-dump] - num chunks remaining: 0 [state-dump] - max chunks allowed: 409 [state-dump] OwnershipBasedObjectDirectory: [state-dump] - num listeners: 0 [state-dump] - cumulative location updates: 0 [state-dump] - num location updates per second: 0.000 [state-dump] - num location lookups per second: 0.000 [state-dump] - num locations added per second: 0.000 [state-dump] - num locations removed per second: 0.000 [state-dump] BufferPool: [state-dump] - create buffer state map size: 0 [state-dump] PullManager: [state-dump] - num bytes available for pulled objects: 2147483648 [state-dump] - num bytes being pulled (all): 0 [state-dump] - num bytes being pulled / pinned: 0 [state-dump] - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - first get request bundle: N/A [state-dump] - first wait request bundle: N/A [state-dump] - first task request bundle: N/A [state-dump] - num objects queued: 0 [state-dump] - num objects actively pulled (all): 0 [state-dump] - num objects actively pulled / pinned: 0 [state-dump] - num bundles being pulled: 0 [state-dump] - num pull retries: 0 [state-dump] - max timeout seconds: 0 [state-dump] - max timeout request is already processed. No entry. [state-dump] [state-dump] WorkerPool: [state-dump] - registered jobs: 1 [state-dump] - process_failed_job_config_missing: 0 [state-dump] - process_failed_rate_limited: 0 [state-dump] - process_failed_pending_registration: 0 [state-dump] - process_failed_runtime_env_setup_failed: 0 [state-dump] - num PYTHON workers: 20 [state-dump] - num PYTHON drivers: 1 [state-dump] - num PYTHON pending start requests: 0 [state-dump] - num PYTHON pending registration requests: 0 [state-dump] - num object spill callbacks queued: 0 [state-dump] - num object restore queued: 0 [state-dump] - num util functions queued: 0 [state-dump] - num idle workers: 20 [state-dump] TaskDependencyManager: [state-dump] - task deps map size: 0 [state-dump] - get req map size: 0 [state-dump] - wait req map size: 0 [state-dump] - local objects map size: 0 [state-dump] WaitManager: [state-dump] - num active wait requests: 0 [state-dump] Subscriber: [state-dump] Channel WORKER_OBJECT_LOCATIONS_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_OBJECT_EVICTION [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_REF_REMOVED_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] num async plasma notifications: 0 [state-dump] Remote node managers: [state-dump] Event stats: [state-dump] Global stats: 115464 total (35 active) [state-dump] Queueing time: mean = 623.384 us, max = 59.826 s, min = -0.000 s, total = 71.978 s [state-dump] Execution time: mean = 10.562 ms, total = 1219.492 s [state-dump] Event stats: [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog - 27699 total (0 active), Execution time: mean = 542.337 us, total = 15.022 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 27699 total (0 active), Execution time: mean = 36.910 us, total = 1.022 s, Queueing time: mean = 113.282 us, max = 2.644 ms, min = 2.750 us, total = 3.138 s [state-dump] ObjectManager.UpdateAvailableMemory - 13187 total (0 active), Execution time: mean = 6.221 us, total = 82.040 ms, Queueing time: mean = 111.248 us, max = 1.076 ms, min = 2.228 us, total = 1.467 s [state-dump] NodeManager.CheckGC - 13187 total (1 active), Execution time: mean = 2.893 us, total = 38.148 ms, Queueing time: mean = 101.085 us, max = 25.875 ms, min = 3.123 us, total = 1.333 s [state-dump] RaySyncer.OnDemandBroadcasting - 13187 total (1 active), Execution time: mean = 10.984 us, total = 144.840 ms, Queueing time: mean = 93.909 us, max = 25.869 ms, min = 8.560 us, total = 1.238 s [state-dump] RayletWorkerPool.deadline_timer.kill_idle_workers - 6598 total (1 active), Execution time: mean = 18.824 us, total = 124.202 ms, Queueing time: mean = 79.916 us, max = 26.386 ms, min = -0.000 s, total = 527.283 ms [state-dump] MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 5270 total (1 active), Execution time: mean = 456.111 us, total = 2.404 s, Queueing time: mean = 75.018 us, max = 1.481 ms, min = -0.000 s, total = 395.346 ms [state-dump] NodeManager.deadline_timer.spill_objects_when_over_threshold - 1320 total (1 active), Execution time: mean = 3.156 us, total = 4.166 ms, Queueing time: mean = 177.681 us, max = 2.222 ms, min = 5.004 us, total = 234.539 ms [state-dump] NodeManager.deadline_timer.flush_free_objects - 1320 total (1 active), Execution time: mean = 9.062 us, total = 11.962 ms, Queueing time: mean = 173.667 us, max = 2.212 ms, min = 6.450 us, total = 229.240 ms [state-dump] NodeManager.ScheduleAndDispatchTasks - 1320 total (1 active), Execution time: mean = 16.538 us, total = 21.830 ms, Queueing time: mean = 77.439 us, max = 2.581 ms, min = 10.666 us, total = 102.220 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 1319 total (0 active), Execution time: mean = 99.996 us, total = 131.894 ms, Queueing time: mean = 114.686 us, max = 1.722 ms, min = 6.847 us, total = 151.271 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad - 1319 total (0 active), Execution time: mean = 623.906 us, total = 822.932 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ClusterResourceManager.ResetRemoteNodeView - 441 total (1 active), Execution time: mean = 11.833 us, total = 5.218 ms, Queueing time: mean = 78.143 us, max = 341.916 us, min = 14.912 us, total = 34.461 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 264 total (0 active), Execution time: mean = 1.505 ms, total = 397.296 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.deadline_timer.record_metrics - 264 total (1 active), Execution time: mean = 566.904 us, total = 149.663 ms, Queueing time: mean = 334.497 us, max = 1.419 ms, min = 11.295 us, total = 88.307 ms [state-dump] NodeManager.GcsCheckAlive - 264 total (1 active), Execution time: mean = 288.155 us, total = 76.073 ms, Queueing time: mean = 611.628 us, max = 1.682 ms, min = 6.690 us, total = 161.470 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 264 total (0 active), Execution time: mean = 52.481 us, total = 13.855 ms, Queueing time: mean = 126.758 us, max = 4.779 ms, min = 11.561 us, total = 33.464 ms [state-dump] NodeManager.deadline_timer.debug_state_dump - 132 total (1 active), Execution time: mean = 1.727 ms, total = 227.937 ms, Queueing time: mean = 68.715 us, max = 163.431 us, min = 15.806 us, total = 9.070 ms [state-dump] ClientConnection.async_read.ProcessMessageHeader - 95 total (21 active), Execution time: mean = 7.620 us, total = 723.861 us, Queueing time: mean = 660.815 ms, max = 59.826 s, min = 29.537 us, total = 62.777 s [state-dump] ClientConnection.async_read.ProcessMessage - 74 total (0 active), Execution time: mean = 752.373 us, total = 55.676 ms, Queueing time: mean = 18.898 us, max = 247.615 us, min = 4.453 us, total = 1.398 ms [state-dump] NodeManager.deadline_timer.print_event_loop_stats - 22 total (1 active, 1 running), Execution time: mean = 2.449 ms, total = 53.887 ms, Queueing time: mean = 72.871 us, max = 215.454 us, min = 15.835 us, total = 1.603 ms [state-dump] ClientConnection.async_write.DoAsyncWrites - 22 total (0 active), Execution time: mean = 2.129 us, total = 46.834 us, Queueing time: mean = 43.475 us, max = 264.443 us, min = 17.727 us, total = 956.459 us [state-dump] ObjectManager.ObjectDeleted - 21 total (0 active), Execution time: mean = 18.654 us, total = 391.728 us, Queueing time: mean = 66.932 us, max = 269.040 us, min = 24.280 us, total = 1.406 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 21 total (0 active), Execution time: mean = 119.783 us, total = 2.515 ms, Queueing time: mean = 119.709 us, max = 193.975 us, min = 26.529 us, total = 2.514 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig - 21 total (0 active), Execution time: mean = 800.451 us, total = 16.809 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ObjectManager.ObjectAdded - 21 total (0 active), Execution time: mean = 13.522 us, total = 283.961 us, Queueing time: mean = 115.200 us, max = 203.067 us, min = 15.853 us, total = 2.419 ms [state-dump] PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 204.482 us, total = 2.658 ms, Queueing time: mean = 3.109 ms, max = 9.915 ms, min = 28.359 us, total = 40.413 ms [state-dump] - 12 total (0 active), Execution time: mean = 1.109 us, total = 13.308 us, Queueing time: mean = 117.235 us, max = 207.332 us, min = 32.817 us, total = 1.407 ms [state-dump] RaySyncer.BroadcastMessage - 12 total (0 active), Execution time: mean = 229.790 us, total = 2.757 ms, Queueing time: mean = 796.500 ns, max = 1.330 us, min = 180.000 ns, total = 9.558 us [state-dump] NodeManagerService.grpc_server.RequestWorkerLease.HandleRequestImpl - 10 total (0 active), Execution time: mean = 133.400 us, total = 1.334 ms, Queueing time: mean = 133.654 us, max = 258.016 us, min = 50.206 us, total = 1.337 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker - 10 total (0 active), Execution time: mean = 698.937 us, total = 6.989 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.RequestWorkerLease - 10 total (0 active), Execution time: mean = 967.372 us, total = 9.674 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] WorkerPool.PopWorkerCallback - 10 total (0 active), Execution time: mean = 24.488 us, total = 244.878 us, Queueing time: mean = 114.997 us, max = 206.869 us, min = 23.653 us, total = 1.150 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker.HandleRequestImpl - 10 total (0 active), Execution time: mean = 129.906 us, total = 1.299 ms, Queueing time: mean = 109.463 us, max = 137.712 us, min = 31.332 us, total = 1.095 ms [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 4 total (1 active), Execution time: mean = 299.400 s, total = 1197.600 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 3 total (0 active), Execution time: mean = 294.453 us, total = 883.360 us, Queueing time: mean = 76.688 us, max = 160.959 us, min = 20.299 us, total = 230.064 us [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 1.320 ms, total = 2.640 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.GCTaskFailureReason - 2 total (1 active), Execution time: mean = 4.207 us, total = 8.414 us, Queueing time: mean = 41.935 us, max = 83.871 us, min = 83.871 us, total = 83.871 us [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 142.119 us, total = 284.237 us, Queueing time: mean = 702.200 us, max = 1.287 ms, min = 117.240 us, total = 1.404 ms [state-dump] RaySyncerRegister - 2 total (0 active), Execution time: mean = 2.053 us, total = 4.106 us, Queueing time: mean = 222.000 ns, max = 369.000 ns, min = 75.000 ns, total = 444.000 ns [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 2.178 ms, total = 2.178 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 1.022 s, total = 1.022 s, Queueing time: mean = 126.779 us, max = 126.779 us, min = 126.779 us, total = 126.779 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 43.810 us, total = 43.810 us, Queueing time: mean = 116.080 us, max = 116.080 us, min = 116.080 us, total = 116.080 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 1.935 ms, total = 1.935 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 2.002 ms, total = 2.002 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 1.278 ms, total = 1.278 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 288.995 us, total = 288.995 us, Queueing time: mean = 122.868 us, max = 122.868 us, min = 122.868 us, total = 122.868 us [state-dump] Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 74.750 us, total = 74.750 us, Queueing time: mean = 225.429 us, max = 225.429 us, min = 225.429 us, total = 225.429 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 242.297 us, total = 242.297 us, Queueing time: mean = 122.977 us, max = 122.977 us, min = 122.977 us, total = 122.977 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 92.080 us, total = 92.080 us, Queueing time: mean = 36.492 us, max = 36.492 us, min = 36.492 us, total = 36.492 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 1.746 ms, total = 1.746 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] DebugString() time ms: 1 [state-dump] [state-dump] [2025-01-20 23:24:54,909 I 13636 13664] (raylet) store.cc:564: Plasma store debug dump: Current usage: 0 / 2.14748 GB - num bytes created total: 168 0 pending objects of total size 0MB - objects spillable: 0 - bytes spillable: 0 - objects unsealed: 0 - bytes unsealed: 0 - objects in use: 0 - bytes in use: 0 - objects evictable: 0 - bytes evictable: 0 - objects created by worker: 0 - bytes created by worker: 0 - objects restored: 0 - bytes restored: 0 - objects received: 0 - bytes received: 0 - objects errored: 0 - bytes errored: 0 [2025-01-20 23:24:55,986 I 13636 13636] (raylet) node_manager.cc:525: [state-dump] NodeManager: [state-dump] Node ID: 94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3 [state-dump] Node name: 192.168.0.2 [state-dump] InitialConfigResources: {object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, GPU: 20000, node:__internal_head__: 10000, accelerator_type:A40: 10000, memory: 844966629380000} [state-dump] ClusterTaskManager: [state-dump] ========== Node: 94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3 ================= [state-dump] Infeasible queue length: 0 [state-dump] Schedule queue length: 0 [state-dump] Dispatch queue length: 0 [state-dump] num_waiting_for_resource: 0 [state-dump] num_waiting_for_plasma_memory: 0 [state-dump] num_waiting_for_remote_node_resources: 0 [state-dump] num_worker_not_started_by_job_config_not_exist: 0 [state-dump] num_worker_not_started_by_registration_timeout: 0 [state-dump] num_tasks_waiting_for_workers: 0 [state-dump] num_cancelled_tasks: 0 [state-dump] cluster_resource_scheduler state: [state-dump] Local id: -2729954820020547849 Local resources: {"total":{CPU: [200000], memory: [844966629380000], accelerator_type:A40: [10000], node:192.168.0.2: [10000], node:__internal_head__: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000]}}, "available": {CPU: [200000], memory: [844966629380000], accelerator_type:A40: [10000], node:192.168.0.2: [10000], node:__internal_head__: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000]}}, "labels":{"ray.io/node_id":"94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3",} is_draining: 0 is_idle: 1 Cluster resources: node id: -2729954820020547849{"total":{object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, GPU: 20000, node:__internal_head__: 10000, accelerator_type:A40: 10000, memory: 844966629380000}}, "available": {GPU: 20000, node:__internal_head__: 10000, memory: 844966629380000, accelerator_type:A40: 10000, object_store_memory: 21474836480000, node:192.168.0.2: 10000, CPU: 200000}}, "labels":{"ray.io/node_id":"94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} [state-dump] Waiting tasks size: 0 [state-dump] Number of executing tasks: 0 [state-dump] Number of pinned task arguments: 0 [state-dump] Number of total spilled tasks: 0 [state-dump] Number of spilled waiting tasks: 0 [state-dump] Number of spilled unschedulable tasks: 0 [state-dump] Resource usage { [state-dump] } [state-dump] Backlog Size per scheduling descriptor :{workerId: num backlogs}: [state-dump] [state-dump] Running tasks by scheduling class: [state-dump] ================================================== [state-dump] [state-dump] ClusterResources: [state-dump] LocalObjectManager: [state-dump] - num pinned objects: 0 [state-dump] - pinned objects size: 0 [state-dump] - num objects pending restore: 0 [state-dump] - num objects pending spill: 0 [state-dump] - num bytes pending spill: 0 [state-dump] - num bytes currently spilled: 0 [state-dump] - cumulative spill requests: 0 [state-dump] - cumulative restore requests: 0 [state-dump] - spilled objects pending delete: 0 [state-dump] [state-dump] ObjectManager: [state-dump] - num local objects: 0 [state-dump] - num unfulfilled push requests: 0 [state-dump] - num object pull requests: 0 [state-dump] - num chunks received total: 0 [state-dump] - num chunks received failed (all): 0 [state-dump] - num chunks received failed / cancelled: 0 [state-dump] - num chunks received failed / plasma error: 0 [state-dump] Event stats: [state-dump] Global stats: 0 total (0 active) [state-dump] Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Execution time: mean = -nan s, total = 0.000 s [state-dump] Event stats: [state-dump] PushManager: [state-dump] - num pushes in flight: 0 [state-dump] - num chunks in flight: 0 [state-dump] - num chunks remaining: 0 [state-dump] - max chunks allowed: 409 [state-dump] OwnershipBasedObjectDirectory: [state-dump] - num listeners: 0 [state-dump] - cumulative location updates: 0 [state-dump] - num location updates per second: 0.000 [state-dump] - num location lookups per second: 0.000 [state-dump] - num locations added per second: 0.000 [state-dump] - num locations removed per second: 0.000 [state-dump] BufferPool: [state-dump] - create buffer state map size: 0 [state-dump] PullManager: [state-dump] - num bytes available for pulled objects: 2147483648 [state-dump] - num bytes being pulled (all): 0 [state-dump] - num bytes being pulled / pinned: 0 [state-dump] - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - first get request bundle: N/A [state-dump] - first wait request bundle: N/A [state-dump] - first task request bundle: N/A [state-dump] - num objects queued: 0 [state-dump] - num objects actively pulled (all): 0 [state-dump] - num objects actively pulled / pinned: 0 [state-dump] - num bundles being pulled: 0 [state-dump] - num pull retries: 0 [state-dump] - max timeout seconds: 0 [state-dump] - max timeout request is already processed. No entry. [state-dump] [state-dump] WorkerPool: [state-dump] - registered jobs: 1 [state-dump] - process_failed_job_config_missing: 0 [state-dump] - process_failed_rate_limited: 0 [state-dump] - process_failed_pending_registration: 0 [state-dump] - process_failed_runtime_env_setup_failed: 0 [state-dump] - num PYTHON workers: 20 [state-dump] - num PYTHON drivers: 1 [state-dump] - num PYTHON pending start requests: 0 [state-dump] - num PYTHON pending registration requests: 0 [state-dump] - num object spill callbacks queued: 0 [state-dump] - num object restore queued: 0 [state-dump] - num util functions queued: 0 [state-dump] - num idle workers: 20 [state-dump] TaskDependencyManager: [state-dump] - task deps map size: 0 [state-dump] - get req map size: 0 [state-dump] - wait req map size: 0 [state-dump] - local objects map size: 0 [state-dump] WaitManager: [state-dump] - num active wait requests: 0 [state-dump] Subscriber: [state-dump] Channel WORKER_OBJECT_LOCATIONS_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_OBJECT_EVICTION [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_REF_REMOVED_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] num async plasma notifications: 0 [state-dump] Remote node managers: [state-dump] Event stats: [state-dump] Global stats: 120698 total (35 active) [state-dump] Queueing time: mean = 599.663 us, max = 59.826 s, min = -0.000 s, total = 72.378 s [state-dump] Execution time: mean = 10.111 ms, total = 1220.416 s [state-dump] Event stats: [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog - 28959 total (0 active), Execution time: mean = 541.894 us, total = 15.693 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 28959 total (0 active), Execution time: mean = 36.770 us, total = 1.065 s, Queueing time: mean = 113.263 us, max = 2.644 ms, min = 2.750 us, total = 3.280 s [state-dump] ObjectManager.UpdateAvailableMemory - 13787 total (0 active), Execution time: mean = 6.203 us, total = 85.523 ms, Queueing time: mean = 111.026 us, max = 1.076 ms, min = 2.228 us, total = 1.531 s [state-dump] NodeManager.CheckGC - 13787 total (1 active), Execution time: mean = 2.880 us, total = 39.710 ms, Queueing time: mean = 100.709 us, max = 25.875 ms, min = 3.123 us, total = 1.388 s [state-dump] RaySyncer.OnDemandBroadcasting - 13787 total (1 active), Execution time: mean = 10.912 us, total = 150.440 ms, Queueing time: mean = 93.588 us, max = 25.869 ms, min = 8.560 us, total = 1.290 s [state-dump] RayletWorkerPool.deadline_timer.kill_idle_workers - 6898 total (1 active), Execution time: mean = 18.738 us, total = 129.252 ms, Queueing time: mean = 79.543 us, max = 26.386 ms, min = -0.000 s, total = 548.685 ms [state-dump] MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 5509 total (1 active), Execution time: mean = 455.481 us, total = 2.509 s, Queueing time: mean = 74.934 us, max = 1.481 ms, min = -0.000 s, total = 412.812 ms [state-dump] NodeManager.deadline_timer.spill_objects_when_over_threshold - 1380 total (1 active), Execution time: mean = 3.144 us, total = 4.339 ms, Queueing time: mean = 177.978 us, max = 2.222 ms, min = 5.004 us, total = 245.610 ms [state-dump] NodeManager.deadline_timer.flush_free_objects - 1380 total (1 active), Execution time: mean = 9.037 us, total = 12.471 ms, Queueing time: mean = 173.961 us, max = 2.212 ms, min = 6.450 us, total = 240.067 ms [state-dump] NodeManager.ScheduleAndDispatchTasks - 1380 total (1 active), Execution time: mean = 16.514 us, total = 22.790 ms, Queueing time: mean = 76.887 us, max = 2.581 ms, min = 10.666 us, total = 106.104 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 1379 total (0 active), Execution time: mean = 99.588 us, total = 137.331 ms, Queueing time: mean = 114.849 us, max = 1.722 ms, min = 6.847 us, total = 158.376 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad - 1379 total (0 active), Execution time: mean = 624.119 us, total = 860.661 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ClusterResourceManager.ResetRemoteNodeView - 461 total (1 active), Execution time: mean = 11.736 us, total = 5.410 ms, Queueing time: mean = 77.756 us, max = 341.916 us, min = 14.912 us, total = 35.846 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 276 total (0 active), Execution time: mean = 1.512 ms, total = 417.177 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.deadline_timer.record_metrics - 276 total (1 active), Execution time: mean = 567.894 us, total = 156.739 ms, Queueing time: mean = 334.704 us, max = 1.419 ms, min = 11.295 us, total = 92.378 ms [state-dump] NodeManager.GcsCheckAlive - 276 total (1 active), Execution time: mean = 289.263 us, total = 79.836 ms, Queueing time: mean = 612.017 us, max = 1.682 ms, min = 6.690 us, total = 168.917 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 276 total (0 active), Execution time: mean = 52.523 us, total = 14.496 ms, Queueing time: mean = 126.038 us, max = 4.779 ms, min = 11.561 us, total = 34.786 ms [state-dump] NodeManager.deadline_timer.debug_state_dump - 138 total (1 active), Execution time: mean = 1.729 ms, total = 238.649 ms, Queueing time: mean = 68.343 us, max = 163.431 us, min = 15.806 us, total = 9.431 ms [state-dump] ClientConnection.async_read.ProcessMessageHeader - 95 total (21 active), Execution time: mean = 7.620 us, total = 723.861 us, Queueing time: mean = 660.815 ms, max = 59.826 s, min = 29.537 us, total = 62.777 s [state-dump] ClientConnection.async_read.ProcessMessage - 74 total (0 active), Execution time: mean = 752.373 us, total = 55.676 ms, Queueing time: mean = 18.898 us, max = 247.615 us, min = 4.453 us, total = 1.398 ms [state-dump] NodeManager.deadline_timer.print_event_loop_stats - 23 total (1 active, 1 running), Execution time: mean = 2.474 ms, total = 56.912 ms, Queueing time: mean = 73.448 us, max = 215.454 us, min = 15.835 us, total = 1.689 ms [state-dump] ClientConnection.async_write.DoAsyncWrites - 22 total (0 active), Execution time: mean = 2.129 us, total = 46.834 us, Queueing time: mean = 43.475 us, max = 264.443 us, min = 17.727 us, total = 956.459 us [state-dump] ObjectManager.ObjectDeleted - 21 total (0 active), Execution time: mean = 18.654 us, total = 391.728 us, Queueing time: mean = 66.932 us, max = 269.040 us, min = 24.280 us, total = 1.406 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 21 total (0 active), Execution time: mean = 119.783 us, total = 2.515 ms, Queueing time: mean = 119.709 us, max = 193.975 us, min = 26.529 us, total = 2.514 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig - 21 total (0 active), Execution time: mean = 800.451 us, total = 16.809 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ObjectManager.ObjectAdded - 21 total (0 active), Execution time: mean = 13.522 us, total = 283.961 us, Queueing time: mean = 115.200 us, max = 203.067 us, min = 15.853 us, total = 2.419 ms [state-dump] PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 204.482 us, total = 2.658 ms, Queueing time: mean = 3.109 ms, max = 9.915 ms, min = 28.359 us, total = 40.413 ms [state-dump] - 12 total (0 active), Execution time: mean = 1.109 us, total = 13.308 us, Queueing time: mean = 117.235 us, max = 207.332 us, min = 32.817 us, total = 1.407 ms [state-dump] RaySyncer.BroadcastMessage - 12 total (0 active), Execution time: mean = 229.790 us, total = 2.757 ms, Queueing time: mean = 796.500 ns, max = 1.330 us, min = 180.000 ns, total = 9.558 us [state-dump] NodeManagerService.grpc_server.RequestWorkerLease.HandleRequestImpl - 10 total (0 active), Execution time: mean = 133.400 us, total = 1.334 ms, Queueing time: mean = 133.654 us, max = 258.016 us, min = 50.206 us, total = 1.337 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker - 10 total (0 active), Execution time: mean = 698.937 us, total = 6.989 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.RequestWorkerLease - 10 total (0 active), Execution time: mean = 967.372 us, total = 9.674 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] WorkerPool.PopWorkerCallback - 10 total (0 active), Execution time: mean = 24.488 us, total = 244.878 us, Queueing time: mean = 114.997 us, max = 206.869 us, min = 23.653 us, total = 1.150 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker.HandleRequestImpl - 10 total (0 active), Execution time: mean = 129.906 us, total = 1.299 ms, Queueing time: mean = 109.463 us, max = 137.712 us, min = 31.332 us, total = 1.095 ms [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 4 total (1 active), Execution time: mean = 299.400 s, total = 1197.600 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 3 total (0 active), Execution time: mean = 294.453 us, total = 883.360 us, Queueing time: mean = 76.688 us, max = 160.959 us, min = 20.299 us, total = 230.064 us [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 1.320 ms, total = 2.640 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.GCTaskFailureReason - 2 total (1 active), Execution time: mean = 4.207 us, total = 8.414 us, Queueing time: mean = 41.935 us, max = 83.871 us, min = 83.871 us, total = 83.871 us [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 142.119 us, total = 284.237 us, Queueing time: mean = 702.200 us, max = 1.287 ms, min = 117.240 us, total = 1.404 ms [state-dump] RaySyncerRegister - 2 total (0 active), Execution time: mean = 2.053 us, total = 4.106 us, Queueing time: mean = 222.000 ns, max = 369.000 ns, min = 75.000 ns, total = 444.000 ns [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 2.178 ms, total = 2.178 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 1.022 s, total = 1.022 s, Queueing time: mean = 126.779 us, max = 126.779 us, min = 126.779 us, total = 126.779 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 43.810 us, total = 43.810 us, Queueing time: mean = 116.080 us, max = 116.080 us, min = 116.080 us, total = 116.080 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 1.935 ms, total = 1.935 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 2.002 ms, total = 2.002 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 1.278 ms, total = 1.278 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 288.995 us, total = 288.995 us, Queueing time: mean = 122.868 us, max = 122.868 us, min = 122.868 us, total = 122.868 us [state-dump] Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 74.750 us, total = 74.750 us, Queueing time: mean = 225.429 us, max = 225.429 us, min = 225.429 us, total = 225.429 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 242.297 us, total = 242.297 us, Queueing time: mean = 122.977 us, max = 122.977 us, min = 122.977 us, total = 122.977 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 92.080 us, total = 92.080 us, Queueing time: mean = 36.492 us, max = 36.492 us, min = 36.492 us, total = 36.492 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 1.746 ms, total = 1.746 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] DebugString() time ms: 1 [state-dump] [state-dump] [2025-01-20 23:25:54,909 I 13636 13664] (raylet) store.cc:564: Plasma store debug dump: Current usage: 0 / 2.14748 GB - num bytes created total: 168 0 pending objects of total size 0MB - objects spillable: 0 - bytes spillable: 0 - objects unsealed: 0 - bytes unsealed: 0 - objects in use: 0 - bytes in use: 0 - objects evictable: 0 - bytes evictable: 0 - objects created by worker: 0 - bytes created by worker: 0 - objects restored: 0 - bytes restored: 0 - objects received: 0 - bytes received: 0 - objects errored: 0 - bytes errored: 0 [2025-01-20 23:25:55,989 I 13636 13636] (raylet) node_manager.cc:525: [state-dump] NodeManager: [state-dump] Node ID: 94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3 [state-dump] Node name: 192.168.0.2 [state-dump] InitialConfigResources: {object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, GPU: 20000, node:__internal_head__: 10000, accelerator_type:A40: 10000, memory: 844966629380000} [state-dump] ClusterTaskManager: [state-dump] ========== Node: 94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3 ================= [state-dump] Infeasible queue length: 0 [state-dump] Schedule queue length: 0 [state-dump] Dispatch queue length: 0 [state-dump] num_waiting_for_resource: 0 [state-dump] num_waiting_for_plasma_memory: 0 [state-dump] num_waiting_for_remote_node_resources: 0 [state-dump] num_worker_not_started_by_job_config_not_exist: 0 [state-dump] num_worker_not_started_by_registration_timeout: 0 [state-dump] num_tasks_waiting_for_workers: 0 [state-dump] num_cancelled_tasks: 0 [state-dump] cluster_resource_scheduler state: [state-dump] Local id: -2729954820020547849 Local resources: {"total":{CPU: [200000], memory: [844966629380000], accelerator_type:A40: [10000], node:192.168.0.2: [10000], node:__internal_head__: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000]}}, "available": {CPU: [200000], memory: [844966629380000], accelerator_type:A40: [10000], node:192.168.0.2: [10000], node:__internal_head__: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000]}}, "labels":{"ray.io/node_id":"94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3",} is_draining: 0 is_idle: 1 Cluster resources: node id: -2729954820020547849{"total":{object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, GPU: 20000, node:__internal_head__: 10000, accelerator_type:A40: 10000, memory: 844966629380000}}, "available": {GPU: 20000, node:__internal_head__: 10000, memory: 844966629380000, accelerator_type:A40: 10000, object_store_memory: 21474836480000, node:192.168.0.2: 10000, CPU: 200000}}, "labels":{"ray.io/node_id":"94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} [state-dump] Waiting tasks size: 0 [state-dump] Number of executing tasks: 0 [state-dump] Number of pinned task arguments: 0 [state-dump] Number of total spilled tasks: 0 [state-dump] Number of spilled waiting tasks: 0 [state-dump] Number of spilled unschedulable tasks: 0 [state-dump] Resource usage { [state-dump] } [state-dump] Backlog Size per scheduling descriptor :{workerId: num backlogs}: [state-dump] [state-dump] Running tasks by scheduling class: [state-dump] ================================================== [state-dump] [state-dump] ClusterResources: [state-dump] LocalObjectManager: [state-dump] - num pinned objects: 0 [state-dump] - pinned objects size: 0 [state-dump] - num objects pending restore: 0 [state-dump] - num objects pending spill: 0 [state-dump] - num bytes pending spill: 0 [state-dump] - num bytes currently spilled: 0 [state-dump] - cumulative spill requests: 0 [state-dump] - cumulative restore requests: 0 [state-dump] - spilled objects pending delete: 0 [state-dump] [state-dump] ObjectManager: [state-dump] - num local objects: 0 [state-dump] - num unfulfilled push requests: 0 [state-dump] - num object pull requests: 0 [state-dump] - num chunks received total: 0 [state-dump] - num chunks received failed (all): 0 [state-dump] - num chunks received failed / cancelled: 0 [state-dump] - num chunks received failed / plasma error: 0 [state-dump] Event stats: [state-dump] Global stats: 0 total (0 active) [state-dump] Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Execution time: mean = -nan s, total = 0.000 s [state-dump] Event stats: [state-dump] PushManager: [state-dump] - num pushes in flight: 0 [state-dump] - num chunks in flight: 0 [state-dump] - num chunks remaining: 0 [state-dump] - max chunks allowed: 409 [state-dump] OwnershipBasedObjectDirectory: [state-dump] - num listeners: 0 [state-dump] - cumulative location updates: 0 [state-dump] - num location updates per second: 0.000 [state-dump] - num location lookups per second: 0.000 [state-dump] - num locations added per second: 0.000 [state-dump] - num locations removed per second: 0.000 [state-dump] BufferPool: [state-dump] - create buffer state map size: 0 [state-dump] PullManager: [state-dump] - num bytes available for pulled objects: 2147483648 [state-dump] - num bytes being pulled (all): 0 [state-dump] - num bytes being pulled / pinned: 0 [state-dump] - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - first get request bundle: N/A [state-dump] - first wait request bundle: N/A [state-dump] - first task request bundle: N/A [state-dump] - num objects queued: 0 [state-dump] - num objects actively pulled (all): 0 [state-dump] - num objects actively pulled / pinned: 0 [state-dump] - num bundles being pulled: 0 [state-dump] - num pull retries: 0 [state-dump] - max timeout seconds: 0 [state-dump] - max timeout request is already processed. No entry. [state-dump] [state-dump] WorkerPool: [state-dump] - registered jobs: 1 [state-dump] - process_failed_job_config_missing: 0 [state-dump] - process_failed_rate_limited: 0 [state-dump] - process_failed_pending_registration: 0 [state-dump] - process_failed_runtime_env_setup_failed: 0 [state-dump] - num PYTHON workers: 20 [state-dump] - num PYTHON drivers: 1 [state-dump] - num PYTHON pending start requests: 0 [state-dump] - num PYTHON pending registration requests: 0 [state-dump] - num object spill callbacks queued: 0 [state-dump] - num object restore queued: 0 [state-dump] - num util functions queued: 0 [state-dump] - num idle workers: 20 [state-dump] TaskDependencyManager: [state-dump] - task deps map size: 0 [state-dump] - get req map size: 0 [state-dump] - wait req map size: 0 [state-dump] - local objects map size: 0 [state-dump] WaitManager: [state-dump] - num active wait requests: 0 [state-dump] Subscriber: [state-dump] Channel WORKER_OBJECT_LOCATIONS_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_OBJECT_EVICTION [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_REF_REMOVED_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] num async plasma notifications: 0 [state-dump] Remote node managers: [state-dump] Event stats: [state-dump] Global stats: 125929 total (35 active) [state-dump] Queueing time: mean = 577.908 us, max = 59.826 s, min = -0.000 s, total = 72.775 s [state-dump] Execution time: mean = 9.699 ms, total = 1221.328 s [state-dump] Event stats: [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog - 30219 total (0 active), Execution time: mean = 541.090 us, total = 16.351 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 30219 total (0 active), Execution time: mean = 36.584 us, total = 1.106 s, Queueing time: mean = 113.116 us, max = 2.644 ms, min = 2.750 us, total = 3.418 s [state-dump] ObjectManager.UpdateAvailableMemory - 14386 total (0 active), Execution time: mean = 6.184 us, total = 88.968 ms, Queueing time: mean = 110.900 us, max = 1.076 ms, min = 2.228 us, total = 1.595 s [state-dump] NodeManager.CheckGC - 14386 total (1 active), Execution time: mean = 2.876 us, total = 41.368 ms, Queueing time: mean = 100.347 us, max = 25.875 ms, min = 3.123 us, total = 1.444 s [state-dump] RaySyncer.OnDemandBroadcasting - 14386 total (1 active), Execution time: mean = 10.863 us, total = 156.277 ms, Queueing time: mean = 93.267 us, max = 25.869 ms, min = 8.560 us, total = 1.342 s [state-dump] RayletWorkerPool.deadline_timer.kill_idle_workers - 7197 total (1 active), Execution time: mean = 18.645 us, total = 134.185 ms, Queueing time: mean = 79.028 us, max = 26.386 ms, min = -0.000 s, total = 568.763 ms [state-dump] MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 5749 total (1 active), Execution time: mean = 455.487 us, total = 2.619 s, Queueing time: mean = 75.339 us, max = 3.532 ms, min = -0.000 s, total = 433.123 ms [state-dump] NodeManager.deadline_timer.spill_objects_when_over_threshold - 1440 total (1 active), Execution time: mean = 3.139 us, total = 4.520 ms, Queueing time: mean = 178.266 us, max = 2.222 ms, min = 5.004 us, total = 256.704 ms [state-dump] NodeManager.deadline_timer.flush_free_objects - 1440 total (1 active), Execution time: mean = 9.011 us, total = 12.976 ms, Queueing time: mean = 174.255 us, max = 2.212 ms, min = 6.450 us, total = 250.927 ms [state-dump] NodeManager.ScheduleAndDispatchTasks - 1440 total (1 active), Execution time: mean = 16.530 us, total = 23.803 ms, Queueing time: mean = 76.772 us, max = 2.581 ms, min = 10.666 us, total = 110.551 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 1439 total (0 active), Execution time: mean = 99.403 us, total = 143.042 ms, Queueing time: mean = 114.560 us, max = 1.722 ms, min = 4.027 us, total = 164.852 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad - 1439 total (0 active), Execution time: mean = 623.286 us, total = 896.909 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ClusterResourceManager.ResetRemoteNodeView - 481 total (1 active), Execution time: mean = 11.627 us, total = 5.593 ms, Queueing time: mean = 77.323 us, max = 341.916 us, min = 14.912 us, total = 37.192 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 288 total (0 active), Execution time: mean = 1.515 ms, total = 436.200 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.deadline_timer.record_metrics - 288 total (1 active), Execution time: mean = 569.649 us, total = 164.059 ms, Queueing time: mean = 334.431 us, max = 1.499 ms, min = 11.295 us, total = 96.316 ms [state-dump] NodeManager.GcsCheckAlive - 288 total (1 active), Execution time: mean = 289.195 us, total = 83.288 ms, Queueing time: mean = 613.353 us, max = 1.861 ms, min = 6.690 us, total = 176.646 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 288 total (0 active), Execution time: mean = 52.458 us, total = 15.108 ms, Queueing time: mean = 124.617 us, max = 4.779 ms, min = 11.561 us, total = 35.890 ms [state-dump] NodeManager.deadline_timer.debug_state_dump - 144 total (1 active), Execution time: mean = 1.732 ms, total = 249.460 ms, Queueing time: mean = 68.116 us, max = 163.431 us, min = 15.806 us, total = 9.809 ms [state-dump] ClientConnection.async_read.ProcessMessageHeader - 95 total (21 active), Execution time: mean = 7.620 us, total = 723.861 us, Queueing time: mean = 660.815 ms, max = 59.826 s, min = 29.537 us, total = 62.777 s [state-dump] ClientConnection.async_read.ProcessMessage - 74 total (0 active), Execution time: mean = 752.373 us, total = 55.676 ms, Queueing time: mean = 18.898 us, max = 247.615 us, min = 4.453 us, total = 1.398 ms [state-dump] NodeManager.deadline_timer.print_event_loop_stats - 24 total (1 active, 1 running), Execution time: mean = 2.489 ms, total = 59.746 ms, Queueing time: mean = 71.342 us, max = 215.454 us, min = 15.835 us, total = 1.712 ms [state-dump] ClientConnection.async_write.DoAsyncWrites - 22 total (0 active), Execution time: mean = 2.129 us, total = 46.834 us, Queueing time: mean = 43.475 us, max = 264.443 us, min = 17.727 us, total = 956.459 us [state-dump] ObjectManager.ObjectDeleted - 21 total (0 active), Execution time: mean = 18.654 us, total = 391.728 us, Queueing time: mean = 66.932 us, max = 269.040 us, min = 24.280 us, total = 1.406 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 21 total (0 active), Execution time: mean = 119.783 us, total = 2.515 ms, Queueing time: mean = 119.709 us, max = 193.975 us, min = 26.529 us, total = 2.514 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig - 21 total (0 active), Execution time: mean = 800.451 us, total = 16.809 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ObjectManager.ObjectAdded - 21 total (0 active), Execution time: mean = 13.522 us, total = 283.961 us, Queueing time: mean = 115.200 us, max = 203.067 us, min = 15.853 us, total = 2.419 ms [state-dump] PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 204.482 us, total = 2.658 ms, Queueing time: mean = 3.109 ms, max = 9.915 ms, min = 28.359 us, total = 40.413 ms [state-dump] - 12 total (0 active), Execution time: mean = 1.109 us, total = 13.308 us, Queueing time: mean = 117.235 us, max = 207.332 us, min = 32.817 us, total = 1.407 ms [state-dump] RaySyncer.BroadcastMessage - 12 total (0 active), Execution time: mean = 229.790 us, total = 2.757 ms, Queueing time: mean = 796.500 ns, max = 1.330 us, min = 180.000 ns, total = 9.558 us [state-dump] NodeManagerService.grpc_server.RequestWorkerLease.HandleRequestImpl - 10 total (0 active), Execution time: mean = 133.400 us, total = 1.334 ms, Queueing time: mean = 133.654 us, max = 258.016 us, min = 50.206 us, total = 1.337 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker - 10 total (0 active), Execution time: mean = 698.937 us, total = 6.989 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.RequestWorkerLease - 10 total (0 active), Execution time: mean = 967.372 us, total = 9.674 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] WorkerPool.PopWorkerCallback - 10 total (0 active), Execution time: mean = 24.488 us, total = 244.878 us, Queueing time: mean = 114.997 us, max = 206.869 us, min = 23.653 us, total = 1.150 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker.HandleRequestImpl - 10 total (0 active), Execution time: mean = 129.906 us, total = 1.299 ms, Queueing time: mean = 109.463 us, max = 137.712 us, min = 31.332 us, total = 1.095 ms [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 4 total (1 active), Execution time: mean = 299.400 s, total = 1197.600 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 3 total (0 active), Execution time: mean = 294.453 us, total = 883.360 us, Queueing time: mean = 76.688 us, max = 160.959 us, min = 20.299 us, total = 230.064 us [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 1.320 ms, total = 2.640 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.GCTaskFailureReason - 2 total (1 active), Execution time: mean = 4.207 us, total = 8.414 us, Queueing time: mean = 41.935 us, max = 83.871 us, min = 83.871 us, total = 83.871 us [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 142.119 us, total = 284.237 us, Queueing time: mean = 702.200 us, max = 1.287 ms, min = 117.240 us, total = 1.404 ms [state-dump] RaySyncerRegister - 2 total (0 active), Execution time: mean = 2.053 us, total = 4.106 us, Queueing time: mean = 222.000 ns, max = 369.000 ns, min = 75.000 ns, total = 444.000 ns [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 2.178 ms, total = 2.178 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 1.022 s, total = 1.022 s, Queueing time: mean = 126.779 us, max = 126.779 us, min = 126.779 us, total = 126.779 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 43.810 us, total = 43.810 us, Queueing time: mean = 116.080 us, max = 116.080 us, min = 116.080 us, total = 116.080 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 1.935 ms, total = 1.935 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 2.002 ms, total = 2.002 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 1.278 ms, total = 1.278 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 288.995 us, total = 288.995 us, Queueing time: mean = 122.868 us, max = 122.868 us, min = 122.868 us, total = 122.868 us [state-dump] Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 74.750 us, total = 74.750 us, Queueing time: mean = 225.429 us, max = 225.429 us, min = 225.429 us, total = 225.429 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 242.297 us, total = 242.297 us, Queueing time: mean = 122.977 us, max = 122.977 us, min = 122.977 us, total = 122.977 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 92.080 us, total = 92.080 us, Queueing time: mean = 36.492 us, max = 36.492 us, min = 36.492 us, total = 36.492 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 1.746 ms, total = 1.746 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] DebugString() time ms: 1 [state-dump] [state-dump] [2025-01-20 23:26:54,909 I 13636 13664] (raylet) store.cc:564: Plasma store debug dump: Current usage: 0 / 2.14748 GB - num bytes created total: 168 0 pending objects of total size 0MB - objects spillable: 0 - bytes spillable: 0 - objects unsealed: 0 - bytes unsealed: 0 - objects in use: 0 - bytes in use: 0 - objects evictable: 0 - bytes evictable: 0 - objects created by worker: 0 - bytes created by worker: 0 - objects restored: 0 - bytes restored: 0 - objects received: 0 - bytes received: 0 - objects errored: 0 - bytes errored: 0 [2025-01-20 23:26:55,992 I 13636 13636] (raylet) node_manager.cc:525: [state-dump] NodeManager: [state-dump] Node ID: 94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3 [state-dump] Node name: 192.168.0.2 [state-dump] InitialConfigResources: {object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, GPU: 20000, node:__internal_head__: 10000, accelerator_type:A40: 10000, memory: 844966629380000} [state-dump] ClusterTaskManager: [state-dump] ========== Node: 94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3 ================= [state-dump] Infeasible queue length: 0 [state-dump] Schedule queue length: 0 [state-dump] Dispatch queue length: 0 [state-dump] num_waiting_for_resource: 0 [state-dump] num_waiting_for_plasma_memory: 0 [state-dump] num_waiting_for_remote_node_resources: 0 [state-dump] num_worker_not_started_by_job_config_not_exist: 0 [state-dump] num_worker_not_started_by_registration_timeout: 0 [state-dump] num_tasks_waiting_for_workers: 0 [state-dump] num_cancelled_tasks: 0 [state-dump] cluster_resource_scheduler state: [state-dump] Local id: -2729954820020547849 Local resources: {"total":{CPU: [200000], memory: [844966629380000], accelerator_type:A40: [10000], node:192.168.0.2: [10000], node:__internal_head__: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000]}}, "available": {CPU: [200000], memory: [844966629380000], accelerator_type:A40: [10000], node:192.168.0.2: [10000], node:__internal_head__: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000]}}, "labels":{"ray.io/node_id":"94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3",} is_draining: 0 is_idle: 1 Cluster resources: node id: -2729954820020547849{"total":{object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, GPU: 20000, node:__internal_head__: 10000, accelerator_type:A40: 10000, memory: 844966629380000}}, "available": {GPU: 20000, node:__internal_head__: 10000, memory: 844966629380000, accelerator_type:A40: 10000, object_store_memory: 21474836480000, node:192.168.0.2: 10000, CPU: 200000}}, "labels":{"ray.io/node_id":"94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} [state-dump] Waiting tasks size: 0 [state-dump] Number of executing tasks: 0 [state-dump] Number of pinned task arguments: 0 [state-dump] Number of total spilled tasks: 0 [state-dump] Number of spilled waiting tasks: 0 [state-dump] Number of spilled unschedulable tasks: 0 [state-dump] Resource usage { [state-dump] } [state-dump] Backlog Size per scheduling descriptor :{workerId: num backlogs}: [state-dump] [state-dump] Running tasks by scheduling class: [state-dump] ================================================== [state-dump] [state-dump] ClusterResources: [state-dump] LocalObjectManager: [state-dump] - num pinned objects: 0 [state-dump] - pinned objects size: 0 [state-dump] - num objects pending restore: 0 [state-dump] - num objects pending spill: 0 [state-dump] - num bytes pending spill: 0 [state-dump] - num bytes currently spilled: 0 [state-dump] - cumulative spill requests: 0 [state-dump] - cumulative restore requests: 0 [state-dump] - spilled objects pending delete: 0 [state-dump] [state-dump] ObjectManager: [state-dump] - num local objects: 0 [state-dump] - num unfulfilled push requests: 0 [state-dump] - num object pull requests: 0 [state-dump] - num chunks received total: 0 [state-dump] - num chunks received failed (all): 0 [state-dump] - num chunks received failed / cancelled: 0 [state-dump] - num chunks received failed / plasma error: 0 [state-dump] Event stats: [state-dump] Global stats: 0 total (0 active) [state-dump] Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Execution time: mean = -nan s, total = 0.000 s [state-dump] Event stats: [state-dump] PushManager: [state-dump] - num pushes in flight: 0 [state-dump] - num chunks in flight: 0 [state-dump] - num chunks remaining: 0 [state-dump] - max chunks allowed: 409 [state-dump] OwnershipBasedObjectDirectory: [state-dump] - num listeners: 0 [state-dump] - cumulative location updates: 0 [state-dump] - num location updates per second: 0.000 [state-dump] - num location lookups per second: 0.000 [state-dump] - num locations added per second: 0.000 [state-dump] - num locations removed per second: 0.000 [state-dump] BufferPool: [state-dump] - create buffer state map size: 0 [state-dump] PullManager: [state-dump] - num bytes available for pulled objects: 2147483648 [state-dump] - num bytes being pulled (all): 0 [state-dump] - num bytes being pulled / pinned: 0 [state-dump] - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - first get request bundle: N/A [state-dump] - first wait request bundle: N/A [state-dump] - first task request bundle: N/A [state-dump] - num objects queued: 0 [state-dump] - num objects actively pulled (all): 0 [state-dump] - num objects actively pulled / pinned: 0 [state-dump] - num bundles being pulled: 0 [state-dump] - num pull retries: 0 [state-dump] - max timeout seconds: 0 [state-dump] - max timeout request is already processed. No entry. [state-dump] [state-dump] WorkerPool: [state-dump] - registered jobs: 1 [state-dump] - process_failed_job_config_missing: 0 [state-dump] - process_failed_rate_limited: 0 [state-dump] - process_failed_pending_registration: 0 [state-dump] - process_failed_runtime_env_setup_failed: 0 [state-dump] - num PYTHON workers: 20 [state-dump] - num PYTHON drivers: 1 [state-dump] - num PYTHON pending start requests: 0 [state-dump] - num PYTHON pending registration requests: 0 [state-dump] - num object spill callbacks queued: 0 [state-dump] - num object restore queued: 0 [state-dump] - num util functions queued: 0 [state-dump] - num idle workers: 20 [state-dump] TaskDependencyManager: [state-dump] - task deps map size: 0 [state-dump] - get req map size: 0 [state-dump] - wait req map size: 0 [state-dump] - local objects map size: 0 [state-dump] WaitManager: [state-dump] - num active wait requests: 0 [state-dump] Subscriber: [state-dump] Channel WORKER_OBJECT_LOCATIONS_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_OBJECT_EVICTION [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_REF_REMOVED_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] num async plasma notifications: 0 [state-dump] Remote node managers: [state-dump] Event stats: [state-dump] Global stats: 131163 total (35 active) [state-dump] Queueing time: mean = 557.689 us, max = 59.826 s, min = -0.000 s, total = 73.148 s [state-dump] Execution time: mean = 9.318 ms, total = 1222.230 s [state-dump] Event stats: [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog - 31479 total (0 active), Execution time: mean = 540.155 us, total = 17.004 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 31479 total (0 active), Execution time: mean = 36.419 us, total = 1.146 s, Queueing time: mean = 112.583 us, max = 2.841 ms, min = 2.750 us, total = 3.544 s [state-dump] ObjectManager.UpdateAvailableMemory - 14986 total (0 active), Execution time: mean = 6.159 us, total = 92.299 ms, Queueing time: mean = 110.431 us, max = 1.076 ms, min = 2.228 us, total = 1.655 s [state-dump] NodeManager.CheckGC - 14986 total (1 active), Execution time: mean = 2.866 us, total = 42.953 ms, Queueing time: mean = 99.821 us, max = 25.875 ms, min = 3.123 us, total = 1.496 s [state-dump] RaySyncer.OnDemandBroadcasting - 14986 total (1 active), Execution time: mean = 10.792 us, total = 161.725 ms, Queueing time: mean = 92.799 us, max = 25.869 ms, min = 8.560 us, total = 1.391 s [state-dump] RayletWorkerPool.deadline_timer.kill_idle_workers - 7497 total (1 active), Execution time: mean = 18.586 us, total = 139.341 ms, Queueing time: mean = 78.671 us, max = 26.386 ms, min = -0.000 s, total = 589.797 ms [state-dump] MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 5988 total (1 active), Execution time: mean = 454.628 us, total = 2.722 s, Queueing time: mean = 75.205 us, max = 3.532 ms, min = -0.000 s, total = 450.328 ms [state-dump] NodeManager.deadline_timer.spill_objects_when_over_threshold - 1500 total (1 active), Execution time: mean = 3.132 us, total = 4.698 ms, Queueing time: mean = 178.273 us, max = 2.222 ms, min = 5.004 us, total = 267.410 ms [state-dump] NodeManager.deadline_timer.flush_free_objects - 1500 total (1 active), Execution time: mean = 8.996 us, total = 13.494 ms, Queueing time: mean = 174.271 us, max = 2.212 ms, min = 6.450 us, total = 261.406 ms [state-dump] NodeManager.ScheduleAndDispatchTasks - 1500 total (1 active), Execution time: mean = 16.457 us, total = 24.685 ms, Queueing time: mean = 76.219 us, max = 2.581 ms, min = 10.666 us, total = 114.329 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 1499 total (0 active), Execution time: mean = 99.129 us, total = 148.594 ms, Queueing time: mean = 115.731 us, max = 2.934 ms, min = 4.027 us, total = 173.480 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad - 1499 total (0 active), Execution time: mean = 623.695 us, total = 934.919 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ClusterResourceManager.ResetRemoteNodeView - 501 total (1 active), Execution time: mean = 11.501 us, total = 5.762 ms, Queueing time: mean = 76.866 us, max = 341.916 us, min = 14.912 us, total = 38.510 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 300 total (0 active), Execution time: mean = 1.520 ms, total = 456.025 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.deadline_timer.record_metrics - 300 total (1 active), Execution time: mean = 568.675 us, total = 170.602 ms, Queueing time: mean = 335.353 us, max = 1.499 ms, min = 11.295 us, total = 100.606 ms [state-dump] NodeManager.GcsCheckAlive - 300 total (1 active), Execution time: mean = 290.030 us, total = 87.009 ms, Queueing time: mean = 613.019 us, max = 1.861 ms, min = 6.690 us, total = 183.906 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 300 total (0 active), Execution time: mean = 52.480 us, total = 15.744 ms, Queueing time: mean = 122.926 us, max = 4.779 ms, min = 11.561 us, total = 36.878 ms [state-dump] NodeManager.deadline_timer.debug_state_dump - 150 total (1 active), Execution time: mean = 1.733 ms, total = 259.933 ms, Queueing time: mean = 67.921 us, max = 163.431 us, min = 15.806 us, total = 10.188 ms [state-dump] ClientConnection.async_read.ProcessMessageHeader - 95 total (21 active), Execution time: mean = 7.620 us, total = 723.861 us, Queueing time: mean = 660.815 ms, max = 59.826 s, min = 29.537 us, total = 62.777 s [state-dump] ClientConnection.async_read.ProcessMessage - 74 total (0 active), Execution time: mean = 752.373 us, total = 55.676 ms, Queueing time: mean = 18.898 us, max = 247.615 us, min = 4.453 us, total = 1.398 ms [state-dump] NodeManager.deadline_timer.print_event_loop_stats - 25 total (1 active, 1 running), Execution time: mean = 2.505 ms, total = 62.622 ms, Queueing time: mean = 71.794 us, max = 215.454 us, min = 15.835 us, total = 1.795 ms [state-dump] ClientConnection.async_write.DoAsyncWrites - 22 total (0 active), Execution time: mean = 2.129 us, total = 46.834 us, Queueing time: mean = 43.475 us, max = 264.443 us, min = 17.727 us, total = 956.459 us [state-dump] ObjectManager.ObjectDeleted - 21 total (0 active), Execution time: mean = 18.654 us, total = 391.728 us, Queueing time: mean = 66.932 us, max = 269.040 us, min = 24.280 us, total = 1.406 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 21 total (0 active), Execution time: mean = 119.783 us, total = 2.515 ms, Queueing time: mean = 119.709 us, max = 193.975 us, min = 26.529 us, total = 2.514 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig - 21 total (0 active), Execution time: mean = 800.451 us, total = 16.809 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ObjectManager.ObjectAdded - 21 total (0 active), Execution time: mean = 13.522 us, total = 283.961 us, Queueing time: mean = 115.200 us, max = 203.067 us, min = 15.853 us, total = 2.419 ms [state-dump] PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 204.482 us, total = 2.658 ms, Queueing time: mean = 3.109 ms, max = 9.915 ms, min = 28.359 us, total = 40.413 ms [state-dump] - 12 total (0 active), Execution time: mean = 1.109 us, total = 13.308 us, Queueing time: mean = 117.235 us, max = 207.332 us, min = 32.817 us, total = 1.407 ms [state-dump] RaySyncer.BroadcastMessage - 12 total (0 active), Execution time: mean = 229.790 us, total = 2.757 ms, Queueing time: mean = 796.500 ns, max = 1.330 us, min = 180.000 ns, total = 9.558 us [state-dump] NodeManagerService.grpc_server.RequestWorkerLease.HandleRequestImpl - 10 total (0 active), Execution time: mean = 133.400 us, total = 1.334 ms, Queueing time: mean = 133.654 us, max = 258.016 us, min = 50.206 us, total = 1.337 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker - 10 total (0 active), Execution time: mean = 698.937 us, total = 6.989 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.RequestWorkerLease - 10 total (0 active), Execution time: mean = 967.372 us, total = 9.674 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] WorkerPool.PopWorkerCallback - 10 total (0 active), Execution time: mean = 24.488 us, total = 244.878 us, Queueing time: mean = 114.997 us, max = 206.869 us, min = 23.653 us, total = 1.150 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker.HandleRequestImpl - 10 total (0 active), Execution time: mean = 129.906 us, total = 1.299 ms, Queueing time: mean = 109.463 us, max = 137.712 us, min = 31.332 us, total = 1.095 ms [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 4 total (1 active), Execution time: mean = 299.400 s, total = 1197.600 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 3 total (0 active), Execution time: mean = 294.453 us, total = 883.360 us, Queueing time: mean = 76.688 us, max = 160.959 us, min = 20.299 us, total = 230.064 us [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 1.320 ms, total = 2.640 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.GCTaskFailureReason - 2 total (1 active), Execution time: mean = 4.207 us, total = 8.414 us, Queueing time: mean = 41.935 us, max = 83.871 us, min = 83.871 us, total = 83.871 us [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 142.119 us, total = 284.237 us, Queueing time: mean = 702.200 us, max = 1.287 ms, min = 117.240 us, total = 1.404 ms [state-dump] RaySyncerRegister - 2 total (0 active), Execution time: mean = 2.053 us, total = 4.106 us, Queueing time: mean = 222.000 ns, max = 369.000 ns, min = 75.000 ns, total = 444.000 ns [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 2.178 ms, total = 2.178 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 1.022 s, total = 1.022 s, Queueing time: mean = 126.779 us, max = 126.779 us, min = 126.779 us, total = 126.779 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 43.810 us, total = 43.810 us, Queueing time: mean = 116.080 us, max = 116.080 us, min = 116.080 us, total = 116.080 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 1.935 ms, total = 1.935 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 2.002 ms, total = 2.002 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 1.278 ms, total = 1.278 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 288.995 us, total = 288.995 us, Queueing time: mean = 122.868 us, max = 122.868 us, min = 122.868 us, total = 122.868 us [state-dump] Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 74.750 us, total = 74.750 us, Queueing time: mean = 225.429 us, max = 225.429 us, min = 225.429 us, total = 225.429 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 242.297 us, total = 242.297 us, Queueing time: mean = 122.977 us, max = 122.977 us, min = 122.977 us, total = 122.977 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 92.080 us, total = 92.080 us, Queueing time: mean = 36.492 us, max = 36.492 us, min = 36.492 us, total = 36.492 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 1.746 ms, total = 1.746 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] DebugString() time ms: 1 [state-dump] [state-dump] [2025-01-20 23:27:54,910 I 13636 13664] (raylet) store.cc:564: Plasma store debug dump: Current usage: 0 / 2.14748 GB - num bytes created total: 168 0 pending objects of total size 0MB - objects spillable: 0 - bytes spillable: 0 - objects unsealed: 0 - bytes unsealed: 0 - objects in use: 0 - bytes in use: 0 - objects evictable: 0 - bytes evictable: 0 - objects created by worker: 0 - bytes created by worker: 0 - objects restored: 0 - bytes restored: 0 - objects received: 0 - bytes received: 0 - objects errored: 0 - bytes errored: 0 [2025-01-20 23:27:55,994 I 13636 13636] (raylet) node_manager.cc:525: [state-dump] NodeManager: [state-dump] Node ID: 94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3 [state-dump] Node name: 192.168.0.2 [state-dump] InitialConfigResources: {object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, GPU: 20000, node:__internal_head__: 10000, accelerator_type:A40: 10000, memory: 844966629380000} [state-dump] ClusterTaskManager: [state-dump] ========== Node: 94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3 ================= [state-dump] Infeasible queue length: 0 [state-dump] Schedule queue length: 0 [state-dump] Dispatch queue length: 0 [state-dump] num_waiting_for_resource: 0 [state-dump] num_waiting_for_plasma_memory: 0 [state-dump] num_waiting_for_remote_node_resources: 0 [state-dump] num_worker_not_started_by_job_config_not_exist: 0 [state-dump] num_worker_not_started_by_registration_timeout: 0 [state-dump] num_tasks_waiting_for_workers: 0 [state-dump] num_cancelled_tasks: 0 [state-dump] cluster_resource_scheduler state: [state-dump] Local id: -2729954820020547849 Local resources: {"total":{CPU: [200000], memory: [844966629380000], accelerator_type:A40: [10000], node:192.168.0.2: [10000], node:__internal_head__: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000]}}, "available": {CPU: [200000], memory: [844966629380000], accelerator_type:A40: [10000], node:192.168.0.2: [10000], node:__internal_head__: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000]}}, "labels":{"ray.io/node_id":"94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3",} is_draining: 0 is_idle: 1 Cluster resources: node id: -2729954820020547849{"total":{object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, GPU: 20000, node:__internal_head__: 10000, accelerator_type:A40: 10000, memory: 844966629380000}}, "available": {GPU: 20000, node:__internal_head__: 10000, memory: 844966629380000, accelerator_type:A40: 10000, object_store_memory: 21474836480000, node:192.168.0.2: 10000, CPU: 200000}}, "labels":{"ray.io/node_id":"94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} [state-dump] Waiting tasks size: 0 [state-dump] Number of executing tasks: 0 [state-dump] Number of pinned task arguments: 0 [state-dump] Number of total spilled tasks: 0 [state-dump] Number of spilled waiting tasks: 0 [state-dump] Number of spilled unschedulable tasks: 0 [state-dump] Resource usage { [state-dump] } [state-dump] Backlog Size per scheduling descriptor :{workerId: num backlogs}: [state-dump] [state-dump] Running tasks by scheduling class: [state-dump] ================================================== [state-dump] [state-dump] ClusterResources: [state-dump] LocalObjectManager: [state-dump] - num pinned objects: 0 [state-dump] - pinned objects size: 0 [state-dump] - num objects pending restore: 0 [state-dump] - num objects pending spill: 0 [state-dump] - num bytes pending spill: 0 [state-dump] - num bytes currently spilled: 0 [state-dump] - cumulative spill requests: 0 [state-dump] - cumulative restore requests: 0 [state-dump] - spilled objects pending delete: 0 [state-dump] [state-dump] ObjectManager: [state-dump] - num local objects: 0 [state-dump] - num unfulfilled push requests: 0 [state-dump] - num object pull requests: 0 [state-dump] - num chunks received total: 0 [state-dump] - num chunks received failed (all): 0 [state-dump] - num chunks received failed / cancelled: 0 [state-dump] - num chunks received failed / plasma error: 0 [state-dump] Event stats: [state-dump] Global stats: 0 total (0 active) [state-dump] Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Execution time: mean = -nan s, total = 0.000 s [state-dump] Event stats: [state-dump] PushManager: [state-dump] - num pushes in flight: 0 [state-dump] - num chunks in flight: 0 [state-dump] - num chunks remaining: 0 [state-dump] - max chunks allowed: 409 [state-dump] OwnershipBasedObjectDirectory: [state-dump] - num listeners: 0 [state-dump] - cumulative location updates: 0 [state-dump] - num location updates per second: 0.000 [state-dump] - num location lookups per second: 0.000 [state-dump] - num locations added per second: 0.000 [state-dump] - num locations removed per second: 0.000 [state-dump] BufferPool: [state-dump] - create buffer state map size: 0 [state-dump] PullManager: [state-dump] - num bytes available for pulled objects: 2147483648 [state-dump] - num bytes being pulled (all): 0 [state-dump] - num bytes being pulled / pinned: 0 [state-dump] - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - first get request bundle: N/A [state-dump] - first wait request bundle: N/A [state-dump] - first task request bundle: N/A [state-dump] - num objects queued: 0 [state-dump] - num objects actively pulled (all): 0 [state-dump] - num objects actively pulled / pinned: 0 [state-dump] - num bundles being pulled: 0 [state-dump] - num pull retries: 0 [state-dump] - max timeout seconds: 0 [state-dump] - max timeout request is already processed. No entry. [state-dump] [state-dump] WorkerPool: [state-dump] - registered jobs: 1 [state-dump] - process_failed_job_config_missing: 0 [state-dump] - process_failed_rate_limited: 0 [state-dump] - process_failed_pending_registration: 0 [state-dump] - process_failed_runtime_env_setup_failed: 0 [state-dump] - num PYTHON workers: 20 [state-dump] - num PYTHON drivers: 1 [state-dump] - num PYTHON pending start requests: 0 [state-dump] - num PYTHON pending registration requests: 0 [state-dump] - num object spill callbacks queued: 0 [state-dump] - num object restore queued: 0 [state-dump] - num util functions queued: 0 [state-dump] - num idle workers: 20 [state-dump] TaskDependencyManager: [state-dump] - task deps map size: 0 [state-dump] - get req map size: 0 [state-dump] - wait req map size: 0 [state-dump] - local objects map size: 0 [state-dump] WaitManager: [state-dump] - num active wait requests: 0 [state-dump] Subscriber: [state-dump] Channel WORKER_OBJECT_LOCATIONS_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_OBJECT_EVICTION [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_REF_REMOVED_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] num async plasma notifications: 0 [state-dump] Remote node managers: [state-dump] Event stats: [state-dump] Global stats: 136395 total (35 active) [state-dump] Queueing time: mean = 539.045 us, max = 59.826 s, min = -0.000 s, total = 73.523 s [state-dump] Execution time: mean = 8.967 ms, total = 1223.108 s [state-dump] Event stats: [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog - 32739 total (0 active), Execution time: mean = 538.715 us, total = 17.637 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 32739 total (0 active), Execution time: mean = 36.222 us, total = 1.186 s, Queueing time: mean = 112.176 us, max = 2.841 ms, min = 2.750 us, total = 3.673 s [state-dump] ObjectManager.UpdateAvailableMemory - 15585 total (0 active), Execution time: mean = 6.133 us, total = 95.589 ms, Queueing time: mean = 110.047 us, max = 1.076 ms, min = 2.228 us, total = 1.715 s [state-dump] NodeManager.CheckGC - 15585 total (1 active), Execution time: mean = 2.859 us, total = 44.553 ms, Queueing time: mean = 99.399 us, max = 25.875 ms, min = 3.123 us, total = 1.549 s [state-dump] RaySyncer.OnDemandBroadcasting - 15585 total (1 active), Execution time: mean = 10.732 us, total = 167.262 ms, Queueing time: mean = 92.425 us, max = 25.869 ms, min = 8.560 us, total = 1.440 s [state-dump] RayletWorkerPool.deadline_timer.kill_idle_workers - 7797 total (1 active), Execution time: mean = 18.508 us, total = 144.308 ms, Queueing time: mean = 78.238 us, max = 26.386 ms, min = -0.000 s, total = 610.025 ms [state-dump] MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 6228 total (1 active), Execution time: mean = 453.988 us, total = 2.827 s, Queueing time: mean = 74.906 us, max = 3.532 ms, min = -0.000 s, total = 466.516 ms [state-dump] NodeManager.deadline_timer.spill_objects_when_over_threshold - 1560 total (1 active), Execution time: mean = 3.122 us, total = 4.870 ms, Queueing time: mean = 178.376 us, max = 2.222 ms, min = 5.004 us, total = 278.266 ms [state-dump] NodeManager.deadline_timer.flush_free_objects - 1560 total (1 active), Execution time: mean = 8.964 us, total = 13.983 ms, Queueing time: mean = 174.386 us, max = 2.212 ms, min = 6.450 us, total = 272.042 ms [state-dump] NodeManager.ScheduleAndDispatchTasks - 1560 total (1 active), Execution time: mean = 16.416 us, total = 25.609 ms, Queueing time: mean = 76.201 us, max = 2.581 ms, min = 10.666 us, total = 118.873 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 1559 total (0 active), Execution time: mean = 98.844 us, total = 154.098 ms, Queueing time: mean = 115.005 us, max = 2.934 ms, min = 4.027 us, total = 179.292 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad - 1559 total (0 active), Execution time: mean = 621.042 us, total = 968.204 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ClusterResourceManager.ResetRemoteNodeView - 521 total (1 active), Execution time: mean = 11.387 us, total = 5.933 ms, Queueing time: mean = 76.441 us, max = 341.916 us, min = 14.912 us, total = 39.826 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 312 total (0 active), Execution time: mean = 1.523 ms, total = 475.182 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.deadline_timer.record_metrics - 312 total (1 active), Execution time: mean = 568.731 us, total = 177.444 ms, Queueing time: mean = 336.806 us, max = 1.499 ms, min = 11.295 us, total = 105.083 ms [state-dump] NodeManager.GcsCheckAlive - 312 total (1 active), Execution time: mean = 290.636 us, total = 90.678 ms, Queueing time: mean = 613.215 us, max = 1.861 ms, min = 6.690 us, total = 191.323 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 312 total (0 active), Execution time: mean = 52.509 us, total = 16.383 ms, Queueing time: mean = 122.391 us, max = 4.779 ms, min = 11.561 us, total = 38.186 ms [state-dump] NodeManager.deadline_timer.debug_state_dump - 156 total (1 active), Execution time: mean = 1.735 ms, total = 270.679 ms, Queueing time: mean = 68.162 us, max = 163.431 us, min = 15.806 us, total = 10.633 ms [state-dump] ClientConnection.async_read.ProcessMessageHeader - 95 total (21 active), Execution time: mean = 7.620 us, total = 723.861 us, Queueing time: mean = 660.815 ms, max = 59.826 s, min = 29.537 us, total = 62.777 s [state-dump] ClientConnection.async_read.ProcessMessage - 74 total (0 active), Execution time: mean = 752.373 us, total = 55.676 ms, Queueing time: mean = 18.898 us, max = 247.615 us, min = 4.453 us, total = 1.398 ms [state-dump] NodeManager.deadline_timer.print_event_loop_stats - 26 total (1 active, 1 running), Execution time: mean = 2.520 ms, total = 65.512 ms, Queueing time: mean = 72.515 us, max = 215.454 us, min = 15.835 us, total = 1.885 ms [state-dump] ClientConnection.async_write.DoAsyncWrites - 22 total (0 active), Execution time: mean = 2.129 us, total = 46.834 us, Queueing time: mean = 43.475 us, max = 264.443 us, min = 17.727 us, total = 956.459 us [state-dump] ObjectManager.ObjectDeleted - 21 total (0 active), Execution time: mean = 18.654 us, total = 391.728 us, Queueing time: mean = 66.932 us, max = 269.040 us, min = 24.280 us, total = 1.406 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 21 total (0 active), Execution time: mean = 119.783 us, total = 2.515 ms, Queueing time: mean = 119.709 us, max = 193.975 us, min = 26.529 us, total = 2.514 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig - 21 total (0 active), Execution time: mean = 800.451 us, total = 16.809 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ObjectManager.ObjectAdded - 21 total (0 active), Execution time: mean = 13.522 us, total = 283.961 us, Queueing time: mean = 115.200 us, max = 203.067 us, min = 15.853 us, total = 2.419 ms [state-dump] PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 204.482 us, total = 2.658 ms, Queueing time: mean = 3.109 ms, max = 9.915 ms, min = 28.359 us, total = 40.413 ms [state-dump] - 12 total (0 active), Execution time: mean = 1.109 us, total = 13.308 us, Queueing time: mean = 117.235 us, max = 207.332 us, min = 32.817 us, total = 1.407 ms [state-dump] RaySyncer.BroadcastMessage - 12 total (0 active), Execution time: mean = 229.790 us, total = 2.757 ms, Queueing time: mean = 796.500 ns, max = 1.330 us, min = 180.000 ns, total = 9.558 us [state-dump] NodeManagerService.grpc_server.RequestWorkerLease.HandleRequestImpl - 10 total (0 active), Execution time: mean = 133.400 us, total = 1.334 ms, Queueing time: mean = 133.654 us, max = 258.016 us, min = 50.206 us, total = 1.337 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker - 10 total (0 active), Execution time: mean = 698.937 us, total = 6.989 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.RequestWorkerLease - 10 total (0 active), Execution time: mean = 967.372 us, total = 9.674 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] WorkerPool.PopWorkerCallback - 10 total (0 active), Execution time: mean = 24.488 us, total = 244.878 us, Queueing time: mean = 114.997 us, max = 206.869 us, min = 23.653 us, total = 1.150 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker.HandleRequestImpl - 10 total (0 active), Execution time: mean = 129.906 us, total = 1.299 ms, Queueing time: mean = 109.463 us, max = 137.712 us, min = 31.332 us, total = 1.095 ms [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 4 total (1 active), Execution time: mean = 299.400 s, total = 1197.600 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 3 total (0 active), Execution time: mean = 294.453 us, total = 883.360 us, Queueing time: mean = 76.688 us, max = 160.959 us, min = 20.299 us, total = 230.064 us [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 1.320 ms, total = 2.640 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.GCTaskFailureReason - 2 total (1 active), Execution time: mean = 4.207 us, total = 8.414 us, Queueing time: mean = 41.935 us, max = 83.871 us, min = 83.871 us, total = 83.871 us [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 142.119 us, total = 284.237 us, Queueing time: mean = 702.200 us, max = 1.287 ms, min = 117.240 us, total = 1.404 ms [state-dump] RaySyncerRegister - 2 total (0 active), Execution time: mean = 2.053 us, total = 4.106 us, Queueing time: mean = 222.000 ns, max = 369.000 ns, min = 75.000 ns, total = 444.000 ns [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 2.178 ms, total = 2.178 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 1.022 s, total = 1.022 s, Queueing time: mean = 126.779 us, max = 126.779 us, min = 126.779 us, total = 126.779 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 43.810 us, total = 43.810 us, Queueing time: mean = 116.080 us, max = 116.080 us, min = 116.080 us, total = 116.080 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 1.935 ms, total = 1.935 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 2.002 ms, total = 2.002 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 1.278 ms, total = 1.278 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 288.995 us, total = 288.995 us, Queueing time: mean = 122.868 us, max = 122.868 us, min = 122.868 us, total = 122.868 us [state-dump] Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 74.750 us, total = 74.750 us, Queueing time: mean = 225.429 us, max = 225.429 us, min = 225.429 us, total = 225.429 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 242.297 us, total = 242.297 us, Queueing time: mean = 122.977 us, max = 122.977 us, min = 122.977 us, total = 122.977 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 92.080 us, total = 92.080 us, Queueing time: mean = 36.492 us, max = 36.492 us, min = 36.492 us, total = 36.492 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 1.746 ms, total = 1.746 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] DebugString() time ms: 1 [state-dump] [state-dump] [2025-01-20 23:28:54,910 I 13636 13664] (raylet) store.cc:564: Plasma store debug dump: Current usage: 0 / 2.14748 GB - num bytes created total: 168 0 pending objects of total size 0MB - objects spillable: 0 - bytes spillable: 0 - objects unsealed: 0 - bytes unsealed: 0 - objects in use: 0 - bytes in use: 0 - objects evictable: 0 - bytes evictable: 0 - objects created by worker: 0 - bytes created by worker: 0 - objects restored: 0 - bytes restored: 0 - objects received: 0 - bytes received: 0 - objects errored: 0 - bytes errored: 0 [2025-01-20 23:28:55,997 I 13636 13636] (raylet) node_manager.cc:525: [state-dump] NodeManager: [state-dump] Node ID: 94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3 [state-dump] Node name: 192.168.0.2 [state-dump] InitialConfigResources: {object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, GPU: 20000, node:__internal_head__: 10000, accelerator_type:A40: 10000, memory: 844966629380000} [state-dump] ClusterTaskManager: [state-dump] ========== Node: 94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3 ================= [state-dump] Infeasible queue length: 0 [state-dump] Schedule queue length: 0 [state-dump] Dispatch queue length: 0 [state-dump] num_waiting_for_resource: 0 [state-dump] num_waiting_for_plasma_memory: 0 [state-dump] num_waiting_for_remote_node_resources: 0 [state-dump] num_worker_not_started_by_job_config_not_exist: 0 [state-dump] num_worker_not_started_by_registration_timeout: 0 [state-dump] num_tasks_waiting_for_workers: 0 [state-dump] num_cancelled_tasks: 0 [state-dump] cluster_resource_scheduler state: [state-dump] Local id: -2729954820020547849 Local resources: {"total":{CPU: [200000], memory: [844966629380000], accelerator_type:A40: [10000], node:192.168.0.2: [10000], node:__internal_head__: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000]}}, "available": {CPU: [200000], memory: [844966629380000], accelerator_type:A40: [10000], node:192.168.0.2: [10000], node:__internal_head__: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000]}}, "labels":{"ray.io/node_id":"94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3",} is_draining: 0 is_idle: 1 Cluster resources: node id: -2729954820020547849{"total":{object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, GPU: 20000, node:__internal_head__: 10000, accelerator_type:A40: 10000, memory: 844966629380000}}, "available": {GPU: 20000, node:__internal_head__: 10000, memory: 844966629380000, accelerator_type:A40: 10000, object_store_memory: 21474836480000, node:192.168.0.2: 10000, CPU: 200000}}, "labels":{"ray.io/node_id":"94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} [state-dump] Waiting tasks size: 0 [state-dump] Number of executing tasks: 0 [state-dump] Number of pinned task arguments: 0 [state-dump] Number of total spilled tasks: 0 [state-dump] Number of spilled waiting tasks: 0 [state-dump] Number of spilled unschedulable tasks: 0 [state-dump] Resource usage { [state-dump] } [state-dump] Backlog Size per scheduling descriptor :{workerId: num backlogs}: [state-dump] [state-dump] Running tasks by scheduling class: [state-dump] ================================================== [state-dump] [state-dump] ClusterResources: [state-dump] LocalObjectManager: [state-dump] - num pinned objects: 0 [state-dump] - pinned objects size: 0 [state-dump] - num objects pending restore: 0 [state-dump] - num objects pending spill: 0 [state-dump] - num bytes pending spill: 0 [state-dump] - num bytes currently spilled: 0 [state-dump] - cumulative spill requests: 0 [state-dump] - cumulative restore requests: 0 [state-dump] - spilled objects pending delete: 0 [state-dump] [state-dump] ObjectManager: [state-dump] - num local objects: 0 [state-dump] - num unfulfilled push requests: 0 [state-dump] - num object pull requests: 0 [state-dump] - num chunks received total: 0 [state-dump] - num chunks received failed (all): 0 [state-dump] - num chunks received failed / cancelled: 0 [state-dump] - num chunks received failed / plasma error: 0 [state-dump] Event stats: [state-dump] Global stats: 0 total (0 active) [state-dump] Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Execution time: mean = -nan s, total = 0.000 s [state-dump] Event stats: [state-dump] PushManager: [state-dump] - num pushes in flight: 0 [state-dump] - num chunks in flight: 0 [state-dump] - num chunks remaining: 0 [state-dump] - max chunks allowed: 409 [state-dump] OwnershipBasedObjectDirectory: [state-dump] - num listeners: 0 [state-dump] - cumulative location updates: 0 [state-dump] - num location updates per second: 0.000 [state-dump] - num location lookups per second: 0.000 [state-dump] - num locations added per second: 0.000 [state-dump] - num locations removed per second: 0.000 [state-dump] BufferPool: [state-dump] - create buffer state map size: 0 [state-dump] PullManager: [state-dump] - num bytes available for pulled objects: 2147483648 [state-dump] - num bytes being pulled (all): 0 [state-dump] - num bytes being pulled / pinned: 0 [state-dump] - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - first get request bundle: N/A [state-dump] - first wait request bundle: N/A [state-dump] - first task request bundle: N/A [state-dump] - num objects queued: 0 [state-dump] - num objects actively pulled (all): 0 [state-dump] - num objects actively pulled / pinned: 0 [state-dump] - num bundles being pulled: 0 [state-dump] - num pull retries: 0 [state-dump] - max timeout seconds: 0 [state-dump] - max timeout request is already processed. No entry. [state-dump] [state-dump] WorkerPool: [state-dump] - registered jobs: 1 [state-dump] - process_failed_job_config_missing: 0 [state-dump] - process_failed_rate_limited: 0 [state-dump] - process_failed_pending_registration: 0 [state-dump] - process_failed_runtime_env_setup_failed: 0 [state-dump] - num PYTHON workers: 20 [state-dump] - num PYTHON drivers: 1 [state-dump] - num PYTHON pending start requests: 0 [state-dump] - num PYTHON pending registration requests: 0 [state-dump] - num object spill callbacks queued: 0 [state-dump] - num object restore queued: 0 [state-dump] - num util functions queued: 0 [state-dump] - num idle workers: 20 [state-dump] TaskDependencyManager: [state-dump] - task deps map size: 0 [state-dump] - get req map size: 0 [state-dump] - wait req map size: 0 [state-dump] - local objects map size: 0 [state-dump] WaitManager: [state-dump] - num active wait requests: 0 [state-dump] Subscriber: [state-dump] Channel WORKER_OBJECT_LOCATIONS_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_OBJECT_EVICTION [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_REF_REMOVED_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] num async plasma notifications: 0 [state-dump] Remote node managers: [state-dump] Event stats: [state-dump] Global stats: 141629 total (35 active) [state-dump] Queueing time: mean = 521.662 us, max = 59.826 s, min = -0.000 s, total = 73.882 s [state-dump] Execution time: mean = 8.642 ms, total = 1223.988 s [state-dump] Event stats: [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog - 33999 total (0 active), Execution time: mean = 537.356 us, total = 18.270 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 33999 total (0 active), Execution time: mean = 36.066 us, total = 1.226 s, Queueing time: mean = 111.556 us, max = 2.841 ms, min = 2.750 us, total = 3.793 s [state-dump] ObjectManager.UpdateAvailableMemory - 16185 total (0 active), Execution time: mean = 6.106 us, total = 98.819 ms, Queueing time: mean = 109.350 us, max = 1.076 ms, min = 2.228 us, total = 1.770 s [state-dump] NodeManager.CheckGC - 16185 total (1 active), Execution time: mean = 2.853 us, total = 46.179 ms, Queueing time: mean = 99.131 us, max = 25.875 ms, min = 3.123 us, total = 1.604 s [state-dump] RaySyncer.OnDemandBroadcasting - 16185 total (1 active), Execution time: mean = 10.706 us, total = 173.272 ms, Queueing time: mean = 92.177 us, max = 25.869 ms, min = 8.560 us, total = 1.492 s [state-dump] RayletWorkerPool.deadline_timer.kill_idle_workers - 8097 total (1 active), Execution time: mean = 18.454 us, total = 149.425 ms, Queueing time: mean = 77.813 us, max = 26.386 ms, min = -0.000 s, total = 630.054 ms [state-dump] MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 6467 total (1 active), Execution time: mean = 453.642 us, total = 2.934 s, Queueing time: mean = 74.671 us, max = 3.532 ms, min = -0.000 s, total = 482.899 ms [state-dump] NodeManager.deadline_timer.spill_objects_when_over_threshold - 1620 total (1 active), Execution time: mean = 3.125 us, total = 5.063 ms, Queueing time: mean = 177.862 us, max = 2.222 ms, min = 5.004 us, total = 288.137 ms [state-dump] NodeManager.deadline_timer.flush_free_objects - 1620 total (1 active), Execution time: mean = 8.989 us, total = 14.563 ms, Queueing time: mean = 173.867 us, max = 2.212 ms, min = 6.450 us, total = 281.664 ms [state-dump] NodeManager.ScheduleAndDispatchTasks - 1620 total (1 active), Execution time: mean = 16.327 us, total = 26.449 ms, Queueing time: mean = 75.378 us, max = 2.581 ms, min = 10.666 us, total = 122.112 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 1619 total (0 active), Execution time: mean = 98.401 us, total = 159.312 ms, Queueing time: mean = 114.535 us, max = 2.934 ms, min = 4.027 us, total = 185.433 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad - 1619 total (0 active), Execution time: mean = 618.812 us, total = 1.002 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ClusterResourceManager.ResetRemoteNodeView - 541 total (1 active), Execution time: mean = 11.274 us, total = 6.099 ms, Queueing time: mean = 76.829 us, max = 442.307 us, min = 14.912 us, total = 41.565 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 324 total (0 active), Execution time: mean = 1.530 ms, total = 495.779 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.deadline_timer.record_metrics - 324 total (1 active), Execution time: mean = 568.900 us, total = 184.324 ms, Queueing time: mean = 333.233 us, max = 1.499 ms, min = 11.295 us, total = 107.967 ms [state-dump] NodeManager.GcsCheckAlive - 324 total (1 active), Execution time: mean = 290.738 us, total = 94.199 ms, Queueing time: mean = 610.217 us, max = 1.861 ms, min = 6.690 us, total = 197.710 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 324 total (0 active), Execution time: mean = 52.858 us, total = 17.126 ms, Queueing time: mean = 120.758 us, max = 4.779 ms, min = 11.561 us, total = 39.126 ms [state-dump] NodeManager.deadline_timer.debug_state_dump - 162 total (1 active), Execution time: mean = 1.730 ms, total = 280.303 ms, Queueing time: mean = 67.456 us, max = 163.431 us, min = 15.806 us, total = 10.928 ms [state-dump] ClientConnection.async_read.ProcessMessageHeader - 95 total (21 active), Execution time: mean = 7.620 us, total = 723.861 us, Queueing time: mean = 660.815 ms, max = 59.826 s, min = 29.537 us, total = 62.777 s [state-dump] ClientConnection.async_read.ProcessMessage - 74 total (0 active), Execution time: mean = 752.373 us, total = 55.676 ms, Queueing time: mean = 18.898 us, max = 247.615 us, min = 4.453 us, total = 1.398 ms [state-dump] NodeManager.deadline_timer.print_event_loop_stats - 27 total (1 active, 1 running), Execution time: mean = 2.523 ms, total = 68.134 ms, Queueing time: mean = 72.715 us, max = 215.454 us, min = 15.835 us, total = 1.963 ms [state-dump] ClientConnection.async_write.DoAsyncWrites - 22 total (0 active), Execution time: mean = 2.129 us, total = 46.834 us, Queueing time: mean = 43.475 us, max = 264.443 us, min = 17.727 us, total = 956.459 us [state-dump] ObjectManager.ObjectDeleted - 21 total (0 active), Execution time: mean = 18.654 us, total = 391.728 us, Queueing time: mean = 66.932 us, max = 269.040 us, min = 24.280 us, total = 1.406 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 21 total (0 active), Execution time: mean = 119.783 us, total = 2.515 ms, Queueing time: mean = 119.709 us, max = 193.975 us, min = 26.529 us, total = 2.514 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig - 21 total (0 active), Execution time: mean = 800.451 us, total = 16.809 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ObjectManager.ObjectAdded - 21 total (0 active), Execution time: mean = 13.522 us, total = 283.961 us, Queueing time: mean = 115.200 us, max = 203.067 us, min = 15.853 us, total = 2.419 ms [state-dump] PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 204.482 us, total = 2.658 ms, Queueing time: mean = 3.109 ms, max = 9.915 ms, min = 28.359 us, total = 40.413 ms [state-dump] - 12 total (0 active), Execution time: mean = 1.109 us, total = 13.308 us, Queueing time: mean = 117.235 us, max = 207.332 us, min = 32.817 us, total = 1.407 ms [state-dump] RaySyncer.BroadcastMessage - 12 total (0 active), Execution time: mean = 229.790 us, total = 2.757 ms, Queueing time: mean = 796.500 ns, max = 1.330 us, min = 180.000 ns, total = 9.558 us [state-dump] NodeManagerService.grpc_server.RequestWorkerLease.HandleRequestImpl - 10 total (0 active), Execution time: mean = 133.400 us, total = 1.334 ms, Queueing time: mean = 133.654 us, max = 258.016 us, min = 50.206 us, total = 1.337 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker - 10 total (0 active), Execution time: mean = 698.937 us, total = 6.989 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.RequestWorkerLease - 10 total (0 active), Execution time: mean = 967.372 us, total = 9.674 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] WorkerPool.PopWorkerCallback - 10 total (0 active), Execution time: mean = 24.488 us, total = 244.878 us, Queueing time: mean = 114.997 us, max = 206.869 us, min = 23.653 us, total = 1.150 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker.HandleRequestImpl - 10 total (0 active), Execution time: mean = 129.906 us, total = 1.299 ms, Queueing time: mean = 109.463 us, max = 137.712 us, min = 31.332 us, total = 1.095 ms [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 4 total (1 active), Execution time: mean = 299.400 s, total = 1197.600 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 3 total (0 active), Execution time: mean = 294.453 us, total = 883.360 us, Queueing time: mean = 76.688 us, max = 160.959 us, min = 20.299 us, total = 230.064 us [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 1.320 ms, total = 2.640 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.GCTaskFailureReason - 2 total (1 active), Execution time: mean = 4.207 us, total = 8.414 us, Queueing time: mean = 41.935 us, max = 83.871 us, min = 83.871 us, total = 83.871 us [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 142.119 us, total = 284.237 us, Queueing time: mean = 702.200 us, max = 1.287 ms, min = 117.240 us, total = 1.404 ms [state-dump] RaySyncerRegister - 2 total (0 active), Execution time: mean = 2.053 us, total = 4.106 us, Queueing time: mean = 222.000 ns, max = 369.000 ns, min = 75.000 ns, total = 444.000 ns [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 2.178 ms, total = 2.178 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 1.022 s, total = 1.022 s, Queueing time: mean = 126.779 us, max = 126.779 us, min = 126.779 us, total = 126.779 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 43.810 us, total = 43.810 us, Queueing time: mean = 116.080 us, max = 116.080 us, min = 116.080 us, total = 116.080 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 1.935 ms, total = 1.935 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 2.002 ms, total = 2.002 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 1.278 ms, total = 1.278 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 288.995 us, total = 288.995 us, Queueing time: mean = 122.868 us, max = 122.868 us, min = 122.868 us, total = 122.868 us [state-dump] Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 74.750 us, total = 74.750 us, Queueing time: mean = 225.429 us, max = 225.429 us, min = 225.429 us, total = 225.429 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 242.297 us, total = 242.297 us, Queueing time: mean = 122.977 us, max = 122.977 us, min = 122.977 us, total = 122.977 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 92.080 us, total = 92.080 us, Queueing time: mean = 36.492 us, max = 36.492 us, min = 36.492 us, total = 36.492 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 1.746 ms, total = 1.746 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] DebugString() time ms: 2 [state-dump] [state-dump] [2025-01-20 23:29:54,911 I 13636 13664] (raylet) store.cc:564: Plasma store debug dump: Current usage: 0 / 2.14748 GB - num bytes created total: 168 0 pending objects of total size 0MB - objects spillable: 0 - bytes spillable: 0 - objects unsealed: 0 - bytes unsealed: 0 - objects in use: 0 - bytes in use: 0 - objects evictable: 0 - bytes evictable: 0 - objects created by worker: 0 - bytes created by worker: 0 - objects restored: 0 - bytes restored: 0 - objects received: 0 - bytes received: 0 - objects errored: 0 - bytes errored: 0 [2025-01-20 23:29:56,000 I 13636 13636] (raylet) node_manager.cc:525: [state-dump] NodeManager: [state-dump] Node ID: 94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3 [state-dump] Node name: 192.168.0.2 [state-dump] InitialConfigResources: {object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, GPU: 20000, node:__internal_head__: 10000, accelerator_type:A40: 10000, memory: 844966629380000} [state-dump] ClusterTaskManager: [state-dump] ========== Node: 94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3 ================= [state-dump] Infeasible queue length: 0 [state-dump] Schedule queue length: 0 [state-dump] Dispatch queue length: 0 [state-dump] num_waiting_for_resource: 0 [state-dump] num_waiting_for_plasma_memory: 0 [state-dump] num_waiting_for_remote_node_resources: 0 [state-dump] num_worker_not_started_by_job_config_not_exist: 0 [state-dump] num_worker_not_started_by_registration_timeout: 0 [state-dump] num_tasks_waiting_for_workers: 0 [state-dump] num_cancelled_tasks: 0 [state-dump] cluster_resource_scheduler state: [state-dump] Local id: -2729954820020547849 Local resources: {"total":{CPU: [200000], memory: [844966629380000], accelerator_type:A40: [10000], node:192.168.0.2: [10000], node:__internal_head__: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000]}}, "available": {CPU: [200000], memory: [844966629380000], accelerator_type:A40: [10000], node:192.168.0.2: [10000], node:__internal_head__: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000]}}, "labels":{"ray.io/node_id":"94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3",} is_draining: 0 is_idle: 1 Cluster resources: node id: -2729954820020547849{"total":{object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, GPU: 20000, node:__internal_head__: 10000, accelerator_type:A40: 10000, memory: 844966629380000}}, "available": {GPU: 20000, node:__internal_head__: 10000, memory: 844966629380000, accelerator_type:A40: 10000, object_store_memory: 21474836480000, node:192.168.0.2: 10000, CPU: 200000}}, "labels":{"ray.io/node_id":"94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} [state-dump] Waiting tasks size: 0 [state-dump] Number of executing tasks: 0 [state-dump] Number of pinned task arguments: 0 [state-dump] Number of total spilled tasks: 0 [state-dump] Number of spilled waiting tasks: 0 [state-dump] Number of spilled unschedulable tasks: 0 [state-dump] Resource usage { [state-dump] } [state-dump] Backlog Size per scheduling descriptor :{workerId: num backlogs}: [state-dump] [state-dump] Running tasks by scheduling class: [state-dump] ================================================== [state-dump] [state-dump] ClusterResources: [state-dump] LocalObjectManager: [state-dump] - num pinned objects: 0 [state-dump] - pinned objects size: 0 [state-dump] - num objects pending restore: 0 [state-dump] - num objects pending spill: 0 [state-dump] - num bytes pending spill: 0 [state-dump] - num bytes currently spilled: 0 [state-dump] - cumulative spill requests: 0 [state-dump] - cumulative restore requests: 0 [state-dump] - spilled objects pending delete: 0 [state-dump] [state-dump] ObjectManager: [state-dump] - num local objects: 0 [state-dump] - num unfulfilled push requests: 0 [state-dump] - num object pull requests: 0 [state-dump] - num chunks received total: 0 [state-dump] - num chunks received failed (all): 0 [state-dump] - num chunks received failed / cancelled: 0 [state-dump] - num chunks received failed / plasma error: 0 [state-dump] Event stats: [state-dump] Global stats: 0 total (0 active) [state-dump] Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Execution time: mean = -nan s, total = 0.000 s [state-dump] Event stats: [state-dump] PushManager: [state-dump] - num pushes in flight: 0 [state-dump] - num chunks in flight: 0 [state-dump] - num chunks remaining: 0 [state-dump] - max chunks allowed: 409 [state-dump] OwnershipBasedObjectDirectory: [state-dump] - num listeners: 0 [state-dump] - cumulative location updates: 0 [state-dump] - num location updates per second: 0.000 [state-dump] - num location lookups per second: 0.000 [state-dump] - num locations added per second: 0.000 [state-dump] - num locations removed per second: 0.000 [state-dump] BufferPool: [state-dump] - create buffer state map size: 0 [state-dump] PullManager: [state-dump] - num bytes available for pulled objects: 2147483648 [state-dump] - num bytes being pulled (all): 0 [state-dump] - num bytes being pulled / pinned: 0 [state-dump] - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - first get request bundle: N/A [state-dump] - first wait request bundle: N/A [state-dump] - first task request bundle: N/A [state-dump] - num objects queued: 0 [state-dump] - num objects actively pulled (all): 0 [state-dump] - num objects actively pulled / pinned: 0 [state-dump] - num bundles being pulled: 0 [state-dump] - num pull retries: 0 [state-dump] - max timeout seconds: 0 [state-dump] - max timeout request is already processed. No entry. [state-dump] [state-dump] WorkerPool: [state-dump] - registered jobs: 1 [state-dump] - process_failed_job_config_missing: 0 [state-dump] - process_failed_rate_limited: 0 [state-dump] - process_failed_pending_registration: 0 [state-dump] - process_failed_runtime_env_setup_failed: 0 [state-dump] - num PYTHON workers: 20 [state-dump] - num PYTHON drivers: 1 [state-dump] - num PYTHON pending start requests: 0 [state-dump] - num PYTHON pending registration requests: 0 [state-dump] - num object spill callbacks queued: 0 [state-dump] - num object restore queued: 0 [state-dump] - num util functions queued: 0 [state-dump] - num idle workers: 20 [state-dump] TaskDependencyManager: [state-dump] - task deps map size: 0 [state-dump] - get req map size: 0 [state-dump] - wait req map size: 0 [state-dump] - local objects map size: 0 [state-dump] WaitManager: [state-dump] - num active wait requests: 0 [state-dump] Subscriber: [state-dump] Channel WORKER_OBJECT_LOCATIONS_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_OBJECT_EVICTION [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_REF_REMOVED_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] num async plasma notifications: 0 [state-dump] Remote node managers: [state-dump] Event stats: [state-dump] Global stats: 146861 total (35 active) [state-dump] Queueing time: mean = 505.989 us, max = 59.826 s, min = -0.000 s, total = 74.310 s [state-dump] Execution time: mean = 8.341 ms, total = 1224.955 s [state-dump] Event stats: [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog - 35259 total (0 active), Execution time: mean = 537.965 us, total = 18.968 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 35259 total (0 active), Execution time: mean = 36.107 us, total = 1.273 s, Queueing time: mean = 111.848 us, max = 2.841 ms, min = 2.750 us, total = 3.944 s [state-dump] ObjectManager.UpdateAvailableMemory - 16784 total (0 active), Execution time: mean = 6.117 us, total = 102.676 ms, Queueing time: mean = 109.419 us, max = 1.076 ms, min = 2.228 us, total = 1.836 s [state-dump] NodeManager.CheckGC - 16784 total (1 active), Execution time: mean = 2.857 us, total = 47.948 ms, Queueing time: mean = 99.151 us, max = 25.875 ms, min = 3.123 us, total = 1.664 s [state-dump] RaySyncer.OnDemandBroadcasting - 16784 total (1 active), Execution time: mean = 10.713 us, total = 179.811 ms, Queueing time: mean = 92.195 us, max = 25.869 ms, min = 8.560 us, total = 1.547 s [state-dump] RayletWorkerPool.deadline_timer.kill_idle_workers - 8397 total (1 active), Execution time: mean = 18.516 us, total = 155.477 ms, Queueing time: mean = 77.755 us, max = 26.386 ms, min = -0.000 s, total = 652.907 ms [state-dump] MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 6707 total (1 active), Execution time: mean = 453.904 us, total = 3.044 s, Queueing time: mean = 74.806 us, max = 3.532 ms, min = -0.000 s, total = 501.723 ms [state-dump] NodeManager.deadline_timer.spill_objects_when_over_threshold - 1680 total (1 active), Execution time: mean = 3.132 us, total = 5.261 ms, Queueing time: mean = 178.791 us, max = 2.222 ms, min = 5.004 us, total = 300.369 ms [state-dump] NodeManager.deadline_timer.flush_free_objects - 1680 total (1 active), Execution time: mean = 9.003 us, total = 15.125 ms, Queueing time: mean = 174.793 us, max = 2.212 ms, min = 6.450 us, total = 293.652 ms [state-dump] NodeManager.ScheduleAndDispatchTasks - 1680 total (1 active), Execution time: mean = 16.443 us, total = 27.624 ms, Queueing time: mean = 75.537 us, max = 2.581 ms, min = 10.666 us, total = 126.902 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 1679 total (0 active), Execution time: mean = 98.383 us, total = 165.185 ms, Queueing time: mean = 114.776 us, max = 2.934 ms, min = 4.027 us, total = 192.710 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad - 1679 total (0 active), Execution time: mean = 618.353 us, total = 1.038 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ClusterResourceManager.ResetRemoteNodeView - 561 total (1 active), Execution time: mean = 11.265 us, total = 6.320 ms, Queueing time: mean = 77.053 us, max = 442.307 us, min = 14.912 us, total = 43.227 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 336 total (0 active), Execution time: mean = 1.537 ms, total = 516.336 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.deadline_timer.record_metrics - 336 total (1 active), Execution time: mean = 572.372 us, total = 192.317 ms, Queueing time: mean = 334.787 us, max = 1.499 ms, min = 11.295 us, total = 112.488 ms [state-dump] NodeManager.GcsCheckAlive - 336 total (1 active), Execution time: mean = 291.988 us, total = 98.108 ms, Queueing time: mean = 613.996 us, max = 1.922 ms, min = 6.690 us, total = 206.303 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 336 total (0 active), Execution time: mean = 53.157 us, total = 17.861 ms, Queueing time: mean = 121.007 us, max = 4.779 ms, min = 11.561 us, total = 40.658 ms [state-dump] NodeManager.deadline_timer.debug_state_dump - 168 total (1 active), Execution time: mean = 1.740 ms, total = 292.361 ms, Queueing time: mean = 67.758 us, max = 163.431 us, min = 15.806 us, total = 11.383 ms [state-dump] ClientConnection.async_read.ProcessMessageHeader - 95 total (21 active), Execution time: mean = 7.620 us, total = 723.861 us, Queueing time: mean = 660.815 ms, max = 59.826 s, min = 29.537 us, total = 62.777 s [state-dump] ClientConnection.async_read.ProcessMessage - 74 total (0 active), Execution time: mean = 752.373 us, total = 55.676 ms, Queueing time: mean = 18.898 us, max = 247.615 us, min = 4.453 us, total = 1.398 ms [state-dump] NodeManager.deadline_timer.print_event_loop_stats - 28 total (1 active, 1 running), Execution time: mean = 2.541 ms, total = 71.150 ms, Queueing time: mean = 73.136 us, max = 215.454 us, min = 15.835 us, total = 2.048 ms [state-dump] ClientConnection.async_write.DoAsyncWrites - 22 total (0 active), Execution time: mean = 2.129 us, total = 46.834 us, Queueing time: mean = 43.475 us, max = 264.443 us, min = 17.727 us, total = 956.459 us [state-dump] ObjectManager.ObjectDeleted - 21 total (0 active), Execution time: mean = 18.654 us, total = 391.728 us, Queueing time: mean = 66.932 us, max = 269.040 us, min = 24.280 us, total = 1.406 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 21 total (0 active), Execution time: mean = 119.783 us, total = 2.515 ms, Queueing time: mean = 119.709 us, max = 193.975 us, min = 26.529 us, total = 2.514 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig - 21 total (0 active), Execution time: mean = 800.451 us, total = 16.809 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ObjectManager.ObjectAdded - 21 total (0 active), Execution time: mean = 13.522 us, total = 283.961 us, Queueing time: mean = 115.200 us, max = 203.067 us, min = 15.853 us, total = 2.419 ms [state-dump] PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 204.482 us, total = 2.658 ms, Queueing time: mean = 3.109 ms, max = 9.915 ms, min = 28.359 us, total = 40.413 ms [state-dump] - 12 total (0 active), Execution time: mean = 1.109 us, total = 13.308 us, Queueing time: mean = 117.235 us, max = 207.332 us, min = 32.817 us, total = 1.407 ms [state-dump] RaySyncer.BroadcastMessage - 12 total (0 active), Execution time: mean = 229.790 us, total = 2.757 ms, Queueing time: mean = 796.500 ns, max = 1.330 us, min = 180.000 ns, total = 9.558 us [state-dump] NodeManagerService.grpc_server.RequestWorkerLease.HandleRequestImpl - 10 total (0 active), Execution time: mean = 133.400 us, total = 1.334 ms, Queueing time: mean = 133.654 us, max = 258.016 us, min = 50.206 us, total = 1.337 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker - 10 total (0 active), Execution time: mean = 698.937 us, total = 6.989 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.RequestWorkerLease - 10 total (0 active), Execution time: mean = 967.372 us, total = 9.674 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] WorkerPool.PopWorkerCallback - 10 total (0 active), Execution time: mean = 24.488 us, total = 244.878 us, Queueing time: mean = 114.997 us, max = 206.869 us, min = 23.653 us, total = 1.150 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker.HandleRequestImpl - 10 total (0 active), Execution time: mean = 129.906 us, total = 1.299 ms, Queueing time: mean = 109.463 us, max = 137.712 us, min = 31.332 us, total = 1.095 ms [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 4 total (1 active), Execution time: mean = 299.400 s, total = 1197.600 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 3 total (0 active), Execution time: mean = 294.453 us, total = 883.360 us, Queueing time: mean = 76.688 us, max = 160.959 us, min = 20.299 us, total = 230.064 us [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 1.320 ms, total = 2.640 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.GCTaskFailureReason - 2 total (1 active), Execution time: mean = 4.207 us, total = 8.414 us, Queueing time: mean = 41.935 us, max = 83.871 us, min = 83.871 us, total = 83.871 us [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 142.119 us, total = 284.237 us, Queueing time: mean = 702.200 us, max = 1.287 ms, min = 117.240 us, total = 1.404 ms [state-dump] RaySyncerRegister - 2 total (0 active), Execution time: mean = 2.053 us, total = 4.106 us, Queueing time: mean = 222.000 ns, max = 369.000 ns, min = 75.000 ns, total = 444.000 ns [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 2.178 ms, total = 2.178 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 1.022 s, total = 1.022 s, Queueing time: mean = 126.779 us, max = 126.779 us, min = 126.779 us, total = 126.779 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 43.810 us, total = 43.810 us, Queueing time: mean = 116.080 us, max = 116.080 us, min = 116.080 us, total = 116.080 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 1.935 ms, total = 1.935 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 2.002 ms, total = 2.002 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 1.278 ms, total = 1.278 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 288.995 us, total = 288.995 us, Queueing time: mean = 122.868 us, max = 122.868 us, min = 122.868 us, total = 122.868 us [state-dump] Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 74.750 us, total = 74.750 us, Queueing time: mean = 225.429 us, max = 225.429 us, min = 225.429 us, total = 225.429 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 242.297 us, total = 242.297 us, Queueing time: mean = 122.977 us, max = 122.977 us, min = 122.977 us, total = 122.977 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 92.080 us, total = 92.080 us, Queueing time: mean = 36.492 us, max = 36.492 us, min = 36.492 us, total = 36.492 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 1.746 ms, total = 1.746 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] DebugString() time ms: 2 [state-dump] [state-dump] [2025-01-20 23:30:54,911 I 13636 13664] (raylet) store.cc:564: Plasma store debug dump: Current usage: 0 / 2.14748 GB - num bytes created total: 168 0 pending objects of total size 0MB - objects spillable: 0 - bytes spillable: 0 - objects unsealed: 0 - bytes unsealed: 0 - objects in use: 0 - bytes in use: 0 - objects evictable: 0 - bytes evictable: 0 - objects created by worker: 0 - bytes created by worker: 0 - objects restored: 0 - bytes restored: 0 - objects received: 0 - bytes received: 0 - objects errored: 0 - bytes errored: 0 [2025-01-20 23:30:56,003 I 13636 13636] (raylet) node_manager.cc:525: [state-dump] NodeManager: [state-dump] Node ID: 94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3 [state-dump] Node name: 192.168.0.2 [state-dump] InitialConfigResources: {object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, GPU: 20000, node:__internal_head__: 10000, accelerator_type:A40: 10000, memory: 844966629380000} [state-dump] ClusterTaskManager: [state-dump] ========== Node: 94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3 ================= [state-dump] Infeasible queue length: 0 [state-dump] Schedule queue length: 0 [state-dump] Dispatch queue length: 0 [state-dump] num_waiting_for_resource: 0 [state-dump] num_waiting_for_plasma_memory: 0 [state-dump] num_waiting_for_remote_node_resources: 0 [state-dump] num_worker_not_started_by_job_config_not_exist: 0 [state-dump] num_worker_not_started_by_registration_timeout: 0 [state-dump] num_tasks_waiting_for_workers: 0 [state-dump] num_cancelled_tasks: 0 [state-dump] cluster_resource_scheduler state: [state-dump] Local id: -2729954820020547849 Local resources: {"total":{CPU: [200000], memory: [844966629380000], accelerator_type:A40: [10000], node:192.168.0.2: [10000], node:__internal_head__: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000]}}, "available": {CPU: [200000], memory: [844966629380000], accelerator_type:A40: [10000], node:192.168.0.2: [10000], node:__internal_head__: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000]}}, "labels":{"ray.io/node_id":"94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3",} is_draining: 0 is_idle: 1 Cluster resources: node id: -2729954820020547849{"total":{object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, GPU: 20000, node:__internal_head__: 10000, accelerator_type:A40: 10000, memory: 844966629380000}}, "available": {GPU: 20000, node:__internal_head__: 10000, memory: 844966629380000, accelerator_type:A40: 10000, object_store_memory: 21474836480000, node:192.168.0.2: 10000, CPU: 200000}}, "labels":{"ray.io/node_id":"94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} [state-dump] Waiting tasks size: 0 [state-dump] Number of executing tasks: 0 [state-dump] Number of pinned task arguments: 0 [state-dump] Number of total spilled tasks: 0 [state-dump] Number of spilled waiting tasks: 0 [state-dump] Number of spilled unschedulable tasks: 0 [state-dump] Resource usage { [state-dump] } [state-dump] Backlog Size per scheduling descriptor :{workerId: num backlogs}: [state-dump] [state-dump] Running tasks by scheduling class: [state-dump] ================================================== [state-dump] [state-dump] ClusterResources: [state-dump] LocalObjectManager: [state-dump] - num pinned objects: 0 [state-dump] - pinned objects size: 0 [state-dump] - num objects pending restore: 0 [state-dump] - num objects pending spill: 0 [state-dump] - num bytes pending spill: 0 [state-dump] - num bytes currently spilled: 0 [state-dump] - cumulative spill requests: 0 [state-dump] - cumulative restore requests: 0 [state-dump] - spilled objects pending delete: 0 [state-dump] [state-dump] ObjectManager: [state-dump] - num local objects: 0 [state-dump] - num unfulfilled push requests: 0 [state-dump] - num object pull requests: 0 [state-dump] - num chunks received total: 0 [state-dump] - num chunks received failed (all): 0 [state-dump] - num chunks received failed / cancelled: 0 [state-dump] - num chunks received failed / plasma error: 0 [state-dump] Event stats: [state-dump] Global stats: 0 total (0 active) [state-dump] Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Execution time: mean = -nan s, total = 0.000 s [state-dump] Event stats: [state-dump] PushManager: [state-dump] - num pushes in flight: 0 [state-dump] - num chunks in flight: 0 [state-dump] - num chunks remaining: 0 [state-dump] - max chunks allowed: 409 [state-dump] OwnershipBasedObjectDirectory: [state-dump] - num listeners: 0 [state-dump] - cumulative location updates: 0 [state-dump] - num location updates per second: 0.000 [state-dump] - num location lookups per second: 0.000 [state-dump] - num locations added per second: 0.000 [state-dump] - num locations removed per second: 0.000 [state-dump] BufferPool: [state-dump] - create buffer state map size: 0 [state-dump] PullManager: [state-dump] - num bytes available for pulled objects: 2147483648 [state-dump] - num bytes being pulled (all): 0 [state-dump] - num bytes being pulled / pinned: 0 [state-dump] - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - first get request bundle: N/A [state-dump] - first wait request bundle: N/A [state-dump] - first task request bundle: N/A [state-dump] - num objects queued: 0 [state-dump] - num objects actively pulled (all): 0 [state-dump] - num objects actively pulled / pinned: 0 [state-dump] - num bundles being pulled: 0 [state-dump] - num pull retries: 0 [state-dump] - max timeout seconds: 0 [state-dump] - max timeout request is already processed. No entry. [state-dump] [state-dump] WorkerPool: [state-dump] - registered jobs: 1 [state-dump] - process_failed_job_config_missing: 0 [state-dump] - process_failed_rate_limited: 0 [state-dump] - process_failed_pending_registration: 0 [state-dump] - process_failed_runtime_env_setup_failed: 0 [state-dump] - num PYTHON workers: 20 [state-dump] - num PYTHON drivers: 1 [state-dump] - num PYTHON pending start requests: 0 [state-dump] - num PYTHON pending registration requests: 0 [state-dump] - num object spill callbacks queued: 0 [state-dump] - num object restore queued: 0 [state-dump] - num util functions queued: 0 [state-dump] - num idle workers: 20 [state-dump] TaskDependencyManager: [state-dump] - task deps map size: 0 [state-dump] - get req map size: 0 [state-dump] - wait req map size: 0 [state-dump] - local objects map size: 0 [state-dump] WaitManager: [state-dump] - num active wait requests: 0 [state-dump] Subscriber: [state-dump] Channel WORKER_OBJECT_LOCATIONS_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_OBJECT_EVICTION [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_REF_REMOVED_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] num async plasma notifications: 0 [state-dump] Remote node managers: [state-dump] Event stats: [state-dump] Global stats: 152095 total (35 active) [state-dump] Queueing time: mean = 491.260 us, max = 59.826 s, min = -0.000 s, total = 74.718 s [state-dump] Execution time: mean = 8.060 ms, total = 1225.907 s [state-dump] Event stats: [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog - 36519 total (0 active), Execution time: mean = 538.359 us, total = 19.660 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 36519 total (0 active), Execution time: mean = 36.057 us, total = 1.317 s, Queueing time: mean = 111.796 us, max = 2.841 ms, min = 2.750 us, total = 4.083 s [state-dump] ObjectManager.UpdateAvailableMemory - 17384 total (0 active), Execution time: mean = 6.116 us, total = 106.323 ms, Queueing time: mean = 109.556 us, max = 1.076 ms, min = 2.228 us, total = 1.905 s [state-dump] NodeManager.CheckGC - 17384 total (1 active), Execution time: mean = 2.850 us, total = 49.551 ms, Queueing time: mean = 99.132 us, max = 25.875 ms, min = 3.123 us, total = 1.723 s [state-dump] RaySyncer.OnDemandBroadcasting - 17384 total (1 active), Execution time: mean = 10.685 us, total = 185.744 ms, Queueing time: mean = 92.196 us, max = 25.869 ms, min = 8.560 us, total = 1.603 s [state-dump] RayletWorkerPool.deadline_timer.kill_idle_workers - 8697 total (1 active), Execution time: mean = 18.498 us, total = 160.878 ms, Queueing time: mean = 77.719 us, max = 26.386 ms, min = -0.000 s, total = 675.923 ms [state-dump] MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 6946 total (1 active), Execution time: mean = 453.971 us, total = 3.153 s, Queueing time: mean = 74.754 us, max = 3.532 ms, min = -0.000 s, total = 519.243 ms [state-dump] NodeManager.deadline_timer.spill_objects_when_over_threshold - 1740 total (1 active), Execution time: mean = 3.136 us, total = 5.457 ms, Queueing time: mean = 178.628 us, max = 2.222 ms, min = 3.958 us, total = 310.812 ms [state-dump] NodeManager.deadline_timer.flush_free_objects - 1740 total (1 active), Execution time: mean = 9.002 us, total = 15.664 ms, Queueing time: mean = 174.632 us, max = 2.212 ms, min = 6.450 us, total = 303.860 ms [state-dump] NodeManager.ScheduleAndDispatchTasks - 1740 total (1 active), Execution time: mean = 16.427 us, total = 28.583 ms, Queueing time: mean = 75.457 us, max = 2.581 ms, min = 10.666 us, total = 131.294 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 1739 total (0 active), Execution time: mean = 98.339 us, total = 171.011 ms, Queueing time: mean = 114.680 us, max = 2.934 ms, min = 4.027 us, total = 199.429 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad - 1739 total (0 active), Execution time: mean = 619.202 us, total = 1.077 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ClusterResourceManager.ResetRemoteNodeView - 581 total (1 active), Execution time: mean = 11.172 us, total = 6.491 ms, Queueing time: mean = 76.908 us, max = 442.307 us, min = 14.912 us, total = 44.684 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 348 total (0 active), Execution time: mean = 1.542 ms, total = 536.658 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.deadline_timer.record_metrics - 348 total (1 active), Execution time: mean = 571.225 us, total = 198.786 ms, Queueing time: mean = 335.448 us, max = 1.499 ms, min = 10.616 us, total = 116.736 ms [state-dump] NodeManager.GcsCheckAlive - 348 total (1 active), Execution time: mean = 292.147 us, total = 101.667 ms, Queueing time: mean = 613.276 us, max = 1.922 ms, min = 6.690 us, total = 213.420 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 348 total (0 active), Execution time: mean = 53.105 us, total = 18.481 ms, Queueing time: mean = 120.075 us, max = 4.779 ms, min = 11.561 us, total = 41.786 ms [state-dump] NodeManager.deadline_timer.debug_state_dump - 174 total (1 active), Execution time: mean = 1.741 ms, total = 302.873 ms, Queueing time: mean = 67.462 us, max = 163.431 us, min = 13.687 us, total = 11.738 ms [state-dump] ClientConnection.async_read.ProcessMessageHeader - 95 total (21 active), Execution time: mean = 7.620 us, total = 723.861 us, Queueing time: mean = 660.815 ms, max = 59.826 s, min = 29.537 us, total = 62.777 s [state-dump] ClientConnection.async_read.ProcessMessage - 74 total (0 active), Execution time: mean = 752.373 us, total = 55.676 ms, Queueing time: mean = 18.898 us, max = 247.615 us, min = 4.453 us, total = 1.398 ms [state-dump] NodeManager.deadline_timer.print_event_loop_stats - 29 total (1 active, 1 running), Execution time: mean = 2.554 ms, total = 74.075 ms, Queueing time: mean = 72.774 us, max = 215.454 us, min = 15.835 us, total = 2.110 ms [state-dump] ClientConnection.async_write.DoAsyncWrites - 22 total (0 active), Execution time: mean = 2.129 us, total = 46.834 us, Queueing time: mean = 43.475 us, max = 264.443 us, min = 17.727 us, total = 956.459 us [state-dump] ObjectManager.ObjectDeleted - 21 total (0 active), Execution time: mean = 18.654 us, total = 391.728 us, Queueing time: mean = 66.932 us, max = 269.040 us, min = 24.280 us, total = 1.406 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 21 total (0 active), Execution time: mean = 119.783 us, total = 2.515 ms, Queueing time: mean = 119.709 us, max = 193.975 us, min = 26.529 us, total = 2.514 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig - 21 total (0 active), Execution time: mean = 800.451 us, total = 16.809 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ObjectManager.ObjectAdded - 21 total (0 active), Execution time: mean = 13.522 us, total = 283.961 us, Queueing time: mean = 115.200 us, max = 203.067 us, min = 15.853 us, total = 2.419 ms [state-dump] PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 204.482 us, total = 2.658 ms, Queueing time: mean = 3.109 ms, max = 9.915 ms, min = 28.359 us, total = 40.413 ms [state-dump] - 12 total (0 active), Execution time: mean = 1.109 us, total = 13.308 us, Queueing time: mean = 117.235 us, max = 207.332 us, min = 32.817 us, total = 1.407 ms [state-dump] RaySyncer.BroadcastMessage - 12 total (0 active), Execution time: mean = 229.790 us, total = 2.757 ms, Queueing time: mean = 796.500 ns, max = 1.330 us, min = 180.000 ns, total = 9.558 us [state-dump] NodeManagerService.grpc_server.RequestWorkerLease.HandleRequestImpl - 10 total (0 active), Execution time: mean = 133.400 us, total = 1.334 ms, Queueing time: mean = 133.654 us, max = 258.016 us, min = 50.206 us, total = 1.337 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker - 10 total (0 active), Execution time: mean = 698.937 us, total = 6.989 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.RequestWorkerLease - 10 total (0 active), Execution time: mean = 967.372 us, total = 9.674 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] WorkerPool.PopWorkerCallback - 10 total (0 active), Execution time: mean = 24.488 us, total = 244.878 us, Queueing time: mean = 114.997 us, max = 206.869 us, min = 23.653 us, total = 1.150 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker.HandleRequestImpl - 10 total (0 active), Execution time: mean = 129.906 us, total = 1.299 ms, Queueing time: mean = 109.463 us, max = 137.712 us, min = 31.332 us, total = 1.095 ms [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 4 total (1 active), Execution time: mean = 299.400 s, total = 1197.600 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 3 total (0 active), Execution time: mean = 294.453 us, total = 883.360 us, Queueing time: mean = 76.688 us, max = 160.959 us, min = 20.299 us, total = 230.064 us [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 1.320 ms, total = 2.640 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.GCTaskFailureReason - 2 total (1 active), Execution time: mean = 4.207 us, total = 8.414 us, Queueing time: mean = 41.935 us, max = 83.871 us, min = 83.871 us, total = 83.871 us [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 142.119 us, total = 284.237 us, Queueing time: mean = 702.200 us, max = 1.287 ms, min = 117.240 us, total = 1.404 ms [state-dump] RaySyncerRegister - 2 total (0 active), Execution time: mean = 2.053 us, total = 4.106 us, Queueing time: mean = 222.000 ns, max = 369.000 ns, min = 75.000 ns, total = 444.000 ns [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 2.178 ms, total = 2.178 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 1.022 s, total = 1.022 s, Queueing time: mean = 126.779 us, max = 126.779 us, min = 126.779 us, total = 126.779 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 43.810 us, total = 43.810 us, Queueing time: mean = 116.080 us, max = 116.080 us, min = 116.080 us, total = 116.080 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 1.935 ms, total = 1.935 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 2.002 ms, total = 2.002 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 1.278 ms, total = 1.278 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 288.995 us, total = 288.995 us, Queueing time: mean = 122.868 us, max = 122.868 us, min = 122.868 us, total = 122.868 us [state-dump] Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 74.750 us, total = 74.750 us, Queueing time: mean = 225.429 us, max = 225.429 us, min = 225.429 us, total = 225.429 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 242.297 us, total = 242.297 us, Queueing time: mean = 122.977 us, max = 122.977 us, min = 122.977 us, total = 122.977 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 92.080 us, total = 92.080 us, Queueing time: mean = 36.492 us, max = 36.492 us, min = 36.492 us, total = 36.492 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 1.746 ms, total = 1.746 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] DebugString() time ms: 2 [state-dump] [state-dump] [2025-01-20 23:31:54,911 I 13636 13664] (raylet) store.cc:564: Plasma store debug dump: Current usage: 0 / 2.14748 GB - num bytes created total: 168 0 pending objects of total size 0MB - objects spillable: 0 - bytes spillable: 0 - objects unsealed: 0 - bytes unsealed: 0 - objects in use: 0 - bytes in use: 0 - objects evictable: 0 - bytes evictable: 0 - objects created by worker: 0 - bytes created by worker: 0 - objects restored: 0 - bytes restored: 0 - objects received: 0 - bytes received: 0 - objects errored: 0 - bytes errored: 0 [2025-01-20 23:31:56,006 I 13636 13636] (raylet) node_manager.cc:525: [state-dump] NodeManager: [state-dump] Node ID: 94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3 [state-dump] Node name: 192.168.0.2 [state-dump] InitialConfigResources: {object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, GPU: 20000, node:__internal_head__: 10000, accelerator_type:A40: 10000, memory: 844966629380000} [state-dump] ClusterTaskManager: [state-dump] ========== Node: 94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3 ================= [state-dump] Infeasible queue length: 0 [state-dump] Schedule queue length: 0 [state-dump] Dispatch queue length: 0 [state-dump] num_waiting_for_resource: 0 [state-dump] num_waiting_for_plasma_memory: 0 [state-dump] num_waiting_for_remote_node_resources: 0 [state-dump] num_worker_not_started_by_job_config_not_exist: 0 [state-dump] num_worker_not_started_by_registration_timeout: 0 [state-dump] num_tasks_waiting_for_workers: 0 [state-dump] num_cancelled_tasks: 0 [state-dump] cluster_resource_scheduler state: [state-dump] Local id: -2729954820020547849 Local resources: {"total":{CPU: [200000], memory: [844966629380000], accelerator_type:A40: [10000], node:192.168.0.2: [10000], node:__internal_head__: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000]}}, "available": {CPU: [200000], memory: [844966629380000], accelerator_type:A40: [10000], node:192.168.0.2: [10000], node:__internal_head__: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000]}}, "labels":{"ray.io/node_id":"94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3",} is_draining: 0 is_idle: 1 Cluster resources: node id: -2729954820020547849{"total":{object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, GPU: 20000, node:__internal_head__: 10000, accelerator_type:A40: 10000, memory: 844966629380000}}, "available": {GPU: 20000, node:__internal_head__: 10000, memory: 844966629380000, accelerator_type:A40: 10000, object_store_memory: 21474836480000, node:192.168.0.2: 10000, CPU: 200000}}, "labels":{"ray.io/node_id":"94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} [state-dump] Waiting tasks size: 0 [state-dump] Number of executing tasks: 0 [state-dump] Number of pinned task arguments: 0 [state-dump] Number of total spilled tasks: 0 [state-dump] Number of spilled waiting tasks: 0 [state-dump] Number of spilled unschedulable tasks: 0 [state-dump] Resource usage { [state-dump] } [state-dump] Backlog Size per scheduling descriptor :{workerId: num backlogs}: [state-dump] [state-dump] Running tasks by scheduling class: [state-dump] ================================================== [state-dump] [state-dump] ClusterResources: [state-dump] LocalObjectManager: [state-dump] - num pinned objects: 0 [state-dump] - pinned objects size: 0 [state-dump] - num objects pending restore: 0 [state-dump] - num objects pending spill: 0 [state-dump] - num bytes pending spill: 0 [state-dump] - num bytes currently spilled: 0 [state-dump] - cumulative spill requests: 0 [state-dump] - cumulative restore requests: 0 [state-dump] - spilled objects pending delete: 0 [state-dump] [state-dump] ObjectManager: [state-dump] - num local objects: 0 [state-dump] - num unfulfilled push requests: 0 [state-dump] - num object pull requests: 0 [state-dump] - num chunks received total: 0 [state-dump] - num chunks received failed (all): 0 [state-dump] - num chunks received failed / cancelled: 0 [state-dump] - num chunks received failed / plasma error: 0 [state-dump] Event stats: [state-dump] Global stats: 0 total (0 active) [state-dump] Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Execution time: mean = -nan s, total = 0.000 s [state-dump] Event stats: [state-dump] PushManager: [state-dump] - num pushes in flight: 0 [state-dump] - num chunks in flight: 0 [state-dump] - num chunks remaining: 0 [state-dump] - max chunks allowed: 409 [state-dump] OwnershipBasedObjectDirectory: [state-dump] - num listeners: 0 [state-dump] - cumulative location updates: 0 [state-dump] - num location updates per second: 0.000 [state-dump] - num location lookups per second: 0.000 [state-dump] - num locations added per second: 0.000 [state-dump] - num locations removed per second: 0.000 [state-dump] BufferPool: [state-dump] - create buffer state map size: 0 [state-dump] PullManager: [state-dump] - num bytes available for pulled objects: 2147483648 [state-dump] - num bytes being pulled (all): 0 [state-dump] - num bytes being pulled / pinned: 0 [state-dump] - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - first get request bundle: N/A [state-dump] - first wait request bundle: N/A [state-dump] - first task request bundle: N/A [state-dump] - num objects queued: 0 [state-dump] - num objects actively pulled (all): 0 [state-dump] - num objects actively pulled / pinned: 0 [state-dump] - num bundles being pulled: 0 [state-dump] - num pull retries: 0 [state-dump] - max timeout seconds: 0 [state-dump] - max timeout request is already processed. No entry. [state-dump] [state-dump] WorkerPool: [state-dump] - registered jobs: 1 [state-dump] - process_failed_job_config_missing: 0 [state-dump] - process_failed_rate_limited: 0 [state-dump] - process_failed_pending_registration: 0 [state-dump] - process_failed_runtime_env_setup_failed: 0 [state-dump] - num PYTHON workers: 20 [state-dump] - num PYTHON drivers: 1 [state-dump] - num PYTHON pending start requests: 0 [state-dump] - num PYTHON pending registration requests: 0 [state-dump] - num object spill callbacks queued: 0 [state-dump] - num object restore queued: 0 [state-dump] - num util functions queued: 0 [state-dump] - num idle workers: 20 [state-dump] TaskDependencyManager: [state-dump] - task deps map size: 0 [state-dump] - get req map size: 0 [state-dump] - wait req map size: 0 [state-dump] - local objects map size: 0 [state-dump] WaitManager: [state-dump] - num active wait requests: 0 [state-dump] Subscriber: [state-dump] Channel WORKER_OBJECT_LOCATIONS_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_OBJECT_EVICTION [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_REF_REMOVED_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] num async plasma notifications: 0 [state-dump] Remote node managers: [state-dump] Event stats: [state-dump] Global stats: 157330 total (35 active) [state-dump] Queueing time: mean = 477.676 us, max = 59.826 s, min = -0.000 s, total = 75.153 s [state-dump] Execution time: mean = 11.612 ms, total = 1826.879 s [state-dump] Event stats: [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog - 37779 total (0 active), Execution time: mean = 539.052 us, total = 20.365 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 37779 total (0 active), Execution time: mean = 36.064 us, total = 1.362 s, Queueing time: mean = 112.145 us, max = 2.841 ms, min = 2.750 us, total = 4.237 s [state-dump] ObjectManager.UpdateAvailableMemory - 17983 total (0 active), Execution time: mean = 6.128 us, total = 110.203 ms, Queueing time: mean = 109.689 us, max = 1.076 ms, min = 2.228 us, total = 1.973 s [state-dump] NodeManager.CheckGC - 17983 total (1 active), Execution time: mean = 2.852 us, total = 51.289 ms, Queueing time: mean = 99.320 us, max = 25.875 ms, min = 3.123 us, total = 1.786 s [state-dump] RaySyncer.OnDemandBroadcasting - 17983 total (1 active), Execution time: mean = 10.716 us, total = 192.700 ms, Queueing time: mean = 92.357 us, max = 25.869 ms, min = 8.560 us, total = 1.661 s [state-dump] RayletWorkerPool.deadline_timer.kill_idle_workers - 8997 total (1 active), Execution time: mean = 18.525 us, total = 166.667 ms, Queueing time: mean = 77.643 us, max = 26.386 ms, min = -0.000 s, total = 698.557 ms [state-dump] MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 7186 total (1 active), Execution time: mean = 454.244 us, total = 3.264 s, Queueing time: mean = 75.089 us, max = 3.532 ms, min = -0.000 s, total = 539.589 ms [state-dump] NodeManager.deadline_timer.spill_objects_when_over_threshold - 1800 total (1 active), Execution time: mean = 3.136 us, total = 5.644 ms, Queueing time: mean = 178.958 us, max = 2.222 ms, min = 3.958 us, total = 322.124 ms [state-dump] NodeManager.deadline_timer.flush_free_objects - 1800 total (1 active), Execution time: mean = 9.036 us, total = 16.265 ms, Queueing time: mean = 174.935 us, max = 2.212 ms, min = 6.450 us, total = 314.884 ms [state-dump] NodeManager.ScheduleAndDispatchTasks - 1800 total (1 active), Execution time: mean = 16.528 us, total = 29.750 ms, Queueing time: mean = 75.540 us, max = 2.581 ms, min = 10.666 us, total = 135.972 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 1799 total (0 active), Execution time: mean = 98.357 us, total = 176.944 ms, Queueing time: mean = 114.452 us, max = 2.934 ms, min = 4.027 us, total = 205.900 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad - 1799 total (0 active), Execution time: mean = 618.279 us, total = 1.112 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ClusterResourceManager.ResetRemoteNodeView - 601 total (1 active), Execution time: mean = 11.159 us, total = 6.706 ms, Queueing time: mean = 77.500 us, max = 442.307 us, min = 14.912 us, total = 46.577 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 360 total (0 active), Execution time: mean = 1.549 ms, total = 557.565 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.deadline_timer.record_metrics - 360 total (1 active), Execution time: mean = 572.515 us, total = 206.106 ms, Queueing time: mean = 335.537 us, max = 1.499 ms, min = 10.616 us, total = 120.793 ms [state-dump] NodeManager.GcsCheckAlive - 360 total (1 active), Execution time: mean = 294.601 us, total = 106.056 ms, Queueing time: mean = 612.159 us, max = 1.922 ms, min = 6.690 us, total = 220.377 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 360 total (0 active), Execution time: mean = 53.281 us, total = 19.181 ms, Queueing time: mean = 120.090 us, max = 4.779 ms, min = 11.561 us, total = 43.232 ms [state-dump] NodeManager.deadline_timer.debug_state_dump - 180 total (1 active), Execution time: mean = 1.742 ms, total = 313.543 ms, Queueing time: mean = 67.755 us, max = 163.431 us, min = 13.687 us, total = 12.196 ms [state-dump] ClientConnection.async_read.ProcessMessageHeader - 95 total (21 active), Execution time: mean = 7.620 us, total = 723.861 us, Queueing time: mean = 660.815 ms, max = 59.826 s, min = 29.537 us, total = 62.777 s [state-dump] ClientConnection.async_read.ProcessMessage - 74 total (0 active), Execution time: mean = 752.373 us, total = 55.676 ms, Queueing time: mean = 18.898 us, max = 247.615 us, min = 4.453 us, total = 1.398 ms [state-dump] NodeManager.deadline_timer.print_event_loop_stats - 30 total (1 active, 1 running), Execution time: mean = 2.557 ms, total = 76.718 ms, Queueing time: mean = 73.184 us, max = 215.454 us, min = 15.835 us, total = 2.196 ms [state-dump] ClientConnection.async_write.DoAsyncWrites - 22 total (0 active), Execution time: mean = 2.129 us, total = 46.834 us, Queueing time: mean = 43.475 us, max = 264.443 us, min = 17.727 us, total = 956.459 us [state-dump] ObjectManager.ObjectDeleted - 21 total (0 active), Execution time: mean = 18.654 us, total = 391.728 us, Queueing time: mean = 66.932 us, max = 269.040 us, min = 24.280 us, total = 1.406 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 21 total (0 active), Execution time: mean = 119.783 us, total = 2.515 ms, Queueing time: mean = 119.709 us, max = 193.975 us, min = 26.529 us, total = 2.514 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig - 21 total (0 active), Execution time: mean = 800.451 us, total = 16.809 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ObjectManager.ObjectAdded - 21 total (0 active), Execution time: mean = 13.522 us, total = 283.961 us, Queueing time: mean = 115.200 us, max = 203.067 us, min = 15.853 us, total = 2.419 ms [state-dump] PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 204.482 us, total = 2.658 ms, Queueing time: mean = 3.109 ms, max = 9.915 ms, min = 28.359 us, total = 40.413 ms [state-dump] - 12 total (0 active), Execution time: mean = 1.109 us, total = 13.308 us, Queueing time: mean = 117.235 us, max = 207.332 us, min = 32.817 us, total = 1.407 ms [state-dump] RaySyncer.BroadcastMessage - 12 total (0 active), Execution time: mean = 229.790 us, total = 2.757 ms, Queueing time: mean = 796.500 ns, max = 1.330 us, min = 180.000 ns, total = 9.558 us [state-dump] NodeManagerService.grpc_server.RequestWorkerLease.HandleRequestImpl - 10 total (0 active), Execution time: mean = 133.400 us, total = 1.334 ms, Queueing time: mean = 133.654 us, max = 258.016 us, min = 50.206 us, total = 1.337 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker - 10 total (0 active), Execution time: mean = 698.937 us, total = 6.989 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.RequestWorkerLease - 10 total (0 active), Execution time: mean = 967.372 us, total = 9.674 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] WorkerPool.PopWorkerCallback - 10 total (0 active), Execution time: mean = 24.488 us, total = 244.878 us, Queueing time: mean = 114.997 us, max = 206.869 us, min = 23.653 us, total = 1.150 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker.HandleRequestImpl - 10 total (0 active), Execution time: mean = 129.906 us, total = 1.299 ms, Queueing time: mean = 109.463 us, max = 137.712 us, min = 31.332 us, total = 1.095 ms [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 5 total (1 active), Execution time: mean = 359.520 s, total = 1797.602 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 4 total (0 active), Execution time: mean = 320.800 us, total = 1.283 ms, Queueing time: mean = 94.198 us, max = 160.959 us, min = 20.299 us, total = 376.792 us [state-dump] NodeManager.GCTaskFailureReason - 3 total (1 active), Execution time: mean = 6.632 us, total = 19.896 us, Queueing time: mean = 60.387 us, max = 97.290 us, min = 83.871 us, total = 181.161 us [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 1.320 ms, total = 2.640 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 142.119 us, total = 284.237 us, Queueing time: mean = 702.200 us, max = 1.287 ms, min = 117.240 us, total = 1.404 ms [state-dump] RaySyncerRegister - 2 total (0 active), Execution time: mean = 2.053 us, total = 4.106 us, Queueing time: mean = 222.000 ns, max = 369.000 ns, min = 75.000 ns, total = 444.000 ns [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 2.178 ms, total = 2.178 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 1.022 s, total = 1.022 s, Queueing time: mean = 126.779 us, max = 126.779 us, min = 126.779 us, total = 126.779 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 43.810 us, total = 43.810 us, Queueing time: mean = 116.080 us, max = 116.080 us, min = 116.080 us, total = 116.080 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 1.935 ms, total = 1.935 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 2.002 ms, total = 2.002 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 1.278 ms, total = 1.278 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 288.995 us, total = 288.995 us, Queueing time: mean = 122.868 us, max = 122.868 us, min = 122.868 us, total = 122.868 us [state-dump] Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 74.750 us, total = 74.750 us, Queueing time: mean = 225.429 us, max = 225.429 us, min = 225.429 us, total = 225.429 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 242.297 us, total = 242.297 us, Queueing time: mean = 122.977 us, max = 122.977 us, min = 122.977 us, total = 122.977 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 92.080 us, total = 92.080 us, Queueing time: mean = 36.492 us, max = 36.492 us, min = 36.492 us, total = 36.492 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 1.746 ms, total = 1.746 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] DebugString() time ms: 2 [state-dump] [state-dump] [2025-01-20 23:32:54,912 I 13636 13664] (raylet) store.cc:564: Plasma store debug dump: Current usage: 0 / 2.14748 GB - num bytes created total: 168 0 pending objects of total size 0MB - objects spillable: 0 - bytes spillable: 0 - objects unsealed: 0 - bytes unsealed: 0 - objects in use: 0 - bytes in use: 0 - objects evictable: 0 - bytes evictable: 0 - objects created by worker: 0 - bytes created by worker: 0 - objects restored: 0 - bytes restored: 0 - objects received: 0 - bytes received: 0 - objects errored: 0 - bytes errored: 0 [2025-01-20 23:32:56,009 I 13636 13636] (raylet) node_manager.cc:525: [state-dump] NodeManager: [state-dump] Node ID: 94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3 [state-dump] Node name: 192.168.0.2 [state-dump] InitialConfigResources: {object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, GPU: 20000, node:__internal_head__: 10000, accelerator_type:A40: 10000, memory: 844966629380000} [state-dump] ClusterTaskManager: [state-dump] ========== Node: 94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3 ================= [state-dump] Infeasible queue length: 0 [state-dump] Schedule queue length: 0 [state-dump] Dispatch queue length: 0 [state-dump] num_waiting_for_resource: 0 [state-dump] num_waiting_for_plasma_memory: 0 [state-dump] num_waiting_for_remote_node_resources: 0 [state-dump] num_worker_not_started_by_job_config_not_exist: 0 [state-dump] num_worker_not_started_by_registration_timeout: 0 [state-dump] num_tasks_waiting_for_workers: 0 [state-dump] num_cancelled_tasks: 0 [state-dump] cluster_resource_scheduler state: [state-dump] Local id: -2729954820020547849 Local resources: {"total":{CPU: [200000], memory: [844966629380000], accelerator_type:A40: [10000], node:192.168.0.2: [10000], node:__internal_head__: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000]}}, "available": {CPU: [200000], memory: [844966629380000], accelerator_type:A40: [10000], node:192.168.0.2: [10000], node:__internal_head__: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000]}}, "labels":{"ray.io/node_id":"94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3",} is_draining: 0 is_idle: 1 Cluster resources: node id: -2729954820020547849{"total":{object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, GPU: 20000, node:__internal_head__: 10000, accelerator_type:A40: 10000, memory: 844966629380000}}, "available": {GPU: 20000, node:__internal_head__: 10000, memory: 844966629380000, accelerator_type:A40: 10000, object_store_memory: 21474836480000, node:192.168.0.2: 10000, CPU: 200000}}, "labels":{"ray.io/node_id":"94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} [state-dump] Waiting tasks size: 0 [state-dump] Number of executing tasks: 0 [state-dump] Number of pinned task arguments: 0 [state-dump] Number of total spilled tasks: 0 [state-dump] Number of spilled waiting tasks: 0 [state-dump] Number of spilled unschedulable tasks: 0 [state-dump] Resource usage { [state-dump] } [state-dump] Backlog Size per scheduling descriptor :{workerId: num backlogs}: [state-dump] [state-dump] Running tasks by scheduling class: [state-dump] ================================================== [state-dump] [state-dump] ClusterResources: [state-dump] LocalObjectManager: [state-dump] - num pinned objects: 0 [state-dump] - pinned objects size: 0 [state-dump] - num objects pending restore: 0 [state-dump] - num objects pending spill: 0 [state-dump] - num bytes pending spill: 0 [state-dump] - num bytes currently spilled: 0 [state-dump] - cumulative spill requests: 0 [state-dump] - cumulative restore requests: 0 [state-dump] - spilled objects pending delete: 0 [state-dump] [state-dump] ObjectManager: [state-dump] - num local objects: 0 [state-dump] - num unfulfilled push requests: 0 [state-dump] - num object pull requests: 0 [state-dump] - num chunks received total: 0 [state-dump] - num chunks received failed (all): 0 [state-dump] - num chunks received failed / cancelled: 0 [state-dump] - num chunks received failed / plasma error: 0 [state-dump] Event stats: [state-dump] Global stats: 0 total (0 active) [state-dump] Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Execution time: mean = -nan s, total = 0.000 s [state-dump] Event stats: [state-dump] PushManager: [state-dump] - num pushes in flight: 0 [state-dump] - num chunks in flight: 0 [state-dump] - num chunks remaining: 0 [state-dump] - max chunks allowed: 409 [state-dump] OwnershipBasedObjectDirectory: [state-dump] - num listeners: 0 [state-dump] - cumulative location updates: 0 [state-dump] - num location updates per second: 0.000 [state-dump] - num location lookups per second: 0.000 [state-dump] - num locations added per second: 0.000 [state-dump] - num locations removed per second: 0.000 [state-dump] BufferPool: [state-dump] - create buffer state map size: 0 [state-dump] PullManager: [state-dump] - num bytes available for pulled objects: 2147483648 [state-dump] - num bytes being pulled (all): 0 [state-dump] - num bytes being pulled / pinned: 0 [state-dump] - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - first get request bundle: N/A [state-dump] - first wait request bundle: N/A [state-dump] - first task request bundle: N/A [state-dump] - num objects queued: 0 [state-dump] - num objects actively pulled (all): 0 [state-dump] - num objects actively pulled / pinned: 0 [state-dump] - num bundles being pulled: 0 [state-dump] - num pull retries: 0 [state-dump] - max timeout seconds: 0 [state-dump] - max timeout request is already processed. No entry. [state-dump] [state-dump] WorkerPool: [state-dump] - registered jobs: 1 [state-dump] - process_failed_job_config_missing: 0 [state-dump] - process_failed_rate_limited: 0 [state-dump] - process_failed_pending_registration: 0 [state-dump] - process_failed_runtime_env_setup_failed: 0 [state-dump] - num PYTHON workers: 20 [state-dump] - num PYTHON drivers: 1 [state-dump] - num PYTHON pending start requests: 0 [state-dump] - num PYTHON pending registration requests: 0 [state-dump] - num object spill callbacks queued: 0 [state-dump] - num object restore queued: 0 [state-dump] - num util functions queued: 0 [state-dump] - num idle workers: 20 [state-dump] TaskDependencyManager: [state-dump] - task deps map size: 0 [state-dump] - get req map size: 0 [state-dump] - wait req map size: 0 [state-dump] - local objects map size: 0 [state-dump] WaitManager: [state-dump] - num active wait requests: 0 [state-dump] Subscriber: [state-dump] Channel WORKER_OBJECT_LOCATIONS_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_OBJECT_EVICTION [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_REF_REMOVED_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] num async plasma notifications: 0 [state-dump] Remote node managers: [state-dump] Event stats: [state-dump] Global stats: 162564 total (35 active) [state-dump] Queueing time: mean = 464.662 us, max = 59.826 s, min = -0.000 s, total = 75.537 s [state-dump] Execution time: mean = 11.244 ms, total = 1827.795 s [state-dump] Event stats: [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog - 39039 total (0 active), Execution time: mean = 538.527 us, total = 21.024 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 39039 total (0 active), Execution time: mean = 36.015 us, total = 1.406 s, Queueing time: mean = 112.024 us, max = 2.841 ms, min = 2.750 us, total = 4.373 s [state-dump] ObjectManager.UpdateAvailableMemory - 18583 total (0 active), Execution time: mean = 6.119 us, total = 113.701 ms, Queueing time: mean = 109.337 us, max = 1.076 ms, min = 2.228 us, total = 2.032 s [state-dump] NodeManager.CheckGC - 18583 total (1 active), Execution time: mean = 2.853 us, total = 53.017 ms, Queueing time: mean = 98.993 us, max = 25.875 ms, min = 2.848 us, total = 1.840 s [state-dump] RaySyncer.OnDemandBroadcasting - 18583 total (1 active), Execution time: mean = 10.693 us, total = 198.706 ms, Queueing time: mean = 92.051 us, max = 25.869 ms, min = 6.166 us, total = 1.711 s [state-dump] RayletWorkerPool.deadline_timer.kill_idle_workers - 9297 total (1 active), Execution time: mean = 18.531 us, total = 172.285 ms, Queueing time: mean = 77.450 us, max = 26.386 ms, min = -0.000 s, total = 720.056 ms [state-dump] MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 7425 total (1 active), Execution time: mean = 454.485 us, total = 3.375 s, Queueing time: mean = 75.017 us, max = 3.532 ms, min = -0.000 s, total = 556.998 ms [state-dump] NodeManager.deadline_timer.spill_objects_when_over_threshold - 1860 total (1 active), Execution time: mean = 3.136 us, total = 5.833 ms, Queueing time: mean = 179.126 us, max = 2.222 ms, min = 3.958 us, total = 333.175 ms [state-dump] NodeManager.deadline_timer.flush_free_objects - 1860 total (1 active), Execution time: mean = 9.046 us, total = 16.825 ms, Queueing time: mean = 175.099 us, max = 2.212 ms, min = 6.450 us, total = 325.684 ms [state-dump] NodeManager.ScheduleAndDispatchTasks - 1860 total (1 active), Execution time: mean = 16.523 us, total = 30.732 ms, Queueing time: mean = 75.237 us, max = 2.581 ms, min = 10.666 us, total = 139.940 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 1859 total (0 active), Execution time: mean = 98.246 us, total = 182.640 ms, Queueing time: mean = 113.908 us, max = 2.934 ms, min = 4.027 us, total = 211.754 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad - 1859 total (0 active), Execution time: mean = 616.574 us, total = 1.146 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ClusterResourceManager.ResetRemoteNodeView - 621 total (1 active), Execution time: mean = 11.103 us, total = 6.895 ms, Queueing time: mean = 77.263 us, max = 442.307 us, min = 14.912 us, total = 47.980 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 372 total (0 active), Execution time: mean = 1.551 ms, total = 576.882 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.deadline_timer.record_metrics - 372 total (1 active), Execution time: mean = 573.481 us, total = 213.335 ms, Queueing time: mean = 335.554 us, max = 1.499 ms, min = 10.616 us, total = 124.826 ms [state-dump] NodeManager.GcsCheckAlive - 372 total (1 active), Execution time: mean = 295.049 us, total = 109.758 ms, Queueing time: mean = 612.785 us, max = 1.922 ms, min = 6.690 us, total = 227.956 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 372 total (0 active), Execution time: mean = 53.288 us, total = 19.823 ms, Queueing time: mean = 119.970 us, max = 4.779 ms, min = 11.561 us, total = 44.629 ms [state-dump] NodeManager.deadline_timer.debug_state_dump - 186 total (1 active), Execution time: mean = 1.744 ms, total = 324.331 ms, Queueing time: mean = 67.665 us, max = 163.431 us, min = 13.687 us, total = 12.586 ms [state-dump] ClientConnection.async_read.ProcessMessageHeader - 95 total (21 active), Execution time: mean = 7.620 us, total = 723.861 us, Queueing time: mean = 660.815 ms, max = 59.826 s, min = 29.537 us, total = 62.777 s [state-dump] ClientConnection.async_read.ProcessMessage - 74 total (0 active), Execution time: mean = 752.373 us, total = 55.676 ms, Queueing time: mean = 18.898 us, max = 247.615 us, min = 4.453 us, total = 1.398 ms [state-dump] NodeManager.deadline_timer.print_event_loop_stats - 31 total (1 active, 1 running), Execution time: mean = 2.572 ms, total = 79.733 ms, Queueing time: mean = 73.569 us, max = 215.454 us, min = 15.835 us, total = 2.281 ms [state-dump] ClientConnection.async_write.DoAsyncWrites - 22 total (0 active), Execution time: mean = 2.129 us, total = 46.834 us, Queueing time: mean = 43.475 us, max = 264.443 us, min = 17.727 us, total = 956.459 us [state-dump] ObjectManager.ObjectDeleted - 21 total (0 active), Execution time: mean = 18.654 us, total = 391.728 us, Queueing time: mean = 66.932 us, max = 269.040 us, min = 24.280 us, total = 1.406 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 21 total (0 active), Execution time: mean = 119.783 us, total = 2.515 ms, Queueing time: mean = 119.709 us, max = 193.975 us, min = 26.529 us, total = 2.514 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig - 21 total (0 active), Execution time: mean = 800.451 us, total = 16.809 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ObjectManager.ObjectAdded - 21 total (0 active), Execution time: mean = 13.522 us, total = 283.961 us, Queueing time: mean = 115.200 us, max = 203.067 us, min = 15.853 us, total = 2.419 ms [state-dump] PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 204.482 us, total = 2.658 ms, Queueing time: mean = 3.109 ms, max = 9.915 ms, min = 28.359 us, total = 40.413 ms [state-dump] - 12 total (0 active), Execution time: mean = 1.109 us, total = 13.308 us, Queueing time: mean = 117.235 us, max = 207.332 us, min = 32.817 us, total = 1.407 ms [state-dump] RaySyncer.BroadcastMessage - 12 total (0 active), Execution time: mean = 229.790 us, total = 2.757 ms, Queueing time: mean = 796.500 ns, max = 1.330 us, min = 180.000 ns, total = 9.558 us [state-dump] NodeManagerService.grpc_server.RequestWorkerLease.HandleRequestImpl - 10 total (0 active), Execution time: mean = 133.400 us, total = 1.334 ms, Queueing time: mean = 133.654 us, max = 258.016 us, min = 50.206 us, total = 1.337 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker - 10 total (0 active), Execution time: mean = 698.937 us, total = 6.989 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.RequestWorkerLease - 10 total (0 active), Execution time: mean = 967.372 us, total = 9.674 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] WorkerPool.PopWorkerCallback - 10 total (0 active), Execution time: mean = 24.488 us, total = 244.878 us, Queueing time: mean = 114.997 us, max = 206.869 us, min = 23.653 us, total = 1.150 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker.HandleRequestImpl - 10 total (0 active), Execution time: mean = 129.906 us, total = 1.299 ms, Queueing time: mean = 109.463 us, max = 137.712 us, min = 31.332 us, total = 1.095 ms [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 5 total (1 active), Execution time: mean = 359.520 s, total = 1797.602 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 4 total (0 active), Execution time: mean = 320.800 us, total = 1.283 ms, Queueing time: mean = 94.198 us, max = 160.959 us, min = 20.299 us, total = 376.792 us [state-dump] NodeManager.GCTaskFailureReason - 3 total (1 active), Execution time: mean = 6.632 us, total = 19.896 us, Queueing time: mean = 60.387 us, max = 97.290 us, min = 83.871 us, total = 181.161 us [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 1.320 ms, total = 2.640 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 142.119 us, total = 284.237 us, Queueing time: mean = 702.200 us, max = 1.287 ms, min = 117.240 us, total = 1.404 ms [state-dump] RaySyncerRegister - 2 total (0 active), Execution time: mean = 2.053 us, total = 4.106 us, Queueing time: mean = 222.000 ns, max = 369.000 ns, min = 75.000 ns, total = 444.000 ns [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 2.178 ms, total = 2.178 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 1.022 s, total = 1.022 s, Queueing time: mean = 126.779 us, max = 126.779 us, min = 126.779 us, total = 126.779 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 43.810 us, total = 43.810 us, Queueing time: mean = 116.080 us, max = 116.080 us, min = 116.080 us, total = 116.080 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 1.935 ms, total = 1.935 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 2.002 ms, total = 2.002 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 1.278 ms, total = 1.278 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 288.995 us, total = 288.995 us, Queueing time: mean = 122.868 us, max = 122.868 us, min = 122.868 us, total = 122.868 us [state-dump] Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 74.750 us, total = 74.750 us, Queueing time: mean = 225.429 us, max = 225.429 us, min = 225.429 us, total = 225.429 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 242.297 us, total = 242.297 us, Queueing time: mean = 122.977 us, max = 122.977 us, min = 122.977 us, total = 122.977 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 92.080 us, total = 92.080 us, Queueing time: mean = 36.492 us, max = 36.492 us, min = 36.492 us, total = 36.492 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 1.746 ms, total = 1.746 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] DebugString() time ms: 1 [state-dump] [state-dump] [2025-01-20 23:33:54,912 I 13636 13664] (raylet) store.cc:564: Plasma store debug dump: Current usage: 0 / 2.14748 GB - num bytes created total: 168 0 pending objects of total size 0MB - objects spillable: 0 - bytes spillable: 0 - objects unsealed: 0 - bytes unsealed: 0 - objects in use: 0 - bytes in use: 0 - objects evictable: 0 - bytes evictable: 0 - objects created by worker: 0 - bytes created by worker: 0 - objects restored: 0 - bytes restored: 0 - objects received: 0 - bytes received: 0 - objects errored: 0 - bytes errored: 0 [2025-01-20 23:33:56,012 I 13636 13636] (raylet) node_manager.cc:525: [state-dump] NodeManager: [state-dump] Node ID: 94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3 [state-dump] Node name: 192.168.0.2 [state-dump] InitialConfigResources: {object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, GPU: 20000, node:__internal_head__: 10000, accelerator_type:A40: 10000, memory: 844966629380000} [state-dump] ClusterTaskManager: [state-dump] ========== Node: 94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3 ================= [state-dump] Infeasible queue length: 0 [state-dump] Schedule queue length: 0 [state-dump] Dispatch queue length: 0 [state-dump] num_waiting_for_resource: 0 [state-dump] num_waiting_for_plasma_memory: 0 [state-dump] num_waiting_for_remote_node_resources: 0 [state-dump] num_worker_not_started_by_job_config_not_exist: 0 [state-dump] num_worker_not_started_by_registration_timeout: 0 [state-dump] num_tasks_waiting_for_workers: 0 [state-dump] num_cancelled_tasks: 0 [state-dump] cluster_resource_scheduler state: [state-dump] Local id: -2729954820020547849 Local resources: {"total":{CPU: [200000], memory: [844966629380000], accelerator_type:A40: [10000], node:192.168.0.2: [10000], node:__internal_head__: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000]}}, "available": {CPU: [200000], memory: [844966629380000], accelerator_type:A40: [10000], node:192.168.0.2: [10000], node:__internal_head__: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000]}}, "labels":{"ray.io/node_id":"94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3",} is_draining: 0 is_idle: 1 Cluster resources: node id: -2729954820020547849{"total":{object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, GPU: 20000, node:__internal_head__: 10000, accelerator_type:A40: 10000, memory: 844966629380000}}, "available": {GPU: 20000, node:__internal_head__: 10000, memory: 844966629380000, accelerator_type:A40: 10000, object_store_memory: 21474836480000, node:192.168.0.2: 10000, CPU: 200000}}, "labels":{"ray.io/node_id":"94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} [state-dump] Waiting tasks size: 0 [state-dump] Number of executing tasks: 0 [state-dump] Number of pinned task arguments: 0 [state-dump] Number of total spilled tasks: 0 [state-dump] Number of spilled waiting tasks: 0 [state-dump] Number of spilled unschedulable tasks: 0 [state-dump] Resource usage { [state-dump] } [state-dump] Backlog Size per scheduling descriptor :{workerId: num backlogs}: [state-dump] [state-dump] Running tasks by scheduling class: [state-dump] ================================================== [state-dump] [state-dump] ClusterResources: [state-dump] LocalObjectManager: [state-dump] - num pinned objects: 0 [state-dump] - pinned objects size: 0 [state-dump] - num objects pending restore: 0 [state-dump] - num objects pending spill: 0 [state-dump] - num bytes pending spill: 0 [state-dump] - num bytes currently spilled: 0 [state-dump] - cumulative spill requests: 0 [state-dump] - cumulative restore requests: 0 [state-dump] - spilled objects pending delete: 0 [state-dump] [state-dump] ObjectManager: [state-dump] - num local objects: 0 [state-dump] - num unfulfilled push requests: 0 [state-dump] - num object pull requests: 0 [state-dump] - num chunks received total: 0 [state-dump] - num chunks received failed (all): 0 [state-dump] - num chunks received failed / cancelled: 0 [state-dump] - num chunks received failed / plasma error: 0 [state-dump] Event stats: [state-dump] Global stats: 0 total (0 active) [state-dump] Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Execution time: mean = -nan s, total = 0.000 s [state-dump] Event stats: [state-dump] PushManager: [state-dump] - num pushes in flight: 0 [state-dump] - num chunks in flight: 0 [state-dump] - num chunks remaining: 0 [state-dump] - max chunks allowed: 409 [state-dump] OwnershipBasedObjectDirectory: [state-dump] - num listeners: 0 [state-dump] - cumulative location updates: 0 [state-dump] - num location updates per second: 0.000 [state-dump] - num location lookups per second: 0.000 [state-dump] - num locations added per second: 0.000 [state-dump] - num locations removed per second: 0.000 [state-dump] BufferPool: [state-dump] - create buffer state map size: 0 [state-dump] PullManager: [state-dump] - num bytes available for pulled objects: 2147483648 [state-dump] - num bytes being pulled (all): 0 [state-dump] - num bytes being pulled / pinned: 0 [state-dump] - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - first get request bundle: N/A [state-dump] - first wait request bundle: N/A [state-dump] - first task request bundle: N/A [state-dump] - num objects queued: 0 [state-dump] - num objects actively pulled (all): 0 [state-dump] - num objects actively pulled / pinned: 0 [state-dump] - num bundles being pulled: 0 [state-dump] - num pull retries: 0 [state-dump] - max timeout seconds: 0 [state-dump] - max timeout request is already processed. No entry. [state-dump] [state-dump] WorkerPool: [state-dump] - registered jobs: 1 [state-dump] - process_failed_job_config_missing: 0 [state-dump] - process_failed_rate_limited: 0 [state-dump] - process_failed_pending_registration: 0 [state-dump] - process_failed_runtime_env_setup_failed: 0 [state-dump] - num PYTHON workers: 20 [state-dump] - num PYTHON drivers: 1 [state-dump] - num PYTHON pending start requests: 0 [state-dump] - num PYTHON pending registration requests: 0 [state-dump] - num object spill callbacks queued: 0 [state-dump] - num object restore queued: 0 [state-dump] - num util functions queued: 0 [state-dump] - num idle workers: 20 [state-dump] TaskDependencyManager: [state-dump] - task deps map size: 0 [state-dump] - get req map size: 0 [state-dump] - wait req map size: 0 [state-dump] - local objects map size: 0 [state-dump] WaitManager: [state-dump] - num active wait requests: 0 [state-dump] Subscriber: [state-dump] Channel WORKER_OBJECT_LOCATIONS_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_OBJECT_EVICTION [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_REF_REMOVED_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] num async plasma notifications: 0 [state-dump] Remote node managers: [state-dump] Event stats: [state-dump] Global stats: 167795 total (35 active) [state-dump] Queueing time: mean = 452.742 us, max = 59.826 s, min = -0.000 s, total = 75.968 s [state-dump] Execution time: mean = 10.899 ms, total = 1828.755 s [state-dump] Event stats: [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog - 40299 total (0 active), Execution time: mean = 538.967 us, total = 21.720 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 40299 total (0 active), Execution time: mean = 35.964 us, total = 1.449 s, Queueing time: mean = 112.125 us, max = 2.841 ms, min = 2.750 us, total = 4.519 s [state-dump] ObjectManager.UpdateAvailableMemory - 19182 total (0 active), Execution time: mean = 6.132 us, total = 117.615 ms, Queueing time: mean = 109.744 us, max = 1.076 ms, min = 2.228 us, total = 2.105 s [state-dump] NodeManager.CheckGC - 19182 total (1 active), Execution time: mean = 2.853 us, total = 54.717 ms, Queueing time: mean = 99.013 us, max = 25.875 ms, min = 2.848 us, total = 1.899 s [state-dump] RaySyncer.OnDemandBroadcasting - 19182 total (1 active), Execution time: mean = 10.689 us, total = 205.037 ms, Queueing time: mean = 92.075 us, max = 25.869 ms, min = 6.166 us, total = 1.766 s [state-dump] RayletWorkerPool.deadline_timer.kill_idle_workers - 9596 total (1 active), Execution time: mean = 18.541 us, total = 177.923 ms, Queueing time: mean = 77.684 us, max = 26.386 ms, min = -0.000 s, total = 745.458 ms [state-dump] MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 7665 total (1 active), Execution time: mean = 454.711 us, total = 3.485 s, Queueing time: mean = 75.051 us, max = 3.532 ms, min = -0.000 s, total = 575.264 ms [state-dump] NodeManager.deadline_timer.spill_objects_when_over_threshold - 1920 total (1 active), Execution time: mean = 3.134 us, total = 6.018 ms, Queueing time: mean = 180.033 us, max = 2.473 ms, min = 3.958 us, total = 345.663 ms [state-dump] NodeManager.deadline_timer.flush_free_objects - 1920 total (1 active), Execution time: mean = 9.054 us, total = 17.384 ms, Queueing time: mean = 175.995 us, max = 2.483 ms, min = 6.450 us, total = 337.910 ms [state-dump] NodeManager.ScheduleAndDispatchTasks - 1920 total (1 active), Execution time: mean = 16.553 us, total = 31.781 ms, Queueing time: mean = 75.410 us, max = 2.581 ms, min = 10.666 us, total = 144.786 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 1919 total (0 active), Execution time: mean = 98.211 us, total = 188.468 ms, Queueing time: mean = 113.615 us, max = 2.934 ms, min = 4.027 us, total = 218.026 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad - 1919 total (0 active), Execution time: mean = 616.531 us, total = 1.183 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ClusterResourceManager.ResetRemoteNodeView - 641 total (1 active), Execution time: mean = 11.059 us, total = 7.089 ms, Queueing time: mean = 77.329 us, max = 442.307 us, min = 14.912 us, total = 49.568 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 384 total (0 active), Execution time: mean = 1.556 ms, total = 597.538 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.deadline_timer.record_metrics - 384 total (1 active), Execution time: mean = 575.034 us, total = 220.813 ms, Queueing time: mean = 338.499 us, max = 2.010 ms, min = 10.616 us, total = 129.984 ms [state-dump] NodeManager.GcsCheckAlive - 384 total (1 active), Execution time: mean = 296.113 us, total = 113.707 ms, Queueing time: mean = 616.215 us, max = 2.522 ms, min = 6.690 us, total = 236.627 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 384 total (0 active), Execution time: mean = 53.459 us, total = 20.528 ms, Queueing time: mean = 119.576 us, max = 4.779 ms, min = 11.561 us, total = 45.917 ms [state-dump] NodeManager.deadline_timer.debug_state_dump - 192 total (1 active), Execution time: mean = 1.753 ms, total = 336.627 ms, Queueing time: mean = 67.898 us, max = 163.431 us, min = 13.687 us, total = 13.036 ms [state-dump] ClientConnection.async_read.ProcessMessageHeader - 95 total (21 active), Execution time: mean = 7.620 us, total = 723.861 us, Queueing time: mean = 660.815 ms, max = 59.826 s, min = 29.537 us, total = 62.777 s [state-dump] ClientConnection.async_read.ProcessMessage - 74 total (0 active), Execution time: mean = 752.373 us, total = 55.676 ms, Queueing time: mean = 18.898 us, max = 247.615 us, min = 4.453 us, total = 1.398 ms [state-dump] NodeManager.deadline_timer.print_event_loop_stats - 32 total (1 active, 1 running), Execution time: mean = 2.573 ms, total = 82.343 ms, Queueing time: mean = 71.813 us, max = 215.454 us, min = 15.835 us, total = 2.298 ms [state-dump] ClientConnection.async_write.DoAsyncWrites - 22 total (0 active), Execution time: mean = 2.129 us, total = 46.834 us, Queueing time: mean = 43.475 us, max = 264.443 us, min = 17.727 us, total = 956.459 us [state-dump] ObjectManager.ObjectDeleted - 21 total (0 active), Execution time: mean = 18.654 us, total = 391.728 us, Queueing time: mean = 66.932 us, max = 269.040 us, min = 24.280 us, total = 1.406 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 21 total (0 active), Execution time: mean = 119.783 us, total = 2.515 ms, Queueing time: mean = 119.709 us, max = 193.975 us, min = 26.529 us, total = 2.514 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig - 21 total (0 active), Execution time: mean = 800.451 us, total = 16.809 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ObjectManager.ObjectAdded - 21 total (0 active), Execution time: mean = 13.522 us, total = 283.961 us, Queueing time: mean = 115.200 us, max = 203.067 us, min = 15.853 us, total = 2.419 ms [state-dump] PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 204.482 us, total = 2.658 ms, Queueing time: mean = 3.109 ms, max = 9.915 ms, min = 28.359 us, total = 40.413 ms [state-dump] - 12 total (0 active), Execution time: mean = 1.109 us, total = 13.308 us, Queueing time: mean = 117.235 us, max = 207.332 us, min = 32.817 us, total = 1.407 ms [state-dump] RaySyncer.BroadcastMessage - 12 total (0 active), Execution time: mean = 229.790 us, total = 2.757 ms, Queueing time: mean = 796.500 ns, max = 1.330 us, min = 180.000 ns, total = 9.558 us [state-dump] NodeManagerService.grpc_server.RequestWorkerLease.HandleRequestImpl - 10 total (0 active), Execution time: mean = 133.400 us, total = 1.334 ms, Queueing time: mean = 133.654 us, max = 258.016 us, min = 50.206 us, total = 1.337 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker - 10 total (0 active), Execution time: mean = 698.937 us, total = 6.989 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.RequestWorkerLease - 10 total (0 active), Execution time: mean = 967.372 us, total = 9.674 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] WorkerPool.PopWorkerCallback - 10 total (0 active), Execution time: mean = 24.488 us, total = 244.878 us, Queueing time: mean = 114.997 us, max = 206.869 us, min = 23.653 us, total = 1.150 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker.HandleRequestImpl - 10 total (0 active), Execution time: mean = 129.906 us, total = 1.299 ms, Queueing time: mean = 109.463 us, max = 137.712 us, min = 31.332 us, total = 1.095 ms [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 5 total (1 active), Execution time: mean = 359.520 s, total = 1797.602 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 4 total (0 active), Execution time: mean = 320.800 us, total = 1.283 ms, Queueing time: mean = 94.198 us, max = 160.959 us, min = 20.299 us, total = 376.792 us [state-dump] NodeManager.GCTaskFailureReason - 3 total (1 active), Execution time: mean = 6.632 us, total = 19.896 us, Queueing time: mean = 60.387 us, max = 97.290 us, min = 83.871 us, total = 181.161 us [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 1.320 ms, total = 2.640 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 142.119 us, total = 284.237 us, Queueing time: mean = 702.200 us, max = 1.287 ms, min = 117.240 us, total = 1.404 ms [state-dump] RaySyncerRegister - 2 total (0 active), Execution time: mean = 2.053 us, total = 4.106 us, Queueing time: mean = 222.000 ns, max = 369.000 ns, min = 75.000 ns, total = 444.000 ns [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 2.178 ms, total = 2.178 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 1.022 s, total = 1.022 s, Queueing time: mean = 126.779 us, max = 126.779 us, min = 126.779 us, total = 126.779 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 43.810 us, total = 43.810 us, Queueing time: mean = 116.080 us, max = 116.080 us, min = 116.080 us, total = 116.080 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 1.935 ms, total = 1.935 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 2.002 ms, total = 2.002 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 1.278 ms, total = 1.278 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 288.995 us, total = 288.995 us, Queueing time: mean = 122.868 us, max = 122.868 us, min = 122.868 us, total = 122.868 us [state-dump] Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 74.750 us, total = 74.750 us, Queueing time: mean = 225.429 us, max = 225.429 us, min = 225.429 us, total = 225.429 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 242.297 us, total = 242.297 us, Queueing time: mean = 122.977 us, max = 122.977 us, min = 122.977 us, total = 122.977 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 92.080 us, total = 92.080 us, Queueing time: mean = 36.492 us, max = 36.492 us, min = 36.492 us, total = 36.492 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 1.746 ms, total = 1.746 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] DebugString() time ms: 1 [state-dump] [state-dump] [2025-01-20 23:34:54,913 I 13636 13664] (raylet) store.cc:564: Plasma store debug dump: Current usage: 0 / 2.14748 GB - num bytes created total: 168 0 pending objects of total size 0MB - objects spillable: 0 - bytes spillable: 0 - objects unsealed: 0 - bytes unsealed: 0 - objects in use: 0 - bytes in use: 0 - objects evictable: 0 - bytes evictable: 0 - objects created by worker: 0 - bytes created by worker: 0 - objects restored: 0 - bytes restored: 0 - objects received: 0 - bytes received: 0 - objects errored: 0 - bytes errored: 0 [2025-01-20 23:34:56,015 I 13636 13636] (raylet) node_manager.cc:525: [state-dump] NodeManager: [state-dump] Node ID: 94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3 [state-dump] Node name: 192.168.0.2 [state-dump] InitialConfigResources: {object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, GPU: 20000, node:__internal_head__: 10000, accelerator_type:A40: 10000, memory: 844966629380000} [state-dump] ClusterTaskManager: [state-dump] ========== Node: 94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3 ================= [state-dump] Infeasible queue length: 0 [state-dump] Schedule queue length: 0 [state-dump] Dispatch queue length: 0 [state-dump] num_waiting_for_resource: 0 [state-dump] num_waiting_for_plasma_memory: 0 [state-dump] num_waiting_for_remote_node_resources: 0 [state-dump] num_worker_not_started_by_job_config_not_exist: 0 [state-dump] num_worker_not_started_by_registration_timeout: 0 [state-dump] num_tasks_waiting_for_workers: 0 [state-dump] num_cancelled_tasks: 0 [state-dump] cluster_resource_scheduler state: [state-dump] Local id: -2729954820020547849 Local resources: {"total":{CPU: [200000], memory: [844966629380000], accelerator_type:A40: [10000], node:192.168.0.2: [10000], node:__internal_head__: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000]}}, "available": {CPU: [200000], memory: [844966629380000], accelerator_type:A40: [10000], node:192.168.0.2: [10000], node:__internal_head__: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000]}}, "labels":{"ray.io/node_id":"94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3",} is_draining: 0 is_idle: 1 Cluster resources: node id: -2729954820020547849{"total":{object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, GPU: 20000, node:__internal_head__: 10000, accelerator_type:A40: 10000, memory: 844966629380000}}, "available": {GPU: 20000, node:__internal_head__: 10000, memory: 844966629380000, accelerator_type:A40: 10000, object_store_memory: 21474836480000, node:192.168.0.2: 10000, CPU: 200000}}, "labels":{"ray.io/node_id":"94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} [state-dump] Waiting tasks size: 0 [state-dump] Number of executing tasks: 0 [state-dump] Number of pinned task arguments: 0 [state-dump] Number of total spilled tasks: 0 [state-dump] Number of spilled waiting tasks: 0 [state-dump] Number of spilled unschedulable tasks: 0 [state-dump] Resource usage { [state-dump] } [state-dump] Backlog Size per scheduling descriptor :{workerId: num backlogs}: [state-dump] [state-dump] Running tasks by scheduling class: [state-dump] ================================================== [state-dump] [state-dump] ClusterResources: [state-dump] LocalObjectManager: [state-dump] - num pinned objects: 0 [state-dump] - pinned objects size: 0 [state-dump] - num objects pending restore: 0 [state-dump] - num objects pending spill: 0 [state-dump] - num bytes pending spill: 0 [state-dump] - num bytes currently spilled: 0 [state-dump] - cumulative spill requests: 0 [state-dump] - cumulative restore requests: 0 [state-dump] - spilled objects pending delete: 0 [state-dump] [state-dump] ObjectManager: [state-dump] - num local objects: 0 [state-dump] - num unfulfilled push requests: 0 [state-dump] - num object pull requests: 0 [state-dump] - num chunks received total: 0 [state-dump] - num chunks received failed (all): 0 [state-dump] - num chunks received failed / cancelled: 0 [state-dump] - num chunks received failed / plasma error: 0 [state-dump] Event stats: [state-dump] Global stats: 0 total (0 active) [state-dump] Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Execution time: mean = -nan s, total = 0.000 s [state-dump] Event stats: [state-dump] PushManager: [state-dump] - num pushes in flight: 0 [state-dump] - num chunks in flight: 0 [state-dump] - num chunks remaining: 0 [state-dump] - max chunks allowed: 409 [state-dump] OwnershipBasedObjectDirectory: [state-dump] - num listeners: 0 [state-dump] - cumulative location updates: 0 [state-dump] - num location updates per second: 0.000 [state-dump] - num location lookups per second: 0.000 [state-dump] - num locations added per second: 0.000 [state-dump] - num locations removed per second: 0.000 [state-dump] BufferPool: [state-dump] - create buffer state map size: 0 [state-dump] PullManager: [state-dump] - num bytes available for pulled objects: 2147483648 [state-dump] - num bytes being pulled (all): 0 [state-dump] - num bytes being pulled / pinned: 0 [state-dump] - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - first get request bundle: N/A [state-dump] - first wait request bundle: N/A [state-dump] - first task request bundle: N/A [state-dump] - num objects queued: 0 [state-dump] - num objects actively pulled (all): 0 [state-dump] - num objects actively pulled / pinned: 0 [state-dump] - num bundles being pulled: 0 [state-dump] - num pull retries: 0 [state-dump] - max timeout seconds: 0 [state-dump] - max timeout request is already processed. No entry. [state-dump] [state-dump] WorkerPool: [state-dump] - registered jobs: 1 [state-dump] - process_failed_job_config_missing: 0 [state-dump] - process_failed_rate_limited: 0 [state-dump] - process_failed_pending_registration: 0 [state-dump] - process_failed_runtime_env_setup_failed: 0 [state-dump] - num PYTHON workers: 20 [state-dump] - num PYTHON drivers: 1 [state-dump] - num PYTHON pending start requests: 0 [state-dump] - num PYTHON pending registration requests: 0 [state-dump] - num object spill callbacks queued: 0 [state-dump] - num object restore queued: 0 [state-dump] - num util functions queued: 0 [state-dump] - num idle workers: 20 [state-dump] TaskDependencyManager: [state-dump] - task deps map size: 0 [state-dump] - get req map size: 0 [state-dump] - wait req map size: 0 [state-dump] - local objects map size: 0 [state-dump] WaitManager: [state-dump] - num active wait requests: 0 [state-dump] Subscriber: [state-dump] Channel WORKER_OBJECT_LOCATIONS_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_OBJECT_EVICTION [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_REF_REMOVED_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] num async plasma notifications: 0 [state-dump] Remote node managers: [state-dump] Event stats: [state-dump] Global stats: 173026 total (35 active) [state-dump] Queueing time: mean = 441.368 us, max = 59.826 s, min = -0.000 s, total = 76.368 s [state-dump] Execution time: mean = 10.575 ms, total = 1829.685 s [state-dump] Event stats: [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog - 41559 total (0 active), Execution time: mean = 538.864 us, total = 22.395 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 41559 total (0 active), Execution time: mean = 35.896 us, total = 1.492 s, Queueing time: mean = 112.066 us, max = 2.841 ms, min = 2.750 us, total = 4.657 s [state-dump] ObjectManager.UpdateAvailableMemory - 19781 total (0 active), Execution time: mean = 6.125 us, total = 121.160 ms, Queueing time: mean = 109.743 us, max = 1.076 ms, min = 2.228 us, total = 2.171 s [state-dump] NodeManager.CheckGC - 19781 total (1 active), Execution time: mean = 2.850 us, total = 56.376 ms, Queueing time: mean = 98.884 us, max = 25.875 ms, min = 2.848 us, total = 1.956 s [state-dump] RaySyncer.OnDemandBroadcasting - 19781 total (1 active), Execution time: mean = 10.669 us, total = 211.038 ms, Queueing time: mean = 91.963 us, max = 25.869 ms, min = 6.166 us, total = 1.819 s [state-dump] RayletWorkerPool.deadline_timer.kill_idle_workers - 9896 total (1 active), Execution time: mean = 18.482 us, total = 182.897 ms, Queueing time: mean = 77.580 us, max = 26.386 ms, min = -0.000 s, total = 767.732 ms [state-dump] MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 7904 total (1 active), Execution time: mean = 454.769 us, total = 3.594 s, Queueing time: mean = 74.946 us, max = 3.532 ms, min = -0.000 s, total = 592.373 ms [state-dump] NodeManager.deadline_timer.spill_objects_when_over_threshold - 1980 total (1 active), Execution time: mean = 3.130 us, total = 6.198 ms, Queueing time: mean = 180.190 us, max = 2.473 ms, min = 3.958 us, total = 356.777 ms [state-dump] NodeManager.deadline_timer.flush_free_objects - 1980 total (1 active), Execution time: mean = 9.052 us, total = 17.922 ms, Queueing time: mean = 176.149 us, max = 2.483 ms, min = 6.450 us, total = 348.775 ms [state-dump] NodeManager.ScheduleAndDispatchTasks - 1980 total (1 active), Execution time: mean = 16.522 us, total = 32.714 ms, Queueing time: mean = 75.182 us, max = 2.581 ms, min = 10.666 us, total = 148.860 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 1979 total (0 active), Execution time: mean = 98.159 us, total = 194.256 ms, Queueing time: mean = 113.230 us, max = 2.934 ms, min = 4.027 us, total = 224.082 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad - 1979 total (0 active), Execution time: mean = 615.360 us, total = 1.218 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ClusterResourceManager.ResetRemoteNodeView - 661 total (1 active), Execution time: mean = 11.003 us, total = 7.273 ms, Queueing time: mean = 77.148 us, max = 442.307 us, min = 14.912 us, total = 50.995 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 396 total (0 active), Execution time: mean = 1.558 ms, total = 616.988 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.deadline_timer.record_metrics - 396 total (1 active), Execution time: mean = 576.531 us, total = 228.306 ms, Queueing time: mean = 338.190 us, max = 2.010 ms, min = 10.616 us, total = 133.923 ms [state-dump] NodeManager.GcsCheckAlive - 396 total (1 active), Execution time: mean = 297.253 us, total = 117.712 ms, Queueing time: mean = 616.164 us, max = 2.522 ms, min = 6.690 us, total = 244.001 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 396 total (0 active), Execution time: mean = 53.533 us, total = 21.199 ms, Queueing time: mean = 119.227 us, max = 4.779 ms, min = 11.561 us, total = 47.214 ms [state-dump] NodeManager.deadline_timer.debug_state_dump - 198 total (1 active), Execution time: mean = 1.755 ms, total = 347.558 ms, Queueing time: mean = 67.875 us, max = 163.431 us, min = 13.687 us, total = 13.439 ms [state-dump] ClientConnection.async_read.ProcessMessageHeader - 95 total (21 active), Execution time: mean = 7.620 us, total = 723.861 us, Queueing time: mean = 660.815 ms, max = 59.826 s, min = 29.537 us, total = 62.777 s [state-dump] ClientConnection.async_read.ProcessMessage - 74 total (0 active), Execution time: mean = 752.373 us, total = 55.676 ms, Queueing time: mean = 18.898 us, max = 247.615 us, min = 4.453 us, total = 1.398 ms [state-dump] NodeManager.deadline_timer.print_event_loop_stats - 33 total (1 active, 1 running), Execution time: mean = 2.583 ms, total = 85.237 ms, Queueing time: mean = 71.786 us, max = 215.454 us, min = 15.835 us, total = 2.369 ms [state-dump] ClientConnection.async_write.DoAsyncWrites - 22 total (0 active), Execution time: mean = 2.129 us, total = 46.834 us, Queueing time: mean = 43.475 us, max = 264.443 us, min = 17.727 us, total = 956.459 us [state-dump] ObjectManager.ObjectDeleted - 21 total (0 active), Execution time: mean = 18.654 us, total = 391.728 us, Queueing time: mean = 66.932 us, max = 269.040 us, min = 24.280 us, total = 1.406 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 21 total (0 active), Execution time: mean = 119.783 us, total = 2.515 ms, Queueing time: mean = 119.709 us, max = 193.975 us, min = 26.529 us, total = 2.514 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig - 21 total (0 active), Execution time: mean = 800.451 us, total = 16.809 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ObjectManager.ObjectAdded - 21 total (0 active), Execution time: mean = 13.522 us, total = 283.961 us, Queueing time: mean = 115.200 us, max = 203.067 us, min = 15.853 us, total = 2.419 ms [state-dump] PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 204.482 us, total = 2.658 ms, Queueing time: mean = 3.109 ms, max = 9.915 ms, min = 28.359 us, total = 40.413 ms [state-dump] - 12 total (0 active), Execution time: mean = 1.109 us, total = 13.308 us, Queueing time: mean = 117.235 us, max = 207.332 us, min = 32.817 us, total = 1.407 ms [state-dump] RaySyncer.BroadcastMessage - 12 total (0 active), Execution time: mean = 229.790 us, total = 2.757 ms, Queueing time: mean = 796.500 ns, max = 1.330 us, min = 180.000 ns, total = 9.558 us [state-dump] NodeManagerService.grpc_server.RequestWorkerLease.HandleRequestImpl - 10 total (0 active), Execution time: mean = 133.400 us, total = 1.334 ms, Queueing time: mean = 133.654 us, max = 258.016 us, min = 50.206 us, total = 1.337 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker - 10 total (0 active), Execution time: mean = 698.937 us, total = 6.989 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.RequestWorkerLease - 10 total (0 active), Execution time: mean = 967.372 us, total = 9.674 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] WorkerPool.PopWorkerCallback - 10 total (0 active), Execution time: mean = 24.488 us, total = 244.878 us, Queueing time: mean = 114.997 us, max = 206.869 us, min = 23.653 us, total = 1.150 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker.HandleRequestImpl - 10 total (0 active), Execution time: mean = 129.906 us, total = 1.299 ms, Queueing time: mean = 109.463 us, max = 137.712 us, min = 31.332 us, total = 1.095 ms [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 5 total (1 active), Execution time: mean = 359.520 s, total = 1797.602 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 4 total (0 active), Execution time: mean = 320.800 us, total = 1.283 ms, Queueing time: mean = 94.198 us, max = 160.959 us, min = 20.299 us, total = 376.792 us [state-dump] NodeManager.GCTaskFailureReason - 3 total (1 active), Execution time: mean = 6.632 us, total = 19.896 us, Queueing time: mean = 60.387 us, max = 97.290 us, min = 83.871 us, total = 181.161 us [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 1.320 ms, total = 2.640 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 142.119 us, total = 284.237 us, Queueing time: mean = 702.200 us, max = 1.287 ms, min = 117.240 us, total = 1.404 ms [state-dump] RaySyncerRegister - 2 total (0 active), Execution time: mean = 2.053 us, total = 4.106 us, Queueing time: mean = 222.000 ns, max = 369.000 ns, min = 75.000 ns, total = 444.000 ns [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 2.178 ms, total = 2.178 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 1.022 s, total = 1.022 s, Queueing time: mean = 126.779 us, max = 126.779 us, min = 126.779 us, total = 126.779 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 43.810 us, total = 43.810 us, Queueing time: mean = 116.080 us, max = 116.080 us, min = 116.080 us, total = 116.080 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 1.935 ms, total = 1.935 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 2.002 ms, total = 2.002 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 1.278 ms, total = 1.278 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 288.995 us, total = 288.995 us, Queueing time: mean = 122.868 us, max = 122.868 us, min = 122.868 us, total = 122.868 us [state-dump] Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 74.750 us, total = 74.750 us, Queueing time: mean = 225.429 us, max = 225.429 us, min = 225.429 us, total = 225.429 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 242.297 us, total = 242.297 us, Queueing time: mean = 122.977 us, max = 122.977 us, min = 122.977 us, total = 122.977 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 92.080 us, total = 92.080 us, Queueing time: mean = 36.492 us, max = 36.492 us, min = 36.492 us, total = 36.492 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 1.746 ms, total = 1.746 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] DebugString() time ms: 1 [state-dump] [state-dump] [2025-01-20 23:35:54,913 I 13636 13664] (raylet) store.cc:564: Plasma store debug dump: Current usage: 0 / 2.14748 GB - num bytes created total: 168 0 pending objects of total size 0MB - objects spillable: 0 - bytes spillable: 0 - objects unsealed: 0 - bytes unsealed: 0 - objects in use: 0 - bytes in use: 0 - objects evictable: 0 - bytes evictable: 0 - objects created by worker: 0 - bytes created by worker: 0 - objects restored: 0 - bytes restored: 0 - objects received: 0 - bytes received: 0 - objects errored: 0 - bytes errored: 0 [2025-01-20 23:35:56,018 I 13636 13636] (raylet) node_manager.cc:525: [state-dump] NodeManager: [state-dump] Node ID: 94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3 [state-dump] Node name: 192.168.0.2 [state-dump] InitialConfigResources: {object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, GPU: 20000, node:__internal_head__: 10000, accelerator_type:A40: 10000, memory: 844966629380000} [state-dump] ClusterTaskManager: [state-dump] ========== Node: 94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3 ================= [state-dump] Infeasible queue length: 0 [state-dump] Schedule queue length: 0 [state-dump] Dispatch queue length: 0 [state-dump] num_waiting_for_resource: 0 [state-dump] num_waiting_for_plasma_memory: 0 [state-dump] num_waiting_for_remote_node_resources: 0 [state-dump] num_worker_not_started_by_job_config_not_exist: 0 [state-dump] num_worker_not_started_by_registration_timeout: 0 [state-dump] num_tasks_waiting_for_workers: 0 [state-dump] num_cancelled_tasks: 0 [state-dump] cluster_resource_scheduler state: [state-dump] Local id: -2729954820020547849 Local resources: {"total":{CPU: [200000], memory: [844966629380000], accelerator_type:A40: [10000], node:192.168.0.2: [10000], node:__internal_head__: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000]}}, "available": {CPU: [200000], memory: [844966629380000], accelerator_type:A40: [10000], node:192.168.0.2: [10000], node:__internal_head__: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000]}}, "labels":{"ray.io/node_id":"94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3",} is_draining: 0 is_idle: 1 Cluster resources: node id: -2729954820020547849{"total":{object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, GPU: 20000, node:__internal_head__: 10000, accelerator_type:A40: 10000, memory: 844966629380000}}, "available": {GPU: 20000, node:__internal_head__: 10000, memory: 844966629380000, accelerator_type:A40: 10000, object_store_memory: 21474836480000, node:192.168.0.2: 10000, CPU: 200000}}, "labels":{"ray.io/node_id":"94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} [state-dump] Waiting tasks size: 0 [state-dump] Number of executing tasks: 0 [state-dump] Number of pinned task arguments: 0 [state-dump] Number of total spilled tasks: 0 [state-dump] Number of spilled waiting tasks: 0 [state-dump] Number of spilled unschedulable tasks: 0 [state-dump] Resource usage { [state-dump] } [state-dump] Backlog Size per scheduling descriptor :{workerId: num backlogs}: [state-dump] [state-dump] Running tasks by scheduling class: [state-dump] ================================================== [state-dump] [state-dump] ClusterResources: [state-dump] LocalObjectManager: [state-dump] - num pinned objects: 0 [state-dump] - pinned objects size: 0 [state-dump] - num objects pending restore: 0 [state-dump] - num objects pending spill: 0 [state-dump] - num bytes pending spill: 0 [state-dump] - num bytes currently spilled: 0 [state-dump] - cumulative spill requests: 0 [state-dump] - cumulative restore requests: 0 [state-dump] - spilled objects pending delete: 0 [state-dump] [state-dump] ObjectManager: [state-dump] - num local objects: 0 [state-dump] - num unfulfilled push requests: 0 [state-dump] - num object pull requests: 0 [state-dump] - num chunks received total: 0 [state-dump] - num chunks received failed (all): 0 [state-dump] - num chunks received failed / cancelled: 0 [state-dump] - num chunks received failed / plasma error: 0 [state-dump] Event stats: [state-dump] Global stats: 0 total (0 active) [state-dump] Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Execution time: mean = -nan s, total = 0.000 s [state-dump] Event stats: [state-dump] PushManager: [state-dump] - num pushes in flight: 0 [state-dump] - num chunks in flight: 0 [state-dump] - num chunks remaining: 0 [state-dump] - max chunks allowed: 409 [state-dump] OwnershipBasedObjectDirectory: [state-dump] - num listeners: 0 [state-dump] - cumulative location updates: 0 [state-dump] - num location updates per second: 0.000 [state-dump] - num location lookups per second: 0.000 [state-dump] - num locations added per second: 0.000 [state-dump] - num locations removed per second: 0.000 [state-dump] BufferPool: [state-dump] - create buffer state map size: 0 [state-dump] PullManager: [state-dump] - num bytes available for pulled objects: 2147483648 [state-dump] - num bytes being pulled (all): 0 [state-dump] - num bytes being pulled / pinned: 0 [state-dump] - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - first get request bundle: N/A [state-dump] - first wait request bundle: N/A [state-dump] - first task request bundle: N/A [state-dump] - num objects queued: 0 [state-dump] - num objects actively pulled (all): 0 [state-dump] - num objects actively pulled / pinned: 0 [state-dump] - num bundles being pulled: 0 [state-dump] - num pull retries: 0 [state-dump] - max timeout seconds: 0 [state-dump] - max timeout request is already processed. No entry. [state-dump] [state-dump] WorkerPool: [state-dump] - registered jobs: 1 [state-dump] - process_failed_job_config_missing: 0 [state-dump] - process_failed_rate_limited: 0 [state-dump] - process_failed_pending_registration: 0 [state-dump] - process_failed_runtime_env_setup_failed: 0 [state-dump] - num PYTHON workers: 20 [state-dump] - num PYTHON drivers: 1 [state-dump] - num PYTHON pending start requests: 0 [state-dump] - num PYTHON pending registration requests: 0 [state-dump] - num object spill callbacks queued: 0 [state-dump] - num object restore queued: 0 [state-dump] - num util functions queued: 0 [state-dump] - num idle workers: 20 [state-dump] TaskDependencyManager: [state-dump] - task deps map size: 0 [state-dump] - get req map size: 0 [state-dump] - wait req map size: 0 [state-dump] - local objects map size: 0 [state-dump] WaitManager: [state-dump] - num active wait requests: 0 [state-dump] Subscriber: [state-dump] Channel WORKER_OBJECT_LOCATIONS_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_OBJECT_EVICTION [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_REF_REMOVED_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] num async plasma notifications: 0 [state-dump] Remote node managers: [state-dump] Event stats: [state-dump] Global stats: 178261 total (35 active) [state-dump] Queueing time: mean = 430.675 us, max = 59.826 s, min = -0.000 s, total = 76.772 s [state-dump] Execution time: mean = 10.269 ms, total = 1830.626 s [state-dump] Event stats: [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog - 42819 total (0 active), Execution time: mean = 538.885 us, total = 23.075 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 42819 total (0 active), Execution time: mean = 35.821 us, total = 1.534 s, Queueing time: mean = 112.027 us, max = 2.841 ms, min = 2.750 us, total = 4.797 s [state-dump] ObjectManager.UpdateAvailableMemory - 20381 total (0 active), Execution time: mean = 6.124 us, total = 124.805 ms, Queueing time: mean = 109.619 us, max = 1.076 ms, min = 2.228 us, total = 2.234 s [state-dump] NodeManager.CheckGC - 20381 total (1 active), Execution time: mean = 2.850 us, total = 58.093 ms, Queueing time: mean = 98.810 us, max = 25.875 ms, min = 2.848 us, total = 2.014 s [state-dump] RaySyncer.OnDemandBroadcasting - 20381 total (1 active), Execution time: mean = 10.659 us, total = 217.240 ms, Queueing time: mean = 91.898 us, max = 25.869 ms, min = 6.166 us, total = 1.873 s [state-dump] RayletWorkerPool.deadline_timer.kill_idle_workers - 10196 total (1 active), Execution time: mean = 18.461 us, total = 188.232 ms, Queueing time: mean = 77.482 us, max = 26.386 ms, min = -0.000 s, total = 790.005 ms [state-dump] MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 8144 total (1 active), Execution time: mean = 454.856 us, total = 3.704 s, Queueing time: mean = 74.958 us, max = 3.532 ms, min = -0.000 s, total = 610.456 ms [state-dump] NodeManager.deadline_timer.spill_objects_when_over_threshold - 2040 total (1 active), Execution time: mean = 3.125 us, total = 6.375 ms, Queueing time: mean = 180.672 us, max = 2.473 ms, min = 3.958 us, total = 368.570 ms [state-dump] NodeManager.deadline_timer.flush_free_objects - 2040 total (1 active), Execution time: mean = 9.059 us, total = 18.480 ms, Queueing time: mean = 176.617 us, max = 2.483 ms, min = 6.450 us, total = 360.298 ms [state-dump] NodeManager.ScheduleAndDispatchTasks - 2040 total (1 active), Execution time: mean = 16.519 us, total = 33.698 ms, Queueing time: mean = 75.155 us, max = 2.581 ms, min = 10.666 us, total = 153.315 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 2039 total (0 active), Execution time: mean = 98.101 us, total = 200.027 ms, Queueing time: mean = 113.406 us, max = 2.934 ms, min = 4.027 us, total = 231.234 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad - 2039 total (0 active), Execution time: mean = 615.600 us, total = 1.255 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ClusterResourceManager.ResetRemoteNodeView - 681 total (1 active), Execution time: mean = 10.933 us, total = 7.446 ms, Queueing time: mean = 77.047 us, max = 442.307 us, min = 14.912 us, total = 52.469 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 408 total (0 active), Execution time: mean = 1.561 ms, total = 637.054 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.deadline_timer.record_metrics - 408 total (1 active), Execution time: mean = 579.860 us, total = 236.583 ms, Queueing time: mean = 336.823 us, max = 2.010 ms, min = 10.616 us, total = 137.424 ms [state-dump] NodeManager.GcsCheckAlive - 408 total (1 active), Execution time: mean = 298.340 us, total = 121.723 ms, Queueing time: mean = 617.403 us, max = 2.522 ms, min = 6.690 us, total = 251.900 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 408 total (0 active), Execution time: mean = 53.743 us, total = 21.927 ms, Queueing time: mean = 118.774 us, max = 4.779 ms, min = 11.561 us, total = 48.460 ms [state-dump] NodeManager.deadline_timer.debug_state_dump - 204 total (1 active), Execution time: mean = 1.757 ms, total = 358.506 ms, Queueing time: mean = 67.817 us, max = 163.431 us, min = 13.687 us, total = 13.835 ms [state-dump] ClientConnection.async_read.ProcessMessageHeader - 95 total (21 active), Execution time: mean = 7.620 us, total = 723.861 us, Queueing time: mean = 660.815 ms, max = 59.826 s, min = 29.537 us, total = 62.777 s [state-dump] ClientConnection.async_read.ProcessMessage - 74 total (0 active), Execution time: mean = 752.373 us, total = 55.676 ms, Queueing time: mean = 18.898 us, max = 247.615 us, min = 4.453 us, total = 1.398 ms [state-dump] NodeManager.deadline_timer.print_event_loop_stats - 34 total (1 active, 1 running), Execution time: mean = 2.594 ms, total = 88.199 ms, Queueing time: mean = 72.241 us, max = 215.454 us, min = 15.835 us, total = 2.456 ms [state-dump] ClientConnection.async_write.DoAsyncWrites - 22 total (0 active), Execution time: mean = 2.129 us, total = 46.834 us, Queueing time: mean = 43.475 us, max = 264.443 us, min = 17.727 us, total = 956.459 us [state-dump] ObjectManager.ObjectDeleted - 21 total (0 active), Execution time: mean = 18.654 us, total = 391.728 us, Queueing time: mean = 66.932 us, max = 269.040 us, min = 24.280 us, total = 1.406 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 21 total (0 active), Execution time: mean = 119.783 us, total = 2.515 ms, Queueing time: mean = 119.709 us, max = 193.975 us, min = 26.529 us, total = 2.514 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig - 21 total (0 active), Execution time: mean = 800.451 us, total = 16.809 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ObjectManager.ObjectAdded - 21 total (0 active), Execution time: mean = 13.522 us, total = 283.961 us, Queueing time: mean = 115.200 us, max = 203.067 us, min = 15.853 us, total = 2.419 ms [state-dump] PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 204.482 us, total = 2.658 ms, Queueing time: mean = 3.109 ms, max = 9.915 ms, min = 28.359 us, total = 40.413 ms [state-dump] - 12 total (0 active), Execution time: mean = 1.109 us, total = 13.308 us, Queueing time: mean = 117.235 us, max = 207.332 us, min = 32.817 us, total = 1.407 ms [state-dump] RaySyncer.BroadcastMessage - 12 total (0 active), Execution time: mean = 229.790 us, total = 2.757 ms, Queueing time: mean = 796.500 ns, max = 1.330 us, min = 180.000 ns, total = 9.558 us [state-dump] NodeManagerService.grpc_server.RequestWorkerLease.HandleRequestImpl - 10 total (0 active), Execution time: mean = 133.400 us, total = 1.334 ms, Queueing time: mean = 133.654 us, max = 258.016 us, min = 50.206 us, total = 1.337 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker - 10 total (0 active), Execution time: mean = 698.937 us, total = 6.989 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.RequestWorkerLease - 10 total (0 active), Execution time: mean = 967.372 us, total = 9.674 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] WorkerPool.PopWorkerCallback - 10 total (0 active), Execution time: mean = 24.488 us, total = 244.878 us, Queueing time: mean = 114.997 us, max = 206.869 us, min = 23.653 us, total = 1.150 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker.HandleRequestImpl - 10 total (0 active), Execution time: mean = 129.906 us, total = 1.299 ms, Queueing time: mean = 109.463 us, max = 137.712 us, min = 31.332 us, total = 1.095 ms [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 5 total (1 active), Execution time: mean = 359.520 s, total = 1797.602 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 4 total (0 active), Execution time: mean = 320.800 us, total = 1.283 ms, Queueing time: mean = 94.198 us, max = 160.959 us, min = 20.299 us, total = 376.792 us [state-dump] NodeManager.GCTaskFailureReason - 3 total (1 active), Execution time: mean = 6.632 us, total = 19.896 us, Queueing time: mean = 60.387 us, max = 97.290 us, min = 83.871 us, total = 181.161 us [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 1.320 ms, total = 2.640 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 142.119 us, total = 284.237 us, Queueing time: mean = 702.200 us, max = 1.287 ms, min = 117.240 us, total = 1.404 ms [state-dump] RaySyncerRegister - 2 total (0 active), Execution time: mean = 2.053 us, total = 4.106 us, Queueing time: mean = 222.000 ns, max = 369.000 ns, min = 75.000 ns, total = 444.000 ns [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 2.178 ms, total = 2.178 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 1.022 s, total = 1.022 s, Queueing time: mean = 126.779 us, max = 126.779 us, min = 126.779 us, total = 126.779 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 43.810 us, total = 43.810 us, Queueing time: mean = 116.080 us, max = 116.080 us, min = 116.080 us, total = 116.080 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 1.935 ms, total = 1.935 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 2.002 ms, total = 2.002 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 1.278 ms, total = 1.278 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 288.995 us, total = 288.995 us, Queueing time: mean = 122.868 us, max = 122.868 us, min = 122.868 us, total = 122.868 us [state-dump] Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 74.750 us, total = 74.750 us, Queueing time: mean = 225.429 us, max = 225.429 us, min = 225.429 us, total = 225.429 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 242.297 us, total = 242.297 us, Queueing time: mean = 122.977 us, max = 122.977 us, min = 122.977 us, total = 122.977 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 92.080 us, total = 92.080 us, Queueing time: mean = 36.492 us, max = 36.492 us, min = 36.492 us, total = 36.492 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 1.746 ms, total = 1.746 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] DebugString() time ms: 1 [state-dump] [state-dump] [2025-01-20 23:36:54,913 I 13636 13664] (raylet) store.cc:564: Plasma store debug dump: Current usage: 0 / 2.14748 GB - num bytes created total: 168 0 pending objects of total size 0MB - objects spillable: 0 - bytes spillable: 0 - objects unsealed: 0 - bytes unsealed: 0 - objects in use: 0 - bytes in use: 0 - objects evictable: 0 - bytes evictable: 0 - objects created by worker: 0 - bytes created by worker: 0 - objects restored: 0 - bytes restored: 0 - objects received: 0 - bytes received: 0 - objects errored: 0 - bytes errored: 0 [2025-01-20 23:36:56,021 I 13636 13636] (raylet) node_manager.cc:525: [state-dump] NodeManager: [state-dump] Node ID: 94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3 [state-dump] Node name: 192.168.0.2 [state-dump] InitialConfigResources: {object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, GPU: 20000, node:__internal_head__: 10000, accelerator_type:A40: 10000, memory: 844966629380000} [state-dump] ClusterTaskManager: [state-dump] ========== Node: 94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3 ================= [state-dump] Infeasible queue length: 0 [state-dump] Schedule queue length: 0 [state-dump] Dispatch queue length: 0 [state-dump] num_waiting_for_resource: 0 [state-dump] num_waiting_for_plasma_memory: 0 [state-dump] num_waiting_for_remote_node_resources: 0 [state-dump] num_worker_not_started_by_job_config_not_exist: 0 [state-dump] num_worker_not_started_by_registration_timeout: 0 [state-dump] num_tasks_waiting_for_workers: 0 [state-dump] num_cancelled_tasks: 0 [state-dump] cluster_resource_scheduler state: [state-dump] Local id: -2729954820020547849 Local resources: {"total":{CPU: [200000], memory: [844966629380000], accelerator_type:A40: [10000], node:192.168.0.2: [10000], node:__internal_head__: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000]}}, "available": {CPU: [200000], memory: [844966629380000], accelerator_type:A40: [10000], node:192.168.0.2: [10000], node:__internal_head__: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000]}}, "labels":{"ray.io/node_id":"94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3",} is_draining: 0 is_idle: 1 Cluster resources: node id: -2729954820020547849{"total":{object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, GPU: 20000, node:__internal_head__: 10000, accelerator_type:A40: 10000, memory: 844966629380000}}, "available": {GPU: 20000, node:__internal_head__: 10000, memory: 844966629380000, accelerator_type:A40: 10000, object_store_memory: 21474836480000, node:192.168.0.2: 10000, CPU: 200000}}, "labels":{"ray.io/node_id":"94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} [state-dump] Waiting tasks size: 0 [state-dump] Number of executing tasks: 0 [state-dump] Number of pinned task arguments: 0 [state-dump] Number of total spilled tasks: 0 [state-dump] Number of spilled waiting tasks: 0 [state-dump] Number of spilled unschedulable tasks: 0 [state-dump] Resource usage { [state-dump] } [state-dump] Backlog Size per scheduling descriptor :{workerId: num backlogs}: [state-dump] [state-dump] Running tasks by scheduling class: [state-dump] ================================================== [state-dump] [state-dump] ClusterResources: [state-dump] LocalObjectManager: [state-dump] - num pinned objects: 0 [state-dump] - pinned objects size: 0 [state-dump] - num objects pending restore: 0 [state-dump] - num objects pending spill: 0 [state-dump] - num bytes pending spill: 0 [state-dump] - num bytes currently spilled: 0 [state-dump] - cumulative spill requests: 0 [state-dump] - cumulative restore requests: 0 [state-dump] - spilled objects pending delete: 0 [state-dump] [state-dump] ObjectManager: [state-dump] - num local objects: 0 [state-dump] - num unfulfilled push requests: 0 [state-dump] - num object pull requests: 0 [state-dump] - num chunks received total: 0 [state-dump] - num chunks received failed (all): 0 [state-dump] - num chunks received failed / cancelled: 0 [state-dump] - num chunks received failed / plasma error: 0 [state-dump] Event stats: [state-dump] Global stats: 0 total (0 active) [state-dump] Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Execution time: mean = -nan s, total = 0.000 s [state-dump] Event stats: [state-dump] PushManager: [state-dump] - num pushes in flight: 0 [state-dump] - num chunks in flight: 0 [state-dump] - num chunks remaining: 0 [state-dump] - max chunks allowed: 409 [state-dump] OwnershipBasedObjectDirectory: [state-dump] - num listeners: 0 [state-dump] - cumulative location updates: 0 [state-dump] - num location updates per second: 0.000 [state-dump] - num location lookups per second: 0.000 [state-dump] - num locations added per second: 0.000 [state-dump] - num locations removed per second: 0.000 [state-dump] BufferPool: [state-dump] - create buffer state map size: 0 [state-dump] PullManager: [state-dump] - num bytes available for pulled objects: 2147483648 [state-dump] - num bytes being pulled (all): 0 [state-dump] - num bytes being pulled / pinned: 0 [state-dump] - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - first get request bundle: N/A [state-dump] - first wait request bundle: N/A [state-dump] - first task request bundle: N/A [state-dump] - num objects queued: 0 [state-dump] - num objects actively pulled (all): 0 [state-dump] - num objects actively pulled / pinned: 0 [state-dump] - num bundles being pulled: 0 [state-dump] - num pull retries: 0 [state-dump] - max timeout seconds: 0 [state-dump] - max timeout request is already processed. No entry. [state-dump] [state-dump] WorkerPool: [state-dump] - registered jobs: 1 [state-dump] - process_failed_job_config_missing: 0 [state-dump] - process_failed_rate_limited: 0 [state-dump] - process_failed_pending_registration: 0 [state-dump] - process_failed_runtime_env_setup_failed: 0 [state-dump] - num PYTHON workers: 20 [state-dump] - num PYTHON drivers: 1 [state-dump] - num PYTHON pending start requests: 0 [state-dump] - num PYTHON pending registration requests: 0 [state-dump] - num object spill callbacks queued: 0 [state-dump] - num object restore queued: 0 [state-dump] - num util functions queued: 0 [state-dump] - num idle workers: 20 [state-dump] TaskDependencyManager: [state-dump] - task deps map size: 0 [state-dump] - get req map size: 0 [state-dump] - wait req map size: 0 [state-dump] - local objects map size: 0 [state-dump] WaitManager: [state-dump] - num active wait requests: 0 [state-dump] Subscriber: [state-dump] Channel WORKER_OBJECT_LOCATIONS_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_OBJECT_EVICTION [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_REF_REMOVED_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] num async plasma notifications: 0 [state-dump] Remote node managers: [state-dump] Event stats: [state-dump] Global stats: 183492 total (35 active) [state-dump] Queueing time: mean = 420.586 us, max = 59.826 s, min = -0.000 s, total = 77.174 s [state-dump] Execution time: mean = 9.982 ms, total = 1831.527 s [state-dump] Event stats: [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog - 44079 total (0 active), Execution time: mean = 538.420 us, total = 23.733 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 44079 total (0 active), Execution time: mean = 35.774 us, total = 1.577 s, Queueing time: mean = 112.050 us, max = 2.841 ms, min = 2.750 us, total = 4.939 s [state-dump] ObjectManager.UpdateAvailableMemory - 20980 total (0 active), Execution time: mean = 6.108 us, total = 128.140 ms, Queueing time: mean = 109.971 us, max = 1.076 ms, min = 2.228 us, total = 2.307 s [state-dump] NodeManager.CheckGC - 20980 total (1 active), Execution time: mean = 2.843 us, total = 59.641 ms, Queueing time: mean = 98.610 us, max = 25.875 ms, min = 2.848 us, total = 2.069 s [state-dump] RaySyncer.OnDemandBroadcasting - 20980 total (1 active), Execution time: mean = 10.604 us, total = 222.464 ms, Queueing time: mean = 91.745 us, max = 25.869 ms, min = 6.166 us, total = 1.925 s [state-dump] RayletWorkerPool.deadline_timer.kill_idle_workers - 10496 total (1 active), Execution time: mean = 18.340 us, total = 192.499 ms, Queueing time: mean = 77.460 us, max = 26.386 ms, min = -0.000 s, total = 813.024 ms [state-dump] MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 8383 total (1 active), Execution time: mean = 454.085 us, total = 3.807 s, Queueing time: mean = 74.852 us, max = 3.532 ms, min = -0.000 s, total = 627.482 ms [state-dump] NodeManager.deadline_timer.spill_objects_when_over_threshold - 2100 total (1 active), Execution time: mean = 3.112 us, total = 6.535 ms, Queueing time: mean = 179.764 us, max = 2.473 ms, min = 3.958 us, total = 377.504 ms [state-dump] NodeManager.deadline_timer.flush_free_objects - 2100 total (1 active), Execution time: mean = 9.023 us, total = 18.948 ms, Queueing time: mean = 175.723 us, max = 2.483 ms, min = 6.450 us, total = 369.018 ms [state-dump] NodeManager.ScheduleAndDispatchTasks - 2100 total (1 active), Execution time: mean = 16.439 us, total = 34.522 ms, Queueing time: mean = 74.976 us, max = 2.581 ms, min = 10.666 us, total = 157.450 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 2099 total (0 active), Execution time: mean = 97.865 us, total = 205.418 ms, Queueing time: mean = 113.065 us, max = 2.934 ms, min = 4.027 us, total = 237.323 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad - 2099 total (0 active), Execution time: mean = 614.448 us, total = 1.290 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ClusterResourceManager.ResetRemoteNodeView - 701 total (1 active), Execution time: mean = 10.863 us, total = 7.615 ms, Queueing time: mean = 76.960 us, max = 442.307 us, min = 14.912 us, total = 53.949 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 420 total (0 active), Execution time: mean = 1.559 ms, total = 654.879 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.deadline_timer.record_metrics - 420 total (1 active), Execution time: mean = 578.980 us, total = 243.172 ms, Queueing time: mean = 333.529 us, max = 2.010 ms, min = 10.616 us, total = 140.082 ms [state-dump] NodeManager.GcsCheckAlive - 420 total (1 active), Execution time: mean = 297.742 us, total = 125.052 ms, Queueing time: mean = 613.581 us, max = 2.522 ms, min = 6.690 us, total = 257.704 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 420 total (0 active), Execution time: mean = 53.819 us, total = 22.604 ms, Queueing time: mean = 118.751 us, max = 4.779 ms, min = 11.561 us, total = 49.875 ms [state-dump] NodeManager.deadline_timer.debug_state_dump - 210 total (1 active), Execution time: mean = 1.754 ms, total = 368.244 ms, Queueing time: mean = 67.490 us, max = 163.431 us, min = 13.687 us, total = 14.173 ms [state-dump] ClientConnection.async_read.ProcessMessageHeader - 95 total (21 active), Execution time: mean = 7.620 us, total = 723.861 us, Queueing time: mean = 660.815 ms, max = 59.826 s, min = 29.537 us, total = 62.777 s [state-dump] ClientConnection.async_read.ProcessMessage - 74 total (0 active), Execution time: mean = 752.373 us, total = 55.676 ms, Queueing time: mean = 18.898 us, max = 247.615 us, min = 4.453 us, total = 1.398 ms [state-dump] NodeManager.deadline_timer.print_event_loop_stats - 35 total (1 active, 1 running), Execution time: mean = 2.603 ms, total = 91.099 ms, Queueing time: mean = 72.024 us, max = 215.454 us, min = 15.835 us, total = 2.521 ms [state-dump] ClientConnection.async_write.DoAsyncWrites - 22 total (0 active), Execution time: mean = 2.129 us, total = 46.834 us, Queueing time: mean = 43.475 us, max = 264.443 us, min = 17.727 us, total = 956.459 us [state-dump] ObjectManager.ObjectDeleted - 21 total (0 active), Execution time: mean = 18.654 us, total = 391.728 us, Queueing time: mean = 66.932 us, max = 269.040 us, min = 24.280 us, total = 1.406 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 21 total (0 active), Execution time: mean = 119.783 us, total = 2.515 ms, Queueing time: mean = 119.709 us, max = 193.975 us, min = 26.529 us, total = 2.514 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig - 21 total (0 active), Execution time: mean = 800.451 us, total = 16.809 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ObjectManager.ObjectAdded - 21 total (0 active), Execution time: mean = 13.522 us, total = 283.961 us, Queueing time: mean = 115.200 us, max = 203.067 us, min = 15.853 us, total = 2.419 ms [state-dump] PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 204.482 us, total = 2.658 ms, Queueing time: mean = 3.109 ms, max = 9.915 ms, min = 28.359 us, total = 40.413 ms [state-dump] - 12 total (0 active), Execution time: mean = 1.109 us, total = 13.308 us, Queueing time: mean = 117.235 us, max = 207.332 us, min = 32.817 us, total = 1.407 ms [state-dump] RaySyncer.BroadcastMessage - 12 total (0 active), Execution time: mean = 229.790 us, total = 2.757 ms, Queueing time: mean = 796.500 ns, max = 1.330 us, min = 180.000 ns, total = 9.558 us [state-dump] NodeManagerService.grpc_server.RequestWorkerLease.HandleRequestImpl - 10 total (0 active), Execution time: mean = 133.400 us, total = 1.334 ms, Queueing time: mean = 133.654 us, max = 258.016 us, min = 50.206 us, total = 1.337 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker - 10 total (0 active), Execution time: mean = 698.937 us, total = 6.989 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.RequestWorkerLease - 10 total (0 active), Execution time: mean = 967.372 us, total = 9.674 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] WorkerPool.PopWorkerCallback - 10 total (0 active), Execution time: mean = 24.488 us, total = 244.878 us, Queueing time: mean = 114.997 us, max = 206.869 us, min = 23.653 us, total = 1.150 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker.HandleRequestImpl - 10 total (0 active), Execution time: mean = 129.906 us, total = 1.299 ms, Queueing time: mean = 109.463 us, max = 137.712 us, min = 31.332 us, total = 1.095 ms [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 5 total (1 active), Execution time: mean = 359.520 s, total = 1797.602 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 4 total (0 active), Execution time: mean = 320.800 us, total = 1.283 ms, Queueing time: mean = 94.198 us, max = 160.959 us, min = 20.299 us, total = 376.792 us [state-dump] NodeManager.GCTaskFailureReason - 3 total (1 active), Execution time: mean = 6.632 us, total = 19.896 us, Queueing time: mean = 60.387 us, max = 97.290 us, min = 83.871 us, total = 181.161 us [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 1.320 ms, total = 2.640 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 142.119 us, total = 284.237 us, Queueing time: mean = 702.200 us, max = 1.287 ms, min = 117.240 us, total = 1.404 ms [state-dump] RaySyncerRegister - 2 total (0 active), Execution time: mean = 2.053 us, total = 4.106 us, Queueing time: mean = 222.000 ns, max = 369.000 ns, min = 75.000 ns, total = 444.000 ns [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 2.178 ms, total = 2.178 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 1.022 s, total = 1.022 s, Queueing time: mean = 126.779 us, max = 126.779 us, min = 126.779 us, total = 126.779 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 43.810 us, total = 43.810 us, Queueing time: mean = 116.080 us, max = 116.080 us, min = 116.080 us, total = 116.080 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 1.935 ms, total = 1.935 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 2.002 ms, total = 2.002 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 1.278 ms, total = 1.278 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 288.995 us, total = 288.995 us, Queueing time: mean = 122.868 us, max = 122.868 us, min = 122.868 us, total = 122.868 us [state-dump] Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 74.750 us, total = 74.750 us, Queueing time: mean = 225.429 us, max = 225.429 us, min = 225.429 us, total = 225.429 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 242.297 us, total = 242.297 us, Queueing time: mean = 122.977 us, max = 122.977 us, min = 122.977 us, total = 122.977 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 92.080 us, total = 92.080 us, Queueing time: mean = 36.492 us, max = 36.492 us, min = 36.492 us, total = 36.492 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 1.746 ms, total = 1.746 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] DebugString() time ms: 1 [state-dump] [state-dump] [2025-01-20 23:37:54,913 I 13636 13664] (raylet) store.cc:564: Plasma store debug dump: Current usage: 0 / 2.14748 GB - num bytes created total: 168 0 pending objects of total size 0MB - objects spillable: 0 - bytes spillable: 0 - objects unsealed: 0 - bytes unsealed: 0 - objects in use: 0 - bytes in use: 0 - objects evictable: 0 - bytes evictable: 0 - objects created by worker: 0 - bytes created by worker: 0 - objects restored: 0 - bytes restored: 0 - objects received: 0 - bytes received: 0 - objects errored: 0 - bytes errored: 0 [2025-01-20 23:37:56,024 I 13636 13636] (raylet) node_manager.cc:525: [state-dump] NodeManager: [state-dump] Node ID: 94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3 [state-dump] Node name: 192.168.0.2 [state-dump] InitialConfigResources: {object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, GPU: 20000, node:__internal_head__: 10000, accelerator_type:A40: 10000, memory: 844966629380000} [state-dump] ClusterTaskManager: [state-dump] ========== Node: 94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3 ================= [state-dump] Infeasible queue length: 0 [state-dump] Schedule queue length: 0 [state-dump] Dispatch queue length: 0 [state-dump] num_waiting_for_resource: 0 [state-dump] num_waiting_for_plasma_memory: 0 [state-dump] num_waiting_for_remote_node_resources: 0 [state-dump] num_worker_not_started_by_job_config_not_exist: 0 [state-dump] num_worker_not_started_by_registration_timeout: 0 [state-dump] num_tasks_waiting_for_workers: 0 [state-dump] num_cancelled_tasks: 0 [state-dump] cluster_resource_scheduler state: [state-dump] Local id: -2729954820020547849 Local resources: {"total":{CPU: [200000], memory: [844966629380000], accelerator_type:A40: [10000], node:192.168.0.2: [10000], node:__internal_head__: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000]}}, "available": {CPU: [200000], memory: [844966629380000], accelerator_type:A40: [10000], node:192.168.0.2: [10000], node:__internal_head__: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000]}}, "labels":{"ray.io/node_id":"94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3",} is_draining: 0 is_idle: 1 Cluster resources: node id: -2729954820020547849{"total":{object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, GPU: 20000, node:__internal_head__: 10000, accelerator_type:A40: 10000, memory: 844966629380000}}, "available": {GPU: 20000, node:__internal_head__: 10000, memory: 844966629380000, accelerator_type:A40: 10000, object_store_memory: 21474836480000, node:192.168.0.2: 10000, CPU: 200000}}, "labels":{"ray.io/node_id":"94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} [state-dump] Waiting tasks size: 0 [state-dump] Number of executing tasks: 0 [state-dump] Number of pinned task arguments: 0 [state-dump] Number of total spilled tasks: 0 [state-dump] Number of spilled waiting tasks: 0 [state-dump] Number of spilled unschedulable tasks: 0 [state-dump] Resource usage { [state-dump] } [state-dump] Backlog Size per scheduling descriptor :{workerId: num backlogs}: [state-dump] [state-dump] Running tasks by scheduling class: [state-dump] ================================================== [state-dump] [state-dump] ClusterResources: [state-dump] LocalObjectManager: [state-dump] - num pinned objects: 0 [state-dump] - pinned objects size: 0 [state-dump] - num objects pending restore: 0 [state-dump] - num objects pending spill: 0 [state-dump] - num bytes pending spill: 0 [state-dump] - num bytes currently spilled: 0 [state-dump] - cumulative spill requests: 0 [state-dump] - cumulative restore requests: 0 [state-dump] - spilled objects pending delete: 0 [state-dump] [state-dump] ObjectManager: [state-dump] - num local objects: 0 [state-dump] - num unfulfilled push requests: 0 [state-dump] - num object pull requests: 0 [state-dump] - num chunks received total: 0 [state-dump] - num chunks received failed (all): 0 [state-dump] - num chunks received failed / cancelled: 0 [state-dump] - num chunks received failed / plasma error: 0 [state-dump] Event stats: [state-dump] Global stats: 0 total (0 active) [state-dump] Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Execution time: mean = -nan s, total = 0.000 s [state-dump] Event stats: [state-dump] PushManager: [state-dump] - num pushes in flight: 0 [state-dump] - num chunks in flight: 0 [state-dump] - num chunks remaining: 0 [state-dump] - max chunks allowed: 409 [state-dump] OwnershipBasedObjectDirectory: [state-dump] - num listeners: 0 [state-dump] - cumulative location updates: 0 [state-dump] - num location updates per second: 0.000 [state-dump] - num location lookups per second: 0.000 [state-dump] - num locations added per second: 0.000 [state-dump] - num locations removed per second: 0.000 [state-dump] BufferPool: [state-dump] - create buffer state map size: 0 [state-dump] PullManager: [state-dump] - num bytes available for pulled objects: 2147483648 [state-dump] - num bytes being pulled (all): 0 [state-dump] - num bytes being pulled / pinned: 0 [state-dump] - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - first get request bundle: N/A [state-dump] - first wait request bundle: N/A [state-dump] - first task request bundle: N/A [state-dump] - num objects queued: 0 [state-dump] - num objects actively pulled (all): 0 [state-dump] - num objects actively pulled / pinned: 0 [state-dump] - num bundles being pulled: 0 [state-dump] - num pull retries: 0 [state-dump] - max timeout seconds: 0 [state-dump] - max timeout request is already processed. No entry. [state-dump] [state-dump] WorkerPool: [state-dump] - registered jobs: 1 [state-dump] - process_failed_job_config_missing: 0 [state-dump] - process_failed_rate_limited: 0 [state-dump] - process_failed_pending_registration: 0 [state-dump] - process_failed_runtime_env_setup_failed: 0 [state-dump] - num PYTHON workers: 20 [state-dump] - num PYTHON drivers: 1 [state-dump] - num PYTHON pending start requests: 0 [state-dump] - num PYTHON pending registration requests: 0 [state-dump] - num object spill callbacks queued: 0 [state-dump] - num object restore queued: 0 [state-dump] - num util functions queued: 0 [state-dump] - num idle workers: 20 [state-dump] TaskDependencyManager: [state-dump] - task deps map size: 0 [state-dump] - get req map size: 0 [state-dump] - wait req map size: 0 [state-dump] - local objects map size: 0 [state-dump] WaitManager: [state-dump] - num active wait requests: 0 [state-dump] Subscriber: [state-dump] Channel WORKER_OBJECT_LOCATIONS_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_OBJECT_EVICTION [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_REF_REMOVED_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] num async plasma notifications: 0 [state-dump] Remote node managers: [state-dump] Event stats: [state-dump] Global stats: 188727 total (35 active) [state-dump] Queueing time: mean = 410.511 us, max = 59.826 s, min = -0.000 s, total = 77.474 s [state-dump] Execution time: mean = 9.708 ms, total = 1832.241 s [state-dump] Event stats: [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog - 45339 total (0 active), Execution time: mean = 534.298 us, total = 24.225 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 45339 total (0 active), Execution time: mean = 35.500 us, total = 1.610 s, Queueing time: mean = 110.891 us, max = 2.841 ms, min = 2.750 us, total = 5.028 s [state-dump] ObjectManager.UpdateAvailableMemory - 21580 total (0 active), Execution time: mean = 6.062 us, total = 130.826 ms, Queueing time: mean = 109.029 us, max = 1.076 ms, min = 2.228 us, total = 2.353 s [state-dump] NodeManager.CheckGC - 21580 total (1 active), Execution time: mean = 2.837 us, total = 61.222 ms, Queueing time: mean = 98.071 us, max = 25.875 ms, min = 2.848 us, total = 2.116 s [state-dump] RaySyncer.OnDemandBroadcasting - 21580 total (1 active), Execution time: mean = 10.545 us, total = 227.554 ms, Queueing time: mean = 91.256 us, max = 25.869 ms, min = 6.166 us, total = 1.969 s [state-dump] RayletWorkerPool.deadline_timer.kill_idle_workers - 10796 total (1 active), Execution time: mean = 18.234 us, total = 196.853 ms, Queueing time: mean = 76.955 us, max = 26.386 ms, min = -0.000 s, total = 830.809 ms [state-dump] MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 8623 total (1 active), Execution time: mean = 453.251 us, total = 3.908 s, Queueing time: mean = 74.364 us, max = 3.532 ms, min = -0.000 s, total = 641.243 ms [state-dump] NodeManager.deadline_timer.spill_objects_when_over_threshold - 2160 total (1 active), Execution time: mean = 3.102 us, total = 6.701 ms, Queueing time: mean = 179.742 us, max = 2.473 ms, min = 3.958 us, total = 388.243 ms [state-dump] NodeManager.deadline_timer.flush_free_objects - 2160 total (1 active), Execution time: mean = 8.971 us, total = 19.377 ms, Queueing time: mean = 175.727 us, max = 2.483 ms, min = 6.450 us, total = 379.571 ms [state-dump] NodeManager.ScheduleAndDispatchTasks - 2160 total (1 active), Execution time: mean = 16.348 us, total = 35.312 ms, Queueing time: mean = 74.404 us, max = 2.581 ms, min = 10.666 us, total = 160.714 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 2159 total (0 active), Execution time: mean = 97.469 us, total = 210.435 ms, Queueing time: mean = 111.705 us, max = 2.934 ms, min = 4.027 us, total = 241.172 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad - 2159 total (0 active), Execution time: mean = 610.290 us, total = 1.318 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ClusterResourceManager.ResetRemoteNodeView - 721 total (1 active), Execution time: mean = 10.760 us, total = 7.758 ms, Queueing time: mean = 76.203 us, max = 442.307 us, min = 14.912 us, total = 54.942 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 432 total (0 active), Execution time: mean = 1.553 ms, total = 671.028 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.deadline_timer.record_metrics - 432 total (1 active), Execution time: mean = 578.243 us, total = 249.801 ms, Queueing time: mean = 334.188 us, max = 2.010 ms, min = 10.616 us, total = 144.369 ms [state-dump] NodeManager.GcsCheckAlive - 432 total (1 active), Execution time: mean = 297.769 us, total = 128.636 ms, Queueing time: mean = 613.774 us, max = 2.522 ms, min = 6.690 us, total = 265.151 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 432 total (0 active), Execution time: mean = 53.687 us, total = 23.193 ms, Queueing time: mean = 117.558 us, max = 4.779 ms, min = 11.561 us, total = 50.785 ms [state-dump] NodeManager.deadline_timer.debug_state_dump - 216 total (1 active), Execution time: mean = 1.753 ms, total = 378.601 ms, Queueing time: mean = 66.864 us, max = 163.431 us, min = 13.687 us, total = 14.443 ms [state-dump] ClientConnection.async_read.ProcessMessageHeader - 95 total (21 active), Execution time: mean = 7.620 us, total = 723.861 us, Queueing time: mean = 660.815 ms, max = 59.826 s, min = 29.537 us, total = 62.777 s [state-dump] ClientConnection.async_read.ProcessMessage - 74 total (0 active), Execution time: mean = 752.373 us, total = 55.676 ms, Queueing time: mean = 18.898 us, max = 247.615 us, min = 4.453 us, total = 1.398 ms [state-dump] NodeManager.deadline_timer.print_event_loop_stats - 36 total (1 active, 1 running), Execution time: mean = 2.612 ms, total = 94.017 ms, Queueing time: mean = 72.248 us, max = 215.454 us, min = 15.835 us, total = 2.601 ms [state-dump] ClientConnection.async_write.DoAsyncWrites - 22 total (0 active), Execution time: mean = 2.129 us, total = 46.834 us, Queueing time: mean = 43.475 us, max = 264.443 us, min = 17.727 us, total = 956.459 us [state-dump] ObjectManager.ObjectDeleted - 21 total (0 active), Execution time: mean = 18.654 us, total = 391.728 us, Queueing time: mean = 66.932 us, max = 269.040 us, min = 24.280 us, total = 1.406 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 21 total (0 active), Execution time: mean = 119.783 us, total = 2.515 ms, Queueing time: mean = 119.709 us, max = 193.975 us, min = 26.529 us, total = 2.514 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig - 21 total (0 active), Execution time: mean = 800.451 us, total = 16.809 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ObjectManager.ObjectAdded - 21 total (0 active), Execution time: mean = 13.522 us, total = 283.961 us, Queueing time: mean = 115.200 us, max = 203.067 us, min = 15.853 us, total = 2.419 ms [state-dump] PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 204.482 us, total = 2.658 ms, Queueing time: mean = 3.109 ms, max = 9.915 ms, min = 28.359 us, total = 40.413 ms [state-dump] - 12 total (0 active), Execution time: mean = 1.109 us, total = 13.308 us, Queueing time: mean = 117.235 us, max = 207.332 us, min = 32.817 us, total = 1.407 ms [state-dump] RaySyncer.BroadcastMessage - 12 total (0 active), Execution time: mean = 229.790 us, total = 2.757 ms, Queueing time: mean = 796.500 ns, max = 1.330 us, min = 180.000 ns, total = 9.558 us [state-dump] NodeManagerService.grpc_server.RequestWorkerLease.HandleRequestImpl - 10 total (0 active), Execution time: mean = 133.400 us, total = 1.334 ms, Queueing time: mean = 133.654 us, max = 258.016 us, min = 50.206 us, total = 1.337 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker - 10 total (0 active), Execution time: mean = 698.937 us, total = 6.989 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.RequestWorkerLease - 10 total (0 active), Execution time: mean = 967.372 us, total = 9.674 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] WorkerPool.PopWorkerCallback - 10 total (0 active), Execution time: mean = 24.488 us, total = 244.878 us, Queueing time: mean = 114.997 us, max = 206.869 us, min = 23.653 us, total = 1.150 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker.HandleRequestImpl - 10 total (0 active), Execution time: mean = 129.906 us, total = 1.299 ms, Queueing time: mean = 109.463 us, max = 137.712 us, min = 31.332 us, total = 1.095 ms [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 5 total (1 active), Execution time: mean = 359.520 s, total = 1797.602 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 4 total (0 active), Execution time: mean = 320.800 us, total = 1.283 ms, Queueing time: mean = 94.198 us, max = 160.959 us, min = 20.299 us, total = 376.792 us [state-dump] NodeManager.GCTaskFailureReason - 3 total (1 active), Execution time: mean = 6.632 us, total = 19.896 us, Queueing time: mean = 60.387 us, max = 97.290 us, min = 83.871 us, total = 181.161 us [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 1.320 ms, total = 2.640 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 142.119 us, total = 284.237 us, Queueing time: mean = 702.200 us, max = 1.287 ms, min = 117.240 us, total = 1.404 ms [state-dump] RaySyncerRegister - 2 total (0 active), Execution time: mean = 2.053 us, total = 4.106 us, Queueing time: mean = 222.000 ns, max = 369.000 ns, min = 75.000 ns, total = 444.000 ns [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 2.178 ms, total = 2.178 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 1.022 s, total = 1.022 s, Queueing time: mean = 126.779 us, max = 126.779 us, min = 126.779 us, total = 126.779 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 43.810 us, total = 43.810 us, Queueing time: mean = 116.080 us, max = 116.080 us, min = 116.080 us, total = 116.080 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 1.935 ms, total = 1.935 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 2.002 ms, total = 2.002 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 1.278 ms, total = 1.278 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 288.995 us, total = 288.995 us, Queueing time: mean = 122.868 us, max = 122.868 us, min = 122.868 us, total = 122.868 us [state-dump] Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 74.750 us, total = 74.750 us, Queueing time: mean = 225.429 us, max = 225.429 us, min = 225.429 us, total = 225.429 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 242.297 us, total = 242.297 us, Queueing time: mean = 122.977 us, max = 122.977 us, min = 122.977 us, total = 122.977 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 92.080 us, total = 92.080 us, Queueing time: mean = 36.492 us, max = 36.492 us, min = 36.492 us, total = 36.492 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 1.746 ms, total = 1.746 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] DebugString() time ms: 1 [state-dump] [state-dump] [2025-01-20 23:38:54,913 I 13636 13664] (raylet) store.cc:564: Plasma store debug dump: Current usage: 0 / 2.14748 GB - num bytes created total: 168 0 pending objects of total size 0MB - objects spillable: 0 - bytes spillable: 0 - objects unsealed: 0 - bytes unsealed: 0 - objects in use: 0 - bytes in use: 0 - objects evictable: 0 - bytes evictable: 0 - objects created by worker: 0 - bytes created by worker: 0 - objects restored: 0 - bytes restored: 0 - objects received: 0 - bytes received: 0 - objects errored: 0 - bytes errored: 0 [2025-01-20 23:38:56,027 I 13636 13636] (raylet) node_manager.cc:525: [state-dump] NodeManager: [state-dump] Node ID: 94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3 [state-dump] Node name: 192.168.0.2 [state-dump] InitialConfigResources: {object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, GPU: 20000, node:__internal_head__: 10000, accelerator_type:A40: 10000, memory: 844966629380000} [state-dump] ClusterTaskManager: [state-dump] ========== Node: 94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3 ================= [state-dump] Infeasible queue length: 0 [state-dump] Schedule queue length: 0 [state-dump] Dispatch queue length: 0 [state-dump] num_waiting_for_resource: 0 [state-dump] num_waiting_for_plasma_memory: 0 [state-dump] num_waiting_for_remote_node_resources: 0 [state-dump] num_worker_not_started_by_job_config_not_exist: 0 [state-dump] num_worker_not_started_by_registration_timeout: 0 [state-dump] num_tasks_waiting_for_workers: 0 [state-dump] num_cancelled_tasks: 0 [state-dump] cluster_resource_scheduler state: [state-dump] Local id: -2729954820020547849 Local resources: {"total":{CPU: [200000], memory: [844966629380000], accelerator_type:A40: [10000], node:192.168.0.2: [10000], node:__internal_head__: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000]}}, "available": {CPU: [200000], memory: [844966629380000], accelerator_type:A40: [10000], node:192.168.0.2: [10000], node:__internal_head__: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000]}}, "labels":{"ray.io/node_id":"94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3",} is_draining: 0 is_idle: 1 Cluster resources: node id: -2729954820020547849{"total":{object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, GPU: 20000, node:__internal_head__: 10000, accelerator_type:A40: 10000, memory: 844966629380000}}, "available": {GPU: 20000, node:__internal_head__: 10000, memory: 844966629380000, accelerator_type:A40: 10000, object_store_memory: 21474836480000, node:192.168.0.2: 10000, CPU: 200000}}, "labels":{"ray.io/node_id":"94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} [state-dump] Waiting tasks size: 0 [state-dump] Number of executing tasks: 0 [state-dump] Number of pinned task arguments: 0 [state-dump] Number of total spilled tasks: 0 [state-dump] Number of spilled waiting tasks: 0 [state-dump] Number of spilled unschedulable tasks: 0 [state-dump] Resource usage { [state-dump] } [state-dump] Backlog Size per scheduling descriptor :{workerId: num backlogs}: [state-dump] [state-dump] Running tasks by scheduling class: [state-dump] ================================================== [state-dump] [state-dump] ClusterResources: [state-dump] LocalObjectManager: [state-dump] - num pinned objects: 0 [state-dump] - pinned objects size: 0 [state-dump] - num objects pending restore: 0 [state-dump] - num objects pending spill: 0 [state-dump] - num bytes pending spill: 0 [state-dump] - num bytes currently spilled: 0 [state-dump] - cumulative spill requests: 0 [state-dump] - cumulative restore requests: 0 [state-dump] - spilled objects pending delete: 0 [state-dump] [state-dump] ObjectManager: [state-dump] - num local objects: 0 [state-dump] - num unfulfilled push requests: 0 [state-dump] - num object pull requests: 0 [state-dump] - num chunks received total: 0 [state-dump] - num chunks received failed (all): 0 [state-dump] - num chunks received failed / cancelled: 0 [state-dump] - num chunks received failed / plasma error: 0 [state-dump] Event stats: [state-dump] Global stats: 0 total (0 active) [state-dump] Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Execution time: mean = -nan s, total = 0.000 s [state-dump] Event stats: [state-dump] PushManager: [state-dump] - num pushes in flight: 0 [state-dump] - num chunks in flight: 0 [state-dump] - num chunks remaining: 0 [state-dump] - max chunks allowed: 409 [state-dump] OwnershipBasedObjectDirectory: [state-dump] - num listeners: 0 [state-dump] - cumulative location updates: 0 [state-dump] - num location updates per second: 0.000 [state-dump] - num location lookups per second: 0.000 [state-dump] - num locations added per second: 0.000 [state-dump] - num locations removed per second: 0.000 [state-dump] BufferPool: [state-dump] - create buffer state map size: 0 [state-dump] PullManager: [state-dump] - num bytes available for pulled objects: 2147483648 [state-dump] - num bytes being pulled (all): 0 [state-dump] - num bytes being pulled / pinned: 0 [state-dump] - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - first get request bundle: N/A [state-dump] - first wait request bundle: N/A [state-dump] - first task request bundle: N/A [state-dump] - num objects queued: 0 [state-dump] - num objects actively pulled (all): 0 [state-dump] - num objects actively pulled / pinned: 0 [state-dump] - num bundles being pulled: 0 [state-dump] - num pull retries: 0 [state-dump] - max timeout seconds: 0 [state-dump] - max timeout request is already processed. No entry. [state-dump] [state-dump] WorkerPool: [state-dump] - registered jobs: 1 [state-dump] - process_failed_job_config_missing: 0 [state-dump] - process_failed_rate_limited: 0 [state-dump] - process_failed_pending_registration: 0 [state-dump] - process_failed_runtime_env_setup_failed: 0 [state-dump] - num PYTHON workers: 20 [state-dump] - num PYTHON drivers: 1 [state-dump] - num PYTHON pending start requests: 0 [state-dump] - num PYTHON pending registration requests: 0 [state-dump] - num object spill callbacks queued: 0 [state-dump] - num object restore queued: 0 [state-dump] - num util functions queued: 0 [state-dump] - num idle workers: 20 [state-dump] TaskDependencyManager: [state-dump] - task deps map size: 0 [state-dump] - get req map size: 0 [state-dump] - wait req map size: 0 [state-dump] - local objects map size: 0 [state-dump] WaitManager: [state-dump] - num active wait requests: 0 [state-dump] Subscriber: [state-dump] Channel WORKER_OBJECT_LOCATIONS_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_OBJECT_EVICTION [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_REF_REMOVED_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] num async plasma notifications: 0 [state-dump] Remote node managers: [state-dump] Event stats: [state-dump] Global stats: 193958 total (35 active) [state-dump] Queueing time: mean = 400.975 us, max = 59.826 s, min = -0.000 s, total = 77.772 s [state-dump] Execution time: mean = 9.450 ms, total = 1832.930 s [state-dump] Event stats: [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog - 46599 total (0 active), Execution time: mean = 529.871 us, total = 24.691 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 46599 total (0 active), Execution time: mean = 35.286 us, total = 1.644 s, Queueing time: mean = 109.822 us, max = 2.841 ms, min = 1.846 us, total = 5.118 s [state-dump] ObjectManager.UpdateAvailableMemory - 22179 total (0 active), Execution time: mean = 6.017 us, total = 133.448 ms, Queueing time: mean = 107.956 us, max = 1.662 ms, min = 2.228 us, total = 2.394 s [state-dump] NodeManager.CheckGC - 22179 total (1 active), Execution time: mean = 2.832 us, total = 62.812 ms, Queueing time: mean = 97.589 us, max = 25.875 ms, min = 2.848 us, total = 2.164 s [state-dump] RaySyncer.OnDemandBroadcasting - 22179 total (1 active), Execution time: mean = 10.489 us, total = 232.629 ms, Queueing time: mean = 90.822 us, max = 25.869 ms, min = 6.166 us, total = 2.014 s [state-dump] RayletWorkerPool.deadline_timer.kill_idle_workers - 11096 total (1 active), Execution time: mean = 18.101 us, total = 200.846 ms, Queueing time: mean = 76.575 us, max = 26.386 ms, min = -0.000 s, total = 849.672 ms [state-dump] MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 8862 total (1 active), Execution time: mean = 452.324 us, total = 4.008 s, Queueing time: mean = 73.959 us, max = 3.532 ms, min = -0.000 s, total = 655.427 ms [state-dump] NodeManager.deadline_timer.spill_objects_when_over_threshold - 2220 total (1 active), Execution time: mean = 3.099 us, total = 6.880 ms, Queueing time: mean = 179.508 us, max = 2.473 ms, min = 3.958 us, total = 398.507 ms [state-dump] NodeManager.deadline_timer.flush_free_objects - 2220 total (1 active), Execution time: mean = 8.923 us, total = 19.810 ms, Queueing time: mean = 175.517 us, max = 2.483 ms, min = 6.450 us, total = 389.648 ms [state-dump] NodeManager.ScheduleAndDispatchTasks - 2220 total (1 active), Execution time: mean = 16.303 us, total = 36.193 ms, Queueing time: mean = 74.055 us, max = 2.581 ms, min = 10.666 us, total = 164.402 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 2219 total (0 active), Execution time: mean = 97.133 us, total = 215.538 ms, Queueing time: mean = 110.515 us, max = 2.934 ms, min = 4.027 us, total = 245.233 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad - 2219 total (0 active), Execution time: mean = 605.306 us, total = 1.343 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ClusterResourceManager.ResetRemoteNodeView - 741 total (1 active), Execution time: mean = 10.680 us, total = 7.914 ms, Queueing time: mean = 75.581 us, max = 442.307 us, min = 14.912 us, total = 56.005 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 444 total (0 active), Execution time: mean = 1.549 ms, total = 687.670 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.deadline_timer.record_metrics - 444 total (1 active), Execution time: mean = 578.132 us, total = 256.691 ms, Queueing time: mean = 332.913 us, max = 2.010 ms, min = 10.616 us, total = 147.813 ms [state-dump] NodeManager.GcsCheckAlive - 444 total (1 active), Execution time: mean = 299.081 us, total = 132.792 ms, Queueing time: mean = 611.259 us, max = 2.522 ms, min = 6.690 us, total = 271.399 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 444 total (0 active), Execution time: mean = 53.602 us, total = 23.799 ms, Queueing time: mean = 116.854 us, max = 4.779 ms, min = 11.561 us, total = 51.883 ms [state-dump] NodeManager.deadline_timer.debug_state_dump - 222 total (1 active), Execution time: mean = 1.751 ms, total = 388.713 ms, Queueing time: mean = 66.619 us, max = 163.431 us, min = 13.687 us, total = 14.789 ms [state-dump] ClientConnection.async_read.ProcessMessageHeader - 95 total (21 active), Execution time: mean = 7.620 us, total = 723.861 us, Queueing time: mean = 660.815 ms, max = 59.826 s, min = 29.537 us, total = 62.777 s [state-dump] ClientConnection.async_read.ProcessMessage - 74 total (0 active), Execution time: mean = 752.373 us, total = 55.676 ms, Queueing time: mean = 18.898 us, max = 247.615 us, min = 4.453 us, total = 1.398 ms [state-dump] NodeManager.deadline_timer.print_event_loop_stats - 37 total (1 active, 1 running), Execution time: mean = 2.619 ms, total = 96.897 ms, Queueing time: mean = 72.122 us, max = 215.454 us, min = 15.835 us, total = 2.669 ms [state-dump] ClientConnection.async_write.DoAsyncWrites - 22 total (0 active), Execution time: mean = 2.129 us, total = 46.834 us, Queueing time: mean = 43.475 us, max = 264.443 us, min = 17.727 us, total = 956.459 us [state-dump] ObjectManager.ObjectDeleted - 21 total (0 active), Execution time: mean = 18.654 us, total = 391.728 us, Queueing time: mean = 66.932 us, max = 269.040 us, min = 24.280 us, total = 1.406 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 21 total (0 active), Execution time: mean = 119.783 us, total = 2.515 ms, Queueing time: mean = 119.709 us, max = 193.975 us, min = 26.529 us, total = 2.514 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig - 21 total (0 active), Execution time: mean = 800.451 us, total = 16.809 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ObjectManager.ObjectAdded - 21 total (0 active), Execution time: mean = 13.522 us, total = 283.961 us, Queueing time: mean = 115.200 us, max = 203.067 us, min = 15.853 us, total = 2.419 ms [state-dump] PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 204.482 us, total = 2.658 ms, Queueing time: mean = 3.109 ms, max = 9.915 ms, min = 28.359 us, total = 40.413 ms [state-dump] - 12 total (0 active), Execution time: mean = 1.109 us, total = 13.308 us, Queueing time: mean = 117.235 us, max = 207.332 us, min = 32.817 us, total = 1.407 ms [state-dump] RaySyncer.BroadcastMessage - 12 total (0 active), Execution time: mean = 229.790 us, total = 2.757 ms, Queueing time: mean = 796.500 ns, max = 1.330 us, min = 180.000 ns, total = 9.558 us [state-dump] NodeManagerService.grpc_server.RequestWorkerLease.HandleRequestImpl - 10 total (0 active), Execution time: mean = 133.400 us, total = 1.334 ms, Queueing time: mean = 133.654 us, max = 258.016 us, min = 50.206 us, total = 1.337 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker - 10 total (0 active), Execution time: mean = 698.937 us, total = 6.989 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.RequestWorkerLease - 10 total (0 active), Execution time: mean = 967.372 us, total = 9.674 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] WorkerPool.PopWorkerCallback - 10 total (0 active), Execution time: mean = 24.488 us, total = 244.878 us, Queueing time: mean = 114.997 us, max = 206.869 us, min = 23.653 us, total = 1.150 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker.HandleRequestImpl - 10 total (0 active), Execution time: mean = 129.906 us, total = 1.299 ms, Queueing time: mean = 109.463 us, max = 137.712 us, min = 31.332 us, total = 1.095 ms [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 5 total (1 active), Execution time: mean = 359.520 s, total = 1797.602 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 4 total (0 active), Execution time: mean = 320.800 us, total = 1.283 ms, Queueing time: mean = 94.198 us, max = 160.959 us, min = 20.299 us, total = 376.792 us [state-dump] NodeManager.GCTaskFailureReason - 3 total (1 active), Execution time: mean = 6.632 us, total = 19.896 us, Queueing time: mean = 60.387 us, max = 97.290 us, min = 83.871 us, total = 181.161 us [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 1.320 ms, total = 2.640 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 142.119 us, total = 284.237 us, Queueing time: mean = 702.200 us, max = 1.287 ms, min = 117.240 us, total = 1.404 ms [state-dump] RaySyncerRegister - 2 total (0 active), Execution time: mean = 2.053 us, total = 4.106 us, Queueing time: mean = 222.000 ns, max = 369.000 ns, min = 75.000 ns, total = 444.000 ns [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 2.178 ms, total = 2.178 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 1.022 s, total = 1.022 s, Queueing time: mean = 126.779 us, max = 126.779 us, min = 126.779 us, total = 126.779 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 43.810 us, total = 43.810 us, Queueing time: mean = 116.080 us, max = 116.080 us, min = 116.080 us, total = 116.080 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 1.935 ms, total = 1.935 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 2.002 ms, total = 2.002 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 1.278 ms, total = 1.278 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 288.995 us, total = 288.995 us, Queueing time: mean = 122.868 us, max = 122.868 us, min = 122.868 us, total = 122.868 us [state-dump] Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 74.750 us, total = 74.750 us, Queueing time: mean = 225.429 us, max = 225.429 us, min = 225.429 us, total = 225.429 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 242.297 us, total = 242.297 us, Queueing time: mean = 122.977 us, max = 122.977 us, min = 122.977 us, total = 122.977 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 92.080 us, total = 92.080 us, Queueing time: mean = 36.492 us, max = 36.492 us, min = 36.492 us, total = 36.492 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 1.746 ms, total = 1.746 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] DebugString() time ms: 1 [state-dump] [state-dump] [2025-01-20 23:39:54,914 I 13636 13664] (raylet) store.cc:564: Plasma store debug dump: Current usage: 0 / 2.14748 GB - num bytes created total: 168 0 pending objects of total size 0MB - objects spillable: 0 - bytes spillable: 0 - objects unsealed: 0 - bytes unsealed: 0 - objects in use: 0 - bytes in use: 0 - objects evictable: 0 - bytes evictable: 0 - objects created by worker: 0 - bytes created by worker: 0 - objects restored: 0 - bytes restored: 0 - objects received: 0 - bytes received: 0 - objects errored: 0 - bytes errored: 0 [2025-01-20 23:39:56,030 I 13636 13636] (raylet) node_manager.cc:525: [state-dump] NodeManager: [state-dump] Node ID: 94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3 [state-dump] Node name: 192.168.0.2 [state-dump] InitialConfigResources: {object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, GPU: 20000, node:__internal_head__: 10000, accelerator_type:A40: 10000, memory: 844966629380000} [state-dump] ClusterTaskManager: [state-dump] ========== Node: 94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3 ================= [state-dump] Infeasible queue length: 0 [state-dump] Schedule queue length: 0 [state-dump] Dispatch queue length: 0 [state-dump] num_waiting_for_resource: 0 [state-dump] num_waiting_for_plasma_memory: 0 [state-dump] num_waiting_for_remote_node_resources: 0 [state-dump] num_worker_not_started_by_job_config_not_exist: 0 [state-dump] num_worker_not_started_by_registration_timeout: 0 [state-dump] num_tasks_waiting_for_workers: 0 [state-dump] num_cancelled_tasks: 0 [state-dump] cluster_resource_scheduler state: [state-dump] Local id: -2729954820020547849 Local resources: {"total":{CPU: [200000], memory: [844966629380000], accelerator_type:A40: [10000], node:192.168.0.2: [10000], node:__internal_head__: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000]}}, "available": {CPU: [200000], memory: [844966629380000], accelerator_type:A40: [10000], node:192.168.0.2: [10000], node:__internal_head__: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000]}}, "labels":{"ray.io/node_id":"94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3",} is_draining: 0 is_idle: 1 Cluster resources: node id: -2729954820020547849{"total":{object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, GPU: 20000, node:__internal_head__: 10000, accelerator_type:A40: 10000, memory: 844966629380000}}, "available": {GPU: 20000, node:__internal_head__: 10000, memory: 844966629380000, accelerator_type:A40: 10000, object_store_memory: 21474836480000, node:192.168.0.2: 10000, CPU: 200000}}, "labels":{"ray.io/node_id":"94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} [state-dump] Waiting tasks size: 0 [state-dump] Number of executing tasks: 0 [state-dump] Number of pinned task arguments: 0 [state-dump] Number of total spilled tasks: 0 [state-dump] Number of spilled waiting tasks: 0 [state-dump] Number of spilled unschedulable tasks: 0 [state-dump] Resource usage { [state-dump] } [state-dump] Backlog Size per scheduling descriptor :{workerId: num backlogs}: [state-dump] [state-dump] Running tasks by scheduling class: [state-dump] ================================================== [state-dump] [state-dump] ClusterResources: [state-dump] LocalObjectManager: [state-dump] - num pinned objects: 0 [state-dump] - pinned objects size: 0 [state-dump] - num objects pending restore: 0 [state-dump] - num objects pending spill: 0 [state-dump] - num bytes pending spill: 0 [state-dump] - num bytes currently spilled: 0 [state-dump] - cumulative spill requests: 0 [state-dump] - cumulative restore requests: 0 [state-dump] - spilled objects pending delete: 0 [state-dump] [state-dump] ObjectManager: [state-dump] - num local objects: 0 [state-dump] - num unfulfilled push requests: 0 [state-dump] - num object pull requests: 0 [state-dump] - num chunks received total: 0 [state-dump] - num chunks received failed (all): 0 [state-dump] - num chunks received failed / cancelled: 0 [state-dump] - num chunks received failed / plasma error: 0 [state-dump] Event stats: [state-dump] Global stats: 0 total (0 active) [state-dump] Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Execution time: mean = -nan s, total = 0.000 s [state-dump] Event stats: [state-dump] PushManager: [state-dump] - num pushes in flight: 0 [state-dump] - num chunks in flight: 0 [state-dump] - num chunks remaining: 0 [state-dump] - max chunks allowed: 409 [state-dump] OwnershipBasedObjectDirectory: [state-dump] - num listeners: 0 [state-dump] - cumulative location updates: 0 [state-dump] - num location updates per second: 0.000 [state-dump] - num location lookups per second: 0.000 [state-dump] - num locations added per second: 0.000 [state-dump] - num locations removed per second: 0.000 [state-dump] BufferPool: [state-dump] - create buffer state map size: 0 [state-dump] PullManager: [state-dump] - num bytes available for pulled objects: 2147483648 [state-dump] - num bytes being pulled (all): 0 [state-dump] - num bytes being pulled / pinned: 0 [state-dump] - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - first get request bundle: N/A [state-dump] - first wait request bundle: N/A [state-dump] - first task request bundle: N/A [state-dump] - num objects queued: 0 [state-dump] - num objects actively pulled (all): 0 [state-dump] - num objects actively pulled / pinned: 0 [state-dump] - num bundles being pulled: 0 [state-dump] - num pull retries: 0 [state-dump] - max timeout seconds: 0 [state-dump] - max timeout request is already processed. No entry. [state-dump] [state-dump] WorkerPool: [state-dump] - registered jobs: 1 [state-dump] - process_failed_job_config_missing: 0 [state-dump] - process_failed_rate_limited: 0 [state-dump] - process_failed_pending_registration: 0 [state-dump] - process_failed_runtime_env_setup_failed: 0 [state-dump] - num PYTHON workers: 20 [state-dump] - num PYTHON drivers: 1 [state-dump] - num PYTHON pending start requests: 0 [state-dump] - num PYTHON pending registration requests: 0 [state-dump] - num object spill callbacks queued: 0 [state-dump] - num object restore queued: 0 [state-dump] - num util functions queued: 0 [state-dump] - num idle workers: 20 [state-dump] TaskDependencyManager: [state-dump] - task deps map size: 0 [state-dump] - get req map size: 0 [state-dump] - wait req map size: 0 [state-dump] - local objects map size: 0 [state-dump] WaitManager: [state-dump] - num active wait requests: 0 [state-dump] Subscriber: [state-dump] Channel WORKER_OBJECT_LOCATIONS_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_OBJECT_EVICTION [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_REF_REMOVED_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] num async plasma notifications: 0 [state-dump] Remote node managers: [state-dump] Event stats: [state-dump] Global stats: 199193 total (35 active) [state-dump] Queueing time: mean = 392.123 us, max = 59.826 s, min = -0.000 s, total = 78.108 s [state-dump] Execution time: mean = 9.206 ms, total = 1833.698 s [state-dump] Event stats: [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog - 47859 total (0 active), Execution time: mean = 527.174 us, total = 25.230 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 47859 total (0 active), Execution time: mean = 35.155 us, total = 1.682 s, Queueing time: mean = 109.273 us, max = 2.841 ms, min = 1.846 us, total = 5.230 s [state-dump] ObjectManager.UpdateAvailableMemory - 22779 total (0 active), Execution time: mean = 5.988 us, total = 136.392 ms, Queueing time: mean = 107.429 us, max = 1.662 ms, min = 2.228 us, total = 2.447 s [state-dump] NodeManager.CheckGC - 22779 total (1 active), Execution time: mean = 2.826 us, total = 64.384 ms, Queueing time: mean = 97.112 us, max = 25.875 ms, min = 2.848 us, total = 2.212 s [state-dump] RaySyncer.OnDemandBroadcasting - 22779 total (1 active), Execution time: mean = 10.440 us, total = 237.806 ms, Queueing time: mean = 90.388 us, max = 25.869 ms, min = 6.166 us, total = 2.059 s [state-dump] RayletWorkerPool.deadline_timer.kill_idle_workers - 11396 total (1 active), Execution time: mean = 18.015 us, total = 205.298 ms, Queueing time: mean = 76.339 us, max = 26.386 ms, min = -0.000 s, total = 869.954 ms [state-dump] MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 9102 total (1 active), Execution time: mean = 451.506 us, total = 4.110 s, Queueing time: mean = 73.658 us, max = 3.532 ms, min = -0.000 s, total = 670.438 ms [state-dump] NodeManager.deadline_timer.spill_objects_when_over_threshold - 2280 total (1 active), Execution time: mean = 3.094 us, total = 7.054 ms, Queueing time: mean = 179.266 us, max = 2.473 ms, min = 3.958 us, total = 408.726 ms [state-dump] NodeManager.deadline_timer.flush_free_objects - 2280 total (1 active), Execution time: mean = 8.901 us, total = 20.295 ms, Queueing time: mean = 175.283 us, max = 2.483 ms, min = 6.450 us, total = 399.646 ms [state-dump] NodeManager.ScheduleAndDispatchTasks - 2280 total (1 active), Execution time: mean = 16.256 us, total = 37.063 ms, Queueing time: mean = 73.752 us, max = 2.581 ms, min = 10.666 us, total = 168.155 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 2279 total (0 active), Execution time: mean = 96.921 us, total = 220.884 ms, Queueing time: mean = 110.006 us, max = 2.934 ms, min = 4.027 us, total = 250.705 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad - 2279 total (0 active), Execution time: mean = 602.304 us, total = 1.373 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ClusterResourceManager.ResetRemoteNodeView - 761 total (1 active), Execution time: mean = 10.594 us, total = 8.062 ms, Queueing time: mean = 75.189 us, max = 442.307 us, min = 14.912 us, total = 57.219 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 456 total (0 active), Execution time: mean = 1.543 ms, total = 703.669 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.deadline_timer.record_metrics - 456 total (1 active), Execution time: mean = 576.411 us, total = 262.843 ms, Queueing time: mean = 333.268 us, max = 2.010 ms, min = 10.616 us, total = 151.970 ms [state-dump] NodeManager.GcsCheckAlive - 456 total (1 active), Execution time: mean = 298.651 us, total = 136.185 ms, Queueing time: mean = 610.119 us, max = 2.522 ms, min = 6.690 us, total = 278.214 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 456 total (0 active), Execution time: mean = 53.530 us, total = 24.410 ms, Queueing time: mean = 116.222 us, max = 4.779 ms, min = 11.561 us, total = 52.997 ms [state-dump] NodeManager.deadline_timer.debug_state_dump - 228 total (1 active), Execution time: mean = 1.750 ms, total = 399.110 ms, Queueing time: mean = 66.440 us, max = 163.431 us, min = 13.687 us, total = 15.148 ms [state-dump] ClientConnection.async_read.ProcessMessageHeader - 95 total (21 active), Execution time: mean = 7.620 us, total = 723.861 us, Queueing time: mean = 660.815 ms, max = 59.826 s, min = 29.537 us, total = 62.777 s [state-dump] ClientConnection.async_read.ProcessMessage - 74 total (0 active), Execution time: mean = 752.373 us, total = 55.676 ms, Queueing time: mean = 18.898 us, max = 247.615 us, min = 4.453 us, total = 1.398 ms [state-dump] NodeManager.deadline_timer.print_event_loop_stats - 38 total (1 active, 1 running), Execution time: mean = 2.627 ms, total = 99.831 ms, Queueing time: mean = 73.999 us, max = 215.454 us, min = 15.835 us, total = 2.812 ms [state-dump] ClientConnection.async_write.DoAsyncWrites - 22 total (0 active), Execution time: mean = 2.129 us, total = 46.834 us, Queueing time: mean = 43.475 us, max = 264.443 us, min = 17.727 us, total = 956.459 us [state-dump] ObjectManager.ObjectDeleted - 21 total (0 active), Execution time: mean = 18.654 us, total = 391.728 us, Queueing time: mean = 66.932 us, max = 269.040 us, min = 24.280 us, total = 1.406 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 21 total (0 active), Execution time: mean = 119.783 us, total = 2.515 ms, Queueing time: mean = 119.709 us, max = 193.975 us, min = 26.529 us, total = 2.514 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig - 21 total (0 active), Execution time: mean = 800.451 us, total = 16.809 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ObjectManager.ObjectAdded - 21 total (0 active), Execution time: mean = 13.522 us, total = 283.961 us, Queueing time: mean = 115.200 us, max = 203.067 us, min = 15.853 us, total = 2.419 ms [state-dump] PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 204.482 us, total = 2.658 ms, Queueing time: mean = 3.109 ms, max = 9.915 ms, min = 28.359 us, total = 40.413 ms [state-dump] - 12 total (0 active), Execution time: mean = 1.109 us, total = 13.308 us, Queueing time: mean = 117.235 us, max = 207.332 us, min = 32.817 us, total = 1.407 ms [state-dump] RaySyncer.BroadcastMessage - 12 total (0 active), Execution time: mean = 229.790 us, total = 2.757 ms, Queueing time: mean = 796.500 ns, max = 1.330 us, min = 180.000 ns, total = 9.558 us [state-dump] NodeManagerService.grpc_server.RequestWorkerLease.HandleRequestImpl - 10 total (0 active), Execution time: mean = 133.400 us, total = 1.334 ms, Queueing time: mean = 133.654 us, max = 258.016 us, min = 50.206 us, total = 1.337 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker - 10 total (0 active), Execution time: mean = 698.937 us, total = 6.989 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.RequestWorkerLease - 10 total (0 active), Execution time: mean = 967.372 us, total = 9.674 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] WorkerPool.PopWorkerCallback - 10 total (0 active), Execution time: mean = 24.488 us, total = 244.878 us, Queueing time: mean = 114.997 us, max = 206.869 us, min = 23.653 us, total = 1.150 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker.HandleRequestImpl - 10 total (0 active), Execution time: mean = 129.906 us, total = 1.299 ms, Queueing time: mean = 109.463 us, max = 137.712 us, min = 31.332 us, total = 1.095 ms [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 5 total (1 active), Execution time: mean = 359.520 s, total = 1797.602 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 4 total (0 active), Execution time: mean = 320.800 us, total = 1.283 ms, Queueing time: mean = 94.198 us, max = 160.959 us, min = 20.299 us, total = 376.792 us [state-dump] NodeManager.GCTaskFailureReason - 3 total (1 active), Execution time: mean = 6.632 us, total = 19.896 us, Queueing time: mean = 60.387 us, max = 97.290 us, min = 83.871 us, total = 181.161 us [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 1.320 ms, total = 2.640 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 142.119 us, total = 284.237 us, Queueing time: mean = 702.200 us, max = 1.287 ms, min = 117.240 us, total = 1.404 ms [state-dump] RaySyncerRegister - 2 total (0 active), Execution time: mean = 2.053 us, total = 4.106 us, Queueing time: mean = 222.000 ns, max = 369.000 ns, min = 75.000 ns, total = 444.000 ns [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 2.178 ms, total = 2.178 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 1.022 s, total = 1.022 s, Queueing time: mean = 126.779 us, max = 126.779 us, min = 126.779 us, total = 126.779 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 43.810 us, total = 43.810 us, Queueing time: mean = 116.080 us, max = 116.080 us, min = 116.080 us, total = 116.080 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 1.935 ms, total = 1.935 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 2.002 ms, total = 2.002 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 1.278 ms, total = 1.278 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 288.995 us, total = 288.995 us, Queueing time: mean = 122.868 us, max = 122.868 us, min = 122.868 us, total = 122.868 us [state-dump] Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 74.750 us, total = 74.750 us, Queueing time: mean = 225.429 us, max = 225.429 us, min = 225.429 us, total = 225.429 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 242.297 us, total = 242.297 us, Queueing time: mean = 122.977 us, max = 122.977 us, min = 122.977 us, total = 122.977 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 92.080 us, total = 92.080 us, Queueing time: mean = 36.492 us, max = 36.492 us, min = 36.492 us, total = 36.492 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 1.746 ms, total = 1.746 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] DebugString() time ms: 1 [state-dump] [state-dump] [2025-01-20 23:40:54,914 I 13636 13664] (raylet) store.cc:564: Plasma store debug dump: Current usage: 0 / 2.14748 GB - num bytes created total: 168 0 pending objects of total size 0MB - objects spillable: 0 - bytes spillable: 0 - objects unsealed: 0 - bytes unsealed: 0 - objects in use: 0 - bytes in use: 0 - objects evictable: 0 - bytes evictable: 0 - objects created by worker: 0 - bytes created by worker: 0 - objects restored: 0 - bytes restored: 0 - objects received: 0 - bytes received: 0 - objects errored: 0 - bytes errored: 0 [2025-01-20 23:40:56,033 I 13636 13636] (raylet) node_manager.cc:525: [state-dump] NodeManager: [state-dump] Node ID: 94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3 [state-dump] Node name: 192.168.0.2 [state-dump] InitialConfigResources: {object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, GPU: 20000, node:__internal_head__: 10000, accelerator_type:A40: 10000, memory: 844966629380000} [state-dump] ClusterTaskManager: [state-dump] ========== Node: 94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3 ================= [state-dump] Infeasible queue length: 0 [state-dump] Schedule queue length: 0 [state-dump] Dispatch queue length: 0 [state-dump] num_waiting_for_resource: 0 [state-dump] num_waiting_for_plasma_memory: 0 [state-dump] num_waiting_for_remote_node_resources: 0 [state-dump] num_worker_not_started_by_job_config_not_exist: 0 [state-dump] num_worker_not_started_by_registration_timeout: 0 [state-dump] num_tasks_waiting_for_workers: 0 [state-dump] num_cancelled_tasks: 0 [state-dump] cluster_resource_scheduler state: [state-dump] Local id: -2729954820020547849 Local resources: {"total":{CPU: [200000], memory: [844966629380000], accelerator_type:A40: [10000], node:192.168.0.2: [10000], node:__internal_head__: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000]}}, "available": {CPU: [200000], memory: [844966629380000], accelerator_type:A40: [10000], node:192.168.0.2: [10000], node:__internal_head__: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000]}}, "labels":{"ray.io/node_id":"94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3",} is_draining: 0 is_idle: 1 Cluster resources: node id: -2729954820020547849{"total":{object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, GPU: 20000, node:__internal_head__: 10000, accelerator_type:A40: 10000, memory: 844966629380000}}, "available": {GPU: 20000, node:__internal_head__: 10000, memory: 844966629380000, accelerator_type:A40: 10000, object_store_memory: 21474836480000, node:192.168.0.2: 10000, CPU: 200000}}, "labels":{"ray.io/node_id":"94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} [state-dump] Waiting tasks size: 0 [state-dump] Number of executing tasks: 0 [state-dump] Number of pinned task arguments: 0 [state-dump] Number of total spilled tasks: 0 [state-dump] Number of spilled waiting tasks: 0 [state-dump] Number of spilled unschedulable tasks: 0 [state-dump] Resource usage { [state-dump] } [state-dump] Backlog Size per scheduling descriptor :{workerId: num backlogs}: [state-dump] [state-dump] Running tasks by scheduling class: [state-dump] ================================================== [state-dump] [state-dump] ClusterResources: [state-dump] LocalObjectManager: [state-dump] - num pinned objects: 0 [state-dump] - pinned objects size: 0 [state-dump] - num objects pending restore: 0 [state-dump] - num objects pending spill: 0 [state-dump] - num bytes pending spill: 0 [state-dump] - num bytes currently spilled: 0 [state-dump] - cumulative spill requests: 0 [state-dump] - cumulative restore requests: 0 [state-dump] - spilled objects pending delete: 0 [state-dump] [state-dump] ObjectManager: [state-dump] - num local objects: 0 [state-dump] - num unfulfilled push requests: 0 [state-dump] - num object pull requests: 0 [state-dump] - num chunks received total: 0 [state-dump] - num chunks received failed (all): 0 [state-dump] - num chunks received failed / cancelled: 0 [state-dump] - num chunks received failed / plasma error: 0 [state-dump] Event stats: [state-dump] Global stats: 0 total (0 active) [state-dump] Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Execution time: mean = -nan s, total = 0.000 s [state-dump] Event stats: [state-dump] PushManager: [state-dump] - num pushes in flight: 0 [state-dump] - num chunks in flight: 0 [state-dump] - num chunks remaining: 0 [state-dump] - max chunks allowed: 409 [state-dump] OwnershipBasedObjectDirectory: [state-dump] - num listeners: 0 [state-dump] - cumulative location updates: 0 [state-dump] - num location updates per second: 0.000 [state-dump] - num location lookups per second: 0.000 [state-dump] - num locations added per second: 0.000 [state-dump] - num locations removed per second: 0.000 [state-dump] BufferPool: [state-dump] - create buffer state map size: 0 [state-dump] PullManager: [state-dump] - num bytes available for pulled objects: 2147483648 [state-dump] - num bytes being pulled (all): 0 [state-dump] - num bytes being pulled / pinned: 0 [state-dump] - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - first get request bundle: N/A [state-dump] - first wait request bundle: N/A [state-dump] - first task request bundle: N/A [state-dump] - num objects queued: 0 [state-dump] - num objects actively pulled (all): 0 [state-dump] - num objects actively pulled / pinned: 0 [state-dump] - num bundles being pulled: 0 [state-dump] - num pull retries: 0 [state-dump] - max timeout seconds: 0 [state-dump] - max timeout request is already processed. No entry. [state-dump] [state-dump] WorkerPool: [state-dump] - registered jobs: 1 [state-dump] - process_failed_job_config_missing: 0 [state-dump] - process_failed_rate_limited: 0 [state-dump] - process_failed_pending_registration: 0 [state-dump] - process_failed_runtime_env_setup_failed: 0 [state-dump] - num PYTHON workers: 20 [state-dump] - num PYTHON drivers: 1 [state-dump] - num PYTHON pending start requests: 0 [state-dump] - num PYTHON pending registration requests: 0 [state-dump] - num object spill callbacks queued: 0 [state-dump] - num object restore queued: 0 [state-dump] - num util functions queued: 0 [state-dump] - num idle workers: 20 [state-dump] TaskDependencyManager: [state-dump] - task deps map size: 0 [state-dump] - get req map size: 0 [state-dump] - wait req map size: 0 [state-dump] - local objects map size: 0 [state-dump] WaitManager: [state-dump] - num active wait requests: 0 [state-dump] Subscriber: [state-dump] Channel WORKER_OBJECT_LOCATIONS_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_OBJECT_EVICTION [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_REF_REMOVED_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] num async plasma notifications: 0 [state-dump] Remote node managers: [state-dump] Event stats: [state-dump] Global stats: 204424 total (35 active) [state-dump] Queueing time: mean = 383.671 us, max = 59.826 s, min = -0.000 s, total = 78.432 s [state-dump] Execution time: mean = 8.973 ms, total = 1834.393 s [state-dump] Event stats: [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog - 49119 total (0 active), Execution time: mean = 523.460 us, total = 25.712 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 49119 total (0 active), Execution time: mean = 34.931 us, total = 1.716 s, Queueing time: mean = 108.534 us, max = 2.841 ms, min = 1.846 us, total = 5.331 s [state-dump] ObjectManager.UpdateAvailableMemory - 23378 total (0 active), Execution time: mean = 5.945 us, total = 138.989 ms, Queueing time: mean = 107.002 us, max = 1.662 ms, min = 2.228 us, total = 2.501 s [state-dump] NodeManager.CheckGC - 23378 total (1 active), Execution time: mean = 2.816 us, total = 65.843 ms, Queueing time: mean = 96.575 us, max = 25.875 ms, min = 2.848 us, total = 2.258 s [state-dump] RaySyncer.OnDemandBroadcasting - 23378 total (1 active), Execution time: mean = 10.364 us, total = 242.284 ms, Queueing time: mean = 89.913 us, max = 25.869 ms, min = 6.166 us, total = 2.102 s [state-dump] RayletWorkerPool.deadline_timer.kill_idle_workers - 11696 total (1 active), Execution time: mean = 17.858 us, total = 208.869 ms, Queueing time: mean = 76.035 us, max = 26.386 ms, min = -0.000 s, total = 889.306 ms [state-dump] MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 9341 total (1 active), Execution time: mean = 450.209 us, total = 4.205 s, Queueing time: mean = 73.338 us, max = 3.532 ms, min = -0.000 s, total = 685.049 ms [state-dump] NodeManager.deadline_timer.spill_objects_when_over_threshold - 2340 total (1 active), Execution time: mean = 3.079 us, total = 7.206 ms, Queueing time: mean = 179.287 us, max = 2.473 ms, min = 3.958 us, total = 419.530 ms [state-dump] NodeManager.deadline_timer.flush_free_objects - 2340 total (1 active), Execution time: mean = 8.863 us, total = 20.740 ms, Queueing time: mean = 175.321 us, max = 2.483 ms, min = 6.450 us, total = 410.251 ms [state-dump] NodeManager.ScheduleAndDispatchTasks - 2340 total (1 active), Execution time: mean = 16.147 us, total = 37.783 ms, Queueing time: mean = 73.269 us, max = 2.581 ms, min = 10.666 us, total = 171.450 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 2339 total (0 active), Execution time: mean = 96.494 us, total = 225.699 ms, Queueing time: mean = 109.313 us, max = 2.934 ms, min = 4.027 us, total = 255.682 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad - 2339 total (0 active), Execution time: mean = 598.500 us, total = 1.400 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ClusterResourceManager.ResetRemoteNodeView - 781 total (1 active), Execution time: mean = 10.476 us, total = 8.182 ms, Queueing time: mean = 74.473 us, max = 442.307 us, min = 14.912 us, total = 58.163 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 468 total (0 active), Execution time: mean = 1.537 ms, total = 719.120 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.deadline_timer.record_metrics - 468 total (1 active), Execution time: mean = 574.598 us, total = 268.912 ms, Queueing time: mean = 335.557 us, max = 2.010 ms, min = 10.616 us, total = 157.041 ms [state-dump] NodeManager.GcsCheckAlive - 468 total (1 active), Execution time: mean = 297.975 us, total = 139.452 ms, Queueing time: mean = 611.124 us, max = 2.522 ms, min = 6.690 us, total = 286.006 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 468 total (0 active), Execution time: mean = 53.436 us, total = 25.008 ms, Queueing time: mean = 115.561 us, max = 4.779 ms, min = 11.561 us, total = 54.082 ms [state-dump] NodeManager.deadline_timer.debug_state_dump - 234 total (1 active), Execution time: mean = 1.750 ms, total = 409.427 ms, Queueing time: mean = 66.968 us, max = 163.431 us, min = 13.687 us, total = 15.671 ms [state-dump] ClientConnection.async_read.ProcessMessageHeader - 95 total (21 active), Execution time: mean = 7.620 us, total = 723.861 us, Queueing time: mean = 660.815 ms, max = 59.826 s, min = 29.537 us, total = 62.777 s [state-dump] ClientConnection.async_read.ProcessMessage - 74 total (0 active), Execution time: mean = 752.373 us, total = 55.676 ms, Queueing time: mean = 18.898 us, max = 247.615 us, min = 4.453 us, total = 1.398 ms [state-dump] NodeManager.deadline_timer.print_event_loop_stats - 39 total (1 active, 1 running), Execution time: mean = 2.638 ms, total = 102.901 ms, Queueing time: mean = 73.031 us, max = 215.454 us, min = 15.835 us, total = 2.848 ms [state-dump] ClientConnection.async_write.DoAsyncWrites - 22 total (0 active), Execution time: mean = 2.129 us, total = 46.834 us, Queueing time: mean = 43.475 us, max = 264.443 us, min = 17.727 us, total = 956.459 us [state-dump] ObjectManager.ObjectDeleted - 21 total (0 active), Execution time: mean = 18.654 us, total = 391.728 us, Queueing time: mean = 66.932 us, max = 269.040 us, min = 24.280 us, total = 1.406 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 21 total (0 active), Execution time: mean = 119.783 us, total = 2.515 ms, Queueing time: mean = 119.709 us, max = 193.975 us, min = 26.529 us, total = 2.514 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig - 21 total (0 active), Execution time: mean = 800.451 us, total = 16.809 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ObjectManager.ObjectAdded - 21 total (0 active), Execution time: mean = 13.522 us, total = 283.961 us, Queueing time: mean = 115.200 us, max = 203.067 us, min = 15.853 us, total = 2.419 ms [state-dump] PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 204.482 us, total = 2.658 ms, Queueing time: mean = 3.109 ms, max = 9.915 ms, min = 28.359 us, total = 40.413 ms [state-dump] - 12 total (0 active), Execution time: mean = 1.109 us, total = 13.308 us, Queueing time: mean = 117.235 us, max = 207.332 us, min = 32.817 us, total = 1.407 ms [state-dump] RaySyncer.BroadcastMessage - 12 total (0 active), Execution time: mean = 229.790 us, total = 2.757 ms, Queueing time: mean = 796.500 ns, max = 1.330 us, min = 180.000 ns, total = 9.558 us [state-dump] NodeManagerService.grpc_server.RequestWorkerLease.HandleRequestImpl - 10 total (0 active), Execution time: mean = 133.400 us, total = 1.334 ms, Queueing time: mean = 133.654 us, max = 258.016 us, min = 50.206 us, total = 1.337 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker - 10 total (0 active), Execution time: mean = 698.937 us, total = 6.989 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.RequestWorkerLease - 10 total (0 active), Execution time: mean = 967.372 us, total = 9.674 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] WorkerPool.PopWorkerCallback - 10 total (0 active), Execution time: mean = 24.488 us, total = 244.878 us, Queueing time: mean = 114.997 us, max = 206.869 us, min = 23.653 us, total = 1.150 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker.HandleRequestImpl - 10 total (0 active), Execution time: mean = 129.906 us, total = 1.299 ms, Queueing time: mean = 109.463 us, max = 137.712 us, min = 31.332 us, total = 1.095 ms [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 5 total (1 active), Execution time: mean = 359.520 s, total = 1797.602 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 4 total (0 active), Execution time: mean = 320.800 us, total = 1.283 ms, Queueing time: mean = 94.198 us, max = 160.959 us, min = 20.299 us, total = 376.792 us [state-dump] NodeManager.GCTaskFailureReason - 3 total (1 active), Execution time: mean = 6.632 us, total = 19.896 us, Queueing time: mean = 60.387 us, max = 97.290 us, min = 83.871 us, total = 181.161 us [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 1.320 ms, total = 2.640 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 142.119 us, total = 284.237 us, Queueing time: mean = 702.200 us, max = 1.287 ms, min = 117.240 us, total = 1.404 ms [state-dump] RaySyncerRegister - 2 total (0 active), Execution time: mean = 2.053 us, total = 4.106 us, Queueing time: mean = 222.000 ns, max = 369.000 ns, min = 75.000 ns, total = 444.000 ns [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 2.178 ms, total = 2.178 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 1.022 s, total = 1.022 s, Queueing time: mean = 126.779 us, max = 126.779 us, min = 126.779 us, total = 126.779 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 43.810 us, total = 43.810 us, Queueing time: mean = 116.080 us, max = 116.080 us, min = 116.080 us, total = 116.080 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 1.935 ms, total = 1.935 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 2.002 ms, total = 2.002 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 1.278 ms, total = 1.278 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 288.995 us, total = 288.995 us, Queueing time: mean = 122.868 us, max = 122.868 us, min = 122.868 us, total = 122.868 us [state-dump] Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 74.750 us, total = 74.750 us, Queueing time: mean = 225.429 us, max = 225.429 us, min = 225.429 us, total = 225.429 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 242.297 us, total = 242.297 us, Queueing time: mean = 122.977 us, max = 122.977 us, min = 122.977 us, total = 122.977 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 92.080 us, total = 92.080 us, Queueing time: mean = 36.492 us, max = 36.492 us, min = 36.492 us, total = 36.492 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 1.746 ms, total = 1.746 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] DebugString() time ms: 1 [state-dump] [state-dump] [2025-01-20 23:41:54,914 I 13636 13664] (raylet) store.cc:564: Plasma store debug dump: Current usage: 0 / 2.14748 GB - num bytes created total: 168 0 pending objects of total size 0MB - objects spillable: 0 - bytes spillable: 0 - objects unsealed: 0 - bytes unsealed: 0 - objects in use: 0 - bytes in use: 0 - objects evictable: 0 - bytes evictable: 0 - objects created by worker: 0 - bytes created by worker: 0 - objects restored: 0 - bytes restored: 0 - objects received: 0 - bytes received: 0 - objects errored: 0 - bytes errored: 0 [2025-01-20 23:41:56,036 I 13636 13636] (raylet) node_manager.cc:525: [state-dump] NodeManager: [state-dump] Node ID: 94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3 [state-dump] Node name: 192.168.0.2 [state-dump] InitialConfigResources: {object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, GPU: 20000, node:__internal_head__: 10000, accelerator_type:A40: 10000, memory: 844966629380000} [state-dump] ClusterTaskManager: [state-dump] ========== Node: 94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3 ================= [state-dump] Infeasible queue length: 0 [state-dump] Schedule queue length: 0 [state-dump] Dispatch queue length: 0 [state-dump] num_waiting_for_resource: 0 [state-dump] num_waiting_for_plasma_memory: 0 [state-dump] num_waiting_for_remote_node_resources: 0 [state-dump] num_worker_not_started_by_job_config_not_exist: 0 [state-dump] num_worker_not_started_by_registration_timeout: 0 [state-dump] num_tasks_waiting_for_workers: 0 [state-dump] num_cancelled_tasks: 0 [state-dump] cluster_resource_scheduler state: [state-dump] Local id: -2729954820020547849 Local resources: {"total":{CPU: [200000], memory: [844966629380000], accelerator_type:A40: [10000], node:192.168.0.2: [10000], node:__internal_head__: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000]}}, "available": {CPU: [200000], memory: [844966629380000], accelerator_type:A40: [10000], node:192.168.0.2: [10000], node:__internal_head__: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000]}}, "labels":{"ray.io/node_id":"94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3",} is_draining: 0 is_idle: 1 Cluster resources: node id: -2729954820020547849{"total":{object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, GPU: 20000, node:__internal_head__: 10000, accelerator_type:A40: 10000, memory: 844966629380000}}, "available": {GPU: 20000, node:__internal_head__: 10000, memory: 844966629380000, accelerator_type:A40: 10000, object_store_memory: 21474836480000, node:192.168.0.2: 10000, CPU: 200000}}, "labels":{"ray.io/node_id":"94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} [state-dump] Waiting tasks size: 0 [state-dump] Number of executing tasks: 0 [state-dump] Number of pinned task arguments: 0 [state-dump] Number of total spilled tasks: 0 [state-dump] Number of spilled waiting tasks: 0 [state-dump] Number of spilled unschedulable tasks: 0 [state-dump] Resource usage { [state-dump] } [state-dump] Backlog Size per scheduling descriptor :{workerId: num backlogs}: [state-dump] [state-dump] Running tasks by scheduling class: [state-dump] ================================================== [state-dump] [state-dump] ClusterResources: [state-dump] LocalObjectManager: [state-dump] - num pinned objects: 0 [state-dump] - pinned objects size: 0 [state-dump] - num objects pending restore: 0 [state-dump] - num objects pending spill: 0 [state-dump] - num bytes pending spill: 0 [state-dump] - num bytes currently spilled: 0 [state-dump] - cumulative spill requests: 0 [state-dump] - cumulative restore requests: 0 [state-dump] - spilled objects pending delete: 0 [state-dump] [state-dump] ObjectManager: [state-dump] - num local objects: 0 [state-dump] - num unfulfilled push requests: 0 [state-dump] - num object pull requests: 0 [state-dump] - num chunks received total: 0 [state-dump] - num chunks received failed (all): 0 [state-dump] - num chunks received failed / cancelled: 0 [state-dump] - num chunks received failed / plasma error: 0 [state-dump] Event stats: [state-dump] Global stats: 0 total (0 active) [state-dump] Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Execution time: mean = -nan s, total = 0.000 s [state-dump] Event stats: [state-dump] PushManager: [state-dump] - num pushes in flight: 0 [state-dump] - num chunks in flight: 0 [state-dump] - num chunks remaining: 0 [state-dump] - max chunks allowed: 409 [state-dump] OwnershipBasedObjectDirectory: [state-dump] - num listeners: 0 [state-dump] - cumulative location updates: 0 [state-dump] - num location updates per second: 0.000 [state-dump] - num location lookups per second: 0.000 [state-dump] - num locations added per second: 0.000 [state-dump] - num locations removed per second: 0.000 [state-dump] BufferPool: [state-dump] - create buffer state map size: 0 [state-dump] PullManager: [state-dump] - num bytes available for pulled objects: 2147483648 [state-dump] - num bytes being pulled (all): 0 [state-dump] - num bytes being pulled / pinned: 0 [state-dump] - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - first get request bundle: N/A [state-dump] - first wait request bundle: N/A [state-dump] - first task request bundle: N/A [state-dump] - num objects queued: 0 [state-dump] - num objects actively pulled (all): 0 [state-dump] - num objects actively pulled / pinned: 0 [state-dump] - num bundles being pulled: 0 [state-dump] - num pull retries: 0 [state-dump] - max timeout seconds: 0 [state-dump] - max timeout request is already processed. No entry. [state-dump] [state-dump] WorkerPool: [state-dump] - registered jobs: 1 [state-dump] - process_failed_job_config_missing: 0 [state-dump] - process_failed_rate_limited: 0 [state-dump] - process_failed_pending_registration: 0 [state-dump] - process_failed_runtime_env_setup_failed: 0 [state-dump] - num PYTHON workers: 20 [state-dump] - num PYTHON drivers: 1 [state-dump] - num PYTHON pending start requests: 0 [state-dump] - num PYTHON pending registration requests: 0 [state-dump] - num object spill callbacks queued: 0 [state-dump] - num object restore queued: 0 [state-dump] - num util functions queued: 0 [state-dump] - num idle workers: 20 [state-dump] TaskDependencyManager: [state-dump] - task deps map size: 0 [state-dump] - get req map size: 0 [state-dump] - wait req map size: 0 [state-dump] - local objects map size: 0 [state-dump] WaitManager: [state-dump] - num active wait requests: 0 [state-dump] Subscriber: [state-dump] Channel WORKER_OBJECT_LOCATIONS_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_OBJECT_EVICTION [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_REF_REMOVED_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] num async plasma notifications: 0 [state-dump] Remote node managers: [state-dump] Event stats: [state-dump] Global stats: 209661 total (35 active) [state-dump] Queueing time: mean = 375.917 us, max = 59.826 s, min = -0.000 s, total = 78.815 s [state-dump] Execution time: mean = 11.615 ms, total = 2435.262 s [state-dump] Event stats: [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog - 50379 total (0 active), Execution time: mean = 522.871 us, total = 26.342 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 50379 total (0 active), Execution time: mean = 34.842 us, total = 1.755 s, Queueing time: mean = 108.397 us, max = 2.841 ms, min = 1.846 us, total = 5.461 s [state-dump] ObjectManager.UpdateAvailableMemory - 23978 total (0 active), Execution time: mean = 5.935 us, total = 142.303 ms, Queueing time: mean = 107.126 us, max = 1.662 ms, min = 2.228 us, total = 2.569 s [state-dump] NodeManager.CheckGC - 23978 total (1 active), Execution time: mean = 2.815 us, total = 67.487 ms, Queueing time: mean = 96.387 us, max = 25.875 ms, min = 2.848 us, total = 2.311 s [state-dump] RaySyncer.OnDemandBroadcasting - 23978 total (1 active), Execution time: mean = 10.340 us, total = 247.931 ms, Queueing time: mean = 89.747 us, max = 25.869 ms, min = 6.166 us, total = 2.152 s [state-dump] RayletWorkerPool.deadline_timer.kill_idle_workers - 11996 total (1 active), Execution time: mean = 17.788 us, total = 213.385 ms, Queueing time: mean = 75.847 us, max = 26.386 ms, min = -0.000 s, total = 909.858 ms [state-dump] MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 9581 total (1 active), Execution time: mean = 449.797 us, total = 4.310 s, Queueing time: mean = 73.380 us, max = 3.532 ms, min = -0.000 s, total = 703.058 ms [state-dump] NodeManager.deadline_timer.spill_objects_when_over_threshold - 2400 total (1 active), Execution time: mean = 3.074 us, total = 7.379 ms, Queueing time: mean = 179.075 us, max = 2.473 ms, min = 3.958 us, total = 429.780 ms [state-dump] NodeManager.deadline_timer.flush_free_objects - 2400 total (1 active), Execution time: mean = 8.849 us, total = 21.238 ms, Queueing time: mean = 175.113 us, max = 2.483 ms, min = 6.450 us, total = 420.271 ms [state-dump] NodeManager.ScheduleAndDispatchTasks - 2400 total (1 active), Execution time: mean = 16.122 us, total = 38.693 ms, Queueing time: mean = 73.091 us, max = 2.581 ms, min = 10.666 us, total = 175.420 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 2399 total (0 active), Execution time: mean = 96.298 us, total = 231.020 ms, Queueing time: mean = 109.037 us, max = 2.934 ms, min = 4.027 us, total = 261.581 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad - 2399 total (0 active), Execution time: mean = 597.290 us, total = 1.433 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ClusterResourceManager.ResetRemoteNodeView - 801 total (1 active), Execution time: mean = 10.428 us, total = 8.353 ms, Queueing time: mean = 74.417 us, max = 442.307 us, min = 14.912 us, total = 59.608 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 480 total (0 active), Execution time: mean = 1.533 ms, total = 735.823 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.deadline_timer.record_metrics - 480 total (1 active), Execution time: mean = 573.046 us, total = 275.062 ms, Queueing time: mean = 335.772 us, max = 2.010 ms, min = 10.616 us, total = 161.170 ms [state-dump] NodeManager.GcsCheckAlive - 480 total (1 active), Execution time: mean = 297.346 us, total = 142.726 ms, Queueing time: mean = 610.485 us, max = 2.522 ms, min = 6.690 us, total = 293.033 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 480 total (0 active), Execution time: mean = 53.337 us, total = 25.602 ms, Queueing time: mean = 115.500 us, max = 4.779 ms, min = 11.561 us, total = 55.440 ms [state-dump] NodeManager.deadline_timer.debug_state_dump - 240 total (1 active), Execution time: mean = 1.748 ms, total = 419.510 ms, Queueing time: mean = 66.567 us, max = 163.431 us, min = 13.687 us, total = 15.976 ms [state-dump] ClientConnection.async_read.ProcessMessageHeader - 95 total (21 active), Execution time: mean = 7.620 us, total = 723.861 us, Queueing time: mean = 660.815 ms, max = 59.826 s, min = 29.537 us, total = 62.777 s [state-dump] ClientConnection.async_read.ProcessMessage - 74 total (0 active), Execution time: mean = 752.373 us, total = 55.676 ms, Queueing time: mean = 18.898 us, max = 247.615 us, min = 4.453 us, total = 1.398 ms [state-dump] NodeManager.deadline_timer.print_event_loop_stats - 40 total (1 active, 1 running), Execution time: mean = 2.642 ms, total = 105.695 ms, Queueing time: mean = 72.869 us, max = 215.454 us, min = 15.835 us, total = 2.915 ms [state-dump] ClientConnection.async_write.DoAsyncWrites - 22 total (0 active), Execution time: mean = 2.129 us, total = 46.834 us, Queueing time: mean = 43.475 us, max = 264.443 us, min = 17.727 us, total = 956.459 us [state-dump] ObjectManager.ObjectDeleted - 21 total (0 active), Execution time: mean = 18.654 us, total = 391.728 us, Queueing time: mean = 66.932 us, max = 269.040 us, min = 24.280 us, total = 1.406 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 21 total (0 active), Execution time: mean = 119.783 us, total = 2.515 ms, Queueing time: mean = 119.709 us, max = 193.975 us, min = 26.529 us, total = 2.514 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig - 21 total (0 active), Execution time: mean = 800.451 us, total = 16.809 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ObjectManager.ObjectAdded - 21 total (0 active), Execution time: mean = 13.522 us, total = 283.961 us, Queueing time: mean = 115.200 us, max = 203.067 us, min = 15.853 us, total = 2.419 ms [state-dump] PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 204.482 us, total = 2.658 ms, Queueing time: mean = 3.109 ms, max = 9.915 ms, min = 28.359 us, total = 40.413 ms [state-dump] - 12 total (0 active), Execution time: mean = 1.109 us, total = 13.308 us, Queueing time: mean = 117.235 us, max = 207.332 us, min = 32.817 us, total = 1.407 ms [state-dump] RaySyncer.BroadcastMessage - 12 total (0 active), Execution time: mean = 229.790 us, total = 2.757 ms, Queueing time: mean = 796.500 ns, max = 1.330 us, min = 180.000 ns, total = 9.558 us [state-dump] NodeManagerService.grpc_server.RequestWorkerLease.HandleRequestImpl - 10 total (0 active), Execution time: mean = 133.400 us, total = 1.334 ms, Queueing time: mean = 133.654 us, max = 258.016 us, min = 50.206 us, total = 1.337 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker - 10 total (0 active), Execution time: mean = 698.937 us, total = 6.989 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.RequestWorkerLease - 10 total (0 active), Execution time: mean = 967.372 us, total = 9.674 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] WorkerPool.PopWorkerCallback - 10 total (0 active), Execution time: mean = 24.488 us, total = 244.878 us, Queueing time: mean = 114.997 us, max = 206.869 us, min = 23.653 us, total = 1.150 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker.HandleRequestImpl - 10 total (0 active), Execution time: mean = 129.906 us, total = 1.299 ms, Queueing time: mean = 109.463 us, max = 137.712 us, min = 31.332 us, total = 1.095 ms [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 6 total (1 active), Execution time: mean = 399.600 s, total = 2397.602 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 5 total (0 active), Execution time: mean = 329.314 us, total = 1.647 ms, Queueing time: mean = 112.016 us, max = 183.286 us, min = 20.299 us, total = 560.078 us [state-dump] NodeManager.GCTaskFailureReason - 3 total (1 active), Execution time: mean = 6.632 us, total = 19.896 us, Queueing time: mean = 60.387 us, max = 97.290 us, min = 83.871 us, total = 181.161 us [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 1.320 ms, total = 2.640 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 142.119 us, total = 284.237 us, Queueing time: mean = 702.200 us, max = 1.287 ms, min = 117.240 us, total = 1.404 ms [state-dump] RaySyncerRegister - 2 total (0 active), Execution time: mean = 2.053 us, total = 4.106 us, Queueing time: mean = 222.000 ns, max = 369.000 ns, min = 75.000 ns, total = 444.000 ns [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 2.178 ms, total = 2.178 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 1.022 s, total = 1.022 s, Queueing time: mean = 126.779 us, max = 126.779 us, min = 126.779 us, total = 126.779 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 43.810 us, total = 43.810 us, Queueing time: mean = 116.080 us, max = 116.080 us, min = 116.080 us, total = 116.080 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 1.935 ms, total = 1.935 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 2.002 ms, total = 2.002 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 1.278 ms, total = 1.278 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 288.995 us, total = 288.995 us, Queueing time: mean = 122.868 us, max = 122.868 us, min = 122.868 us, total = 122.868 us [state-dump] Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 74.750 us, total = 74.750 us, Queueing time: mean = 225.429 us, max = 225.429 us, min = 225.429 us, total = 225.429 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 242.297 us, total = 242.297 us, Queueing time: mean = 122.977 us, max = 122.977 us, min = 122.977 us, total = 122.977 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 92.080 us, total = 92.080 us, Queueing time: mean = 36.492 us, max = 36.492 us, min = 36.492 us, total = 36.492 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 1.746 ms, total = 1.746 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] DebugString() time ms: 1 [state-dump] [state-dump] [2025-01-20 23:42:54,914 I 13636 13664] (raylet) store.cc:564: Plasma store debug dump: Current usage: 0 / 2.14748 GB - num bytes created total: 168 0 pending objects of total size 0MB - objects spillable: 0 - bytes spillable: 0 - objects unsealed: 0 - bytes unsealed: 0 - objects in use: 0 - bytes in use: 0 - objects evictable: 0 - bytes evictable: 0 - objects created by worker: 0 - bytes created by worker: 0 - objects restored: 0 - bytes restored: 0 - objects received: 0 - bytes received: 0 - objects errored: 0 - bytes errored: 0 [2025-01-20 23:42:56,038 I 13636 13636] (raylet) node_manager.cc:525: [state-dump] NodeManager: [state-dump] Node ID: 94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3 [state-dump] Node name: 192.168.0.2 [state-dump] InitialConfigResources: {object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, GPU: 20000, node:__internal_head__: 10000, accelerator_type:A40: 10000, memory: 844966629380000} [state-dump] ClusterTaskManager: [state-dump] ========== Node: 94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3 ================= [state-dump] Infeasible queue length: 0 [state-dump] Schedule queue length: 0 [state-dump] Dispatch queue length: 0 [state-dump] num_waiting_for_resource: 0 [state-dump] num_waiting_for_plasma_memory: 0 [state-dump] num_waiting_for_remote_node_resources: 0 [state-dump] num_worker_not_started_by_job_config_not_exist: 0 [state-dump] num_worker_not_started_by_registration_timeout: 0 [state-dump] num_tasks_waiting_for_workers: 0 [state-dump] num_cancelled_tasks: 0 [state-dump] cluster_resource_scheduler state: [state-dump] Local id: -2729954820020547849 Local resources: {"total":{CPU: [200000], memory: [844966629380000], accelerator_type:A40: [10000], node:192.168.0.2: [10000], node:__internal_head__: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000]}}, "available": {CPU: [200000], memory: [844966629380000], accelerator_type:A40: [10000], node:192.168.0.2: [10000], node:__internal_head__: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000]}}, "labels":{"ray.io/node_id":"94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3",} is_draining: 0 is_idle: 1 Cluster resources: node id: -2729954820020547849{"total":{object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, GPU: 20000, node:__internal_head__: 10000, accelerator_type:A40: 10000, memory: 844966629380000}}, "available": {GPU: 20000, node:__internal_head__: 10000, memory: 844966629380000, accelerator_type:A40: 10000, object_store_memory: 21474836480000, node:192.168.0.2: 10000, CPU: 200000}}, "labels":{"ray.io/node_id":"94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} [state-dump] Waiting tasks size: 0 [state-dump] Number of executing tasks: 0 [state-dump] Number of pinned task arguments: 0 [state-dump] Number of total spilled tasks: 0 [state-dump] Number of spilled waiting tasks: 0 [state-dump] Number of spilled unschedulable tasks: 0 [state-dump] Resource usage { [state-dump] } [state-dump] Backlog Size per scheduling descriptor :{workerId: num backlogs}: [state-dump] [state-dump] Running tasks by scheduling class: [state-dump] ================================================== [state-dump] [state-dump] ClusterResources: [state-dump] LocalObjectManager: [state-dump] - num pinned objects: 0 [state-dump] - pinned objects size: 0 [state-dump] - num objects pending restore: 0 [state-dump] - num objects pending spill: 0 [state-dump] - num bytes pending spill: 0 [state-dump] - num bytes currently spilled: 0 [state-dump] - cumulative spill requests: 0 [state-dump] - cumulative restore requests: 0 [state-dump] - spilled objects pending delete: 0 [state-dump] [state-dump] ObjectManager: [state-dump] - num local objects: 0 [state-dump] - num unfulfilled push requests: 0 [state-dump] - num object pull requests: 0 [state-dump] - num chunks received total: 0 [state-dump] - num chunks received failed (all): 0 [state-dump] - num chunks received failed / cancelled: 0 [state-dump] - num chunks received failed / plasma error: 0 [state-dump] Event stats: [state-dump] Global stats: 0 total (0 active) [state-dump] Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Execution time: mean = -nan s, total = 0.000 s [state-dump] Event stats: [state-dump] PushManager: [state-dump] - num pushes in flight: 0 [state-dump] - num chunks in flight: 0 [state-dump] - num chunks remaining: 0 [state-dump] - max chunks allowed: 409 [state-dump] OwnershipBasedObjectDirectory: [state-dump] - num listeners: 0 [state-dump] - cumulative location updates: 0 [state-dump] - num location updates per second: 0.000 [state-dump] - num location lookups per second: 0.000 [state-dump] - num locations added per second: 0.000 [state-dump] - num locations removed per second: 0.000 [state-dump] BufferPool: [state-dump] - create buffer state map size: 0 [state-dump] PullManager: [state-dump] - num bytes available for pulled objects: 2147483648 [state-dump] - num bytes being pulled (all): 0 [state-dump] - num bytes being pulled / pinned: 0 [state-dump] - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - first get request bundle: N/A [state-dump] - first wait request bundle: N/A [state-dump] - first task request bundle: N/A [state-dump] - num objects queued: 0 [state-dump] - num objects actively pulled (all): 0 [state-dump] - num objects actively pulled / pinned: 0 [state-dump] - num bundles being pulled: 0 [state-dump] - num pull retries: 0 [state-dump] - max timeout seconds: 0 [state-dump] - max timeout request is already processed. No entry. [state-dump] [state-dump] WorkerPool: [state-dump] - registered jobs: 1 [state-dump] - process_failed_job_config_missing: 0 [state-dump] - process_failed_rate_limited: 0 [state-dump] - process_failed_pending_registration: 0 [state-dump] - process_failed_runtime_env_setup_failed: 0 [state-dump] - num PYTHON workers: 20 [state-dump] - num PYTHON drivers: 1 [state-dump] - num PYTHON pending start requests: 0 [state-dump] - num PYTHON pending registration requests: 0 [state-dump] - num object spill callbacks queued: 0 [state-dump] - num object restore queued: 0 [state-dump] - num util functions queued: 0 [state-dump] - num idle workers: 20 [state-dump] TaskDependencyManager: [state-dump] - task deps map size: 0 [state-dump] - get req map size: 0 [state-dump] - wait req map size: 0 [state-dump] - local objects map size: 0 [state-dump] WaitManager: [state-dump] - num active wait requests: 0 [state-dump] Subscriber: [state-dump] Channel WORKER_OBJECT_LOCATIONS_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_OBJECT_EVICTION [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_REF_REMOVED_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] num async plasma notifications: 0 [state-dump] Remote node managers: [state-dump] Event stats: [state-dump] Global stats: 214893 total (35 active) [state-dump] Queueing time: mean = 368.743 us, max = 59.826 s, min = -0.000 s, total = 79.240 s [state-dump] Execution time: mean = 11.337 ms, total = 2436.227 s [state-dump] Event stats: [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog - 51639 total (0 active), Execution time: mean = 523.679 us, total = 27.042 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 51639 total (0 active), Execution time: mean = 34.843 us, total = 1.799 s, Queueing time: mean = 108.605 us, max = 2.841 ms, min = 1.846 us, total = 5.608 s [state-dump] ObjectManager.UpdateAvailableMemory - 24577 total (0 active), Execution time: mean = 5.946 us, total = 146.129 ms, Queueing time: mean = 107.390 us, max = 1.662 ms, min = 2.228 us, total = 2.639 s [state-dump] NodeManager.CheckGC - 24577 total (1 active), Execution time: mean = 2.817 us, total = 69.224 ms, Queueing time: mean = 96.441 us, max = 25.875 ms, min = 2.848 us, total = 2.370 s [state-dump] RaySyncer.OnDemandBroadcasting - 24577 total (1 active), Execution time: mean = 10.344 us, total = 254.235 ms, Queueing time: mean = 89.800 us, max = 25.869 ms, min = 6.166 us, total = 2.207 s [state-dump] RayletWorkerPool.deadline_timer.kill_idle_workers - 12296 total (1 active), Execution time: mean = 17.804 us, total = 218.920 ms, Queueing time: mean = 75.924 us, max = 26.386 ms, min = -0.000 s, total = 933.568 ms [state-dump] MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 9821 total (1 active), Execution time: mean = 450.260 us, total = 4.422 s, Queueing time: mean = 73.483 us, max = 3.532 ms, min = -0.000 s, total = 721.676 ms [state-dump] NodeManager.deadline_timer.spill_objects_when_over_threshold - 2460 total (1 active), Execution time: mean = 3.084 us, total = 7.587 ms, Queueing time: mean = 179.375 us, max = 2.473 ms, min = 3.958 us, total = 441.261 ms [state-dump] NodeManager.deadline_timer.flush_free_objects - 2460 total (1 active), Execution time: mean = 8.871 us, total = 21.822 ms, Queueing time: mean = 175.410 us, max = 2.483 ms, min = 6.450 us, total = 431.508 ms [state-dump] NodeManager.ScheduleAndDispatchTasks - 2460 total (1 active), Execution time: mean = 16.139 us, total = 39.701 ms, Queueing time: mean = 73.147 us, max = 2.581 ms, min = 10.666 us, total = 179.941 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 2459 total (0 active), Execution time: mean = 96.372 us, total = 236.979 ms, Queueing time: mean = 109.119 us, max = 2.934 ms, min = 4.027 us, total = 268.323 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad - 2459 total (0 active), Execution time: mean = 597.676 us, total = 1.470 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ClusterResourceManager.ResetRemoteNodeView - 821 total (1 active), Execution time: mean = 10.396 us, total = 8.535 ms, Queueing time: mean = 75.008 us, max = 442.307 us, min = 14.912 us, total = 61.582 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 492 total (0 active), Execution time: mean = 1.535 ms, total = 755.437 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.deadline_timer.record_metrics - 492 total (1 active), Execution time: mean = 574.432 us, total = 282.621 ms, Queueing time: mean = 336.016 us, max = 2.010 ms, min = 10.616 us, total = 165.320 ms [state-dump] NodeManager.GcsCheckAlive - 492 total (1 active), Execution time: mean = 297.243 us, total = 146.243 ms, Queueing time: mean = 612.386 us, max = 2.522 ms, min = 6.690 us, total = 301.294 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 492 total (0 active), Execution time: mean = 53.384 us, total = 26.265 ms, Queueing time: mean = 116.274 us, max = 4.779 ms, min = 11.561 us, total = 57.207 ms [state-dump] NodeManager.deadline_timer.debug_state_dump - 246 total (1 active), Execution time: mean = 1.751 ms, total = 430.632 ms, Queueing time: mean = 66.549 us, max = 163.431 us, min = 13.687 us, total = 16.371 ms [state-dump] ClientConnection.async_read.ProcessMessageHeader - 95 total (21 active), Execution time: mean = 7.620 us, total = 723.861 us, Queueing time: mean = 660.815 ms, max = 59.826 s, min = 29.537 us, total = 62.777 s [state-dump] ClientConnection.async_read.ProcessMessage - 74 total (0 active), Execution time: mean = 752.373 us, total = 55.676 ms, Queueing time: mean = 18.898 us, max = 247.615 us, min = 4.453 us, total = 1.398 ms [state-dump] NodeManager.deadline_timer.print_event_loop_stats - 41 total (1 active, 1 running), Execution time: mean = 2.649 ms, total = 108.616 ms, Queueing time: mean = 72.753 us, max = 215.454 us, min = 15.835 us, total = 2.983 ms [state-dump] ClientConnection.async_write.DoAsyncWrites - 22 total (0 active), Execution time: mean = 2.129 us, total = 46.834 us, Queueing time: mean = 43.475 us, max = 264.443 us, min = 17.727 us, total = 956.459 us [state-dump] ObjectManager.ObjectDeleted - 21 total (0 active), Execution time: mean = 18.654 us, total = 391.728 us, Queueing time: mean = 66.932 us, max = 269.040 us, min = 24.280 us, total = 1.406 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 21 total (0 active), Execution time: mean = 119.783 us, total = 2.515 ms, Queueing time: mean = 119.709 us, max = 193.975 us, min = 26.529 us, total = 2.514 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig - 21 total (0 active), Execution time: mean = 800.451 us, total = 16.809 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ObjectManager.ObjectAdded - 21 total (0 active), Execution time: mean = 13.522 us, total = 283.961 us, Queueing time: mean = 115.200 us, max = 203.067 us, min = 15.853 us, total = 2.419 ms [state-dump] PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 204.482 us, total = 2.658 ms, Queueing time: mean = 3.109 ms, max = 9.915 ms, min = 28.359 us, total = 40.413 ms [state-dump] - 12 total (0 active), Execution time: mean = 1.109 us, total = 13.308 us, Queueing time: mean = 117.235 us, max = 207.332 us, min = 32.817 us, total = 1.407 ms [state-dump] RaySyncer.BroadcastMessage - 12 total (0 active), Execution time: mean = 229.790 us, total = 2.757 ms, Queueing time: mean = 796.500 ns, max = 1.330 us, min = 180.000 ns, total = 9.558 us [state-dump] NodeManagerService.grpc_server.RequestWorkerLease.HandleRequestImpl - 10 total (0 active), Execution time: mean = 133.400 us, total = 1.334 ms, Queueing time: mean = 133.654 us, max = 258.016 us, min = 50.206 us, total = 1.337 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker - 10 total (0 active), Execution time: mean = 698.937 us, total = 6.989 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.RequestWorkerLease - 10 total (0 active), Execution time: mean = 967.372 us, total = 9.674 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] WorkerPool.PopWorkerCallback - 10 total (0 active), Execution time: mean = 24.488 us, total = 244.878 us, Queueing time: mean = 114.997 us, max = 206.869 us, min = 23.653 us, total = 1.150 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker.HandleRequestImpl - 10 total (0 active), Execution time: mean = 129.906 us, total = 1.299 ms, Queueing time: mean = 109.463 us, max = 137.712 us, min = 31.332 us, total = 1.095 ms [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 6 total (1 active), Execution time: mean = 399.600 s, total = 2397.602 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 5 total (0 active), Execution time: mean = 329.314 us, total = 1.647 ms, Queueing time: mean = 112.016 us, max = 183.286 us, min = 20.299 us, total = 560.078 us [state-dump] NodeManager.GCTaskFailureReason - 3 total (1 active), Execution time: mean = 6.632 us, total = 19.896 us, Queueing time: mean = 60.387 us, max = 97.290 us, min = 83.871 us, total = 181.161 us [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 1.320 ms, total = 2.640 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 142.119 us, total = 284.237 us, Queueing time: mean = 702.200 us, max = 1.287 ms, min = 117.240 us, total = 1.404 ms [state-dump] RaySyncerRegister - 2 total (0 active), Execution time: mean = 2.053 us, total = 4.106 us, Queueing time: mean = 222.000 ns, max = 369.000 ns, min = 75.000 ns, total = 444.000 ns [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 2.178 ms, total = 2.178 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 1.022 s, total = 1.022 s, Queueing time: mean = 126.779 us, max = 126.779 us, min = 126.779 us, total = 126.779 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 43.810 us, total = 43.810 us, Queueing time: mean = 116.080 us, max = 116.080 us, min = 116.080 us, total = 116.080 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 1.935 ms, total = 1.935 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 2.002 ms, total = 2.002 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 1.278 ms, total = 1.278 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 288.995 us, total = 288.995 us, Queueing time: mean = 122.868 us, max = 122.868 us, min = 122.868 us, total = 122.868 us [state-dump] Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 74.750 us, total = 74.750 us, Queueing time: mean = 225.429 us, max = 225.429 us, min = 225.429 us, total = 225.429 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 242.297 us, total = 242.297 us, Queueing time: mean = 122.977 us, max = 122.977 us, min = 122.977 us, total = 122.977 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 92.080 us, total = 92.080 us, Queueing time: mean = 36.492 us, max = 36.492 us, min = 36.492 us, total = 36.492 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 1.746 ms, total = 1.746 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] DebugString() time ms: 1 [state-dump] [state-dump] [2025-01-20 23:43:54,915 I 13636 13664] (raylet) store.cc:564: Plasma store debug dump: Current usage: 0 / 2.14748 GB - num bytes created total: 168 0 pending objects of total size 0MB - objects spillable: 0 - bytes spillable: 0 - objects unsealed: 0 - bytes unsealed: 0 - objects in use: 0 - bytes in use: 0 - objects evictable: 0 - bytes evictable: 0 - objects created by worker: 0 - bytes created by worker: 0 - objects restored: 0 - bytes restored: 0 - objects received: 0 - bytes received: 0 - objects errored: 0 - bytes errored: 0 [2025-01-20 23:43:56,041 I 13636 13636] (raylet) node_manager.cc:525: [state-dump] NodeManager: [state-dump] Node ID: 94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3 [state-dump] Node name: 192.168.0.2 [state-dump] InitialConfigResources: {object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, GPU: 20000, node:__internal_head__: 10000, accelerator_type:A40: 10000, memory: 844966629380000} [state-dump] ClusterTaskManager: [state-dump] ========== Node: 94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3 ================= [state-dump] Infeasible queue length: 0 [state-dump] Schedule queue length: 0 [state-dump] Dispatch queue length: 0 [state-dump] num_waiting_for_resource: 0 [state-dump] num_waiting_for_plasma_memory: 0 [state-dump] num_waiting_for_remote_node_resources: 0 [state-dump] num_worker_not_started_by_job_config_not_exist: 0 [state-dump] num_worker_not_started_by_registration_timeout: 0 [state-dump] num_tasks_waiting_for_workers: 0 [state-dump] num_cancelled_tasks: 0 [state-dump] cluster_resource_scheduler state: [state-dump] Local id: -2729954820020547849 Local resources: {"total":{CPU: [200000], memory: [844966629380000], accelerator_type:A40: [10000], node:192.168.0.2: [10000], node:__internal_head__: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000]}}, "available": {CPU: [200000], memory: [844966629380000], accelerator_type:A40: [10000], node:192.168.0.2: [10000], node:__internal_head__: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000]}}, "labels":{"ray.io/node_id":"94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3",} is_draining: 0 is_idle: 1 Cluster resources: node id: -2729954820020547849{"total":{object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, GPU: 20000, node:__internal_head__: 10000, accelerator_type:A40: 10000, memory: 844966629380000}}, "available": {GPU: 20000, node:__internal_head__: 10000, memory: 844966629380000, accelerator_type:A40: 10000, object_store_memory: 21474836480000, node:192.168.0.2: 10000, CPU: 200000}}, "labels":{"ray.io/node_id":"94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} [state-dump] Waiting tasks size: 0 [state-dump] Number of executing tasks: 0 [state-dump] Number of pinned task arguments: 0 [state-dump] Number of total spilled tasks: 0 [state-dump] Number of spilled waiting tasks: 0 [state-dump] Number of spilled unschedulable tasks: 0 [state-dump] Resource usage { [state-dump] } [state-dump] Backlog Size per scheduling descriptor :{workerId: num backlogs}: [state-dump] [state-dump] Running tasks by scheduling class: [state-dump] ================================================== [state-dump] [state-dump] ClusterResources: [state-dump] LocalObjectManager: [state-dump] - num pinned objects: 0 [state-dump] - pinned objects size: 0 [state-dump] - num objects pending restore: 0 [state-dump] - num objects pending spill: 0 [state-dump] - num bytes pending spill: 0 [state-dump] - num bytes currently spilled: 0 [state-dump] - cumulative spill requests: 0 [state-dump] - cumulative restore requests: 0 [state-dump] - spilled objects pending delete: 0 [state-dump] [state-dump] ObjectManager: [state-dump] - num local objects: 0 [state-dump] - num unfulfilled push requests: 0 [state-dump] - num object pull requests: 0 [state-dump] - num chunks received total: 0 [state-dump] - num chunks received failed (all): 0 [state-dump] - num chunks received failed / cancelled: 0 [state-dump] - num chunks received failed / plasma error: 0 [state-dump] Event stats: [state-dump] Global stats: 0 total (0 active) [state-dump] Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Execution time: mean = -nan s, total = 0.000 s [state-dump] Event stats: [state-dump] PushManager: [state-dump] - num pushes in flight: 0 [state-dump] - num chunks in flight: 0 [state-dump] - num chunks remaining: 0 [state-dump] - max chunks allowed: 409 [state-dump] OwnershipBasedObjectDirectory: [state-dump] - num listeners: 0 [state-dump] - cumulative location updates: 0 [state-dump] - num location updates per second: 0.000 [state-dump] - num location lookups per second: 0.000 [state-dump] - num locations added per second: 0.000 [state-dump] - num locations removed per second: 0.000 [state-dump] BufferPool: [state-dump] - create buffer state map size: 0 [state-dump] PullManager: [state-dump] - num bytes available for pulled objects: 2147483648 [state-dump] - num bytes being pulled (all): 0 [state-dump] - num bytes being pulled / pinned: 0 [state-dump] - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - first get request bundle: N/A [state-dump] - first wait request bundle: N/A [state-dump] - first task request bundle: N/A [state-dump] - num objects queued: 0 [state-dump] - num objects actively pulled (all): 0 [state-dump] - num objects actively pulled / pinned: 0 [state-dump] - num bundles being pulled: 0 [state-dump] - num pull retries: 0 [state-dump] - max timeout seconds: 0 [state-dump] - max timeout request is already processed. No entry. [state-dump] [state-dump] WorkerPool: [state-dump] - registered jobs: 1 [state-dump] - process_failed_job_config_missing: 0 [state-dump] - process_failed_rate_limited: 0 [state-dump] - process_failed_pending_registration: 0 [state-dump] - process_failed_runtime_env_setup_failed: 0 [state-dump] - num PYTHON workers: 20 [state-dump] - num PYTHON drivers: 1 [state-dump] - num PYTHON pending start requests: 0 [state-dump] - num PYTHON pending registration requests: 0 [state-dump] - num object spill callbacks queued: 0 [state-dump] - num object restore queued: 0 [state-dump] - num util functions queued: 0 [state-dump] - num idle workers: 20 [state-dump] TaskDependencyManager: [state-dump] - task deps map size: 0 [state-dump] - get req map size: 0 [state-dump] - wait req map size: 0 [state-dump] - local objects map size: 0 [state-dump] WaitManager: [state-dump] - num active wait requests: 0 [state-dump] Subscriber: [state-dump] Channel WORKER_OBJECT_LOCATIONS_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_OBJECT_EVICTION [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_REF_REMOVED_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] num async plasma notifications: 0 [state-dump] Remote node managers: [state-dump] Event stats: [state-dump] Global stats: 220126 total (35 active) [state-dump] Queueing time: mean = 361.813 us, max = 59.826 s, min = -0.001 s, total = 79.644 s [state-dump] Execution time: mean = 11.072 ms, total = 2437.142 s [state-dump] Event stats: [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog - 52899 total (0 active), Execution time: mean = 523.559 us, total = 27.696 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 52899 total (0 active), Execution time: mean = 34.802 us, total = 1.841 s, Queueing time: mean = 108.585 us, max = 2.841 ms, min = 1.846 us, total = 5.744 s [state-dump] ObjectManager.UpdateAvailableMemory - 25177 total (0 active), Execution time: mean = 5.951 us, total = 149.826 ms, Queueing time: mean = 107.446 us, max = 1.662 ms, min = 2.228 us, total = 2.705 s [state-dump] NodeManager.CheckGC - 25177 total (1 active), Execution time: mean = 2.818 us, total = 70.961 ms, Queueing time: mean = 96.429 us, max = 25.875 ms, min = 2.848 us, total = 2.428 s [state-dump] RaySyncer.OnDemandBroadcasting - 25177 total (1 active), Execution time: mean = 10.345 us, total = 260.463 ms, Queueing time: mean = 89.789 us, max = 25.869 ms, min = 6.166 us, total = 2.261 s [state-dump] RayletWorkerPool.deadline_timer.kill_idle_workers - 12595 total (1 active), Execution time: mean = 17.934 us, total = 225.876 ms, Queueing time: mean = 75.921 us, max = 26.386 ms, min = -0.001 s, total = 956.228 ms [state-dump] MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 10060 total (1 active), Execution time: mean = 450.572 us, total = 4.533 s, Queueing time: mean = 73.520 us, max = 3.532 ms, min = -0.000 s, total = 739.608 ms [state-dump] NodeManager.deadline_timer.spill_objects_when_over_threshold - 2520 total (1 active), Execution time: mean = 3.081 us, total = 7.765 ms, Queueing time: mean = 179.790 us, max = 2.946 ms, min = 3.958 us, total = 453.070 ms [state-dump] NodeManager.deadline_timer.flush_free_objects - 2520 total (1 active), Execution time: mean = 8.866 us, total = 22.343 ms, Queueing time: mean = 175.822 us, max = 2.947 ms, min = 6.450 us, total = 443.071 ms [state-dump] NodeManager.ScheduleAndDispatchTasks - 2520 total (1 active), Execution time: mean = 16.172 us, total = 40.753 ms, Queueing time: mean = 73.182 us, max = 2.581 ms, min = 10.666 us, total = 184.419 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 2519 total (0 active), Execution time: mean = 96.440 us, total = 242.932 ms, Queueing time: mean = 109.192 us, max = 2.934 ms, min = 4.027 us, total = 275.054 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad - 2519 total (0 active), Execution time: mean = 598.661 us, total = 1.508 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ClusterResourceManager.ResetRemoteNodeView - 841 total (1 active), Execution time: mean = 10.379 us, total = 8.729 ms, Queueing time: mean = 74.990 us, max = 442.307 us, min = 14.912 us, total = 63.066 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 504 total (0 active), Execution time: mean = 1.538 ms, total = 775.082 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.deadline_timer.record_metrics - 504 total (1 active), Execution time: mean = 574.676 us, total = 289.637 ms, Queueing time: mean = 337.841 us, max = 2.010 ms, min = 10.616 us, total = 170.272 ms [state-dump] NodeManager.GcsCheckAlive - 504 total (1 active), Execution time: mean = 297.974 us, total = 150.179 ms, Queueing time: mean = 613.725 us, max = 2.567 ms, min = 6.690 us, total = 309.317 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 504 total (0 active), Execution time: mean = 53.450 us, total = 26.939 ms, Queueing time: mean = 115.944 us, max = 4.779 ms, min = 11.561 us, total = 58.436 ms [state-dump] NodeManager.deadline_timer.debug_state_dump - 252 total (1 active), Execution time: mean = 1.753 ms, total = 441.882 ms, Queueing time: mean = 66.602 us, max = 163.431 us, min = 13.687 us, total = 16.784 ms [state-dump] ClientConnection.async_read.ProcessMessageHeader - 95 total (21 active), Execution time: mean = 7.620 us, total = 723.861 us, Queueing time: mean = 660.815 ms, max = 59.826 s, min = 29.537 us, total = 62.777 s [state-dump] ClientConnection.async_read.ProcessMessage - 74 total (0 active), Execution time: mean = 752.373 us, total = 55.676 ms, Queueing time: mean = 18.898 us, max = 247.615 us, min = 4.453 us, total = 1.398 ms [state-dump] NodeManager.deadline_timer.print_event_loop_stats - 42 total (1 active, 1 running), Execution time: mean = 2.639 ms, total = 110.830 ms, Queueing time: mean = 72.276 us, max = 215.454 us, min = 15.835 us, total = 3.036 ms [state-dump] ClientConnection.async_write.DoAsyncWrites - 22 total (0 active), Execution time: mean = 2.129 us, total = 46.834 us, Queueing time: mean = 43.475 us, max = 264.443 us, min = 17.727 us, total = 956.459 us [state-dump] ObjectManager.ObjectDeleted - 21 total (0 active), Execution time: mean = 18.654 us, total = 391.728 us, Queueing time: mean = 66.932 us, max = 269.040 us, min = 24.280 us, total = 1.406 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 21 total (0 active), Execution time: mean = 119.783 us, total = 2.515 ms, Queueing time: mean = 119.709 us, max = 193.975 us, min = 26.529 us, total = 2.514 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig - 21 total (0 active), Execution time: mean = 800.451 us, total = 16.809 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ObjectManager.ObjectAdded - 21 total (0 active), Execution time: mean = 13.522 us, total = 283.961 us, Queueing time: mean = 115.200 us, max = 203.067 us, min = 15.853 us, total = 2.419 ms [state-dump] PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 204.482 us, total = 2.658 ms, Queueing time: mean = 3.109 ms, max = 9.915 ms, min = 28.359 us, total = 40.413 ms [state-dump] - 12 total (0 active), Execution time: mean = 1.109 us, total = 13.308 us, Queueing time: mean = 117.235 us, max = 207.332 us, min = 32.817 us, total = 1.407 ms [state-dump] RaySyncer.BroadcastMessage - 12 total (0 active), Execution time: mean = 229.790 us, total = 2.757 ms, Queueing time: mean = 796.500 ns, max = 1.330 us, min = 180.000 ns, total = 9.558 us [state-dump] NodeManagerService.grpc_server.RequestWorkerLease.HandleRequestImpl - 10 total (0 active), Execution time: mean = 133.400 us, total = 1.334 ms, Queueing time: mean = 133.654 us, max = 258.016 us, min = 50.206 us, total = 1.337 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker - 10 total (0 active), Execution time: mean = 698.937 us, total = 6.989 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.RequestWorkerLease - 10 total (0 active), Execution time: mean = 967.372 us, total = 9.674 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] WorkerPool.PopWorkerCallback - 10 total (0 active), Execution time: mean = 24.488 us, total = 244.878 us, Queueing time: mean = 114.997 us, max = 206.869 us, min = 23.653 us, total = 1.150 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker.HandleRequestImpl - 10 total (0 active), Execution time: mean = 129.906 us, total = 1.299 ms, Queueing time: mean = 109.463 us, max = 137.712 us, min = 31.332 us, total = 1.095 ms [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 6 total (1 active), Execution time: mean = 399.600 s, total = 2397.602 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 5 total (0 active), Execution time: mean = 329.314 us, total = 1.647 ms, Queueing time: mean = 112.016 us, max = 183.286 us, min = 20.299 us, total = 560.078 us [state-dump] NodeManager.GCTaskFailureReason - 3 total (1 active), Execution time: mean = 6.632 us, total = 19.896 us, Queueing time: mean = 60.387 us, max = 97.290 us, min = 83.871 us, total = 181.161 us [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 1.320 ms, total = 2.640 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 142.119 us, total = 284.237 us, Queueing time: mean = 702.200 us, max = 1.287 ms, min = 117.240 us, total = 1.404 ms [state-dump] RaySyncerRegister - 2 total (0 active), Execution time: mean = 2.053 us, total = 4.106 us, Queueing time: mean = 222.000 ns, max = 369.000 ns, min = 75.000 ns, total = 444.000 ns [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 2.178 ms, total = 2.178 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 1.022 s, total = 1.022 s, Queueing time: mean = 126.779 us, max = 126.779 us, min = 126.779 us, total = 126.779 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 43.810 us, total = 43.810 us, Queueing time: mean = 116.080 us, max = 116.080 us, min = 116.080 us, total = 116.080 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 1.935 ms, total = 1.935 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 2.002 ms, total = 2.002 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 1.278 ms, total = 1.278 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 288.995 us, total = 288.995 us, Queueing time: mean = 122.868 us, max = 122.868 us, min = 122.868 us, total = 122.868 us [state-dump] Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 74.750 us, total = 74.750 us, Queueing time: mean = 225.429 us, max = 225.429 us, min = 225.429 us, total = 225.429 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 242.297 us, total = 242.297 us, Queueing time: mean = 122.977 us, max = 122.977 us, min = 122.977 us, total = 122.977 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 92.080 us, total = 92.080 us, Queueing time: mean = 36.492 us, max = 36.492 us, min = 36.492 us, total = 36.492 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 1.746 ms, total = 1.746 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] DebugString() time ms: 1 [state-dump] [state-dump] [2025-01-20 23:44:54,915 I 13636 13664] (raylet) store.cc:564: Plasma store debug dump: Current usage: 0 / 2.14748 GB - num bytes created total: 168 0 pending objects of total size 0MB - objects spillable: 0 - bytes spillable: 0 - objects unsealed: 0 - bytes unsealed: 0 - objects in use: 0 - bytes in use: 0 - objects evictable: 0 - bytes evictable: 0 - objects created by worker: 0 - bytes created by worker: 0 - objects restored: 0 - bytes restored: 0 - objects received: 0 - bytes received: 0 - objects errored: 0 - bytes errored: 0 [2025-01-20 23:44:56,044 I 13636 13636] (raylet) node_manager.cc:525: [state-dump] NodeManager: [state-dump] Node ID: 94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3 [state-dump] Node name: 192.168.0.2 [state-dump] InitialConfigResources: {object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, GPU: 20000, node:__internal_head__: 10000, accelerator_type:A40: 10000, memory: 844966629380000} [state-dump] ClusterTaskManager: [state-dump] ========== Node: 94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3 ================= [state-dump] Infeasible queue length: 0 [state-dump] Schedule queue length: 0 [state-dump] Dispatch queue length: 0 [state-dump] num_waiting_for_resource: 0 [state-dump] num_waiting_for_plasma_memory: 0 [state-dump] num_waiting_for_remote_node_resources: 0 [state-dump] num_worker_not_started_by_job_config_not_exist: 0 [state-dump] num_worker_not_started_by_registration_timeout: 0 [state-dump] num_tasks_waiting_for_workers: 0 [state-dump] num_cancelled_tasks: 0 [state-dump] cluster_resource_scheduler state: [state-dump] Local id: -2729954820020547849 Local resources: {"total":{CPU: [200000], memory: [844966629380000], accelerator_type:A40: [10000], node:192.168.0.2: [10000], node:__internal_head__: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000]}}, "available": {CPU: [200000], memory: [844966629380000], accelerator_type:A40: [10000], node:192.168.0.2: [10000], node:__internal_head__: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000]}}, "labels":{"ray.io/node_id":"94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3",} is_draining: 0 is_idle: 1 Cluster resources: node id: -2729954820020547849{"total":{object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, GPU: 20000, node:__internal_head__: 10000, accelerator_type:A40: 10000, memory: 844966629380000}}, "available": {GPU: 20000, node:__internal_head__: 10000, memory: 844966629380000, accelerator_type:A40: 10000, object_store_memory: 21474836480000, node:192.168.0.2: 10000, CPU: 200000}}, "labels":{"ray.io/node_id":"94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} [state-dump] Waiting tasks size: 0 [state-dump] Number of executing tasks: 0 [state-dump] Number of pinned task arguments: 0 [state-dump] Number of total spilled tasks: 0 [state-dump] Number of spilled waiting tasks: 0 [state-dump] Number of spilled unschedulable tasks: 0 [state-dump] Resource usage { [state-dump] } [state-dump] Backlog Size per scheduling descriptor :{workerId: num backlogs}: [state-dump] [state-dump] Running tasks by scheduling class: [state-dump] ================================================== [state-dump] [state-dump] ClusterResources: [state-dump] LocalObjectManager: [state-dump] - num pinned objects: 0 [state-dump] - pinned objects size: 0 [state-dump] - num objects pending restore: 0 [state-dump] - num objects pending spill: 0 [state-dump] - num bytes pending spill: 0 [state-dump] - num bytes currently spilled: 0 [state-dump] - cumulative spill requests: 0 [state-dump] - cumulative restore requests: 0 [state-dump] - spilled objects pending delete: 0 [state-dump] [state-dump] ObjectManager: [state-dump] - num local objects: 0 [state-dump] - num unfulfilled push requests: 0 [state-dump] - num object pull requests: 0 [state-dump] - num chunks received total: 0 [state-dump] - num chunks received failed (all): 0 [state-dump] - num chunks received failed / cancelled: 0 [state-dump] - num chunks received failed / plasma error: 0 [state-dump] Event stats: [state-dump] Global stats: 0 total (0 active) [state-dump] Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Execution time: mean = -nan s, total = 0.000 s [state-dump] Event stats: [state-dump] PushManager: [state-dump] - num pushes in flight: 0 [state-dump] - num chunks in flight: 0 [state-dump] - num chunks remaining: 0 [state-dump] - max chunks allowed: 409 [state-dump] OwnershipBasedObjectDirectory: [state-dump] - num listeners: 0 [state-dump] - cumulative location updates: 0 [state-dump] - num location updates per second: 0.000 [state-dump] - num location lookups per second: 0.000 [state-dump] - num locations added per second: 0.000 [state-dump] - num locations removed per second: 0.000 [state-dump] BufferPool: [state-dump] - create buffer state map size: 0 [state-dump] PullManager: [state-dump] - num bytes available for pulled objects: 2147483648 [state-dump] - num bytes being pulled (all): 0 [state-dump] - num bytes being pulled / pinned: 0 [state-dump] - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - first get request bundle: N/A [state-dump] - first wait request bundle: N/A [state-dump] - first task request bundle: N/A [state-dump] - num objects queued: 0 [state-dump] - num objects actively pulled (all): 0 [state-dump] - num objects actively pulled / pinned: 0 [state-dump] - num bundles being pulled: 0 [state-dump] - num pull retries: 0 [state-dump] - max timeout seconds: 0 [state-dump] - max timeout request is already processed. No entry. [state-dump] [state-dump] WorkerPool: [state-dump] - registered jobs: 1 [state-dump] - process_failed_job_config_missing: 0 [state-dump] - process_failed_rate_limited: 0 [state-dump] - process_failed_pending_registration: 0 [state-dump] - process_failed_runtime_env_setup_failed: 0 [state-dump] - num PYTHON workers: 20 [state-dump] - num PYTHON drivers: 1 [state-dump] - num PYTHON pending start requests: 0 [state-dump] - num PYTHON pending registration requests: 0 [state-dump] - num object spill callbacks queued: 0 [state-dump] - num object restore queued: 0 [state-dump] - num util functions queued: 0 [state-dump] - num idle workers: 20 [state-dump] TaskDependencyManager: [state-dump] - task deps map size: 0 [state-dump] - get req map size: 0 [state-dump] - wait req map size: 0 [state-dump] - local objects map size: 0 [state-dump] WaitManager: [state-dump] - num active wait requests: 0 [state-dump] Subscriber: [state-dump] Channel WORKER_OBJECT_LOCATIONS_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_OBJECT_EVICTION [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_REF_REMOVED_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] num async plasma notifications: 0 [state-dump] Remote node managers: [state-dump] Event stats: [state-dump] Global stats: 225358 total (35 active) [state-dump] Queueing time: mean = 355.071 us, max = 59.826 s, min = -0.001 s, total = 80.018 s [state-dump] Execution time: mean = 10.818 ms, total = 2438.025 s [state-dump] Event stats: [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog - 54159 total (0 active), Execution time: mean = 522.953 us, total = 28.323 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 54159 total (0 active), Execution time: mean = 34.745 us, total = 1.882 s, Queueing time: mean = 108.330 us, max = 2.841 ms, min = 1.846 us, total = 5.867 s [state-dump] ObjectManager.UpdateAvailableMemory - 25776 total (0 active), Execution time: mean = 5.945 us, total = 153.247 ms, Queueing time: mean = 107.231 us, max = 1.662 ms, min = 2.228 us, total = 2.764 s [state-dump] NodeManager.CheckGC - 25776 total (1 active), Execution time: mean = 2.817 us, total = 72.609 ms, Queueing time: mean = 96.333 us, max = 25.875 ms, min = 2.848 us, total = 2.483 s [state-dump] RaySyncer.OnDemandBroadcasting - 25776 total (1 active), Execution time: mean = 10.341 us, total = 266.560 ms, Queueing time: mean = 89.695 us, max = 25.869 ms, min = 6.166 us, total = 2.312 s [state-dump] RayletWorkerPool.deadline_timer.kill_idle_workers - 12895 total (1 active), Execution time: mean = 17.910 us, total = 230.953 ms, Queueing time: mean = 75.716 us, max = 26.386 ms, min = -0.001 s, total = 976.360 ms [state-dump] MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 10300 total (1 active), Execution time: mean = 450.582 us, total = 4.641 s, Queueing time: mean = 73.390 us, max = 3.532 ms, min = -0.000 s, total = 755.919 ms [state-dump] NodeManager.deadline_timer.spill_objects_when_over_threshold - 2580 total (1 active), Execution time: mean = 3.083 us, total = 7.955 ms, Queueing time: mean = 180.066 us, max = 2.946 ms, min = 3.958 us, total = 464.570 ms [state-dump] NodeManager.deadline_timer.flush_free_objects - 2580 total (1 active), Execution time: mean = 8.867 us, total = 22.876 ms, Queueing time: mean = 176.104 us, max = 2.947 ms, min = 6.450 us, total = 454.350 ms [state-dump] NodeManager.ScheduleAndDispatchTasks - 2580 total (1 active), Execution time: mean = 16.172 us, total = 41.724 ms, Queueing time: mean = 73.125 us, max = 2.581 ms, min = 10.666 us, total = 188.663 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 2579 total (0 active), Execution time: mean = 96.422 us, total = 248.674 ms, Queueing time: mean = 109.045 us, max = 2.934 ms, min = 4.027 us, total = 281.227 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad - 2579 total (0 active), Execution time: mean = 598.676 us, total = 1.544 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ClusterResourceManager.ResetRemoteNodeView - 861 total (1 active), Execution time: mean = 10.357 us, total = 8.917 ms, Queueing time: mean = 75.108 us, max = 442.307 us, min = 14.912 us, total = 64.668 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 516 total (0 active), Execution time: mean = 1.542 ms, total = 795.568 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.deadline_timer.record_metrics - 516 total (1 active), Execution time: mean = 576.118 us, total = 297.277 ms, Queueing time: mean = 338.078 us, max = 2.010 ms, min = 10.616 us, total = 174.448 ms [state-dump] NodeManager.GcsCheckAlive - 516 total (1 active), Execution time: mean = 298.485 us, total = 154.018 ms, Queueing time: mean = 614.679 us, max = 2.567 ms, min = 6.690 us, total = 317.174 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 516 total (0 active), Execution time: mean = 53.514 us, total = 27.613 ms, Queueing time: mean = 115.863 us, max = 4.779 ms, min = 11.561 us, total = 59.785 ms [state-dump] NodeManager.deadline_timer.debug_state_dump - 258 total (1 active), Execution time: mean = 1.758 ms, total = 453.555 ms, Queueing time: mean = 66.769 us, max = 163.431 us, min = 13.687 us, total = 17.226 ms [state-dump] ClientConnection.async_read.ProcessMessageHeader - 95 total (21 active), Execution time: mean = 7.620 us, total = 723.861 us, Queueing time: mean = 660.815 ms, max = 59.826 s, min = 29.537 us, total = 62.777 s [state-dump] ClientConnection.async_read.ProcessMessage - 74 total (0 active), Execution time: mean = 752.373 us, total = 55.676 ms, Queueing time: mean = 18.898 us, max = 247.615 us, min = 4.453 us, total = 1.398 ms [state-dump] NodeManager.deadline_timer.print_event_loop_stats - 43 total (1 active, 1 running), Execution time: mean = 2.645 ms, total = 113.730 ms, Queueing time: mean = 73.147 us, max = 215.454 us, min = 15.835 us, total = 3.145 ms [state-dump] ClientConnection.async_write.DoAsyncWrites - 22 total (0 active), Execution time: mean = 2.129 us, total = 46.834 us, Queueing time: mean = 43.475 us, max = 264.443 us, min = 17.727 us, total = 956.459 us [state-dump] ObjectManager.ObjectDeleted - 21 total (0 active), Execution time: mean = 18.654 us, total = 391.728 us, Queueing time: mean = 66.932 us, max = 269.040 us, min = 24.280 us, total = 1.406 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 21 total (0 active), Execution time: mean = 119.783 us, total = 2.515 ms, Queueing time: mean = 119.709 us, max = 193.975 us, min = 26.529 us, total = 2.514 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig - 21 total (0 active), Execution time: mean = 800.451 us, total = 16.809 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ObjectManager.ObjectAdded - 21 total (0 active), Execution time: mean = 13.522 us, total = 283.961 us, Queueing time: mean = 115.200 us, max = 203.067 us, min = 15.853 us, total = 2.419 ms [state-dump] PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 204.482 us, total = 2.658 ms, Queueing time: mean = 3.109 ms, max = 9.915 ms, min = 28.359 us, total = 40.413 ms [state-dump] - 12 total (0 active), Execution time: mean = 1.109 us, total = 13.308 us, Queueing time: mean = 117.235 us, max = 207.332 us, min = 32.817 us, total = 1.407 ms [state-dump] RaySyncer.BroadcastMessage - 12 total (0 active), Execution time: mean = 229.790 us, total = 2.757 ms, Queueing time: mean = 796.500 ns, max = 1.330 us, min = 180.000 ns, total = 9.558 us [state-dump] NodeManagerService.grpc_server.RequestWorkerLease.HandleRequestImpl - 10 total (0 active), Execution time: mean = 133.400 us, total = 1.334 ms, Queueing time: mean = 133.654 us, max = 258.016 us, min = 50.206 us, total = 1.337 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker - 10 total (0 active), Execution time: mean = 698.937 us, total = 6.989 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.RequestWorkerLease - 10 total (0 active), Execution time: mean = 967.372 us, total = 9.674 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] WorkerPool.PopWorkerCallback - 10 total (0 active), Execution time: mean = 24.488 us, total = 244.878 us, Queueing time: mean = 114.997 us, max = 206.869 us, min = 23.653 us, total = 1.150 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker.HandleRequestImpl - 10 total (0 active), Execution time: mean = 129.906 us, total = 1.299 ms, Queueing time: mean = 109.463 us, max = 137.712 us, min = 31.332 us, total = 1.095 ms [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 6 total (1 active), Execution time: mean = 399.600 s, total = 2397.602 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 5 total (0 active), Execution time: mean = 329.314 us, total = 1.647 ms, Queueing time: mean = 112.016 us, max = 183.286 us, min = 20.299 us, total = 560.078 us [state-dump] NodeManager.GCTaskFailureReason - 3 total (1 active), Execution time: mean = 6.632 us, total = 19.896 us, Queueing time: mean = 60.387 us, max = 97.290 us, min = 83.871 us, total = 181.161 us [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 1.320 ms, total = 2.640 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 142.119 us, total = 284.237 us, Queueing time: mean = 702.200 us, max = 1.287 ms, min = 117.240 us, total = 1.404 ms [state-dump] RaySyncerRegister - 2 total (0 active), Execution time: mean = 2.053 us, total = 4.106 us, Queueing time: mean = 222.000 ns, max = 369.000 ns, min = 75.000 ns, total = 444.000 ns [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 2.178 ms, total = 2.178 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 1.022 s, total = 1.022 s, Queueing time: mean = 126.779 us, max = 126.779 us, min = 126.779 us, total = 126.779 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 43.810 us, total = 43.810 us, Queueing time: mean = 116.080 us, max = 116.080 us, min = 116.080 us, total = 116.080 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 1.935 ms, total = 1.935 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 2.002 ms, total = 2.002 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 1.278 ms, total = 1.278 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 288.995 us, total = 288.995 us, Queueing time: mean = 122.868 us, max = 122.868 us, min = 122.868 us, total = 122.868 us [state-dump] Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 74.750 us, total = 74.750 us, Queueing time: mean = 225.429 us, max = 225.429 us, min = 225.429 us, total = 225.429 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 242.297 us, total = 242.297 us, Queueing time: mean = 122.977 us, max = 122.977 us, min = 122.977 us, total = 122.977 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 92.080 us, total = 92.080 us, Queueing time: mean = 36.492 us, max = 36.492 us, min = 36.492 us, total = 36.492 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 1.746 ms, total = 1.746 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] DebugString() time ms: 1 [state-dump] [state-dump] [2025-01-20 23:45:54,915 I 13636 13664] (raylet) store.cc:564: Plasma store debug dump: Current usage: 0 / 2.14748 GB - num bytes created total: 168 0 pending objects of total size 0MB - objects spillable: 0 - bytes spillable: 0 - objects unsealed: 0 - bytes unsealed: 0 - objects in use: 0 - bytes in use: 0 - objects evictable: 0 - bytes evictable: 0 - objects created by worker: 0 - bytes created by worker: 0 - objects restored: 0 - bytes restored: 0 - objects received: 0 - bytes received: 0 - objects errored: 0 - bytes errored: 0 [2025-01-20 23:45:56,047 I 13636 13636] (raylet) node_manager.cc:525: [state-dump] NodeManager: [state-dump] Node ID: 94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3 [state-dump] Node name: 192.168.0.2 [state-dump] InitialConfigResources: {object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, GPU: 20000, node:__internal_head__: 10000, accelerator_type:A40: 10000, memory: 844966629380000} [state-dump] ClusterTaskManager: [state-dump] ========== Node: 94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3 ================= [state-dump] Infeasible queue length: 0 [state-dump] Schedule queue length: 0 [state-dump] Dispatch queue length: 0 [state-dump] num_waiting_for_resource: 0 [state-dump] num_waiting_for_plasma_memory: 0 [state-dump] num_waiting_for_remote_node_resources: 0 [state-dump] num_worker_not_started_by_job_config_not_exist: 0 [state-dump] num_worker_not_started_by_registration_timeout: 0 [state-dump] num_tasks_waiting_for_workers: 0 [state-dump] num_cancelled_tasks: 0 [state-dump] cluster_resource_scheduler state: [state-dump] Local id: -2729954820020547849 Local resources: {"total":{CPU: [200000], memory: [844966629380000], accelerator_type:A40: [10000], node:192.168.0.2: [10000], node:__internal_head__: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000]}}, "available": {CPU: [200000], memory: [844966629380000], accelerator_type:A40: [10000], node:192.168.0.2: [10000], node:__internal_head__: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000]}}, "labels":{"ray.io/node_id":"94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3",} is_draining: 0 is_idle: 1 Cluster resources: node id: -2729954820020547849{"total":{object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, GPU: 20000, node:__internal_head__: 10000, accelerator_type:A40: 10000, memory: 844966629380000}}, "available": {GPU: 20000, node:__internal_head__: 10000, memory: 844966629380000, accelerator_type:A40: 10000, object_store_memory: 21474836480000, node:192.168.0.2: 10000, CPU: 200000}}, "labels":{"ray.io/node_id":"94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} [state-dump] Waiting tasks size: 0 [state-dump] Number of executing tasks: 0 [state-dump] Number of pinned task arguments: 0 [state-dump] Number of total spilled tasks: 0 [state-dump] Number of spilled waiting tasks: 0 [state-dump] Number of spilled unschedulable tasks: 0 [state-dump] Resource usage { [state-dump] } [state-dump] Backlog Size per scheduling descriptor :{workerId: num backlogs}: [state-dump] [state-dump] Running tasks by scheduling class: [state-dump] ================================================== [state-dump] [state-dump] ClusterResources: [state-dump] LocalObjectManager: [state-dump] - num pinned objects: 0 [state-dump] - pinned objects size: 0 [state-dump] - num objects pending restore: 0 [state-dump] - num objects pending spill: 0 [state-dump] - num bytes pending spill: 0 [state-dump] - num bytes currently spilled: 0 [state-dump] - cumulative spill requests: 0 [state-dump] - cumulative restore requests: 0 [state-dump] - spilled objects pending delete: 0 [state-dump] [state-dump] ObjectManager: [state-dump] - num local objects: 0 [state-dump] - num unfulfilled push requests: 0 [state-dump] - num object pull requests: 0 [state-dump] - num chunks received total: 0 [state-dump] - num chunks received failed (all): 0 [state-dump] - num chunks received failed / cancelled: 0 [state-dump] - num chunks received failed / plasma error: 0 [state-dump] Event stats: [state-dump] Global stats: 0 total (0 active) [state-dump] Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Execution time: mean = -nan s, total = 0.000 s [state-dump] Event stats: [state-dump] PushManager: [state-dump] - num pushes in flight: 0 [state-dump] - num chunks in flight: 0 [state-dump] - num chunks remaining: 0 [state-dump] - max chunks allowed: 409 [state-dump] OwnershipBasedObjectDirectory: [state-dump] - num listeners: 0 [state-dump] - cumulative location updates: 0 [state-dump] - num location updates per second: 0.000 [state-dump] - num location lookups per second: 0.000 [state-dump] - num locations added per second: 0.000 [state-dump] - num locations removed per second: 0.000 [state-dump] BufferPool: [state-dump] - create buffer state map size: 0 [state-dump] PullManager: [state-dump] - num bytes available for pulled objects: 2147483648 [state-dump] - num bytes being pulled (all): 0 [state-dump] - num bytes being pulled / pinned: 0 [state-dump] - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - first get request bundle: N/A [state-dump] - first wait request bundle: N/A [state-dump] - first task request bundle: N/A [state-dump] - num objects queued: 0 [state-dump] - num objects actively pulled (all): 0 [state-dump] - num objects actively pulled / pinned: 0 [state-dump] - num bundles being pulled: 0 [state-dump] - num pull retries: 0 [state-dump] - max timeout seconds: 0 [state-dump] - max timeout request is already processed. No entry. [state-dump] [state-dump] WorkerPool: [state-dump] - registered jobs: 1 [state-dump] - process_failed_job_config_missing: 0 [state-dump] - process_failed_rate_limited: 0 [state-dump] - process_failed_pending_registration: 0 [state-dump] - process_failed_runtime_env_setup_failed: 0 [state-dump] - num PYTHON workers: 20 [state-dump] - num PYTHON drivers: 1 [state-dump] - num PYTHON pending start requests: 0 [state-dump] - num PYTHON pending registration requests: 0 [state-dump] - num object spill callbacks queued: 0 [state-dump] - num object restore queued: 0 [state-dump] - num util functions queued: 0 [state-dump] - num idle workers: 20 [state-dump] TaskDependencyManager: [state-dump] - task deps map size: 0 [state-dump] - get req map size: 0 [state-dump] - wait req map size: 0 [state-dump] - local objects map size: 0 [state-dump] WaitManager: [state-dump] - num active wait requests: 0 [state-dump] Subscriber: [state-dump] Channel WORKER_OBJECT_LOCATIONS_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_OBJECT_EVICTION [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_REF_REMOVED_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] num async plasma notifications: 0 [state-dump] Remote node managers: [state-dump] Event stats: [state-dump] Global stats: 230592 total (35 active) [state-dump] Queueing time: mean = 348.890 us, max = 59.826 s, min = -0.001 s, total = 80.451 s [state-dump] Execution time: mean = 10.577 ms, total = 2438.994 s [state-dump] Event stats: [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog - 55419 total (0 active), Execution time: mean = 523.697 us, total = 29.023 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 55419 total (0 active), Execution time: mean = 34.789 us, total = 1.928 s, Queueing time: mean = 108.535 us, max = 2.841 ms, min = 1.846 us, total = 6.015 s [state-dump] ObjectManager.UpdateAvailableMemory - 26376 total (0 active), Execution time: mean = 5.956 us, total = 157.105 ms, Queueing time: mean = 107.541 us, max = 1.662 ms, min = 2.228 us, total = 2.837 s [state-dump] NodeManager.CheckGC - 26376 total (1 active), Execution time: mean = 2.820 us, total = 74.367 ms, Queueing time: mean = 96.444 us, max = 25.875 ms, min = 2.848 us, total = 2.544 s [state-dump] RaySyncer.OnDemandBroadcasting - 26376 total (1 active), Execution time: mean = 10.350 us, total = 272.984 ms, Queueing time: mean = 89.802 us, max = 25.869 ms, min = 6.166 us, total = 2.369 s [state-dump] RayletWorkerPool.deadline_timer.kill_idle_workers - 13195 total (1 active), Execution time: mean = 17.905 us, total = 236.255 ms, Queueing time: mean = 75.719 us, max = 26.386 ms, min = -0.001 s, total = 999.111 ms [state-dump] MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 10539 total (1 active), Execution time: mean = 450.911 us, total = 4.752 s, Queueing time: mean = 73.558 us, max = 3.532 ms, min = -0.000 s, total = 775.232 ms [state-dump] NodeManager.deadline_timer.spill_objects_when_over_threshold - 2640 total (1 active), Execution time: mean = 3.082 us, total = 8.136 ms, Queueing time: mean = 180.657 us, max = 2.946 ms, min = 3.958 us, total = 476.934 ms [state-dump] NodeManager.deadline_timer.flush_free_objects - 2640 total (1 active), Execution time: mean = 8.863 us, total = 23.399 ms, Queueing time: mean = 176.695 us, max = 2.947 ms, min = 6.450 us, total = 466.474 ms [state-dump] NodeManager.ScheduleAndDispatchTasks - 2640 total (1 active), Execution time: mean = 16.198 us, total = 42.763 ms, Queueing time: mean = 73.188 us, max = 2.581 ms, min = 10.666 us, total = 193.217 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 2639 total (0 active), Execution time: mean = 96.440 us, total = 254.504 ms, Queueing time: mean = 109.395 us, max = 2.934 ms, min = 4.027 us, total = 288.693 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad - 2639 total (0 active), Execution time: mean = 599.795 us, total = 1.583 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ClusterResourceManager.ResetRemoteNodeView - 881 total (1 active), Execution time: mean = 10.334 us, total = 9.104 ms, Queueing time: mean = 75.076 us, max = 442.307 us, min = 14.912 us, total = 66.142 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 528 total (0 active), Execution time: mean = 1.547 ms, total = 816.686 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.deadline_timer.record_metrics - 528 total (1 active), Execution time: mean = 576.529 us, total = 304.407 ms, Queueing time: mean = 340.074 us, max = 2.010 ms, min = 10.616 us, total = 179.559 ms [state-dump] NodeManager.GcsCheckAlive - 528 total (1 active), Execution time: mean = 299.616 us, total = 158.197 ms, Queueing time: mean = 616.292 us, max = 2.567 ms, min = 6.690 us, total = 325.402 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 528 total (0 active), Execution time: mean = 53.715 us, total = 28.362 ms, Queueing time: mean = 116.216 us, max = 4.779 ms, min = 11.561 us, total = 61.362 ms [state-dump] NodeManager.deadline_timer.debug_state_dump - 264 total (1 active), Execution time: mean = 1.762 ms, total = 465.126 ms, Queueing time: mean = 67.069 us, max = 163.431 us, min = 13.687 us, total = 17.706 ms [state-dump] ClientConnection.async_read.ProcessMessageHeader - 95 total (21 active), Execution time: mean = 7.620 us, total = 723.861 us, Queueing time: mean = 660.815 ms, max = 59.826 s, min = 29.537 us, total = 62.777 s [state-dump] ClientConnection.async_read.ProcessMessage - 74 total (0 active), Execution time: mean = 752.373 us, total = 55.676 ms, Queueing time: mean = 18.898 us, max = 247.615 us, min = 4.453 us, total = 1.398 ms [state-dump] NodeManager.deadline_timer.print_event_loop_stats - 44 total (1 active, 1 running), Execution time: mean = 2.649 ms, total = 116.577 ms, Queueing time: mean = 72.868 us, max = 215.454 us, min = 15.835 us, total = 3.206 ms [state-dump] ClientConnection.async_write.DoAsyncWrites - 22 total (0 active), Execution time: mean = 2.129 us, total = 46.834 us, Queueing time: mean = 43.475 us, max = 264.443 us, min = 17.727 us, total = 956.459 us [state-dump] ObjectManager.ObjectDeleted - 21 total (0 active), Execution time: mean = 18.654 us, total = 391.728 us, Queueing time: mean = 66.932 us, max = 269.040 us, min = 24.280 us, total = 1.406 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 21 total (0 active), Execution time: mean = 119.783 us, total = 2.515 ms, Queueing time: mean = 119.709 us, max = 193.975 us, min = 26.529 us, total = 2.514 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig - 21 total (0 active), Execution time: mean = 800.451 us, total = 16.809 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ObjectManager.ObjectAdded - 21 total (0 active), Execution time: mean = 13.522 us, total = 283.961 us, Queueing time: mean = 115.200 us, max = 203.067 us, min = 15.853 us, total = 2.419 ms [state-dump] PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 204.482 us, total = 2.658 ms, Queueing time: mean = 3.109 ms, max = 9.915 ms, min = 28.359 us, total = 40.413 ms [state-dump] - 12 total (0 active), Execution time: mean = 1.109 us, total = 13.308 us, Queueing time: mean = 117.235 us, max = 207.332 us, min = 32.817 us, total = 1.407 ms [state-dump] RaySyncer.BroadcastMessage - 12 total (0 active), Execution time: mean = 229.790 us, total = 2.757 ms, Queueing time: mean = 796.500 ns, max = 1.330 us, min = 180.000 ns, total = 9.558 us [state-dump] NodeManagerService.grpc_server.RequestWorkerLease.HandleRequestImpl - 10 total (0 active), Execution time: mean = 133.400 us, total = 1.334 ms, Queueing time: mean = 133.654 us, max = 258.016 us, min = 50.206 us, total = 1.337 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker - 10 total (0 active), Execution time: mean = 698.937 us, total = 6.989 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.RequestWorkerLease - 10 total (0 active), Execution time: mean = 967.372 us, total = 9.674 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] WorkerPool.PopWorkerCallback - 10 total (0 active), Execution time: mean = 24.488 us, total = 244.878 us, Queueing time: mean = 114.997 us, max = 206.869 us, min = 23.653 us, total = 1.150 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker.HandleRequestImpl - 10 total (0 active), Execution time: mean = 129.906 us, total = 1.299 ms, Queueing time: mean = 109.463 us, max = 137.712 us, min = 31.332 us, total = 1.095 ms [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 6 total (1 active), Execution time: mean = 399.600 s, total = 2397.602 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 5 total (0 active), Execution time: mean = 329.314 us, total = 1.647 ms, Queueing time: mean = 112.016 us, max = 183.286 us, min = 20.299 us, total = 560.078 us [state-dump] NodeManager.GCTaskFailureReason - 3 total (1 active), Execution time: mean = 6.632 us, total = 19.896 us, Queueing time: mean = 60.387 us, max = 97.290 us, min = 83.871 us, total = 181.161 us [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 1.320 ms, total = 2.640 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 142.119 us, total = 284.237 us, Queueing time: mean = 702.200 us, max = 1.287 ms, min = 117.240 us, total = 1.404 ms [state-dump] RaySyncerRegister - 2 total (0 active), Execution time: mean = 2.053 us, total = 4.106 us, Queueing time: mean = 222.000 ns, max = 369.000 ns, min = 75.000 ns, total = 444.000 ns [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 2.178 ms, total = 2.178 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 1.022 s, total = 1.022 s, Queueing time: mean = 126.779 us, max = 126.779 us, min = 126.779 us, total = 126.779 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 43.810 us, total = 43.810 us, Queueing time: mean = 116.080 us, max = 116.080 us, min = 116.080 us, total = 116.080 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 1.935 ms, total = 1.935 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 2.002 ms, total = 2.002 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 1.278 ms, total = 1.278 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 288.995 us, total = 288.995 us, Queueing time: mean = 122.868 us, max = 122.868 us, min = 122.868 us, total = 122.868 us [state-dump] Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 74.750 us, total = 74.750 us, Queueing time: mean = 225.429 us, max = 225.429 us, min = 225.429 us, total = 225.429 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 242.297 us, total = 242.297 us, Queueing time: mean = 122.977 us, max = 122.977 us, min = 122.977 us, total = 122.977 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 92.080 us, total = 92.080 us, Queueing time: mean = 36.492 us, max = 36.492 us, min = 36.492 us, total = 36.492 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 1.746 ms, total = 1.746 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] DebugString() time ms: 1 [state-dump] [state-dump] [2025-01-20 23:46:54,916 I 13636 13664] (raylet) store.cc:564: Plasma store debug dump: Current usage: 0 / 2.14748 GB - num bytes created total: 168 0 pending objects of total size 0MB - objects spillable: 0 - bytes spillable: 0 - objects unsealed: 0 - bytes unsealed: 0 - objects in use: 0 - bytes in use: 0 - objects evictable: 0 - bytes evictable: 0 - objects created by worker: 0 - bytes created by worker: 0 - objects restored: 0 - bytes restored: 0 - objects received: 0 - bytes received: 0 - objects errored: 0 - bytes errored: 0 [2025-01-20 23:46:56,050 I 13636 13636] (raylet) node_manager.cc:525: [state-dump] NodeManager: [state-dump] Node ID: 94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3 [state-dump] Node name: 192.168.0.2 [state-dump] InitialConfigResources: {object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, GPU: 20000, node:__internal_head__: 10000, accelerator_type:A40: 10000, memory: 844966629380000} [state-dump] ClusterTaskManager: [state-dump] ========== Node: 94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3 ================= [state-dump] Infeasible queue length: 0 [state-dump] Schedule queue length: 0 [state-dump] Dispatch queue length: 0 [state-dump] num_waiting_for_resource: 0 [state-dump] num_waiting_for_plasma_memory: 0 [state-dump] num_waiting_for_remote_node_resources: 0 [state-dump] num_worker_not_started_by_job_config_not_exist: 0 [state-dump] num_worker_not_started_by_registration_timeout: 0 [state-dump] num_tasks_waiting_for_workers: 0 [state-dump] num_cancelled_tasks: 0 [state-dump] cluster_resource_scheduler state: [state-dump] Local id: -2729954820020547849 Local resources: {"total":{CPU: [200000], memory: [844966629380000], accelerator_type:A40: [10000], node:192.168.0.2: [10000], node:__internal_head__: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000]}}, "available": {CPU: [200000], memory: [844966629380000], accelerator_type:A40: [10000], node:192.168.0.2: [10000], node:__internal_head__: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000]}}, "labels":{"ray.io/node_id":"94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3",} is_draining: 0 is_idle: 1 Cluster resources: node id: -2729954820020547849{"total":{object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, GPU: 20000, node:__internal_head__: 10000, accelerator_type:A40: 10000, memory: 844966629380000}}, "available": {GPU: 20000, node:__internal_head__: 10000, memory: 844966629380000, accelerator_type:A40: 10000, object_store_memory: 21474836480000, node:192.168.0.2: 10000, CPU: 200000}}, "labels":{"ray.io/node_id":"94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} [state-dump] Waiting tasks size: 0 [state-dump] Number of executing tasks: 0 [state-dump] Number of pinned task arguments: 0 [state-dump] Number of total spilled tasks: 0 [state-dump] Number of spilled waiting tasks: 0 [state-dump] Number of spilled unschedulable tasks: 0 [state-dump] Resource usage { [state-dump] } [state-dump] Backlog Size per scheduling descriptor :{workerId: num backlogs}: [state-dump] [state-dump] Running tasks by scheduling class: [state-dump] ================================================== [state-dump] [state-dump] ClusterResources: [state-dump] LocalObjectManager: [state-dump] - num pinned objects: 0 [state-dump] - pinned objects size: 0 [state-dump] - num objects pending restore: 0 [state-dump] - num objects pending spill: 0 [state-dump] - num bytes pending spill: 0 [state-dump] - num bytes currently spilled: 0 [state-dump] - cumulative spill requests: 0 [state-dump] - cumulative restore requests: 0 [state-dump] - spilled objects pending delete: 0 [state-dump] [state-dump] ObjectManager: [state-dump] - num local objects: 0 [state-dump] - num unfulfilled push requests: 0 [state-dump] - num object pull requests: 0 [state-dump] - num chunks received total: 0 [state-dump] - num chunks received failed (all): 0 [state-dump] - num chunks received failed / cancelled: 0 [state-dump] - num chunks received failed / plasma error: 0 [state-dump] Event stats: [state-dump] Global stats: 0 total (0 active) [state-dump] Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Execution time: mean = -nan s, total = 0.000 s [state-dump] Event stats: [state-dump] PushManager: [state-dump] - num pushes in flight: 0 [state-dump] - num chunks in flight: 0 [state-dump] - num chunks remaining: 0 [state-dump] - max chunks allowed: 409 [state-dump] OwnershipBasedObjectDirectory: [state-dump] - num listeners: 0 [state-dump] - cumulative location updates: 0 [state-dump] - num location updates per second: 0.000 [state-dump] - num location lookups per second: 0.000 [state-dump] - num locations added per second: 0.000 [state-dump] - num locations removed per second: 0.000 [state-dump] BufferPool: [state-dump] - create buffer state map size: 0 [state-dump] PullManager: [state-dump] - num bytes available for pulled objects: 2147483648 [state-dump] - num bytes being pulled (all): 0 [state-dump] - num bytes being pulled / pinned: 0 [state-dump] - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - first get request bundle: N/A [state-dump] - first wait request bundle: N/A [state-dump] - first task request bundle: N/A [state-dump] - num objects queued: 0 [state-dump] - num objects actively pulled (all): 0 [state-dump] - num objects actively pulled / pinned: 0 [state-dump] - num bundles being pulled: 0 [state-dump] - num pull retries: 0 [state-dump] - max timeout seconds: 0 [state-dump] - max timeout request is already processed. No entry. [state-dump] [state-dump] WorkerPool: [state-dump] - registered jobs: 1 [state-dump] - process_failed_job_config_missing: 0 [state-dump] - process_failed_rate_limited: 0 [state-dump] - process_failed_pending_registration: 0 [state-dump] - process_failed_runtime_env_setup_failed: 0 [state-dump] - num PYTHON workers: 20 [state-dump] - num PYTHON drivers: 1 [state-dump] - num PYTHON pending start requests: 0 [state-dump] - num PYTHON pending registration requests: 0 [state-dump] - num object spill callbacks queued: 0 [state-dump] - num object restore queued: 0 [state-dump] - num util functions queued: 0 [state-dump] - num idle workers: 20 [state-dump] TaskDependencyManager: [state-dump] - task deps map size: 0 [state-dump] - get req map size: 0 [state-dump] - wait req map size: 0 [state-dump] - local objects map size: 0 [state-dump] WaitManager: [state-dump] - num active wait requests: 0 [state-dump] Subscriber: [state-dump] Channel WORKER_OBJECT_LOCATIONS_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_OBJECT_EVICTION [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_REF_REMOVED_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] num async plasma notifications: 0 [state-dump] Remote node managers: [state-dump] Event stats: [state-dump] Global stats: 235825 total (35 active) [state-dump] Queueing time: mean = 342.900 us, max = 59.826 s, min = -0.001 s, total = 80.864 s [state-dump] Execution time: mean = 10.346 ms, total = 2439.931 s [state-dump] Event stats: [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog - 56679 total (0 active), Execution time: mean = 523.966 us, total = 29.698 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 56679 total (0 active), Execution time: mean = 34.801 us, total = 1.972 s, Queueing time: mean = 108.616 us, max = 2.841 ms, min = 1.846 us, total = 6.156 s [state-dump] ObjectManager.UpdateAvailableMemory - 26975 total (0 active), Execution time: mean = 5.963 us, total = 160.843 ms, Queueing time: mean = 107.678 us, max = 1.662 ms, min = 2.228 us, total = 2.905 s [state-dump] NodeManager.CheckGC - 26975 total (1 active), Execution time: mean = 2.822 us, total = 76.118 ms, Queueing time: mean = 96.456 us, max = 25.875 ms, min = 2.848 us, total = 2.602 s [state-dump] RaySyncer.OnDemandBroadcasting - 26975 total (1 active), Execution time: mean = 10.352 us, total = 279.257 ms, Queueing time: mean = 89.812 us, max = 25.869 ms, min = 6.166 us, total = 2.423 s [state-dump] RayletWorkerPool.deadline_timer.kill_idle_workers - 13495 total (1 active), Execution time: mean = 17.937 us, total = 242.063 ms, Queueing time: mean = 75.769 us, max = 26.386 ms, min = -0.001 s, total = 1.023 s [state-dump] MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 10779 total (1 active), Execution time: mean = 451.121 us, total = 4.863 s, Queueing time: mean = 73.642 us, max = 3.532 ms, min = -0.000 s, total = 793.784 ms [state-dump] NodeManager.deadline_timer.spill_objects_when_over_threshold - 2700 total (1 active), Execution time: mean = 3.080 us, total = 8.316 ms, Queueing time: mean = 180.772 us, max = 2.946 ms, min = 3.958 us, total = 488.085 ms [state-dump] NodeManager.deadline_timer.flush_free_objects - 2700 total (1 active), Execution time: mean = 8.887 us, total = 23.994 ms, Queueing time: mean = 176.792 us, max = 2.947 ms, min = 6.450 us, total = 477.339 ms [state-dump] NodeManager.ScheduleAndDispatchTasks - 2700 total (1 active), Execution time: mean = 16.214 us, total = 43.778 ms, Queueing time: mean = 73.114 us, max = 2.581 ms, min = 10.666 us, total = 197.408 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 2699 total (0 active), Execution time: mean = 96.387 us, total = 260.150 ms, Queueing time: mean = 109.548 us, max = 2.934 ms, min = 4.027 us, total = 295.669 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad - 2699 total (0 active), Execution time: mean = 600.238 us, total = 1.620 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ClusterResourceManager.ResetRemoteNodeView - 901 total (1 active), Execution time: mean = 10.330 us, total = 9.307 ms, Queueing time: mean = 74.902 us, max = 442.307 us, min = 14.043 us, total = 67.487 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 540 total (0 active), Execution time: mean = 1.549 ms, total = 836.530 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.deadline_timer.record_metrics - 540 total (1 active), Execution time: mean = 576.011 us, total = 311.046 ms, Queueing time: mean = 341.569 us, max = 2.010 ms, min = 10.616 us, total = 184.447 ms [state-dump] NodeManager.GcsCheckAlive - 540 total (1 active), Execution time: mean = 299.113 us, total = 161.521 ms, Queueing time: mean = 617.539 us, max = 2.567 ms, min = 6.690 us, total = 333.471 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 540 total (0 active), Execution time: mean = 53.694 us, total = 28.995 ms, Queueing time: mean = 116.313 us, max = 4.779 ms, min = 11.561 us, total = 62.809 ms [state-dump] NodeManager.deadline_timer.debug_state_dump - 270 total (1 active), Execution time: mean = 1.763 ms, total = 476.074 ms, Queueing time: mean = 67.340 us, max = 163.431 us, min = 13.687 us, total = 18.182 ms [state-dump] ClientConnection.async_read.ProcessMessageHeader - 95 total (21 active), Execution time: mean = 7.620 us, total = 723.861 us, Queueing time: mean = 660.815 ms, max = 59.826 s, min = 29.537 us, total = 62.777 s [state-dump] ClientConnection.async_read.ProcessMessage - 74 total (0 active), Execution time: mean = 752.373 us, total = 55.676 ms, Queueing time: mean = 18.898 us, max = 247.615 us, min = 4.453 us, total = 1.398 ms [state-dump] NodeManager.deadline_timer.print_event_loop_stats - 45 total (1 active, 1 running), Execution time: mean = 2.657 ms, total = 119.585 ms, Queueing time: mean = 72.714 us, max = 215.454 us, min = 15.835 us, total = 3.272 ms [state-dump] ClientConnection.async_write.DoAsyncWrites - 22 total (0 active), Execution time: mean = 2.129 us, total = 46.834 us, Queueing time: mean = 43.475 us, max = 264.443 us, min = 17.727 us, total = 956.459 us [state-dump] ObjectManager.ObjectDeleted - 21 total (0 active), Execution time: mean = 18.654 us, total = 391.728 us, Queueing time: mean = 66.932 us, max = 269.040 us, min = 24.280 us, total = 1.406 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 21 total (0 active), Execution time: mean = 119.783 us, total = 2.515 ms, Queueing time: mean = 119.709 us, max = 193.975 us, min = 26.529 us, total = 2.514 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig - 21 total (0 active), Execution time: mean = 800.451 us, total = 16.809 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ObjectManager.ObjectAdded - 21 total (0 active), Execution time: mean = 13.522 us, total = 283.961 us, Queueing time: mean = 115.200 us, max = 203.067 us, min = 15.853 us, total = 2.419 ms [state-dump] PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 204.482 us, total = 2.658 ms, Queueing time: mean = 3.109 ms, max = 9.915 ms, min = 28.359 us, total = 40.413 ms [state-dump] - 12 total (0 active), Execution time: mean = 1.109 us, total = 13.308 us, Queueing time: mean = 117.235 us, max = 207.332 us, min = 32.817 us, total = 1.407 ms [state-dump] RaySyncer.BroadcastMessage - 12 total (0 active), Execution time: mean = 229.790 us, total = 2.757 ms, Queueing time: mean = 796.500 ns, max = 1.330 us, min = 180.000 ns, total = 9.558 us [state-dump] NodeManagerService.grpc_server.RequestWorkerLease.HandleRequestImpl - 10 total (0 active), Execution time: mean = 133.400 us, total = 1.334 ms, Queueing time: mean = 133.654 us, max = 258.016 us, min = 50.206 us, total = 1.337 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker - 10 total (0 active), Execution time: mean = 698.937 us, total = 6.989 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.RequestWorkerLease - 10 total (0 active), Execution time: mean = 967.372 us, total = 9.674 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] WorkerPool.PopWorkerCallback - 10 total (0 active), Execution time: mean = 24.488 us, total = 244.878 us, Queueing time: mean = 114.997 us, max = 206.869 us, min = 23.653 us, total = 1.150 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker.HandleRequestImpl - 10 total (0 active), Execution time: mean = 129.906 us, total = 1.299 ms, Queueing time: mean = 109.463 us, max = 137.712 us, min = 31.332 us, total = 1.095 ms [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 6 total (1 active), Execution time: mean = 399.600 s, total = 2397.602 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 5 total (0 active), Execution time: mean = 329.314 us, total = 1.647 ms, Queueing time: mean = 112.016 us, max = 183.286 us, min = 20.299 us, total = 560.078 us [state-dump] NodeManager.GCTaskFailureReason - 4 total (1 active), Execution time: mean = 7.355 us, total = 29.419 us, Queueing time: mean = 66.631 us, max = 97.290 us, min = 83.871 us, total = 266.523 us [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 1.320 ms, total = 2.640 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 142.119 us, total = 284.237 us, Queueing time: mean = 702.200 us, max = 1.287 ms, min = 117.240 us, total = 1.404 ms [state-dump] RaySyncerRegister - 2 total (0 active), Execution time: mean = 2.053 us, total = 4.106 us, Queueing time: mean = 222.000 ns, max = 369.000 ns, min = 75.000 ns, total = 444.000 ns [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 2.178 ms, total = 2.178 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 1.022 s, total = 1.022 s, Queueing time: mean = 126.779 us, max = 126.779 us, min = 126.779 us, total = 126.779 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 43.810 us, total = 43.810 us, Queueing time: mean = 116.080 us, max = 116.080 us, min = 116.080 us, total = 116.080 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 1.935 ms, total = 1.935 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 2.002 ms, total = 2.002 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 1.278 ms, total = 1.278 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 288.995 us, total = 288.995 us, Queueing time: mean = 122.868 us, max = 122.868 us, min = 122.868 us, total = 122.868 us [state-dump] Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 74.750 us, total = 74.750 us, Queueing time: mean = 225.429 us, max = 225.429 us, min = 225.429 us, total = 225.429 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 242.297 us, total = 242.297 us, Queueing time: mean = 122.977 us, max = 122.977 us, min = 122.977 us, total = 122.977 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 92.080 us, total = 92.080 us, Queueing time: mean = 36.492 us, max = 36.492 us, min = 36.492 us, total = 36.492 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 1.746 ms, total = 1.746 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] DebugString() time ms: 2 [state-dump] [state-dump] [2025-01-20 23:47:54,916 I 13636 13664] (raylet) store.cc:564: Plasma store debug dump: Current usage: 0 / 2.14748 GB - num bytes created total: 168 0 pending objects of total size 0MB - objects spillable: 0 - bytes spillable: 0 - objects unsealed: 0 - bytes unsealed: 0 - objects in use: 0 - bytes in use: 0 - objects evictable: 0 - bytes evictable: 0 - objects created by worker: 0 - bytes created by worker: 0 - objects restored: 0 - bytes restored: 0 - objects received: 0 - bytes received: 0 - objects errored: 0 - bytes errored: 0 [2025-01-20 23:47:56,053 I 13636 13636] (raylet) node_manager.cc:525: [state-dump] NodeManager: [state-dump] Node ID: 94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3 [state-dump] Node name: 192.168.0.2 [state-dump] InitialConfigResources: {object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, GPU: 20000, node:__internal_head__: 10000, accelerator_type:A40: 10000, memory: 844966629380000} [state-dump] ClusterTaskManager: [state-dump] ========== Node: 94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3 ================= [state-dump] Infeasible queue length: 0 [state-dump] Schedule queue length: 0 [state-dump] Dispatch queue length: 0 [state-dump] num_waiting_for_resource: 0 [state-dump] num_waiting_for_plasma_memory: 0 [state-dump] num_waiting_for_remote_node_resources: 0 [state-dump] num_worker_not_started_by_job_config_not_exist: 0 [state-dump] num_worker_not_started_by_registration_timeout: 0 [state-dump] num_tasks_waiting_for_workers: 0 [state-dump] num_cancelled_tasks: 0 [state-dump] cluster_resource_scheduler state: [state-dump] Local id: -2729954820020547849 Local resources: {"total":{CPU: [200000], memory: [844966629380000], accelerator_type:A40: [10000], node:192.168.0.2: [10000], node:__internal_head__: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000]}}, "available": {CPU: [200000], memory: [844966629380000], accelerator_type:A40: [10000], node:192.168.0.2: [10000], node:__internal_head__: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000]}}, "labels":{"ray.io/node_id":"94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3",} is_draining: 0 is_idle: 1 Cluster resources: node id: -2729954820020547849{"total":{object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, GPU: 20000, node:__internal_head__: 10000, accelerator_type:A40: 10000, memory: 844966629380000}}, "available": {GPU: 20000, node:__internal_head__: 10000, memory: 844966629380000, accelerator_type:A40: 10000, object_store_memory: 21474836480000, node:192.168.0.2: 10000, CPU: 200000}}, "labels":{"ray.io/node_id":"94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} [state-dump] Waiting tasks size: 0 [state-dump] Number of executing tasks: 0 [state-dump] Number of pinned task arguments: 0 [state-dump] Number of total spilled tasks: 0 [state-dump] Number of spilled waiting tasks: 0 [state-dump] Number of spilled unschedulable tasks: 0 [state-dump] Resource usage { [state-dump] } [state-dump] Backlog Size per scheduling descriptor :{workerId: num backlogs}: [state-dump] [state-dump] Running tasks by scheduling class: [state-dump] ================================================== [state-dump] [state-dump] ClusterResources: [state-dump] LocalObjectManager: [state-dump] - num pinned objects: 0 [state-dump] - pinned objects size: 0 [state-dump] - num objects pending restore: 0 [state-dump] - num objects pending spill: 0 [state-dump] - num bytes pending spill: 0 [state-dump] - num bytes currently spilled: 0 [state-dump] - cumulative spill requests: 0 [state-dump] - cumulative restore requests: 0 [state-dump] - spilled objects pending delete: 0 [state-dump] [state-dump] ObjectManager: [state-dump] - num local objects: 0 [state-dump] - num unfulfilled push requests: 0 [state-dump] - num object pull requests: 0 [state-dump] - num chunks received total: 0 [state-dump] - num chunks received failed (all): 0 [state-dump] - num chunks received failed / cancelled: 0 [state-dump] - num chunks received failed / plasma error: 0 [state-dump] Event stats: [state-dump] Global stats: 0 total (0 active) [state-dump] Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Execution time: mean = -nan s, total = 0.000 s [state-dump] Event stats: [state-dump] PushManager: [state-dump] - num pushes in flight: 0 [state-dump] - num chunks in flight: 0 [state-dump] - num chunks remaining: 0 [state-dump] - max chunks allowed: 409 [state-dump] OwnershipBasedObjectDirectory: [state-dump] - num listeners: 0 [state-dump] - cumulative location updates: 0 [state-dump] - num location updates per second: 0.000 [state-dump] - num location lookups per second: 0.000 [state-dump] - num locations added per second: 0.000 [state-dump] - num locations removed per second: 0.000 [state-dump] BufferPool: [state-dump] - create buffer state map size: 0 [state-dump] PullManager: [state-dump] - num bytes available for pulled objects: 2147483648 [state-dump] - num bytes being pulled (all): 0 [state-dump] - num bytes being pulled / pinned: 0 [state-dump] - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - first get request bundle: N/A [state-dump] - first wait request bundle: N/A [state-dump] - first task request bundle: N/A [state-dump] - num objects queued: 0 [state-dump] - num objects actively pulled (all): 0 [state-dump] - num objects actively pulled / pinned: 0 [state-dump] - num bundles being pulled: 0 [state-dump] - num pull retries: 0 [state-dump] - max timeout seconds: 0 [state-dump] - max timeout request is already processed. No entry. [state-dump] [state-dump] WorkerPool: [state-dump] - registered jobs: 1 [state-dump] - process_failed_job_config_missing: 0 [state-dump] - process_failed_rate_limited: 0 [state-dump] - process_failed_pending_registration: 0 [state-dump] - process_failed_runtime_env_setup_failed: 0 [state-dump] - num PYTHON workers: 20 [state-dump] - num PYTHON drivers: 1 [state-dump] - num PYTHON pending start requests: 0 [state-dump] - num PYTHON pending registration requests: 0 [state-dump] - num object spill callbacks queued: 0 [state-dump] - num object restore queued: 0 [state-dump] - num util functions queued: 0 [state-dump] - num idle workers: 20 [state-dump] TaskDependencyManager: [state-dump] - task deps map size: 0 [state-dump] - get req map size: 0 [state-dump] - wait req map size: 0 [state-dump] - local objects map size: 0 [state-dump] WaitManager: [state-dump] - num active wait requests: 0 [state-dump] Subscriber: [state-dump] Channel WORKER_OBJECT_LOCATIONS_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_OBJECT_EVICTION [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_REF_REMOVED_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] num async plasma notifications: 0 [state-dump] Remote node managers: [state-dump] Event stats: [state-dump] Global stats: 241059 total (35 active) [state-dump] Queueing time: mean = 337.163 us, max = 59.826 s, min = -0.001 s, total = 81.276 s [state-dump] Execution time: mean = 10.126 ms, total = 2440.874 s [state-dump] Event stats: [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog - 57939 total (0 active), Execution time: mean = 524.290 us, total = 30.377 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 57939 total (0 active), Execution time: mean = 34.853 us, total = 2.019 s, Queueing time: mean = 108.708 us, max = 2.841 ms, min = 1.846 us, total = 6.298 s [state-dump] ObjectManager.UpdateAvailableMemory - 27575 total (0 active), Execution time: mean = 5.968 us, total = 164.564 ms, Queueing time: mean = 107.784 us, max = 1.662 ms, min = 2.228 us, total = 2.972 s [state-dump] NodeManager.CheckGC - 27575 total (1 active), Execution time: mean = 2.823 us, total = 77.842 ms, Queueing time: mean = 96.529 us, max = 25.875 ms, min = 2.848 us, total = 2.662 s [state-dump] RaySyncer.OnDemandBroadcasting - 27575 total (1 active), Execution time: mean = 10.362 us, total = 285.720 ms, Queueing time: mean = 89.878 us, max = 25.869 ms, min = 6.166 us, total = 2.478 s [state-dump] RayletWorkerPool.deadline_timer.kill_idle_workers - 13795 total (1 active), Execution time: mean = 17.951 us, total = 247.630 ms, Queueing time: mean = 75.770 us, max = 26.386 ms, min = -0.001 s, total = 1.045 s [state-dump] MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 11018 total (1 active), Execution time: mean = 451.187 us, total = 4.971 s, Queueing time: mean = 73.693 us, max = 3.532 ms, min = -0.000 s, total = 811.949 ms [state-dump] NodeManager.deadline_timer.spill_objects_when_over_threshold - 2760 total (1 active), Execution time: mean = 3.082 us, total = 8.505 ms, Queueing time: mean = 180.672 us, max = 2.946 ms, min = 3.958 us, total = 498.655 ms [state-dump] NodeManager.deadline_timer.flush_free_objects - 2760 total (1 active), Execution time: mean = 8.884 us, total = 24.520 ms, Queueing time: mean = 176.691 us, max = 2.947 ms, min = 6.450 us, total = 487.666 ms [state-dump] NodeManager.ScheduleAndDispatchTasks - 2760 total (1 active), Execution time: mean = 16.219 us, total = 44.764 ms, Queueing time: mean = 73.093 us, max = 2.581 ms, min = 10.666 us, total = 201.737 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 2759 total (0 active), Execution time: mean = 96.394 us, total = 265.950 ms, Queueing time: mean = 109.672 us, max = 2.934 ms, min = 4.027 us, total = 302.586 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad - 2759 total (0 active), Execution time: mean = 600.692 us, total = 1.657 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ClusterResourceManager.ResetRemoteNodeView - 921 total (1 active), Execution time: mean = 10.360 us, total = 9.542 ms, Queueing time: mean = 74.966 us, max = 442.307 us, min = 14.043 us, total = 69.044 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 552 total (0 active), Execution time: mean = 1.553 ms, total = 857.263 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.deadline_timer.record_metrics - 552 total (1 active), Execution time: mean = 576.293 us, total = 318.114 ms, Queueing time: mean = 340.642 us, max = 2.010 ms, min = 10.616 us, total = 188.034 ms [state-dump] NodeManager.GcsCheckAlive - 552 total (1 active), Execution time: mean = 300.465 us, total = 165.856 ms, Queueing time: mean = 615.881 us, max = 2.567 ms, min = 6.690 us, total = 339.966 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 552 total (0 active), Execution time: mean = 53.787 us, total = 29.690 ms, Queueing time: mean = 116.091 us, max = 4.779 ms, min = 11.561 us, total = 64.082 ms [state-dump] NodeManager.deadline_timer.debug_state_dump - 276 total (1 active), Execution time: mean = 1.763 ms, total = 486.515 ms, Queueing time: mean = 67.303 us, max = 163.431 us, min = 13.687 us, total = 18.576 ms [state-dump] ClientConnection.async_read.ProcessMessageHeader - 95 total (21 active), Execution time: mean = 7.620 us, total = 723.861 us, Queueing time: mean = 660.815 ms, max = 59.826 s, min = 29.537 us, total = 62.777 s [state-dump] ClientConnection.async_read.ProcessMessage - 74 total (0 active), Execution time: mean = 752.373 us, total = 55.676 ms, Queueing time: mean = 18.898 us, max = 247.615 us, min = 4.453 us, total = 1.398 ms [state-dump] NodeManager.deadline_timer.print_event_loop_stats - 46 total (1 active, 1 running), Execution time: mean = 2.669 ms, total = 122.755 ms, Queueing time: mean = 72.603 us, max = 215.454 us, min = 15.835 us, total = 3.340 ms [state-dump] ClientConnection.async_write.DoAsyncWrites - 22 total (0 active), Execution time: mean = 2.129 us, total = 46.834 us, Queueing time: mean = 43.475 us, max = 264.443 us, min = 17.727 us, total = 956.459 us [state-dump] ObjectManager.ObjectDeleted - 21 total (0 active), Execution time: mean = 18.654 us, total = 391.728 us, Queueing time: mean = 66.932 us, max = 269.040 us, min = 24.280 us, total = 1.406 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 21 total (0 active), Execution time: mean = 119.783 us, total = 2.515 ms, Queueing time: mean = 119.709 us, max = 193.975 us, min = 26.529 us, total = 2.514 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig - 21 total (0 active), Execution time: mean = 800.451 us, total = 16.809 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ObjectManager.ObjectAdded - 21 total (0 active), Execution time: mean = 13.522 us, total = 283.961 us, Queueing time: mean = 115.200 us, max = 203.067 us, min = 15.853 us, total = 2.419 ms [state-dump] PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 204.482 us, total = 2.658 ms, Queueing time: mean = 3.109 ms, max = 9.915 ms, min = 28.359 us, total = 40.413 ms [state-dump] - 12 total (0 active), Execution time: mean = 1.109 us, total = 13.308 us, Queueing time: mean = 117.235 us, max = 207.332 us, min = 32.817 us, total = 1.407 ms [state-dump] RaySyncer.BroadcastMessage - 12 total (0 active), Execution time: mean = 229.790 us, total = 2.757 ms, Queueing time: mean = 796.500 ns, max = 1.330 us, min = 180.000 ns, total = 9.558 us [state-dump] NodeManagerService.grpc_server.RequestWorkerLease.HandleRequestImpl - 10 total (0 active), Execution time: mean = 133.400 us, total = 1.334 ms, Queueing time: mean = 133.654 us, max = 258.016 us, min = 50.206 us, total = 1.337 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker - 10 total (0 active), Execution time: mean = 698.937 us, total = 6.989 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.RequestWorkerLease - 10 total (0 active), Execution time: mean = 967.372 us, total = 9.674 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] WorkerPool.PopWorkerCallback - 10 total (0 active), Execution time: mean = 24.488 us, total = 244.878 us, Queueing time: mean = 114.997 us, max = 206.869 us, min = 23.653 us, total = 1.150 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker.HandleRequestImpl - 10 total (0 active), Execution time: mean = 129.906 us, total = 1.299 ms, Queueing time: mean = 109.463 us, max = 137.712 us, min = 31.332 us, total = 1.095 ms [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 6 total (1 active), Execution time: mean = 399.600 s, total = 2397.602 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 5 total (0 active), Execution time: mean = 329.314 us, total = 1.647 ms, Queueing time: mean = 112.016 us, max = 183.286 us, min = 20.299 us, total = 560.078 us [state-dump] NodeManager.GCTaskFailureReason - 4 total (1 active), Execution time: mean = 7.355 us, total = 29.419 us, Queueing time: mean = 66.631 us, max = 97.290 us, min = 83.871 us, total = 266.523 us [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 1.320 ms, total = 2.640 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 142.119 us, total = 284.237 us, Queueing time: mean = 702.200 us, max = 1.287 ms, min = 117.240 us, total = 1.404 ms [state-dump] RaySyncerRegister - 2 total (0 active), Execution time: mean = 2.053 us, total = 4.106 us, Queueing time: mean = 222.000 ns, max = 369.000 ns, min = 75.000 ns, total = 444.000 ns [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 2.178 ms, total = 2.178 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 1.022 s, total = 1.022 s, Queueing time: mean = 126.779 us, max = 126.779 us, min = 126.779 us, total = 126.779 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 43.810 us, total = 43.810 us, Queueing time: mean = 116.080 us, max = 116.080 us, min = 116.080 us, total = 116.080 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 1.935 ms, total = 1.935 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 2.002 ms, total = 2.002 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 1.278 ms, total = 1.278 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 288.995 us, total = 288.995 us, Queueing time: mean = 122.868 us, max = 122.868 us, min = 122.868 us, total = 122.868 us [state-dump] Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 74.750 us, total = 74.750 us, Queueing time: mean = 225.429 us, max = 225.429 us, min = 225.429 us, total = 225.429 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 242.297 us, total = 242.297 us, Queueing time: mean = 122.977 us, max = 122.977 us, min = 122.977 us, total = 122.977 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 92.080 us, total = 92.080 us, Queueing time: mean = 36.492 us, max = 36.492 us, min = 36.492 us, total = 36.492 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 1.746 ms, total = 1.746 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] DebugString() time ms: 2 [state-dump] [state-dump] [2025-01-20 23:48:54,916 I 13636 13664] (raylet) store.cc:564: Plasma store debug dump: Current usage: 0 / 2.14748 GB - num bytes created total: 168 0 pending objects of total size 0MB - objects spillable: 0 - bytes spillable: 0 - objects unsealed: 0 - bytes unsealed: 0 - objects in use: 0 - bytes in use: 0 - objects evictable: 0 - bytes evictable: 0 - objects created by worker: 0 - bytes created by worker: 0 - objects restored: 0 - bytes restored: 0 - objects received: 0 - bytes received: 0 - objects errored: 0 - bytes errored: 0 [2025-01-20 23:48:56,056 I 13636 13636] (raylet) node_manager.cc:525: [state-dump] NodeManager: [state-dump] Node ID: 94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3 [state-dump] Node name: 192.168.0.2 [state-dump] InitialConfigResources: {object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, GPU: 20000, node:__internal_head__: 10000, accelerator_type:A40: 10000, memory: 844966629380000} [state-dump] ClusterTaskManager: [state-dump] ========== Node: 94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3 ================= [state-dump] Infeasible queue length: 0 [state-dump] Schedule queue length: 0 [state-dump] Dispatch queue length: 0 [state-dump] num_waiting_for_resource: 0 [state-dump] num_waiting_for_plasma_memory: 0 [state-dump] num_waiting_for_remote_node_resources: 0 [state-dump] num_worker_not_started_by_job_config_not_exist: 0 [state-dump] num_worker_not_started_by_registration_timeout: 0 [state-dump] num_tasks_waiting_for_workers: 0 [state-dump] num_cancelled_tasks: 0 [state-dump] cluster_resource_scheduler state: [state-dump] Local id: -2729954820020547849 Local resources: {"total":{CPU: [200000], memory: [844966629380000], accelerator_type:A40: [10000], node:192.168.0.2: [10000], node:__internal_head__: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000]}}, "available": {CPU: [200000], memory: [844966629380000], accelerator_type:A40: [10000], node:192.168.0.2: [10000], node:__internal_head__: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000]}}, "labels":{"ray.io/node_id":"94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3",} is_draining: 0 is_idle: 1 Cluster resources: node id: -2729954820020547849{"total":{object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, GPU: 20000, node:__internal_head__: 10000, accelerator_type:A40: 10000, memory: 844966629380000}}, "available": {GPU: 20000, node:__internal_head__: 10000, memory: 844966629380000, accelerator_type:A40: 10000, object_store_memory: 21474836480000, node:192.168.0.2: 10000, CPU: 200000}}, "labels":{"ray.io/node_id":"94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} [state-dump] Waiting tasks size: 0 [state-dump] Number of executing tasks: 0 [state-dump] Number of pinned task arguments: 0 [state-dump] Number of total spilled tasks: 0 [state-dump] Number of spilled waiting tasks: 0 [state-dump] Number of spilled unschedulable tasks: 0 [state-dump] Resource usage { [state-dump] } [state-dump] Backlog Size per scheduling descriptor :{workerId: num backlogs}: [state-dump] [state-dump] Running tasks by scheduling class: [state-dump] ================================================== [state-dump] [state-dump] ClusterResources: [state-dump] LocalObjectManager: [state-dump] - num pinned objects: 0 [state-dump] - pinned objects size: 0 [state-dump] - num objects pending restore: 0 [state-dump] - num objects pending spill: 0 [state-dump] - num bytes pending spill: 0 [state-dump] - num bytes currently spilled: 0 [state-dump] - cumulative spill requests: 0 [state-dump] - cumulative restore requests: 0 [state-dump] - spilled objects pending delete: 0 [state-dump] [state-dump] ObjectManager: [state-dump] - num local objects: 0 [state-dump] - num unfulfilled push requests: 0 [state-dump] - num object pull requests: 0 [state-dump] - num chunks received total: 0 [state-dump] - num chunks received failed (all): 0 [state-dump] - num chunks received failed / cancelled: 0 [state-dump] - num chunks received failed / plasma error: 0 [state-dump] Event stats: [state-dump] Global stats: 0 total (0 active) [state-dump] Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Execution time: mean = -nan s, total = 0.000 s [state-dump] Event stats: [state-dump] PushManager: [state-dump] - num pushes in flight: 0 [state-dump] - num chunks in flight: 0 [state-dump] - num chunks remaining: 0 [state-dump] - max chunks allowed: 409 [state-dump] OwnershipBasedObjectDirectory: [state-dump] - num listeners: 0 [state-dump] - cumulative location updates: 0 [state-dump] - num location updates per second: 0.000 [state-dump] - num location lookups per second: 0.000 [state-dump] - num locations added per second: 0.000 [state-dump] - num locations removed per second: 0.000 [state-dump] BufferPool: [state-dump] - create buffer state map size: 0 [state-dump] PullManager: [state-dump] - num bytes available for pulled objects: 2147483648 [state-dump] - num bytes being pulled (all): 0 [state-dump] - num bytes being pulled / pinned: 0 [state-dump] - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - first get request bundle: N/A [state-dump] - first wait request bundle: N/A [state-dump] - first task request bundle: N/A [state-dump] - num objects queued: 0 [state-dump] - num objects actively pulled (all): 0 [state-dump] - num objects actively pulled / pinned: 0 [state-dump] - num bundles being pulled: 0 [state-dump] - num pull retries: 0 [state-dump] - max timeout seconds: 0 [state-dump] - max timeout request is already processed. No entry. [state-dump] [state-dump] WorkerPool: [state-dump] - registered jobs: 1 [state-dump] - process_failed_job_config_missing: 0 [state-dump] - process_failed_rate_limited: 0 [state-dump] - process_failed_pending_registration: 0 [state-dump] - process_failed_runtime_env_setup_failed: 0 [state-dump] - num PYTHON workers: 20 [state-dump] - num PYTHON drivers: 1 [state-dump] - num PYTHON pending start requests: 0 [state-dump] - num PYTHON pending registration requests: 0 [state-dump] - num object spill callbacks queued: 0 [state-dump] - num object restore queued: 0 [state-dump] - num util functions queued: 0 [state-dump] - num idle workers: 20 [state-dump] TaskDependencyManager: [state-dump] - task deps map size: 0 [state-dump] - get req map size: 0 [state-dump] - wait req map size: 0 [state-dump] - local objects map size: 0 [state-dump] WaitManager: [state-dump] - num active wait requests: 0 [state-dump] Subscriber: [state-dump] Channel WORKER_OBJECT_LOCATIONS_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_OBJECT_EVICTION [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_REF_REMOVED_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] num async plasma notifications: 0 [state-dump] Remote node managers: [state-dump] Event stats: [state-dump] Global stats: 246291 total (35 active) [state-dump] Queueing time: mean = 331.680 us, max = 59.826 s, min = -0.001 s, total = 81.690 s [state-dump] Execution time: mean = 9.914 ms, total = 2441.811 s [state-dump] Event stats: [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog - 59199 total (0 active), Execution time: mean = 524.593 us, total = 31.055 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 59199 total (0 active), Execution time: mean = 34.868 us, total = 2.064 s, Queueing time: mean = 108.823 us, max = 2.841 ms, min = 1.846 us, total = 6.442 s [state-dump] ObjectManager.UpdateAvailableMemory - 28174 total (0 active), Execution time: mean = 5.969 us, total = 168.160 ms, Queueing time: mean = 107.921 us, max = 1.662 ms, min = 2.228 us, total = 3.041 s [state-dump] NodeManager.CheckGC - 28174 total (1 active), Execution time: mean = 2.820 us, total = 79.457 ms, Queueing time: mean = 96.546 us, max = 25.875 ms, min = 2.848 us, total = 2.720 s [state-dump] RaySyncer.OnDemandBroadcasting - 28174 total (1 active), Execution time: mean = 10.348 us, total = 291.551 ms, Queueing time: mean = 89.906 us, max = 25.869 ms, min = 6.166 us, total = 2.533 s [state-dump] RayletWorkerPool.deadline_timer.kill_idle_workers - 14095 total (1 active), Execution time: mean = 17.921 us, total = 252.590 ms, Queueing time: mean = 75.755 us, max = 26.386 ms, min = -0.001 s, total = 1.068 s [state-dump] MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 11258 total (1 active), Execution time: mean = 451.033 us, total = 5.078 s, Queueing time: mean = 73.751 us, max = 3.532 ms, min = -0.000 s, total = 830.294 ms [state-dump] NodeManager.deadline_timer.spill_objects_when_over_threshold - 2820 total (1 active), Execution time: mean = 3.077 us, total = 8.678 ms, Queueing time: mean = 180.644 us, max = 2.946 ms, min = 3.958 us, total = 509.416 ms [state-dump] NodeManager.deadline_timer.flush_free_objects - 2820 total (1 active), Execution time: mean = 8.882 us, total = 25.048 ms, Queueing time: mean = 176.659 us, max = 2.947 ms, min = 6.450 us, total = 498.179 ms [state-dump] NodeManager.ScheduleAndDispatchTasks - 2820 total (1 active), Execution time: mean = 16.253 us, total = 45.834 ms, Queueing time: mean = 73.154 us, max = 2.581 ms, min = 10.666 us, total = 206.293 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 2819 total (0 active), Execution time: mean = 96.368 us, total = 271.660 ms, Queueing time: mean = 109.955 us, max = 2.934 ms, min = 4.027 us, total = 309.962 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad - 2819 total (0 active), Execution time: mean = 601.586 us, total = 1.696 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ClusterResourceManager.ResetRemoteNodeView - 941 total (1 active), Execution time: mean = 10.318 us, total = 9.709 ms, Queueing time: mean = 74.858 us, max = 442.307 us, min = 14.043 us, total = 70.442 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 564 total (0 active), Execution time: mean = 1.555 ms, total = 877.219 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.deadline_timer.record_metrics - 564 total (1 active), Execution time: mean = 576.076 us, total = 324.907 ms, Queueing time: mean = 340.775 us, max = 2.010 ms, min = 10.616 us, total = 192.197 ms [state-dump] NodeManager.GcsCheckAlive - 564 total (1 active), Execution time: mean = 300.712 us, total = 169.602 ms, Queueing time: mean = 615.446 us, max = 2.567 ms, min = 6.690 us, total = 347.112 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 564 total (0 active), Execution time: mean = 53.803 us, total = 30.345 ms, Queueing time: mean = 116.004 us, max = 4.779 ms, min = 11.561 us, total = 65.426 ms [state-dump] NodeManager.deadline_timer.debug_state_dump - 282 total (1 active), Execution time: mean = 1.762 ms, total = 496.845 ms, Queueing time: mean = 67.448 us, max = 163.431 us, min = 13.687 us, total = 19.020 ms [state-dump] ClientConnection.async_read.ProcessMessageHeader - 95 total (21 active), Execution time: mean = 7.620 us, total = 723.861 us, Queueing time: mean = 660.815 ms, max = 59.826 s, min = 29.537 us, total = 62.777 s [state-dump] ClientConnection.async_read.ProcessMessage - 74 total (0 active), Execution time: mean = 752.373 us, total = 55.676 ms, Queueing time: mean = 18.898 us, max = 247.615 us, min = 4.453 us, total = 1.398 ms [state-dump] NodeManager.deadline_timer.print_event_loop_stats - 47 total (1 active, 1 running), Execution time: mean = 2.677 ms, total = 125.835 ms, Queueing time: mean = 72.523 us, max = 215.454 us, min = 15.835 us, total = 3.409 ms [state-dump] ClientConnection.async_write.DoAsyncWrites - 22 total (0 active), Execution time: mean = 2.129 us, total = 46.834 us, Queueing time: mean = 43.475 us, max = 264.443 us, min = 17.727 us, total = 956.459 us [state-dump] ObjectManager.ObjectDeleted - 21 total (0 active), Execution time: mean = 18.654 us, total = 391.728 us, Queueing time: mean = 66.932 us, max = 269.040 us, min = 24.280 us, total = 1.406 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 21 total (0 active), Execution time: mean = 119.783 us, total = 2.515 ms, Queueing time: mean = 119.709 us, max = 193.975 us, min = 26.529 us, total = 2.514 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig - 21 total (0 active), Execution time: mean = 800.451 us, total = 16.809 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ObjectManager.ObjectAdded - 21 total (0 active), Execution time: mean = 13.522 us, total = 283.961 us, Queueing time: mean = 115.200 us, max = 203.067 us, min = 15.853 us, total = 2.419 ms [state-dump] PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 204.482 us, total = 2.658 ms, Queueing time: mean = 3.109 ms, max = 9.915 ms, min = 28.359 us, total = 40.413 ms [state-dump] - 12 total (0 active), Execution time: mean = 1.109 us, total = 13.308 us, Queueing time: mean = 117.235 us, max = 207.332 us, min = 32.817 us, total = 1.407 ms [state-dump] RaySyncer.BroadcastMessage - 12 total (0 active), Execution time: mean = 229.790 us, total = 2.757 ms, Queueing time: mean = 796.500 ns, max = 1.330 us, min = 180.000 ns, total = 9.558 us [state-dump] NodeManagerService.grpc_server.RequestWorkerLease.HandleRequestImpl - 10 total (0 active), Execution time: mean = 133.400 us, total = 1.334 ms, Queueing time: mean = 133.654 us, max = 258.016 us, min = 50.206 us, total = 1.337 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker - 10 total (0 active), Execution time: mean = 698.937 us, total = 6.989 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.RequestWorkerLease - 10 total (0 active), Execution time: mean = 967.372 us, total = 9.674 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] WorkerPool.PopWorkerCallback - 10 total (0 active), Execution time: mean = 24.488 us, total = 244.878 us, Queueing time: mean = 114.997 us, max = 206.869 us, min = 23.653 us, total = 1.150 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker.HandleRequestImpl - 10 total (0 active), Execution time: mean = 129.906 us, total = 1.299 ms, Queueing time: mean = 109.463 us, max = 137.712 us, min = 31.332 us, total = 1.095 ms [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 6 total (1 active), Execution time: mean = 399.600 s, total = 2397.602 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 5 total (0 active), Execution time: mean = 329.314 us, total = 1.647 ms, Queueing time: mean = 112.016 us, max = 183.286 us, min = 20.299 us, total = 560.078 us [state-dump] NodeManager.GCTaskFailureReason - 4 total (1 active), Execution time: mean = 7.355 us, total = 29.419 us, Queueing time: mean = 66.631 us, max = 97.290 us, min = 83.871 us, total = 266.523 us [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 1.320 ms, total = 2.640 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 142.119 us, total = 284.237 us, Queueing time: mean = 702.200 us, max = 1.287 ms, min = 117.240 us, total = 1.404 ms [state-dump] RaySyncerRegister - 2 total (0 active), Execution time: mean = 2.053 us, total = 4.106 us, Queueing time: mean = 222.000 ns, max = 369.000 ns, min = 75.000 ns, total = 444.000 ns [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 2.178 ms, total = 2.178 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 1.022 s, total = 1.022 s, Queueing time: mean = 126.779 us, max = 126.779 us, min = 126.779 us, total = 126.779 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 43.810 us, total = 43.810 us, Queueing time: mean = 116.080 us, max = 116.080 us, min = 116.080 us, total = 116.080 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 1.935 ms, total = 1.935 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 2.002 ms, total = 2.002 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 1.278 ms, total = 1.278 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 288.995 us, total = 288.995 us, Queueing time: mean = 122.868 us, max = 122.868 us, min = 122.868 us, total = 122.868 us [state-dump] Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 74.750 us, total = 74.750 us, Queueing time: mean = 225.429 us, max = 225.429 us, min = 225.429 us, total = 225.429 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 242.297 us, total = 242.297 us, Queueing time: mean = 122.977 us, max = 122.977 us, min = 122.977 us, total = 122.977 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 92.080 us, total = 92.080 us, Queueing time: mean = 36.492 us, max = 36.492 us, min = 36.492 us, total = 36.492 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 1.746 ms, total = 1.746 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] DebugString() time ms: 2 [state-dump] [state-dump] [2025-01-20 23:49:54,916 I 13636 13664] (raylet) store.cc:564: Plasma store debug dump: Current usage: 0 / 2.14748 GB - num bytes created total: 168 0 pending objects of total size 0MB - objects spillable: 0 - bytes spillable: 0 - objects unsealed: 0 - bytes unsealed: 0 - objects in use: 0 - bytes in use: 0 - objects evictable: 0 - bytes evictable: 0 - objects created by worker: 0 - bytes created by worker: 0 - objects restored: 0 - bytes restored: 0 - objects received: 0 - bytes received: 0 - objects errored: 0 - bytes errored: 0 [2025-01-20 23:49:56,059 I 13636 13636] (raylet) node_manager.cc:525: [state-dump] NodeManager: [state-dump] Node ID: 94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3 [state-dump] Node name: 192.168.0.2 [state-dump] InitialConfigResources: {object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, GPU: 20000, node:__internal_head__: 10000, accelerator_type:A40: 10000, memory: 844966629380000} [state-dump] ClusterTaskManager: [state-dump] ========== Node: 94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3 ================= [state-dump] Infeasible queue length: 0 [state-dump] Schedule queue length: 0 [state-dump] Dispatch queue length: 0 [state-dump] num_waiting_for_resource: 0 [state-dump] num_waiting_for_plasma_memory: 0 [state-dump] num_waiting_for_remote_node_resources: 0 [state-dump] num_worker_not_started_by_job_config_not_exist: 0 [state-dump] num_worker_not_started_by_registration_timeout: 0 [state-dump] num_tasks_waiting_for_workers: 0 [state-dump] num_cancelled_tasks: 0 [state-dump] cluster_resource_scheduler state: [state-dump] Local id: -2729954820020547849 Local resources: {"total":{CPU: [200000], memory: [844966629380000], accelerator_type:A40: [10000], node:192.168.0.2: [10000], node:__internal_head__: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000]}}, "available": {CPU: [200000], memory: [844966629380000], accelerator_type:A40: [10000], node:192.168.0.2: [10000], node:__internal_head__: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000]}}, "labels":{"ray.io/node_id":"94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3",} is_draining: 0 is_idle: 1 Cluster resources: node id: -2729954820020547849{"total":{object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, GPU: 20000, node:__internal_head__: 10000, accelerator_type:A40: 10000, memory: 844966629380000}}, "available": {GPU: 20000, node:__internal_head__: 10000, memory: 844966629380000, accelerator_type:A40: 10000, object_store_memory: 21474836480000, node:192.168.0.2: 10000, CPU: 200000}}, "labels":{"ray.io/node_id":"94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} [state-dump] Waiting tasks size: 0 [state-dump] Number of executing tasks: 0 [state-dump] Number of pinned task arguments: 0 [state-dump] Number of total spilled tasks: 0 [state-dump] Number of spilled waiting tasks: 0 [state-dump] Number of spilled unschedulable tasks: 0 [state-dump] Resource usage { [state-dump] } [state-dump] Backlog Size per scheduling descriptor :{workerId: num backlogs}: [state-dump] [state-dump] Running tasks by scheduling class: [state-dump] ================================================== [state-dump] [state-dump] ClusterResources: [state-dump] LocalObjectManager: [state-dump] - num pinned objects: 0 [state-dump] - pinned objects size: 0 [state-dump] - num objects pending restore: 0 [state-dump] - num objects pending spill: 0 [state-dump] - num bytes pending spill: 0 [state-dump] - num bytes currently spilled: 0 [state-dump] - cumulative spill requests: 0 [state-dump] - cumulative restore requests: 0 [state-dump] - spilled objects pending delete: 0 [state-dump] [state-dump] ObjectManager: [state-dump] - num local objects: 0 [state-dump] - num unfulfilled push requests: 0 [state-dump] - num object pull requests: 0 [state-dump] - num chunks received total: 0 [state-dump] - num chunks received failed (all): 0 [state-dump] - num chunks received failed / cancelled: 0 [state-dump] - num chunks received failed / plasma error: 0 [state-dump] Event stats: [state-dump] Global stats: 0 total (0 active) [state-dump] Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Execution time: mean = -nan s, total = 0.000 s [state-dump] Event stats: [state-dump] PushManager: [state-dump] - num pushes in flight: 0 [state-dump] - num chunks in flight: 0 [state-dump] - num chunks remaining: 0 [state-dump] - max chunks allowed: 409 [state-dump] OwnershipBasedObjectDirectory: [state-dump] - num listeners: 0 [state-dump] - cumulative location updates: 0 [state-dump] - num location updates per second: 0.000 [state-dump] - num location lookups per second: 0.000 [state-dump] - num locations added per second: 0.000 [state-dump] - num locations removed per second: 0.000 [state-dump] BufferPool: [state-dump] - create buffer state map size: 0 [state-dump] PullManager: [state-dump] - num bytes available for pulled objects: 2147483648 [state-dump] - num bytes being pulled (all): 0 [state-dump] - num bytes being pulled / pinned: 0 [state-dump] - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - first get request bundle: N/A [state-dump] - first wait request bundle: N/A [state-dump] - first task request bundle: N/A [state-dump] - num objects queued: 0 [state-dump] - num objects actively pulled (all): 0 [state-dump] - num objects actively pulled / pinned: 0 [state-dump] - num bundles being pulled: 0 [state-dump] - num pull retries: 0 [state-dump] - max timeout seconds: 0 [state-dump] - max timeout request is already processed. No entry. [state-dump] [state-dump] WorkerPool: [state-dump] - registered jobs: 1 [state-dump] - process_failed_job_config_missing: 0 [state-dump] - process_failed_rate_limited: 0 [state-dump] - process_failed_pending_registration: 0 [state-dump] - process_failed_runtime_env_setup_failed: 0 [state-dump] - num PYTHON workers: 20 [state-dump] - num PYTHON drivers: 1 [state-dump] - num PYTHON pending start requests: 0 [state-dump] - num PYTHON pending registration requests: 0 [state-dump] - num object spill callbacks queued: 0 [state-dump] - num object restore queued: 0 [state-dump] - num util functions queued: 0 [state-dump] - num idle workers: 20 [state-dump] TaskDependencyManager: [state-dump] - task deps map size: 0 [state-dump] - get req map size: 0 [state-dump] - wait req map size: 0 [state-dump] - local objects map size: 0 [state-dump] WaitManager: [state-dump] - num active wait requests: 0 [state-dump] Subscriber: [state-dump] Channel WORKER_OBJECT_LOCATIONS_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_OBJECT_EVICTION [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_REF_REMOVED_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] num async plasma notifications: 0 [state-dump] Remote node managers: [state-dump] Event stats: [state-dump] Global stats: 251522 total (35 active) [state-dump] Queueing time: mean = 326.266 us, max = 59.826 s, min = -0.001 s, total = 82.063 s [state-dump] Execution time: mean = 9.712 ms, total = 2442.673 s [state-dump] Event stats: [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog - 60459 total (0 active), Execution time: mean = 523.811 us, total = 31.669 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 60459 total (0 active), Execution time: mean = 34.820 us, total = 2.105 s, Queueing time: mean = 108.567 us, max = 2.841 ms, min = 1.846 us, total = 6.564 s [state-dump] ObjectManager.UpdateAvailableMemory - 28773 total (0 active), Execution time: mean = 5.961 us, total = 171.505 ms, Queueing time: mean = 107.656 us, max = 1.662 ms, min = 2.228 us, total = 3.098 s [state-dump] NodeManager.CheckGC - 28773 total (1 active), Execution time: mean = 2.820 us, total = 81.154 ms, Queueing time: mean = 96.480 us, max = 25.875 ms, min = 2.848 us, total = 2.776 s [state-dump] RaySyncer.OnDemandBroadcasting - 28773 total (1 active), Execution time: mean = 10.336 us, total = 297.387 ms, Queueing time: mean = 89.851 us, max = 25.869 ms, min = 6.166 us, total = 2.585 s [state-dump] RayletWorkerPool.deadline_timer.kill_idle_workers - 14395 total (1 active), Execution time: mean = 17.916 us, total = 257.907 ms, Queueing time: mean = 75.735 us, max = 26.386 ms, min = -0.001 s, total = 1.090 s [state-dump] MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 11497 total (1 active), Execution time: mean = 450.871 us, total = 5.184 s, Queueing time: mean = 73.629 us, max = 3.532 ms, min = -0.000 s, total = 846.509 ms [state-dump] NodeManager.deadline_timer.spill_objects_when_over_threshold - 2880 total (1 active), Execution time: mean = 3.074 us, total = 8.852 ms, Queueing time: mean = 180.761 us, max = 2.946 ms, min = 3.958 us, total = 520.592 ms [state-dump] NodeManager.deadline_timer.flush_free_objects - 2880 total (1 active), Execution time: mean = 8.876 us, total = 25.563 ms, Queueing time: mean = 176.775 us, max = 2.947 ms, min = 6.450 us, total = 509.112 ms [state-dump] NodeManager.ScheduleAndDispatchTasks - 2880 total (1 active), Execution time: mean = 16.250 us, total = 46.800 ms, Queueing time: mean = 73.087 us, max = 2.581 ms, min = 10.666 us, total = 210.491 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 2879 total (0 active), Execution time: mean = 96.263 us, total = 277.142 ms, Queueing time: mean = 109.759 us, max = 2.934 ms, min = 4.027 us, total = 315.996 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad - 2879 total (0 active), Execution time: mean = 600.683 us, total = 1.729 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ClusterResourceManager.ResetRemoteNodeView - 961 total (1 active), Execution time: mean = 10.315 us, total = 9.912 ms, Queueing time: mean = 74.884 us, max = 442.307 us, min = 14.043 us, total = 71.963 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 576 total (0 active), Execution time: mean = 1.556 ms, total = 896.406 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.deadline_timer.record_metrics - 576 total (1 active), Execution time: mean = 575.891 us, total = 331.713 ms, Queueing time: mean = 341.650 us, max = 2.010 ms, min = 10.616 us, total = 196.790 ms [state-dump] NodeManager.GcsCheckAlive - 576 total (1 active), Execution time: mean = 300.886 us, total = 173.310 ms, Queueing time: mean = 615.914 us, max = 2.567 ms, min = 6.690 us, total = 354.767 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 576 total (0 active), Execution time: mean = 53.738 us, total = 30.953 ms, Queueing time: mean = 115.477 us, max = 4.779 ms, min = 11.561 us, total = 66.515 ms [state-dump] NodeManager.deadline_timer.debug_state_dump - 288 total (1 active), Execution time: mean = 1.764 ms, total = 507.923 ms, Queueing time: mean = 67.297 us, max = 163.431 us, min = 13.687 us, total = 19.381 ms [state-dump] ClientConnection.async_read.ProcessMessageHeader - 95 total (21 active), Execution time: mean = 7.620 us, total = 723.861 us, Queueing time: mean = 660.815 ms, max = 59.826 s, min = 29.537 us, total = 62.777 s [state-dump] ClientConnection.async_read.ProcessMessage - 74 total (0 active), Execution time: mean = 752.373 us, total = 55.676 ms, Queueing time: mean = 18.898 us, max = 247.615 us, min = 4.453 us, total = 1.398 ms [state-dump] NodeManager.deadline_timer.print_event_loop_stats - 48 total (1 active, 1 running), Execution time: mean = 2.678 ms, total = 128.547 ms, Queueing time: mean = 73.563 us, max = 215.454 us, min = 15.835 us, total = 3.531 ms [state-dump] ClientConnection.async_write.DoAsyncWrites - 22 total (0 active), Execution time: mean = 2.129 us, total = 46.834 us, Queueing time: mean = 43.475 us, max = 264.443 us, min = 17.727 us, total = 956.459 us [state-dump] ObjectManager.ObjectDeleted - 21 total (0 active), Execution time: mean = 18.654 us, total = 391.728 us, Queueing time: mean = 66.932 us, max = 269.040 us, min = 24.280 us, total = 1.406 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 21 total (0 active), Execution time: mean = 119.783 us, total = 2.515 ms, Queueing time: mean = 119.709 us, max = 193.975 us, min = 26.529 us, total = 2.514 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig - 21 total (0 active), Execution time: mean = 800.451 us, total = 16.809 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ObjectManager.ObjectAdded - 21 total (0 active), Execution time: mean = 13.522 us, total = 283.961 us, Queueing time: mean = 115.200 us, max = 203.067 us, min = 15.853 us, total = 2.419 ms [state-dump] PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 204.482 us, total = 2.658 ms, Queueing time: mean = 3.109 ms, max = 9.915 ms, min = 28.359 us, total = 40.413 ms [state-dump] - 12 total (0 active), Execution time: mean = 1.109 us, total = 13.308 us, Queueing time: mean = 117.235 us, max = 207.332 us, min = 32.817 us, total = 1.407 ms [state-dump] RaySyncer.BroadcastMessage - 12 total (0 active), Execution time: mean = 229.790 us, total = 2.757 ms, Queueing time: mean = 796.500 ns, max = 1.330 us, min = 180.000 ns, total = 9.558 us [state-dump] NodeManagerService.grpc_server.RequestWorkerLease.HandleRequestImpl - 10 total (0 active), Execution time: mean = 133.400 us, total = 1.334 ms, Queueing time: mean = 133.654 us, max = 258.016 us, min = 50.206 us, total = 1.337 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker - 10 total (0 active), Execution time: mean = 698.937 us, total = 6.989 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.RequestWorkerLease - 10 total (0 active), Execution time: mean = 967.372 us, total = 9.674 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] WorkerPool.PopWorkerCallback - 10 total (0 active), Execution time: mean = 24.488 us, total = 244.878 us, Queueing time: mean = 114.997 us, max = 206.869 us, min = 23.653 us, total = 1.150 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker.HandleRequestImpl - 10 total (0 active), Execution time: mean = 129.906 us, total = 1.299 ms, Queueing time: mean = 109.463 us, max = 137.712 us, min = 31.332 us, total = 1.095 ms [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 6 total (1 active), Execution time: mean = 399.600 s, total = 2397.602 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 5 total (0 active), Execution time: mean = 329.314 us, total = 1.647 ms, Queueing time: mean = 112.016 us, max = 183.286 us, min = 20.299 us, total = 560.078 us [state-dump] NodeManager.GCTaskFailureReason - 4 total (1 active), Execution time: mean = 7.355 us, total = 29.419 us, Queueing time: mean = 66.631 us, max = 97.290 us, min = 83.871 us, total = 266.523 us [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 1.320 ms, total = 2.640 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 142.119 us, total = 284.237 us, Queueing time: mean = 702.200 us, max = 1.287 ms, min = 117.240 us, total = 1.404 ms [state-dump] RaySyncerRegister - 2 total (0 active), Execution time: mean = 2.053 us, total = 4.106 us, Queueing time: mean = 222.000 ns, max = 369.000 ns, min = 75.000 ns, total = 444.000 ns [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 2.178 ms, total = 2.178 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 1.022 s, total = 1.022 s, Queueing time: mean = 126.779 us, max = 126.779 us, min = 126.779 us, total = 126.779 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 43.810 us, total = 43.810 us, Queueing time: mean = 116.080 us, max = 116.080 us, min = 116.080 us, total = 116.080 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 1.935 ms, total = 1.935 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 2.002 ms, total = 2.002 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 1.278 ms, total = 1.278 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 288.995 us, total = 288.995 us, Queueing time: mean = 122.868 us, max = 122.868 us, min = 122.868 us, total = 122.868 us [state-dump] Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 74.750 us, total = 74.750 us, Queueing time: mean = 225.429 us, max = 225.429 us, min = 225.429 us, total = 225.429 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 242.297 us, total = 242.297 us, Queueing time: mean = 122.977 us, max = 122.977 us, min = 122.977 us, total = 122.977 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 92.080 us, total = 92.080 us, Queueing time: mean = 36.492 us, max = 36.492 us, min = 36.492 us, total = 36.492 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 1.746 ms, total = 1.746 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] DebugString() time ms: 2 [state-dump] [state-dump] [2025-01-20 23:50:54,917 I 13636 13664] (raylet) store.cc:564: Plasma store debug dump: Current usage: 0 / 2.14748 GB - num bytes created total: 168 0 pending objects of total size 0MB - objects spillable: 0 - bytes spillable: 0 - objects unsealed: 0 - bytes unsealed: 0 - objects in use: 0 - bytes in use: 0 - objects evictable: 0 - bytes evictable: 0 - objects created by worker: 0 - bytes created by worker: 0 - objects restored: 0 - bytes restored: 0 - objects received: 0 - bytes received: 0 - objects errored: 0 - bytes errored: 0 [2025-01-20 23:50:56,062 I 13636 13636] (raylet) node_manager.cc:525: [state-dump] NodeManager: [state-dump] Node ID: 94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3 [state-dump] Node name: 192.168.0.2 [state-dump] InitialConfigResources: {object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, GPU: 20000, node:__internal_head__: 10000, accelerator_type:A40: 10000, memory: 844966629380000} [state-dump] ClusterTaskManager: [state-dump] ========== Node: 94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3 ================= [state-dump] Infeasible queue length: 0 [state-dump] Schedule queue length: 0 [state-dump] Dispatch queue length: 0 [state-dump] num_waiting_for_resource: 0 [state-dump] num_waiting_for_plasma_memory: 0 [state-dump] num_waiting_for_remote_node_resources: 0 [state-dump] num_worker_not_started_by_job_config_not_exist: 0 [state-dump] num_worker_not_started_by_registration_timeout: 0 [state-dump] num_tasks_waiting_for_workers: 0 [state-dump] num_cancelled_tasks: 0 [state-dump] cluster_resource_scheduler state: [state-dump] Local id: -2729954820020547849 Local resources: {"total":{CPU: [200000], memory: [844966629380000], accelerator_type:A40: [10000], node:192.168.0.2: [10000], node:__internal_head__: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000]}}, "available": {CPU: [200000], memory: [844966629380000], accelerator_type:A40: [10000], node:192.168.0.2: [10000], node:__internal_head__: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000]}}, "labels":{"ray.io/node_id":"94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3",} is_draining: 0 is_idle: 1 Cluster resources: node id: -2729954820020547849{"total":{object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, GPU: 20000, node:__internal_head__: 10000, accelerator_type:A40: 10000, memory: 844966629380000}}, "available": {GPU: 20000, node:__internal_head__: 10000, memory: 844966629380000, accelerator_type:A40: 10000, object_store_memory: 21474836480000, node:192.168.0.2: 10000, CPU: 200000}}, "labels":{"ray.io/node_id":"94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} [state-dump] Waiting tasks size: 0 [state-dump] Number of executing tasks: 0 [state-dump] Number of pinned task arguments: 0 [state-dump] Number of total spilled tasks: 0 [state-dump] Number of spilled waiting tasks: 0 [state-dump] Number of spilled unschedulable tasks: 0 [state-dump] Resource usage { [state-dump] } [state-dump] Backlog Size per scheduling descriptor :{workerId: num backlogs}: [state-dump] [state-dump] Running tasks by scheduling class: [state-dump] ================================================== [state-dump] [state-dump] ClusterResources: [state-dump] LocalObjectManager: [state-dump] - num pinned objects: 0 [state-dump] - pinned objects size: 0 [state-dump] - num objects pending restore: 0 [state-dump] - num objects pending spill: 0 [state-dump] - num bytes pending spill: 0 [state-dump] - num bytes currently spilled: 0 [state-dump] - cumulative spill requests: 0 [state-dump] - cumulative restore requests: 0 [state-dump] - spilled objects pending delete: 0 [state-dump] [state-dump] ObjectManager: [state-dump] - num local objects: 0 [state-dump] - num unfulfilled push requests: 0 [state-dump] - num object pull requests: 0 [state-dump] - num chunks received total: 0 [state-dump] - num chunks received failed (all): 0 [state-dump] - num chunks received failed / cancelled: 0 [state-dump] - num chunks received failed / plasma error: 0 [state-dump] Event stats: [state-dump] Global stats: 0 total (0 active) [state-dump] Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Execution time: mean = -nan s, total = 0.000 s [state-dump] Event stats: [state-dump] PushManager: [state-dump] - num pushes in flight: 0 [state-dump] - num chunks in flight: 0 [state-dump] - num chunks remaining: 0 [state-dump] - max chunks allowed: 409 [state-dump] OwnershipBasedObjectDirectory: [state-dump] - num listeners: 0 [state-dump] - cumulative location updates: 0 [state-dump] - num location updates per second: 0.000 [state-dump] - num location lookups per second: 0.000 [state-dump] - num locations added per second: 0.000 [state-dump] - num locations removed per second: 0.000 [state-dump] BufferPool: [state-dump] - create buffer state map size: 0 [state-dump] PullManager: [state-dump] - num bytes available for pulled objects: 2147483648 [state-dump] - num bytes being pulled (all): 0 [state-dump] - num bytes being pulled / pinned: 0 [state-dump] - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - first get request bundle: N/A [state-dump] - first wait request bundle: N/A [state-dump] - first task request bundle: N/A [state-dump] - num objects queued: 0 [state-dump] - num objects actively pulled (all): 0 [state-dump] - num objects actively pulled / pinned: 0 [state-dump] - num bundles being pulled: 0 [state-dump] - num pull retries: 0 [state-dump] - max timeout seconds: 0 [state-dump] - max timeout request is already processed. No entry. [state-dump] [state-dump] WorkerPool: [state-dump] - registered jobs: 1 [state-dump] - process_failed_job_config_missing: 0 [state-dump] - process_failed_rate_limited: 0 [state-dump] - process_failed_pending_registration: 0 [state-dump] - process_failed_runtime_env_setup_failed: 0 [state-dump] - num PYTHON workers: 20 [state-dump] - num PYTHON drivers: 1 [state-dump] - num PYTHON pending start requests: 0 [state-dump] - num PYTHON pending registration requests: 0 [state-dump] - num object spill callbacks queued: 0 [state-dump] - num object restore queued: 0 [state-dump] - num util functions queued: 0 [state-dump] - num idle workers: 20 [state-dump] TaskDependencyManager: [state-dump] - task deps map size: 0 [state-dump] - get req map size: 0 [state-dump] - wait req map size: 0 [state-dump] - local objects map size: 0 [state-dump] WaitManager: [state-dump] - num active wait requests: 0 [state-dump] Subscriber: [state-dump] Channel WORKER_OBJECT_LOCATIONS_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_OBJECT_EVICTION [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_REF_REMOVED_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] num async plasma notifications: 0 [state-dump] Remote node managers: [state-dump] Event stats: [state-dump] Global stats: 256757 total (35 active) [state-dump] Queueing time: mean = 321.007 us, max = 59.826 s, min = -0.001 s, total = 82.421 s [state-dump] Execution time: mean = 9.517 ms, total = 2443.531 s [state-dump] Event stats: [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog - 61719 total (0 active), Execution time: mean = 522.992 us, total = 32.279 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 61719 total (0 active), Execution time: mean = 34.768 us, total = 2.146 s, Queueing time: mean = 108.306 us, max = 2.841 ms, min = 1.846 us, total = 6.685 s [state-dump] ObjectManager.UpdateAvailableMemory - 29373 total (0 active), Execution time: mean = 5.951 us, total = 174.786 ms, Queueing time: mean = 107.179 us, max = 1.662 ms, min = 2.228 us, total = 3.148 s [state-dump] NodeManager.CheckGC - 29373 total (1 active), Execution time: mean = 2.822 us, total = 82.896 ms, Queueing time: mean = 96.345 us, max = 25.875 ms, min = 2.848 us, total = 2.830 s [state-dump] RaySyncer.OnDemandBroadcasting - 29373 total (1 active), Execution time: mean = 10.346 us, total = 303.892 ms, Queueing time: mean = 89.708 us, max = 25.869 ms, min = 6.166 us, total = 2.635 s [state-dump] RayletWorkerPool.deadline_timer.kill_idle_workers - 14695 total (1 active), Execution time: mean = 17.934 us, total = 263.539 ms, Queueing time: mean = 75.677 us, max = 26.386 ms, min = -0.001 s, total = 1.112 s [state-dump] MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 11737 total (1 active), Execution time: mean = 450.857 us, total = 5.292 s, Queueing time: mean = 73.647 us, max = 3.532 ms, min = -0.000 s, total = 864.391 ms [state-dump] NodeManager.deadline_timer.spill_objects_when_over_threshold - 2940 total (1 active), Execution time: mean = 3.072 us, total = 9.033 ms, Queueing time: mean = 180.547 us, max = 2.946 ms, min = 3.958 us, total = 530.808 ms [state-dump] NodeManager.deadline_timer.flush_free_objects - 2940 total (1 active), Execution time: mean = 8.873 us, total = 26.088 ms, Queueing time: mean = 176.564 us, max = 2.947 ms, min = 3.811 us, total = 519.098 ms [state-dump] NodeManager.ScheduleAndDispatchTasks - 2940 total (1 active), Execution time: mean = 16.299 us, total = 47.919 ms, Queueing time: mean = 72.951 us, max = 2.581 ms, min = 10.666 us, total = 214.476 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 2939 total (0 active), Execution time: mean = 96.133 us, total = 282.535 ms, Queueing time: mean = 109.525 us, max = 2.934 ms, min = 4.027 us, total = 321.895 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad - 2939 total (0 active), Execution time: mean = 599.896 us, total = 1.763 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ClusterResourceManager.ResetRemoteNodeView - 981 total (1 active), Execution time: mean = 10.307 us, total = 10.111 ms, Queueing time: mean = 74.785 us, max = 442.307 us, min = 14.043 us, total = 73.364 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 588 total (0 active), Execution time: mean = 1.556 ms, total = 914.637 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.deadline_timer.record_metrics - 588 total (1 active), Execution time: mean = 575.974 us, total = 338.673 ms, Queueing time: mean = 340.462 us, max = 2.010 ms, min = 10.616 us, total = 200.191 ms [state-dump] NodeManager.GcsCheckAlive - 588 total (1 active), Execution time: mean = 301.102 us, total = 177.048 ms, Queueing time: mean = 614.606 us, max = 2.567 ms, min = 6.690 us, total = 361.388 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 588 total (0 active), Execution time: mean = 53.669 us, total = 31.557 ms, Queueing time: mean = 115.035 us, max = 4.779 ms, min = 11.561 us, total = 67.640 ms [state-dump] NodeManager.deadline_timer.debug_state_dump - 294 total (1 active), Execution time: mean = 1.761 ms, total = 517.788 ms, Queueing time: mean = 67.133 us, max = 163.431 us, min = 13.687 us, total = 19.737 ms [state-dump] ClientConnection.async_read.ProcessMessageHeader - 95 total (21 active), Execution time: mean = 7.620 us, total = 723.861 us, Queueing time: mean = 660.815 ms, max = 59.826 s, min = 29.537 us, total = 62.777 s [state-dump] ClientConnection.async_read.ProcessMessage - 74 total (0 active), Execution time: mean = 752.373 us, total = 55.676 ms, Queueing time: mean = 18.898 us, max = 247.615 us, min = 4.453 us, total = 1.398 ms [state-dump] NodeManager.deadline_timer.print_event_loop_stats - 49 total (1 active, 1 running), Execution time: mean = 2.682 ms, total = 131.410 ms, Queueing time: mean = 73.422 us, max = 215.454 us, min = 15.835 us, total = 3.598 ms [state-dump] ClientConnection.async_write.DoAsyncWrites - 22 total (0 active), Execution time: mean = 2.129 us, total = 46.834 us, Queueing time: mean = 43.475 us, max = 264.443 us, min = 17.727 us, total = 956.459 us [state-dump] ObjectManager.ObjectDeleted - 21 total (0 active), Execution time: mean = 18.654 us, total = 391.728 us, Queueing time: mean = 66.932 us, max = 269.040 us, min = 24.280 us, total = 1.406 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 21 total (0 active), Execution time: mean = 119.783 us, total = 2.515 ms, Queueing time: mean = 119.709 us, max = 193.975 us, min = 26.529 us, total = 2.514 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig - 21 total (0 active), Execution time: mean = 800.451 us, total = 16.809 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ObjectManager.ObjectAdded - 21 total (0 active), Execution time: mean = 13.522 us, total = 283.961 us, Queueing time: mean = 115.200 us, max = 203.067 us, min = 15.853 us, total = 2.419 ms [state-dump] PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 204.482 us, total = 2.658 ms, Queueing time: mean = 3.109 ms, max = 9.915 ms, min = 28.359 us, total = 40.413 ms [state-dump] - 12 total (0 active), Execution time: mean = 1.109 us, total = 13.308 us, Queueing time: mean = 117.235 us, max = 207.332 us, min = 32.817 us, total = 1.407 ms [state-dump] RaySyncer.BroadcastMessage - 12 total (0 active), Execution time: mean = 229.790 us, total = 2.757 ms, Queueing time: mean = 796.500 ns, max = 1.330 us, min = 180.000 ns, total = 9.558 us [state-dump] NodeManagerService.grpc_server.RequestWorkerLease.HandleRequestImpl - 10 total (0 active), Execution time: mean = 133.400 us, total = 1.334 ms, Queueing time: mean = 133.654 us, max = 258.016 us, min = 50.206 us, total = 1.337 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker - 10 total (0 active), Execution time: mean = 698.937 us, total = 6.989 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.RequestWorkerLease - 10 total (0 active), Execution time: mean = 967.372 us, total = 9.674 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] WorkerPool.PopWorkerCallback - 10 total (0 active), Execution time: mean = 24.488 us, total = 244.878 us, Queueing time: mean = 114.997 us, max = 206.869 us, min = 23.653 us, total = 1.150 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker.HandleRequestImpl - 10 total (0 active), Execution time: mean = 129.906 us, total = 1.299 ms, Queueing time: mean = 109.463 us, max = 137.712 us, min = 31.332 us, total = 1.095 ms [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 6 total (1 active), Execution time: mean = 399.600 s, total = 2397.602 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 5 total (0 active), Execution time: mean = 329.314 us, total = 1.647 ms, Queueing time: mean = 112.016 us, max = 183.286 us, min = 20.299 us, total = 560.078 us [state-dump] NodeManager.GCTaskFailureReason - 4 total (1 active), Execution time: mean = 7.355 us, total = 29.419 us, Queueing time: mean = 66.631 us, max = 97.290 us, min = 83.871 us, total = 266.523 us [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 1.320 ms, total = 2.640 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 142.119 us, total = 284.237 us, Queueing time: mean = 702.200 us, max = 1.287 ms, min = 117.240 us, total = 1.404 ms [state-dump] RaySyncerRegister - 2 total (0 active), Execution time: mean = 2.053 us, total = 4.106 us, Queueing time: mean = 222.000 ns, max = 369.000 ns, min = 75.000 ns, total = 444.000 ns [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 2.178 ms, total = 2.178 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 1.022 s, total = 1.022 s, Queueing time: mean = 126.779 us, max = 126.779 us, min = 126.779 us, total = 126.779 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 43.810 us, total = 43.810 us, Queueing time: mean = 116.080 us, max = 116.080 us, min = 116.080 us, total = 116.080 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 1.935 ms, total = 1.935 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 2.002 ms, total = 2.002 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 1.278 ms, total = 1.278 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 288.995 us, total = 288.995 us, Queueing time: mean = 122.868 us, max = 122.868 us, min = 122.868 us, total = 122.868 us [state-dump] Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 74.750 us, total = 74.750 us, Queueing time: mean = 225.429 us, max = 225.429 us, min = 225.429 us, total = 225.429 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 242.297 us, total = 242.297 us, Queueing time: mean = 122.977 us, max = 122.977 us, min = 122.977 us, total = 122.977 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 92.080 us, total = 92.080 us, Queueing time: mean = 36.492 us, max = 36.492 us, min = 36.492 us, total = 36.492 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 1.746 ms, total = 1.746 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] DebugString() time ms: 2 [state-dump] [state-dump] [2025-01-20 23:51:54,917 I 13636 13664] (raylet) store.cc:564: Plasma store debug dump: Current usage: 0 / 2.14748 GB - num bytes created total: 168 0 pending objects of total size 0MB - objects spillable: 0 - bytes spillable: 0 - objects unsealed: 0 - bytes unsealed: 0 - objects in use: 0 - bytes in use: 0 - objects evictable: 0 - bytes evictable: 0 - objects created by worker: 0 - bytes created by worker: 0 - objects restored: 0 - bytes restored: 0 - objects received: 0 - bytes received: 0 - objects errored: 0 - bytes errored: 0 [2025-01-20 23:51:56,065 I 13636 13636] (raylet) node_manager.cc:525: [state-dump] NodeManager: [state-dump] Node ID: 94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3 [state-dump] Node name: 192.168.0.2 [state-dump] InitialConfigResources: {object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, GPU: 20000, node:__internal_head__: 10000, accelerator_type:A40: 10000, memory: 844966629380000} [state-dump] ClusterTaskManager: [state-dump] ========== Node: 94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3 ================= [state-dump] Infeasible queue length: 0 [state-dump] Schedule queue length: 0 [state-dump] Dispatch queue length: 0 [state-dump] num_waiting_for_resource: 0 [state-dump] num_waiting_for_plasma_memory: 0 [state-dump] num_waiting_for_remote_node_resources: 0 [state-dump] num_worker_not_started_by_job_config_not_exist: 0 [state-dump] num_worker_not_started_by_registration_timeout: 0 [state-dump] num_tasks_waiting_for_workers: 0 [state-dump] num_cancelled_tasks: 0 [state-dump] cluster_resource_scheduler state: [state-dump] Local id: -2729954820020547849 Local resources: {"total":{CPU: [200000], memory: [844966629380000], accelerator_type:A40: [10000], node:192.168.0.2: [10000], node:__internal_head__: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000]}}, "available": {CPU: [200000], memory: [844966629380000], accelerator_type:A40: [10000], node:192.168.0.2: [10000], node:__internal_head__: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000]}}, "labels":{"ray.io/node_id":"94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3",} is_draining: 0 is_idle: 1 Cluster resources: node id: -2729954820020547849{"total":{object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, GPU: 20000, node:__internal_head__: 10000, accelerator_type:A40: 10000, memory: 844966629380000}}, "available": {GPU: 20000, node:__internal_head__: 10000, memory: 844966629380000, accelerator_type:A40: 10000, object_store_memory: 21474836480000, node:192.168.0.2: 10000, CPU: 200000}}, "labels":{"ray.io/node_id":"94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} [state-dump] Waiting tasks size: 0 [state-dump] Number of executing tasks: 0 [state-dump] Number of pinned task arguments: 0 [state-dump] Number of total spilled tasks: 0 [state-dump] Number of spilled waiting tasks: 0 [state-dump] Number of spilled unschedulable tasks: 0 [state-dump] Resource usage { [state-dump] } [state-dump] Backlog Size per scheduling descriptor :{workerId: num backlogs}: [state-dump] [state-dump] Running tasks by scheduling class: [state-dump] ================================================== [state-dump] [state-dump] ClusterResources: [state-dump] LocalObjectManager: [state-dump] - num pinned objects: 0 [state-dump] - pinned objects size: 0 [state-dump] - num objects pending restore: 0 [state-dump] - num objects pending spill: 0 [state-dump] - num bytes pending spill: 0 [state-dump] - num bytes currently spilled: 0 [state-dump] - cumulative spill requests: 0 [state-dump] - cumulative restore requests: 0 [state-dump] - spilled objects pending delete: 0 [state-dump] [state-dump] ObjectManager: [state-dump] - num local objects: 0 [state-dump] - num unfulfilled push requests: 0 [state-dump] - num object pull requests: 0 [state-dump] - num chunks received total: 0 [state-dump] - num chunks received failed (all): 0 [state-dump] - num chunks received failed / cancelled: 0 [state-dump] - num chunks received failed / plasma error: 0 [state-dump] Event stats: [state-dump] Global stats: 0 total (0 active) [state-dump] Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Execution time: mean = -nan s, total = 0.000 s [state-dump] Event stats: [state-dump] PushManager: [state-dump] - num pushes in flight: 0 [state-dump] - num chunks in flight: 0 [state-dump] - num chunks remaining: 0 [state-dump] - max chunks allowed: 409 [state-dump] OwnershipBasedObjectDirectory: [state-dump] - num listeners: 0 [state-dump] - cumulative location updates: 0 [state-dump] - num location updates per second: 0.000 [state-dump] - num location lookups per second: 0.000 [state-dump] - num locations added per second: 0.000 [state-dump] - num locations removed per second: 0.000 [state-dump] BufferPool: [state-dump] - create buffer state map size: 0 [state-dump] PullManager: [state-dump] - num bytes available for pulled objects: 2147483648 [state-dump] - num bytes being pulled (all): 0 [state-dump] - num bytes being pulled / pinned: 0 [state-dump] - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - first get request bundle: N/A [state-dump] - first wait request bundle: N/A [state-dump] - first task request bundle: N/A [state-dump] - num objects queued: 0 [state-dump] - num objects actively pulled (all): 0 [state-dump] - num objects actively pulled / pinned: 0 [state-dump] - num bundles being pulled: 0 [state-dump] - num pull retries: 0 [state-dump] - max timeout seconds: 0 [state-dump] - max timeout request is already processed. No entry. [state-dump] [state-dump] WorkerPool: [state-dump] - registered jobs: 1 [state-dump] - process_failed_job_config_missing: 0 [state-dump] - process_failed_rate_limited: 0 [state-dump] - process_failed_pending_registration: 0 [state-dump] - process_failed_runtime_env_setup_failed: 0 [state-dump] - num PYTHON workers: 20 [state-dump] - num PYTHON drivers: 1 [state-dump] - num PYTHON pending start requests: 0 [state-dump] - num PYTHON pending registration requests: 0 [state-dump] - num object spill callbacks queued: 0 [state-dump] - num object restore queued: 0 [state-dump] - num util functions queued: 0 [state-dump] - num idle workers: 20 [state-dump] TaskDependencyManager: [state-dump] - task deps map size: 0 [state-dump] - get req map size: 0 [state-dump] - wait req map size: 0 [state-dump] - local objects map size: 0 [state-dump] WaitManager: [state-dump] - num active wait requests: 0 [state-dump] Subscriber: [state-dump] Channel WORKER_OBJECT_LOCATIONS_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_OBJECT_EVICTION [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_REF_REMOVED_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] num async plasma notifications: 0 [state-dump] Remote node managers: [state-dump] Event stats: [state-dump] Global stats: 261989 total (35 active) [state-dump] Queueing time: mean = 316.223 us, max = 59.826 s, min = -0.001 s, total = 82.847 s [state-dump] Execution time: mean = 11.621 ms, total = 3044.498 s [state-dump] Event stats: [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog - 62979 total (0 active), Execution time: mean = 523.583 us, total = 32.975 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 62979 total (0 active), Execution time: mean = 34.831 us, total = 2.194 s, Queueing time: mean = 108.429 us, max = 2.841 ms, min = 1.846 us, total = 6.829 s [state-dump] ObjectManager.UpdateAvailableMemory - 29972 total (0 active), Execution time: mean = 5.960 us, total = 178.646 ms, Queueing time: mean = 107.463 us, max = 1.662 ms, min = 2.228 us, total = 3.221 s [state-dump] NodeManager.CheckGC - 29972 total (1 active), Execution time: mean = 2.822 us, total = 84.585 ms, Queueing time: mean = 96.471 us, max = 25.875 ms, min = 2.848 us, total = 2.891 s [state-dump] RaySyncer.OnDemandBroadcasting - 29972 total (1 active), Execution time: mean = 10.348 us, total = 310.144 ms, Queueing time: mean = 89.832 us, max = 25.869 ms, min = 6.166 us, total = 2.692 s [state-dump] RayletWorkerPool.deadline_timer.kill_idle_workers - 14994 total (1 active), Execution time: mean = 17.926 us, total = 268.781 ms, Queueing time: mean = 75.686 us, max = 26.386 ms, min = -0.001 s, total = 1.135 s [state-dump] MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 11976 total (1 active), Execution time: mean = 451.262 us, total = 5.404 s, Queueing time: mean = 73.701 us, max = 3.532 ms, min = -0.000 s, total = 882.649 ms [state-dump] NodeManager.deadline_timer.spill_objects_when_over_threshold - 3000 total (1 active), Execution time: mean = 3.072 us, total = 9.217 ms, Queueing time: mean = 180.658 us, max = 2.946 ms, min = 3.958 us, total = 541.974 ms [state-dump] NodeManager.deadline_timer.flush_free_objects - 3000 total (1 active), Execution time: mean = 8.878 us, total = 26.633 ms, Queueing time: mean = 176.670 us, max = 2.947 ms, min = 3.811 us, total = 530.011 ms [state-dump] NodeManager.ScheduleAndDispatchTasks - 3000 total (1 active), Execution time: mean = 16.368 us, total = 49.105 ms, Queueing time: mean = 72.990 us, max = 2.581 ms, min = 10.666 us, total = 218.971 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 2999 total (0 active), Execution time: mean = 96.153 us, total = 288.364 ms, Queueing time: mean = 109.647 us, max = 2.934 ms, min = 4.027 us, total = 328.830 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad - 2999 total (0 active), Execution time: mean = 600.224 us, total = 1.800 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ClusterResourceManager.ResetRemoteNodeView - 1001 total (1 active), Execution time: mean = 10.300 us, total = 10.310 ms, Queueing time: mean = 75.007 us, max = 442.307 us, min = 14.043 us, total = 75.082 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 600 total (0 active), Execution time: mean = 1.559 ms, total = 935.228 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.deadline_timer.record_metrics - 600 total (1 active), Execution time: mean = 575.987 us, total = 345.592 ms, Queueing time: mean = 340.708 us, max = 2.010 ms, min = 10.616 us, total = 204.425 ms [state-dump] NodeManager.GcsCheckAlive - 600 total (1 active), Execution time: mean = 301.127 us, total = 180.676 ms, Queueing time: mean = 614.948 us, max = 2.567 ms, min = 6.690 us, total = 368.969 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 600 total (0 active), Execution time: mean = 53.805 us, total = 32.283 ms, Queueing time: mean = 115.132 us, max = 4.779 ms, min = 11.561 us, total = 69.079 ms [state-dump] NodeManager.deadline_timer.debug_state_dump - 300 total (1 active), Execution time: mean = 1.762 ms, total = 528.538 ms, Queueing time: mean = 67.458 us, max = 163.431 us, min = 13.687 us, total = 20.237 ms [state-dump] ClientConnection.async_read.ProcessMessageHeader - 95 total (21 active), Execution time: mean = 7.620 us, total = 723.861 us, Queueing time: mean = 660.815 ms, max = 59.826 s, min = 29.537 us, total = 62.777 s [state-dump] ClientConnection.async_read.ProcessMessage - 74 total (0 active), Execution time: mean = 752.373 us, total = 55.676 ms, Queueing time: mean = 18.898 us, max = 247.615 us, min = 4.453 us, total = 1.398 ms [state-dump] NodeManager.deadline_timer.print_event_loop_stats - 50 total (1 active, 1 running), Execution time: mean = 2.684 ms, total = 134.206 ms, Queueing time: mean = 73.154 us, max = 215.454 us, min = 15.835 us, total = 3.658 ms [state-dump] ClientConnection.async_write.DoAsyncWrites - 22 total (0 active), Execution time: mean = 2.129 us, total = 46.834 us, Queueing time: mean = 43.475 us, max = 264.443 us, min = 17.727 us, total = 956.459 us [state-dump] ObjectManager.ObjectDeleted - 21 total (0 active), Execution time: mean = 18.654 us, total = 391.728 us, Queueing time: mean = 66.932 us, max = 269.040 us, min = 24.280 us, total = 1.406 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 21 total (0 active), Execution time: mean = 119.783 us, total = 2.515 ms, Queueing time: mean = 119.709 us, max = 193.975 us, min = 26.529 us, total = 2.514 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig - 21 total (0 active), Execution time: mean = 800.451 us, total = 16.809 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ObjectManager.ObjectAdded - 21 total (0 active), Execution time: mean = 13.522 us, total = 283.961 us, Queueing time: mean = 115.200 us, max = 203.067 us, min = 15.853 us, total = 2.419 ms [state-dump] PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 204.482 us, total = 2.658 ms, Queueing time: mean = 3.109 ms, max = 9.915 ms, min = 28.359 us, total = 40.413 ms [state-dump] - 12 total (0 active), Execution time: mean = 1.109 us, total = 13.308 us, Queueing time: mean = 117.235 us, max = 207.332 us, min = 32.817 us, total = 1.407 ms [state-dump] RaySyncer.BroadcastMessage - 12 total (0 active), Execution time: mean = 229.790 us, total = 2.757 ms, Queueing time: mean = 796.500 ns, max = 1.330 us, min = 180.000 ns, total = 9.558 us [state-dump] NodeManagerService.grpc_server.RequestWorkerLease.HandleRequestImpl - 10 total (0 active), Execution time: mean = 133.400 us, total = 1.334 ms, Queueing time: mean = 133.654 us, max = 258.016 us, min = 50.206 us, total = 1.337 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker - 10 total (0 active), Execution time: mean = 698.937 us, total = 6.989 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.RequestWorkerLease - 10 total (0 active), Execution time: mean = 967.372 us, total = 9.674 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] WorkerPool.PopWorkerCallback - 10 total (0 active), Execution time: mean = 24.488 us, total = 244.878 us, Queueing time: mean = 114.997 us, max = 206.869 us, min = 23.653 us, total = 1.150 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker.HandleRequestImpl - 10 total (0 active), Execution time: mean = 129.906 us, total = 1.299 ms, Queueing time: mean = 109.463 us, max = 137.712 us, min = 31.332 us, total = 1.095 ms [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 7 total (1 active), Execution time: mean = 428.229 s, total = 2997.604 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 6 total (0 active), Execution time: mean = 351.177 us, total = 2.107 ms, Queueing time: mean = 117.415 us, max = 183.286 us, min = 20.299 us, total = 704.491 us [state-dump] NodeManager.GCTaskFailureReason - 4 total (1 active), Execution time: mean = 7.355 us, total = 29.419 us, Queueing time: mean = 66.631 us, max = 97.290 us, min = 83.871 us, total = 266.523 us [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 1.320 ms, total = 2.640 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 142.119 us, total = 284.237 us, Queueing time: mean = 702.200 us, max = 1.287 ms, min = 117.240 us, total = 1.404 ms [state-dump] RaySyncerRegister - 2 total (0 active), Execution time: mean = 2.053 us, total = 4.106 us, Queueing time: mean = 222.000 ns, max = 369.000 ns, min = 75.000 ns, total = 444.000 ns [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 2.178 ms, total = 2.178 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 1.022 s, total = 1.022 s, Queueing time: mean = 126.779 us, max = 126.779 us, min = 126.779 us, total = 126.779 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 43.810 us, total = 43.810 us, Queueing time: mean = 116.080 us, max = 116.080 us, min = 116.080 us, total = 116.080 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 1.935 ms, total = 1.935 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 2.002 ms, total = 2.002 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 1.278 ms, total = 1.278 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 288.995 us, total = 288.995 us, Queueing time: mean = 122.868 us, max = 122.868 us, min = 122.868 us, total = 122.868 us [state-dump] Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 74.750 us, total = 74.750 us, Queueing time: mean = 225.429 us, max = 225.429 us, min = 225.429 us, total = 225.429 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 242.297 us, total = 242.297 us, Queueing time: mean = 122.977 us, max = 122.977 us, min = 122.977 us, total = 122.977 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 92.080 us, total = 92.080 us, Queueing time: mean = 36.492 us, max = 36.492 us, min = 36.492 us, total = 36.492 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 1.746 ms, total = 1.746 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] DebugString() time ms: 1 [state-dump] [state-dump] [2025-01-20 23:52:54,917 I 13636 13664] (raylet) store.cc:564: Plasma store debug dump: Current usage: 0 / 2.14748 GB - num bytes created total: 168 0 pending objects of total size 0MB - objects spillable: 0 - bytes spillable: 0 - objects unsealed: 0 - bytes unsealed: 0 - objects in use: 0 - bytes in use: 0 - objects evictable: 0 - bytes evictable: 0 - objects created by worker: 0 - bytes created by worker: 0 - objects restored: 0 - bytes restored: 0 - objects received: 0 - bytes received: 0 - objects errored: 0 - bytes errored: 0 [2025-01-20 23:52:56,068 I 13636 13636] (raylet) node_manager.cc:525: [state-dump] NodeManager: [state-dump] Node ID: 94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3 [state-dump] Node name: 192.168.0.2 [state-dump] InitialConfigResources: {object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, GPU: 20000, node:__internal_head__: 10000, accelerator_type:A40: 10000, memory: 844966629380000} [state-dump] ClusterTaskManager: [state-dump] ========== Node: 94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3 ================= [state-dump] Infeasible queue length: 0 [state-dump] Schedule queue length: 0 [state-dump] Dispatch queue length: 0 [state-dump] num_waiting_for_resource: 0 [state-dump] num_waiting_for_plasma_memory: 0 [state-dump] num_waiting_for_remote_node_resources: 0 [state-dump] num_worker_not_started_by_job_config_not_exist: 0 [state-dump] num_worker_not_started_by_registration_timeout: 0 [state-dump] num_tasks_waiting_for_workers: 0 [state-dump] num_cancelled_tasks: 0 [state-dump] cluster_resource_scheduler state: [state-dump] Local id: -2729954820020547849 Local resources: {"total":{CPU: [200000], memory: [844966629380000], accelerator_type:A40: [10000], node:192.168.0.2: [10000], node:__internal_head__: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000]}}, "available": {CPU: [200000], memory: [844966629380000], accelerator_type:A40: [10000], node:192.168.0.2: [10000], node:__internal_head__: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000]}}, "labels":{"ray.io/node_id":"94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3",} is_draining: 0 is_idle: 1 Cluster resources: node id: -2729954820020547849{"total":{object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, GPU: 20000, node:__internal_head__: 10000, accelerator_type:A40: 10000, memory: 844966629380000}}, "available": {GPU: 20000, node:__internal_head__: 10000, memory: 844966629380000, accelerator_type:A40: 10000, object_store_memory: 21474836480000, node:192.168.0.2: 10000, CPU: 200000}}, "labels":{"ray.io/node_id":"94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} [state-dump] Waiting tasks size: 0 [state-dump] Number of executing tasks: 0 [state-dump] Number of pinned task arguments: 0 [state-dump] Number of total spilled tasks: 0 [state-dump] Number of spilled waiting tasks: 0 [state-dump] Number of spilled unschedulable tasks: 0 [state-dump] Resource usage { [state-dump] } [state-dump] Backlog Size per scheduling descriptor :{workerId: num backlogs}: [state-dump] [state-dump] Running tasks by scheduling class: [state-dump] ================================================== [state-dump] [state-dump] ClusterResources: [state-dump] LocalObjectManager: [state-dump] - num pinned objects: 0 [state-dump] - pinned objects size: 0 [state-dump] - num objects pending restore: 0 [state-dump] - num objects pending spill: 0 [state-dump] - num bytes pending spill: 0 [state-dump] - num bytes currently spilled: 0 [state-dump] - cumulative spill requests: 0 [state-dump] - cumulative restore requests: 0 [state-dump] - spilled objects pending delete: 0 [state-dump] [state-dump] ObjectManager: [state-dump] - num local objects: 0 [state-dump] - num unfulfilled push requests: 0 [state-dump] - num object pull requests: 0 [state-dump] - num chunks received total: 0 [state-dump] - num chunks received failed (all): 0 [state-dump] - num chunks received failed / cancelled: 0 [state-dump] - num chunks received failed / plasma error: 0 [state-dump] Event stats: [state-dump] Global stats: 0 total (0 active) [state-dump] Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Execution time: mean = -nan s, total = 0.000 s [state-dump] Event stats: [state-dump] PushManager: [state-dump] - num pushes in flight: 0 [state-dump] - num chunks in flight: 0 [state-dump] - num chunks remaining: 0 [state-dump] - max chunks allowed: 409 [state-dump] OwnershipBasedObjectDirectory: [state-dump] - num listeners: 0 [state-dump] - cumulative location updates: 0 [state-dump] - num location updates per second: 0.000 [state-dump] - num location lookups per second: 0.000 [state-dump] - num locations added per second: 0.000 [state-dump] - num locations removed per second: 0.000 [state-dump] BufferPool: [state-dump] - create buffer state map size: 0 [state-dump] PullManager: [state-dump] - num bytes available for pulled objects: 2147483648 [state-dump] - num bytes being pulled (all): 0 [state-dump] - num bytes being pulled / pinned: 0 [state-dump] - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - first get request bundle: N/A [state-dump] - first wait request bundle: N/A [state-dump] - first task request bundle: N/A [state-dump] - num objects queued: 0 [state-dump] - num objects actively pulled (all): 0 [state-dump] - num objects actively pulled / pinned: 0 [state-dump] - num bundles being pulled: 0 [state-dump] - num pull retries: 0 [state-dump] - max timeout seconds: 0 [state-dump] - max timeout request is already processed. No entry. [state-dump] [state-dump] WorkerPool: [state-dump] - registered jobs: 1 [state-dump] - process_failed_job_config_missing: 0 [state-dump] - process_failed_rate_limited: 0 [state-dump] - process_failed_pending_registration: 0 [state-dump] - process_failed_runtime_env_setup_failed: 0 [state-dump] - num PYTHON workers: 20 [state-dump] - num PYTHON drivers: 1 [state-dump] - num PYTHON pending start requests: 0 [state-dump] - num PYTHON pending registration requests: 0 [state-dump] - num object spill callbacks queued: 0 [state-dump] - num object restore queued: 0 [state-dump] - num util functions queued: 0 [state-dump] - num idle workers: 20 [state-dump] TaskDependencyManager: [state-dump] - task deps map size: 0 [state-dump] - get req map size: 0 [state-dump] - wait req map size: 0 [state-dump] - local objects map size: 0 [state-dump] WaitManager: [state-dump] - num active wait requests: 0 [state-dump] Subscriber: [state-dump] Channel WORKER_OBJECT_LOCATIONS_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_OBJECT_EVICTION [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_REF_REMOVED_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] num async plasma notifications: 0 [state-dump] Remote node managers: [state-dump] Event stats: [state-dump] Global stats: 267224 total (35 active) [state-dump] Queueing time: mean = 311.654 us, max = 59.826 s, min = -0.001 s, total = 83.281 s [state-dump] Execution time: mean = 11.397 ms, total = 3045.485 s [state-dump] Event stats: [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog - 64239 total (0 active), Execution time: mean = 524.494 us, total = 33.693 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 64239 total (0 active), Execution time: mean = 34.869 us, total = 2.240 s, Queueing time: mean = 108.658 us, max = 2.841 ms, min = 1.846 us, total = 6.980 s [state-dump] ObjectManager.UpdateAvailableMemory - 30572 total (0 active), Execution time: mean = 5.967 us, total = 182.416 ms, Queueing time: mean = 107.799 us, max = 1.662 ms, min = 2.228 us, total = 3.296 s [state-dump] NodeManager.CheckGC - 30572 total (1 active), Execution time: mean = 2.822 us, total = 86.260 ms, Queueing time: mean = 96.515 us, max = 25.875 ms, min = 2.848 us, total = 2.951 s [state-dump] RaySyncer.OnDemandBroadcasting - 30572 total (1 active), Execution time: mean = 10.339 us, total = 316.093 ms, Queueing time: mean = 89.884 us, max = 25.869 ms, min = 6.166 us, total = 2.748 s [state-dump] RayletWorkerPool.deadline_timer.kill_idle_workers - 15294 total (1 active), Execution time: mean = 17.922 us, total = 274.102 ms, Queueing time: mean = 75.767 us, max = 26.386 ms, min = -0.001 s, total = 1.159 s [state-dump] MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 12216 total (1 active), Execution time: mean = 451.518 us, total = 5.516 s, Queueing time: mean = 73.785 us, max = 3.532 ms, min = -0.000 s, total = 901.360 ms [state-dump] NodeManager.deadline_timer.spill_objects_when_over_threshold - 3060 total (1 active), Execution time: mean = 3.072 us, total = 9.401 ms, Queueing time: mean = 180.924 us, max = 2.946 ms, min = 3.958 us, total = 553.628 ms [state-dump] NodeManager.deadline_timer.flush_free_objects - 3060 total (1 active), Execution time: mean = 8.902 us, total = 27.239 ms, Queueing time: mean = 176.925 us, max = 2.947 ms, min = 3.811 us, total = 541.392 ms [state-dump] NodeManager.ScheduleAndDispatchTasks - 3060 total (1 active), Execution time: mean = 16.393 us, total = 50.161 ms, Queueing time: mean = 73.153 us, max = 2.581 ms, min = 10.666 us, total = 223.847 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 3059 total (0 active), Execution time: mean = 96.103 us, total = 293.978 ms, Queueing time: mean = 109.856 us, max = 2.934 ms, min = 4.027 us, total = 336.050 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad - 3059 total (0 active), Execution time: mean = 601.327 us, total = 1.839 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ClusterResourceManager.ResetRemoteNodeView - 1021 total (1 active), Execution time: mean = 10.275 us, total = 10.491 ms, Queueing time: mean = 75.029 us, max = 442.307 us, min = 14.043 us, total = 76.605 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 612 total (0 active), Execution time: mean = 1.562 ms, total = 955.867 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.deadline_timer.record_metrics - 612 total (1 active), Execution time: mean = 577.476 us, total = 353.415 ms, Queueing time: mean = 340.799 us, max = 2.010 ms, min = 10.616 us, total = 208.569 ms [state-dump] NodeManager.GcsCheckAlive - 612 total (1 active), Execution time: mean = 301.021 us, total = 184.225 ms, Queueing time: mean = 616.524 us, max = 2.567 ms, min = 6.690 us, total = 377.313 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 612 total (0 active), Execution time: mean = 53.886 us, total = 32.978 ms, Queueing time: mean = 115.174 us, max = 4.779 ms, min = 11.561 us, total = 70.487 ms [state-dump] NodeManager.deadline_timer.debug_state_dump - 306 total (1 active), Execution time: mean = 1.764 ms, total = 539.924 ms, Queueing time: mean = 67.636 us, max = 163.431 us, min = 13.687 us, total = 20.697 ms [state-dump] ClientConnection.async_read.ProcessMessageHeader - 95 total (21 active), Execution time: mean = 7.620 us, total = 723.861 us, Queueing time: mean = 660.815 ms, max = 59.826 s, min = 29.537 us, total = 62.777 s [state-dump] ClientConnection.async_read.ProcessMessage - 74 total (0 active), Execution time: mean = 752.373 us, total = 55.676 ms, Queueing time: mean = 18.898 us, max = 247.615 us, min = 4.453 us, total = 1.398 ms [state-dump] NodeManager.deadline_timer.print_event_loop_stats - 51 total (1 active, 1 running), Execution time: mean = 2.689 ms, total = 137.157 ms, Queueing time: mean = 73.283 us, max = 215.454 us, min = 15.835 us, total = 3.737 ms [state-dump] ClientConnection.async_write.DoAsyncWrites - 22 total (0 active), Execution time: mean = 2.129 us, total = 46.834 us, Queueing time: mean = 43.475 us, max = 264.443 us, min = 17.727 us, total = 956.459 us [state-dump] ObjectManager.ObjectDeleted - 21 total (0 active), Execution time: mean = 18.654 us, total = 391.728 us, Queueing time: mean = 66.932 us, max = 269.040 us, min = 24.280 us, total = 1.406 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 21 total (0 active), Execution time: mean = 119.783 us, total = 2.515 ms, Queueing time: mean = 119.709 us, max = 193.975 us, min = 26.529 us, total = 2.514 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig - 21 total (0 active), Execution time: mean = 800.451 us, total = 16.809 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ObjectManager.ObjectAdded - 21 total (0 active), Execution time: mean = 13.522 us, total = 283.961 us, Queueing time: mean = 115.200 us, max = 203.067 us, min = 15.853 us, total = 2.419 ms [state-dump] PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 204.482 us, total = 2.658 ms, Queueing time: mean = 3.109 ms, max = 9.915 ms, min = 28.359 us, total = 40.413 ms [state-dump] - 12 total (0 active), Execution time: mean = 1.109 us, total = 13.308 us, Queueing time: mean = 117.235 us, max = 207.332 us, min = 32.817 us, total = 1.407 ms [state-dump] RaySyncer.BroadcastMessage - 12 total (0 active), Execution time: mean = 229.790 us, total = 2.757 ms, Queueing time: mean = 796.500 ns, max = 1.330 us, min = 180.000 ns, total = 9.558 us [state-dump] NodeManagerService.grpc_server.RequestWorkerLease.HandleRequestImpl - 10 total (0 active), Execution time: mean = 133.400 us, total = 1.334 ms, Queueing time: mean = 133.654 us, max = 258.016 us, min = 50.206 us, total = 1.337 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker - 10 total (0 active), Execution time: mean = 698.937 us, total = 6.989 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.RequestWorkerLease - 10 total (0 active), Execution time: mean = 967.372 us, total = 9.674 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] WorkerPool.PopWorkerCallback - 10 total (0 active), Execution time: mean = 24.488 us, total = 244.878 us, Queueing time: mean = 114.997 us, max = 206.869 us, min = 23.653 us, total = 1.150 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker.HandleRequestImpl - 10 total (0 active), Execution time: mean = 129.906 us, total = 1.299 ms, Queueing time: mean = 109.463 us, max = 137.712 us, min = 31.332 us, total = 1.095 ms [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 7 total (1 active), Execution time: mean = 428.229 s, total = 2997.604 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 6 total (0 active), Execution time: mean = 351.177 us, total = 2.107 ms, Queueing time: mean = 117.415 us, max = 183.286 us, min = 20.299 us, total = 704.491 us [state-dump] NodeManager.GCTaskFailureReason - 4 total (1 active), Execution time: mean = 7.355 us, total = 29.419 us, Queueing time: mean = 66.631 us, max = 97.290 us, min = 83.871 us, total = 266.523 us [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 1.320 ms, total = 2.640 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 142.119 us, total = 284.237 us, Queueing time: mean = 702.200 us, max = 1.287 ms, min = 117.240 us, total = 1.404 ms [state-dump] RaySyncerRegister - 2 total (0 active), Execution time: mean = 2.053 us, total = 4.106 us, Queueing time: mean = 222.000 ns, max = 369.000 ns, min = 75.000 ns, total = 444.000 ns [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 2.178 ms, total = 2.178 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 1.022 s, total = 1.022 s, Queueing time: mean = 126.779 us, max = 126.779 us, min = 126.779 us, total = 126.779 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 43.810 us, total = 43.810 us, Queueing time: mean = 116.080 us, max = 116.080 us, min = 116.080 us, total = 116.080 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 1.935 ms, total = 1.935 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 2.002 ms, total = 2.002 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 1.278 ms, total = 1.278 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 288.995 us, total = 288.995 us, Queueing time: mean = 122.868 us, max = 122.868 us, min = 122.868 us, total = 122.868 us [state-dump] Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 74.750 us, total = 74.750 us, Queueing time: mean = 225.429 us, max = 225.429 us, min = 225.429 us, total = 225.429 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 242.297 us, total = 242.297 us, Queueing time: mean = 122.977 us, max = 122.977 us, min = 122.977 us, total = 122.977 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 92.080 us, total = 92.080 us, Queueing time: mean = 36.492 us, max = 36.492 us, min = 36.492 us, total = 36.492 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 1.746 ms, total = 1.746 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] DebugString() time ms: 1 [state-dump] [state-dump] [2025-01-20 23:53:54,918 I 13636 13664] (raylet) store.cc:564: Plasma store debug dump: Current usage: 0 / 2.14748 GB - num bytes created total: 168 0 pending objects of total size 0MB - objects spillable: 0 - bytes spillable: 0 - objects unsealed: 0 - bytes unsealed: 0 - objects in use: 0 - bytes in use: 0 - objects evictable: 0 - bytes evictable: 0 - objects created by worker: 0 - bytes created by worker: 0 - objects restored: 0 - bytes restored: 0 - objects received: 0 - bytes received: 0 - objects errored: 0 - bytes errored: 0 [2025-01-20 23:53:56,071 I 13636 13636] (raylet) node_manager.cc:525: [state-dump] NodeManager: [state-dump] Node ID: 94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3 [state-dump] Node name: 192.168.0.2 [state-dump] InitialConfigResources: {object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, GPU: 20000, node:__internal_head__: 10000, accelerator_type:A40: 10000, memory: 844966629380000} [state-dump] ClusterTaskManager: [state-dump] ========== Node: 94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3 ================= [state-dump] Infeasible queue length: 0 [state-dump] Schedule queue length: 0 [state-dump] Dispatch queue length: 0 [state-dump] num_waiting_for_resource: 0 [state-dump] num_waiting_for_plasma_memory: 0 [state-dump] num_waiting_for_remote_node_resources: 0 [state-dump] num_worker_not_started_by_job_config_not_exist: 0 [state-dump] num_worker_not_started_by_registration_timeout: 0 [state-dump] num_tasks_waiting_for_workers: 0 [state-dump] num_cancelled_tasks: 0 [state-dump] cluster_resource_scheduler state: [state-dump] Local id: -2729954820020547849 Local resources: {"total":{CPU: [200000], memory: [844966629380000], accelerator_type:A40: [10000], node:192.168.0.2: [10000], node:__internal_head__: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000]}}, "available": {CPU: [200000], memory: [844966629380000], accelerator_type:A40: [10000], node:192.168.0.2: [10000], node:__internal_head__: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000]}}, "labels":{"ray.io/node_id":"94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3",} is_draining: 0 is_idle: 1 Cluster resources: node id: -2729954820020547849{"total":{object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, GPU: 20000, node:__internal_head__: 10000, accelerator_type:A40: 10000, memory: 844966629380000}}, "available": {GPU: 20000, node:__internal_head__: 10000, memory: 844966629380000, accelerator_type:A40: 10000, object_store_memory: 21474836480000, node:192.168.0.2: 10000, CPU: 200000}}, "labels":{"ray.io/node_id":"94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} [state-dump] Waiting tasks size: 0 [state-dump] Number of executing tasks: 0 [state-dump] Number of pinned task arguments: 0 [state-dump] Number of total spilled tasks: 0 [state-dump] Number of spilled waiting tasks: 0 [state-dump] Number of spilled unschedulable tasks: 0 [state-dump] Resource usage { [state-dump] } [state-dump] Backlog Size per scheduling descriptor :{workerId: num backlogs}: [state-dump] [state-dump] Running tasks by scheduling class: [state-dump] ================================================== [state-dump] [state-dump] ClusterResources: [state-dump] LocalObjectManager: [state-dump] - num pinned objects: 0 [state-dump] - pinned objects size: 0 [state-dump] - num objects pending restore: 0 [state-dump] - num objects pending spill: 0 [state-dump] - num bytes pending spill: 0 [state-dump] - num bytes currently spilled: 0 [state-dump] - cumulative spill requests: 0 [state-dump] - cumulative restore requests: 0 [state-dump] - spilled objects pending delete: 0 [state-dump] [state-dump] ObjectManager: [state-dump] - num local objects: 0 [state-dump] - num unfulfilled push requests: 0 [state-dump] - num object pull requests: 0 [state-dump] - num chunks received total: 0 [state-dump] - num chunks received failed (all): 0 [state-dump] - num chunks received failed / cancelled: 0 [state-dump] - num chunks received failed / plasma error: 0 [state-dump] Event stats: [state-dump] Global stats: 0 total (0 active) [state-dump] Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Execution time: mean = -nan s, total = 0.000 s [state-dump] Event stats: [state-dump] PushManager: [state-dump] - num pushes in flight: 0 [state-dump] - num chunks in flight: 0 [state-dump] - num chunks remaining: 0 [state-dump] - max chunks allowed: 409 [state-dump] OwnershipBasedObjectDirectory: [state-dump] - num listeners: 0 [state-dump] - cumulative location updates: 0 [state-dump] - num location updates per second: 0.000 [state-dump] - num location lookups per second: 0.000 [state-dump] - num locations added per second: 0.000 [state-dump] - num locations removed per second: 0.000 [state-dump] BufferPool: [state-dump] - create buffer state map size: 0 [state-dump] PullManager: [state-dump] - num bytes available for pulled objects: 2147483648 [state-dump] - num bytes being pulled (all): 0 [state-dump] - num bytes being pulled / pinned: 0 [state-dump] - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - first get request bundle: N/A [state-dump] - first wait request bundle: N/A [state-dump] - first task request bundle: N/A [state-dump] - num objects queued: 0 [state-dump] - num objects actively pulled (all): 0 [state-dump] - num objects actively pulled / pinned: 0 [state-dump] - num bundles being pulled: 0 [state-dump] - num pull retries: 0 [state-dump] - max timeout seconds: 0 [state-dump] - max timeout request is already processed. No entry. [state-dump] [state-dump] WorkerPool: [state-dump] - registered jobs: 1 [state-dump] - process_failed_job_config_missing: 0 [state-dump] - process_failed_rate_limited: 0 [state-dump] - process_failed_pending_registration: 0 [state-dump] - process_failed_runtime_env_setup_failed: 0 [state-dump] - num PYTHON workers: 20 [state-dump] - num PYTHON drivers: 1 [state-dump] - num PYTHON pending start requests: 0 [state-dump] - num PYTHON pending registration requests: 0 [state-dump] - num object spill callbacks queued: 0 [state-dump] - num object restore queued: 0 [state-dump] - num util functions queued: 0 [state-dump] - num idle workers: 20 [state-dump] TaskDependencyManager: [state-dump] - task deps map size: 0 [state-dump] - get req map size: 0 [state-dump] - wait req map size: 0 [state-dump] - local objects map size: 0 [state-dump] WaitManager: [state-dump] - num active wait requests: 0 [state-dump] Subscriber: [state-dump] Channel WORKER_OBJECT_LOCATIONS_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_OBJECT_EVICTION [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_REF_REMOVED_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] num async plasma notifications: 0 [state-dump] Remote node managers: [state-dump] Event stats: [state-dump] Global stats: 272455 total (35 active) [state-dump] Queueing time: mean = 307.169 us, max = 59.826 s, min = -0.001 s, total = 83.690 s [state-dump] Execution time: mean = 11.181 ms, total = 3046.439 s [state-dump] Event stats: [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog - 65499 total (0 active), Execution time: mean = 524.989 us, total = 34.386 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 65499 total (0 active), Execution time: mean = 34.888 us, total = 2.285 s, Queueing time: mean = 108.784 us, max = 2.841 ms, min = 1.846 us, total = 7.125 s [state-dump] ObjectManager.UpdateAvailableMemory - 31171 total (0 active), Execution time: mean = 5.974 us, total = 186.207 ms, Queueing time: mean = 107.871 us, max = 1.662 ms, min = 2.228 us, total = 3.362 s [state-dump] NodeManager.CheckGC - 31171 total (1 active), Execution time: mean = 2.821 us, total = 87.938 ms, Queueing time: mean = 96.499 us, max = 25.875 ms, min = 2.848 us, total = 3.008 s [state-dump] RaySyncer.OnDemandBroadcasting - 31171 total (1 active), Execution time: mean = 10.339 us, total = 322.289 ms, Queueing time: mean = 89.868 us, max = 25.869 ms, min = 6.166 us, total = 2.801 s [state-dump] RayletWorkerPool.deadline_timer.kill_idle_workers - 15594 total (1 active), Execution time: mean = 17.941 us, total = 279.766 ms, Queueing time: mean = 75.757 us, max = 26.386 ms, min = -0.001 s, total = 1.181 s [state-dump] MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 12455 total (1 active), Execution time: mean = 451.596 us, total = 5.625 s, Queueing time: mean = 73.874 us, max = 3.532 ms, min = -0.000 s, total = 920.100 ms [state-dump] NodeManager.deadline_timer.spill_objects_when_over_threshold - 3120 total (1 active), Execution time: mean = 3.069 us, total = 9.575 ms, Queueing time: mean = 180.663 us, max = 2.946 ms, min = 3.958 us, total = 563.668 ms [state-dump] NodeManager.deadline_timer.flush_free_objects - 3120 total (1 active), Execution time: mean = 8.895 us, total = 27.751 ms, Queueing time: mean = 176.665 us, max = 2.947 ms, min = 3.811 us, total = 551.195 ms [state-dump] NodeManager.ScheduleAndDispatchTasks - 3120 total (1 active), Execution time: mean = 16.403 us, total = 51.176 ms, Queueing time: mean = 73.206 us, max = 2.581 ms, min = 10.666 us, total = 228.403 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 3119 total (0 active), Execution time: mean = 96.094 us, total = 299.717 ms, Queueing time: mean = 109.989 us, max = 2.934 ms, min = 4.027 us, total = 343.054 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad - 3119 total (0 active), Execution time: mean = 602.023 us, total = 1.878 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ClusterResourceManager.ResetRemoteNodeView - 1041 total (1 active), Execution time: mean = 10.265 us, total = 10.685 ms, Queueing time: mean = 75.096 us, max = 442.307 us, min = 14.043 us, total = 78.175 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 624 total (0 active), Execution time: mean = 1.565 ms, total = 976.519 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.deadline_timer.record_metrics - 624 total (1 active), Execution time: mean = 577.335 us, total = 360.257 ms, Queueing time: mean = 339.296 us, max = 2.010 ms, min = 9.115 us, total = 211.721 ms [state-dump] NodeManager.GcsCheckAlive - 624 total (1 active), Execution time: mean = 300.999 us, total = 187.823 ms, Queueing time: mean = 615.379 us, max = 2.567 ms, min = 6.690 us, total = 383.997 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 624 total (0 active), Execution time: mean = 53.930 us, total = 33.653 ms, Queueing time: mean = 114.981 us, max = 4.779 ms, min = 11.561 us, total = 71.748 ms [state-dump] NodeManager.deadline_timer.debug_state_dump - 312 total (1 active), Execution time: mean = 1.761 ms, total = 549.527 ms, Queueing time: mean = 67.711 us, max = 163.431 us, min = 11.609 us, total = 21.126 ms [state-dump] ClientConnection.async_read.ProcessMessageHeader - 95 total (21 active), Execution time: mean = 7.620 us, total = 723.861 us, Queueing time: mean = 660.815 ms, max = 59.826 s, min = 29.537 us, total = 62.777 s [state-dump] ClientConnection.async_read.ProcessMessage - 74 total (0 active), Execution time: mean = 752.373 us, total = 55.676 ms, Queueing time: mean = 18.898 us, max = 247.615 us, min = 4.453 us, total = 1.398 ms [state-dump] NodeManager.deadline_timer.print_event_loop_stats - 52 total (1 active, 1 running), Execution time: mean = 2.688 ms, total = 139.766 ms, Queueing time: mean = 73.024 us, max = 215.454 us, min = 15.835 us, total = 3.797 ms [state-dump] ClientConnection.async_write.DoAsyncWrites - 22 total (0 active), Execution time: mean = 2.129 us, total = 46.834 us, Queueing time: mean = 43.475 us, max = 264.443 us, min = 17.727 us, total = 956.459 us [state-dump] ObjectManager.ObjectDeleted - 21 total (0 active), Execution time: mean = 18.654 us, total = 391.728 us, Queueing time: mean = 66.932 us, max = 269.040 us, min = 24.280 us, total = 1.406 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 21 total (0 active), Execution time: mean = 119.783 us, total = 2.515 ms, Queueing time: mean = 119.709 us, max = 193.975 us, min = 26.529 us, total = 2.514 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig - 21 total (0 active), Execution time: mean = 800.451 us, total = 16.809 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ObjectManager.ObjectAdded - 21 total (0 active), Execution time: mean = 13.522 us, total = 283.961 us, Queueing time: mean = 115.200 us, max = 203.067 us, min = 15.853 us, total = 2.419 ms [state-dump] PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 204.482 us, total = 2.658 ms, Queueing time: mean = 3.109 ms, max = 9.915 ms, min = 28.359 us, total = 40.413 ms [state-dump] - 12 total (0 active), Execution time: mean = 1.109 us, total = 13.308 us, Queueing time: mean = 117.235 us, max = 207.332 us, min = 32.817 us, total = 1.407 ms [state-dump] RaySyncer.BroadcastMessage - 12 total (0 active), Execution time: mean = 229.790 us, total = 2.757 ms, Queueing time: mean = 796.500 ns, max = 1.330 us, min = 180.000 ns, total = 9.558 us [state-dump] NodeManagerService.grpc_server.RequestWorkerLease.HandleRequestImpl - 10 total (0 active), Execution time: mean = 133.400 us, total = 1.334 ms, Queueing time: mean = 133.654 us, max = 258.016 us, min = 50.206 us, total = 1.337 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker - 10 total (0 active), Execution time: mean = 698.937 us, total = 6.989 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.RequestWorkerLease - 10 total (0 active), Execution time: mean = 967.372 us, total = 9.674 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] WorkerPool.PopWorkerCallback - 10 total (0 active), Execution time: mean = 24.488 us, total = 244.878 us, Queueing time: mean = 114.997 us, max = 206.869 us, min = 23.653 us, total = 1.150 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker.HandleRequestImpl - 10 total (0 active), Execution time: mean = 129.906 us, total = 1.299 ms, Queueing time: mean = 109.463 us, max = 137.712 us, min = 31.332 us, total = 1.095 ms [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 7 total (1 active), Execution time: mean = 428.229 s, total = 2997.604 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 6 total (0 active), Execution time: mean = 351.177 us, total = 2.107 ms, Queueing time: mean = 117.415 us, max = 183.286 us, min = 20.299 us, total = 704.491 us [state-dump] NodeManager.GCTaskFailureReason - 4 total (1 active), Execution time: mean = 7.355 us, total = 29.419 us, Queueing time: mean = 66.631 us, max = 97.290 us, min = 83.871 us, total = 266.523 us [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 1.320 ms, total = 2.640 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 142.119 us, total = 284.237 us, Queueing time: mean = 702.200 us, max = 1.287 ms, min = 117.240 us, total = 1.404 ms [state-dump] RaySyncerRegister - 2 total (0 active), Execution time: mean = 2.053 us, total = 4.106 us, Queueing time: mean = 222.000 ns, max = 369.000 ns, min = 75.000 ns, total = 444.000 ns [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 2.178 ms, total = 2.178 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 1.022 s, total = 1.022 s, Queueing time: mean = 126.779 us, max = 126.779 us, min = 126.779 us, total = 126.779 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 43.810 us, total = 43.810 us, Queueing time: mean = 116.080 us, max = 116.080 us, min = 116.080 us, total = 116.080 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 1.935 ms, total = 1.935 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 2.002 ms, total = 2.002 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 1.278 ms, total = 1.278 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 288.995 us, total = 288.995 us, Queueing time: mean = 122.868 us, max = 122.868 us, min = 122.868 us, total = 122.868 us [state-dump] Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 74.750 us, total = 74.750 us, Queueing time: mean = 225.429 us, max = 225.429 us, min = 225.429 us, total = 225.429 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 242.297 us, total = 242.297 us, Queueing time: mean = 122.977 us, max = 122.977 us, min = 122.977 us, total = 122.977 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 92.080 us, total = 92.080 us, Queueing time: mean = 36.492 us, max = 36.492 us, min = 36.492 us, total = 36.492 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 1.746 ms, total = 1.746 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] DebugString() time ms: 1 [state-dump] [state-dump] [2025-01-20 23:54:54,918 I 13636 13664] (raylet) store.cc:564: Plasma store debug dump: Current usage: 0 / 2.14748 GB - num bytes created total: 168 0 pending objects of total size 0MB - objects spillable: 0 - bytes spillable: 0 - objects unsealed: 0 - bytes unsealed: 0 - objects in use: 0 - bytes in use: 0 - objects evictable: 0 - bytes evictable: 0 - objects created by worker: 0 - bytes created by worker: 0 - objects restored: 0 - bytes restored: 0 - objects received: 0 - bytes received: 0 - objects errored: 0 - bytes errored: 0 [2025-01-20 23:54:56,074 I 13636 13636] (raylet) node_manager.cc:525: [state-dump] NodeManager: [state-dump] Node ID: 94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3 [state-dump] Node name: 192.168.0.2 [state-dump] InitialConfigResources: {object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, GPU: 20000, node:__internal_head__: 10000, accelerator_type:A40: 10000, memory: 844966629380000} [state-dump] ClusterTaskManager: [state-dump] ========== Node: 94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3 ================= [state-dump] Infeasible queue length: 0 [state-dump] Schedule queue length: 0 [state-dump] Dispatch queue length: 0 [state-dump] num_waiting_for_resource: 0 [state-dump] num_waiting_for_plasma_memory: 0 [state-dump] num_waiting_for_remote_node_resources: 0 [state-dump] num_worker_not_started_by_job_config_not_exist: 0 [state-dump] num_worker_not_started_by_registration_timeout: 0 [state-dump] num_tasks_waiting_for_workers: 0 [state-dump] num_cancelled_tasks: 0 [state-dump] cluster_resource_scheduler state: [state-dump] Local id: -2729954820020547849 Local resources: {"total":{CPU: [200000], memory: [844966629380000], accelerator_type:A40: [10000], node:192.168.0.2: [10000], node:__internal_head__: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000]}}, "available": {CPU: [200000], memory: [844966629380000], accelerator_type:A40: [10000], node:192.168.0.2: [10000], node:__internal_head__: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000]}}, "labels":{"ray.io/node_id":"94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3",} is_draining: 0 is_idle: 1 Cluster resources: node id: -2729954820020547849{"total":{object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, GPU: 20000, node:__internal_head__: 10000, accelerator_type:A40: 10000, memory: 844966629380000}}, "available": {GPU: 20000, node:__internal_head__: 10000, memory: 844966629380000, accelerator_type:A40: 10000, object_store_memory: 21474836480000, node:192.168.0.2: 10000, CPU: 200000}}, "labels":{"ray.io/node_id":"94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} [state-dump] Waiting tasks size: 0 [state-dump] Number of executing tasks: 0 [state-dump] Number of pinned task arguments: 0 [state-dump] Number of total spilled tasks: 0 [state-dump] Number of spilled waiting tasks: 0 [state-dump] Number of spilled unschedulable tasks: 0 [state-dump] Resource usage { [state-dump] } [state-dump] Backlog Size per scheduling descriptor :{workerId: num backlogs}: [state-dump] [state-dump] Running tasks by scheduling class: [state-dump] ================================================== [state-dump] [state-dump] ClusterResources: [state-dump] LocalObjectManager: [state-dump] - num pinned objects: 0 [state-dump] - pinned objects size: 0 [state-dump] - num objects pending restore: 0 [state-dump] - num objects pending spill: 0 [state-dump] - num bytes pending spill: 0 [state-dump] - num bytes currently spilled: 0 [state-dump] - cumulative spill requests: 0 [state-dump] - cumulative restore requests: 0 [state-dump] - spilled objects pending delete: 0 [state-dump] [state-dump] ObjectManager: [state-dump] - num local objects: 0 [state-dump] - num unfulfilled push requests: 0 [state-dump] - num object pull requests: 0 [state-dump] - num chunks received total: 0 [state-dump] - num chunks received failed (all): 0 [state-dump] - num chunks received failed / cancelled: 0 [state-dump] - num chunks received failed / plasma error: 0 [state-dump] Event stats: [state-dump] Global stats: 0 total (0 active) [state-dump] Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Execution time: mean = -nan s, total = 0.000 s [state-dump] Event stats: [state-dump] PushManager: [state-dump] - num pushes in flight: 0 [state-dump] - num chunks in flight: 0 [state-dump] - num chunks remaining: 0 [state-dump] - max chunks allowed: 409 [state-dump] OwnershipBasedObjectDirectory: [state-dump] - num listeners: 0 [state-dump] - cumulative location updates: 0 [state-dump] - num location updates per second: 0.000 [state-dump] - num location lookups per second: 0.000 [state-dump] - num locations added per second: 0.000 [state-dump] - num locations removed per second: 0.000 [state-dump] BufferPool: [state-dump] - create buffer state map size: 0 [state-dump] PullManager: [state-dump] - num bytes available for pulled objects: 2147483648 [state-dump] - num bytes being pulled (all): 0 [state-dump] - num bytes being pulled / pinned: 0 [state-dump] - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - first get request bundle: N/A [state-dump] - first wait request bundle: N/A [state-dump] - first task request bundle: N/A [state-dump] - num objects queued: 0 [state-dump] - num objects actively pulled (all): 0 [state-dump] - num objects actively pulled / pinned: 0 [state-dump] - num bundles being pulled: 0 [state-dump] - num pull retries: 0 [state-dump] - max timeout seconds: 0 [state-dump] - max timeout request is already processed. No entry. [state-dump] [state-dump] WorkerPool: [state-dump] - registered jobs: 1 [state-dump] - process_failed_job_config_missing: 0 [state-dump] - process_failed_rate_limited: 0 [state-dump] - process_failed_pending_registration: 0 [state-dump] - process_failed_runtime_env_setup_failed: 0 [state-dump] - num PYTHON workers: 20 [state-dump] - num PYTHON drivers: 1 [state-dump] - num PYTHON pending start requests: 0 [state-dump] - num PYTHON pending registration requests: 0 [state-dump] - num object spill callbacks queued: 0 [state-dump] - num object restore queued: 0 [state-dump] - num util functions queued: 0 [state-dump] - num idle workers: 20 [state-dump] TaskDependencyManager: [state-dump] - task deps map size: 0 [state-dump] - get req map size: 0 [state-dump] - wait req map size: 0 [state-dump] - local objects map size: 0 [state-dump] WaitManager: [state-dump] - num active wait requests: 0 [state-dump] Subscriber: [state-dump] Channel WORKER_OBJECT_LOCATIONS_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_OBJECT_EVICTION [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_REF_REMOVED_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] num async plasma notifications: 0 [state-dump] Remote node managers: [state-dump] Event stats: [state-dump] Global stats: 277690 total (35 active) [state-dump] Queueing time: mean = 302.893 us, max = 59.826 s, min = -0.001 s, total = 84.110 s [state-dump] Execution time: mean = 10.974 ms, total = 3047.391 s [state-dump] Event stats: [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog - 66759 total (0 active), Execution time: mean = 525.372 us, total = 35.073 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 66759 total (0 active), Execution time: mean = 34.915 us, total = 2.331 s, Queueing time: mean = 108.903 us, max = 2.841 ms, min = 1.846 us, total = 7.270 s [state-dump] ObjectManager.UpdateAvailableMemory - 31771 total (0 active), Execution time: mean = 5.976 us, total = 189.858 ms, Queueing time: mean = 108.093 us, max = 1.662 ms, min = 2.228 us, total = 3.434 s [state-dump] NodeManager.CheckGC - 31771 total (1 active), Execution time: mean = 2.818 us, total = 89.530 ms, Queueing time: mean = 96.529 us, max = 25.875 ms, min = 2.848 us, total = 3.067 s [state-dump] RaySyncer.OnDemandBroadcasting - 31771 total (1 active), Execution time: mean = 10.331 us, total = 328.216 ms, Queueing time: mean = 89.904 us, max = 25.869 ms, min = 6.166 us, total = 2.856 s [state-dump] RayletWorkerPool.deadline_timer.kill_idle_workers - 15894 total (1 active), Execution time: mean = 17.906 us, total = 284.594 ms, Queueing time: mean = 75.708 us, max = 26.386 ms, min = -0.001 s, total = 1.203 s [state-dump] MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 12695 total (1 active), Execution time: mean = 451.896 us, total = 5.737 s, Queueing time: mean = 73.949 us, max = 3.532 ms, min = -0.000 s, total = 938.779 ms [state-dump] NodeManager.deadline_timer.spill_objects_when_over_threshold - 3180 total (1 active), Execution time: mean = 3.071 us, total = 9.765 ms, Queueing time: mean = 180.719 us, max = 2.946 ms, min = 3.958 us, total = 574.687 ms [state-dump] NodeManager.deadline_timer.flush_free_objects - 3180 total (1 active), Execution time: mean = 8.899 us, total = 28.299 ms, Queueing time: mean = 176.720 us, max = 2.947 ms, min = 3.811 us, total = 561.969 ms [state-dump] NodeManager.ScheduleAndDispatchTasks - 3180 total (1 active), Execution time: mean = 16.389 us, total = 52.118 ms, Queueing time: mean = 73.315 us, max = 2.581 ms, min = 10.666 us, total = 233.140 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 3179 total (0 active), Execution time: mean = 96.061 us, total = 305.378 ms, Queueing time: mean = 110.105 us, max = 2.934 ms, min = 4.027 us, total = 350.024 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad - 3179 total (0 active), Execution time: mean = 602.549 us, total = 1.916 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ClusterResourceManager.ResetRemoteNodeView - 1061 total (1 active), Execution time: mean = 10.234 us, total = 10.858 ms, Queueing time: mean = 75.077 us, max = 442.307 us, min = 14.043 us, total = 79.656 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 636 total (0 active), Execution time: mean = 1.568 ms, total = 996.938 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.deadline_timer.record_metrics - 636 total (1 active), Execution time: mean = 577.201 us, total = 367.100 ms, Queueing time: mean = 340.240 us, max = 2.010 ms, min = 9.115 us, total = 216.392 ms [state-dump] NodeManager.GcsCheckAlive - 636 total (1 active), Execution time: mean = 300.973 us, total = 191.419 ms, Queueing time: mean = 615.748 us, max = 2.567 ms, min = 6.690 us, total = 391.616 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 636 total (0 active), Execution time: mean = 54.080 us, total = 34.395 ms, Queueing time: mean = 114.753 us, max = 4.779 ms, min = 11.561 us, total = 72.983 ms [state-dump] NodeManager.deadline_timer.debug_state_dump - 318 total (1 active), Execution time: mean = 1.762 ms, total = 560.385 ms, Queueing time: mean = 67.900 us, max = 163.431 us, min = 11.609 us, total = 21.592 ms [state-dump] ClientConnection.async_read.ProcessMessageHeader - 95 total (21 active), Execution time: mean = 7.620 us, total = 723.861 us, Queueing time: mean = 660.815 ms, max = 59.826 s, min = 29.537 us, total = 62.777 s [state-dump] ClientConnection.async_read.ProcessMessage - 74 total (0 active), Execution time: mean = 752.373 us, total = 55.676 ms, Queueing time: mean = 18.898 us, max = 247.615 us, min = 4.453 us, total = 1.398 ms [state-dump] NodeManager.deadline_timer.print_event_loop_stats - 53 total (1 active, 1 running), Execution time: mean = 2.694 ms, total = 142.767 ms, Queueing time: mean = 73.029 us, max = 215.454 us, min = 15.835 us, total = 3.871 ms [state-dump] ClientConnection.async_write.DoAsyncWrites - 22 total (0 active), Execution time: mean = 2.129 us, total = 46.834 us, Queueing time: mean = 43.475 us, max = 264.443 us, min = 17.727 us, total = 956.459 us [state-dump] ObjectManager.ObjectDeleted - 21 total (0 active), Execution time: mean = 18.654 us, total = 391.728 us, Queueing time: mean = 66.932 us, max = 269.040 us, min = 24.280 us, total = 1.406 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 21 total (0 active), Execution time: mean = 119.783 us, total = 2.515 ms, Queueing time: mean = 119.709 us, max = 193.975 us, min = 26.529 us, total = 2.514 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig - 21 total (0 active), Execution time: mean = 800.451 us, total = 16.809 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ObjectManager.ObjectAdded - 21 total (0 active), Execution time: mean = 13.522 us, total = 283.961 us, Queueing time: mean = 115.200 us, max = 203.067 us, min = 15.853 us, total = 2.419 ms [state-dump] PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 204.482 us, total = 2.658 ms, Queueing time: mean = 3.109 ms, max = 9.915 ms, min = 28.359 us, total = 40.413 ms [state-dump] - 12 total (0 active), Execution time: mean = 1.109 us, total = 13.308 us, Queueing time: mean = 117.235 us, max = 207.332 us, min = 32.817 us, total = 1.407 ms [state-dump] RaySyncer.BroadcastMessage - 12 total (0 active), Execution time: mean = 229.790 us, total = 2.757 ms, Queueing time: mean = 796.500 ns, max = 1.330 us, min = 180.000 ns, total = 9.558 us [state-dump] NodeManagerService.grpc_server.RequestWorkerLease.HandleRequestImpl - 10 total (0 active), Execution time: mean = 133.400 us, total = 1.334 ms, Queueing time: mean = 133.654 us, max = 258.016 us, min = 50.206 us, total = 1.337 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker - 10 total (0 active), Execution time: mean = 698.937 us, total = 6.989 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.RequestWorkerLease - 10 total (0 active), Execution time: mean = 967.372 us, total = 9.674 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] WorkerPool.PopWorkerCallback - 10 total (0 active), Execution time: mean = 24.488 us, total = 244.878 us, Queueing time: mean = 114.997 us, max = 206.869 us, min = 23.653 us, total = 1.150 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker.HandleRequestImpl - 10 total (0 active), Execution time: mean = 129.906 us, total = 1.299 ms, Queueing time: mean = 109.463 us, max = 137.712 us, min = 31.332 us, total = 1.095 ms [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 7 total (1 active), Execution time: mean = 428.229 s, total = 2997.604 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 6 total (0 active), Execution time: mean = 351.177 us, total = 2.107 ms, Queueing time: mean = 117.415 us, max = 183.286 us, min = 20.299 us, total = 704.491 us [state-dump] NodeManager.GCTaskFailureReason - 4 total (1 active), Execution time: mean = 7.355 us, total = 29.419 us, Queueing time: mean = 66.631 us, max = 97.290 us, min = 83.871 us, total = 266.523 us [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 1.320 ms, total = 2.640 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 142.119 us, total = 284.237 us, Queueing time: mean = 702.200 us, max = 1.287 ms, min = 117.240 us, total = 1.404 ms [state-dump] RaySyncerRegister - 2 total (0 active), Execution time: mean = 2.053 us, total = 4.106 us, Queueing time: mean = 222.000 ns, max = 369.000 ns, min = 75.000 ns, total = 444.000 ns [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 2.178 ms, total = 2.178 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 1.022 s, total = 1.022 s, Queueing time: mean = 126.779 us, max = 126.779 us, min = 126.779 us, total = 126.779 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 43.810 us, total = 43.810 us, Queueing time: mean = 116.080 us, max = 116.080 us, min = 116.080 us, total = 116.080 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 1.935 ms, total = 1.935 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 2.002 ms, total = 2.002 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 1.278 ms, total = 1.278 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 288.995 us, total = 288.995 us, Queueing time: mean = 122.868 us, max = 122.868 us, min = 122.868 us, total = 122.868 us [state-dump] Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 74.750 us, total = 74.750 us, Queueing time: mean = 225.429 us, max = 225.429 us, min = 225.429 us, total = 225.429 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 242.297 us, total = 242.297 us, Queueing time: mean = 122.977 us, max = 122.977 us, min = 122.977 us, total = 122.977 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 92.080 us, total = 92.080 us, Queueing time: mean = 36.492 us, max = 36.492 us, min = 36.492 us, total = 36.492 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 1.746 ms, total = 1.746 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] DebugString() time ms: 1 [state-dump] [state-dump] [2025-01-20 23:55:54,918 I 13636 13664] (raylet) store.cc:564: Plasma store debug dump: Current usage: 0 / 2.14748 GB - num bytes created total: 168 0 pending objects of total size 0MB - objects spillable: 0 - bytes spillable: 0 - objects unsealed: 0 - bytes unsealed: 0 - objects in use: 0 - bytes in use: 0 - objects evictable: 0 - bytes evictable: 0 - objects created by worker: 0 - bytes created by worker: 0 - objects restored: 0 - bytes restored: 0 - objects received: 0 - bytes received: 0 - objects errored: 0 - bytes errored: 0 [2025-01-20 23:55:56,077 I 13636 13636] (raylet) node_manager.cc:525: [state-dump] NodeManager: [state-dump] Node ID: 94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3 [state-dump] Node name: 192.168.0.2 [state-dump] InitialConfigResources: {object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, GPU: 20000, node:__internal_head__: 10000, accelerator_type:A40: 10000, memory: 844966629380000} [state-dump] ClusterTaskManager: [state-dump] ========== Node: 94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3 ================= [state-dump] Infeasible queue length: 0 [state-dump] Schedule queue length: 0 [state-dump] Dispatch queue length: 0 [state-dump] num_waiting_for_resource: 0 [state-dump] num_waiting_for_plasma_memory: 0 [state-dump] num_waiting_for_remote_node_resources: 0 [state-dump] num_worker_not_started_by_job_config_not_exist: 0 [state-dump] num_worker_not_started_by_registration_timeout: 0 [state-dump] num_tasks_waiting_for_workers: 0 [state-dump] num_cancelled_tasks: 0 [state-dump] cluster_resource_scheduler state: [state-dump] Local id: -2729954820020547849 Local resources: {"total":{CPU: [200000], memory: [844966629380000], accelerator_type:A40: [10000], node:192.168.0.2: [10000], node:__internal_head__: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000]}}, "available": {CPU: [200000], memory: [844966629380000], accelerator_type:A40: [10000], node:192.168.0.2: [10000], node:__internal_head__: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000]}}, "labels":{"ray.io/node_id":"94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3",} is_draining: 0 is_idle: 1 Cluster resources: node id: -2729954820020547849{"total":{object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, GPU: 20000, node:__internal_head__: 10000, accelerator_type:A40: 10000, memory: 844966629380000}}, "available": {GPU: 20000, node:__internal_head__: 10000, memory: 844966629380000, accelerator_type:A40: 10000, object_store_memory: 21474836480000, node:192.168.0.2: 10000, CPU: 200000}}, "labels":{"ray.io/node_id":"94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} [state-dump] Waiting tasks size: 0 [state-dump] Number of executing tasks: 0 [state-dump] Number of pinned task arguments: 0 [state-dump] Number of total spilled tasks: 0 [state-dump] Number of spilled waiting tasks: 0 [state-dump] Number of spilled unschedulable tasks: 0 [state-dump] Resource usage { [state-dump] } [state-dump] Backlog Size per scheduling descriptor :{workerId: num backlogs}: [state-dump] [state-dump] Running tasks by scheduling class: [state-dump] ================================================== [state-dump] [state-dump] ClusterResources: [state-dump] LocalObjectManager: [state-dump] - num pinned objects: 0 [state-dump] - pinned objects size: 0 [state-dump] - num objects pending restore: 0 [state-dump] - num objects pending spill: 0 [state-dump] - num bytes pending spill: 0 [state-dump] - num bytes currently spilled: 0 [state-dump] - cumulative spill requests: 0 [state-dump] - cumulative restore requests: 0 [state-dump] - spilled objects pending delete: 0 [state-dump] [state-dump] ObjectManager: [state-dump] - num local objects: 0 [state-dump] - num unfulfilled push requests: 0 [state-dump] - num object pull requests: 0 [state-dump] - num chunks received total: 0 [state-dump] - num chunks received failed (all): 0 [state-dump] - num chunks received failed / cancelled: 0 [state-dump] - num chunks received failed / plasma error: 0 [state-dump] Event stats: [state-dump] Global stats: 0 total (0 active) [state-dump] Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Execution time: mean = -nan s, total = 0.000 s [state-dump] Event stats: [state-dump] PushManager: [state-dump] - num pushes in flight: 0 [state-dump] - num chunks in flight: 0 [state-dump] - num chunks remaining: 0 [state-dump] - max chunks allowed: 409 [state-dump] OwnershipBasedObjectDirectory: [state-dump] - num listeners: 0 [state-dump] - cumulative location updates: 0 [state-dump] - num location updates per second: 0.000 [state-dump] - num location lookups per second: 0.000 [state-dump] - num locations added per second: 0.000 [state-dump] - num locations removed per second: 0.000 [state-dump] BufferPool: [state-dump] - create buffer state map size: 0 [state-dump] PullManager: [state-dump] - num bytes available for pulled objects: 2147483648 [state-dump] - num bytes being pulled (all): 0 [state-dump] - num bytes being pulled / pinned: 0 [state-dump] - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - first get request bundle: N/A [state-dump] - first wait request bundle: N/A [state-dump] - first task request bundle: N/A [state-dump] - num objects queued: 0 [state-dump] - num objects actively pulled (all): 0 [state-dump] - num objects actively pulled / pinned: 0 [state-dump] - num bundles being pulled: 0 [state-dump] - num pull retries: 0 [state-dump] - max timeout seconds: 0 [state-dump] - max timeout request is already processed. No entry. [state-dump] [state-dump] WorkerPool: [state-dump] - registered jobs: 1 [state-dump] - process_failed_job_config_missing: 0 [state-dump] - process_failed_rate_limited: 0 [state-dump] - process_failed_pending_registration: 0 [state-dump] - process_failed_runtime_env_setup_failed: 0 [state-dump] - num PYTHON workers: 20 [state-dump] - num PYTHON drivers: 1 [state-dump] - num PYTHON pending start requests: 0 [state-dump] - num PYTHON pending registration requests: 0 [state-dump] - num object spill callbacks queued: 0 [state-dump] - num object restore queued: 0 [state-dump] - num util functions queued: 0 [state-dump] - num idle workers: 20 [state-dump] TaskDependencyManager: [state-dump] - task deps map size: 0 [state-dump] - get req map size: 0 [state-dump] - wait req map size: 0 [state-dump] - local objects map size: 0 [state-dump] WaitManager: [state-dump] - num active wait requests: 0 [state-dump] Subscriber: [state-dump] Channel WORKER_OBJECT_LOCATIONS_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_OBJECT_EVICTION [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_REF_REMOVED_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] num async plasma notifications: 0 [state-dump] Remote node managers: [state-dump] Event stats: [state-dump] Global stats: 282921 total (35 active) [state-dump] Queueing time: mean = 298.795 us, max = 59.826 s, min = -0.001 s, total = 84.535 s [state-dump] Execution time: mean = 10.775 ms, total = 3048.370 s [state-dump] Event stats: [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog - 68019 total (0 active), Execution time: mean = 526.107 us, total = 35.785 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 68019 total (0 active), Execution time: mean = 34.959 us, total = 2.378 s, Queueing time: mean = 109.058 us, max = 2.841 ms, min = 1.846 us, total = 7.418 s [state-dump] ObjectManager.UpdateAvailableMemory - 32370 total (0 active), Execution time: mean = 5.984 us, total = 193.691 ms, Queueing time: mean = 108.274 us, max = 1.662 ms, min = 2.228 us, total = 3.505 s [state-dump] NodeManager.CheckGC - 32370 total (1 active), Execution time: mean = 2.818 us, total = 91.232 ms, Queueing time: mean = 96.559 us, max = 25.875 ms, min = 2.848 us, total = 3.126 s [state-dump] RaySyncer.OnDemandBroadcasting - 32370 total (1 active), Execution time: mean = 10.333 us, total = 334.484 ms, Queueing time: mean = 89.931 us, max = 25.869 ms, min = 6.166 us, total = 2.911 s [state-dump] RayletWorkerPool.deadline_timer.kill_idle_workers - 16194 total (1 active), Execution time: mean = 17.926 us, total = 290.293 ms, Queueing time: mean = 75.776 us, max = 26.386 ms, min = -0.001 s, total = 1.227 s [state-dump] MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 12934 total (1 active), Execution time: mean = 452.200 us, total = 5.849 s, Queueing time: mean = 74.101 us, max = 3.532 ms, min = -0.000 s, total = 958.422 ms [state-dump] NodeManager.deadline_timer.spill_objects_when_over_threshold - 3240 total (1 active), Execution time: mean = 3.069 us, total = 9.945 ms, Queueing time: mean = 180.859 us, max = 2.946 ms, min = 3.958 us, total = 585.982 ms [state-dump] NodeManager.deadline_timer.flush_free_objects - 3240 total (1 active), Execution time: mean = 8.906 us, total = 28.854 ms, Queueing time: mean = 176.852 us, max = 2.947 ms, min = 3.811 us, total = 572.999 ms [state-dump] NodeManager.ScheduleAndDispatchTasks - 3240 total (1 active), Execution time: mean = 16.385 us, total = 53.087 ms, Queueing time: mean = 73.345 us, max = 2.581 ms, min = 10.666 us, total = 237.638 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 3239 total (0 active), Execution time: mean = 96.135 us, total = 311.382 ms, Queueing time: mean = 110.288 us, max = 2.934 ms, min = 4.027 us, total = 357.222 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad - 3239 total (0 active), Execution time: mean = 602.875 us, total = 1.953 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ClusterResourceManager.ResetRemoteNodeView - 1081 total (1 active), Execution time: mean = 10.211 us, total = 11.038 ms, Queueing time: mean = 75.022 us, max = 442.307 us, min = 14.043 us, total = 81.099 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 648 total (0 active), Execution time: mean = 1.570 ms, total = 1.018 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.deadline_timer.record_metrics - 648 total (1 active), Execution time: mean = 576.551 us, total = 373.605 ms, Queueing time: mean = 341.285 us, max = 2.010 ms, min = 9.115 us, total = 221.153 ms [state-dump] NodeManager.GcsCheckAlive - 648 total (1 active), Execution time: mean = 301.322 us, total = 195.257 ms, Queueing time: mean = 615.836 us, max = 2.567 ms, min = 6.690 us, total = 399.062 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 648 total (0 active), Execution time: mean = 54.158 us, total = 35.094 ms, Queueing time: mean = 114.733 us, max = 4.779 ms, min = 11.561 us, total = 74.347 ms [state-dump] NodeManager.deadline_timer.debug_state_dump - 324 total (1 active), Execution time: mean = 1.764 ms, total = 571.674 ms, Queueing time: mean = 68.234 us, max = 163.431 us, min = 11.609 us, total = 22.108 ms [state-dump] ClientConnection.async_read.ProcessMessageHeader - 95 total (21 active), Execution time: mean = 7.620 us, total = 723.861 us, Queueing time: mean = 660.815 ms, max = 59.826 s, min = 29.537 us, total = 62.777 s [state-dump] ClientConnection.async_read.ProcessMessage - 74 total (0 active), Execution time: mean = 752.373 us, total = 55.676 ms, Queueing time: mean = 18.898 us, max = 247.615 us, min = 4.453 us, total = 1.398 ms [state-dump] NodeManager.deadline_timer.print_event_loop_stats - 54 total (1 active, 1 running), Execution time: mean = 2.699 ms, total = 145.728 ms, Queueing time: mean = 74.246 us, max = 215.454 us, min = 15.835 us, total = 4.009 ms [state-dump] ClientConnection.async_write.DoAsyncWrites - 22 total (0 active), Execution time: mean = 2.129 us, total = 46.834 us, Queueing time: mean = 43.475 us, max = 264.443 us, min = 17.727 us, total = 956.459 us [state-dump] ObjectManager.ObjectDeleted - 21 total (0 active), Execution time: mean = 18.654 us, total = 391.728 us, Queueing time: mean = 66.932 us, max = 269.040 us, min = 24.280 us, total = 1.406 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 21 total (0 active), Execution time: mean = 119.783 us, total = 2.515 ms, Queueing time: mean = 119.709 us, max = 193.975 us, min = 26.529 us, total = 2.514 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig - 21 total (0 active), Execution time: mean = 800.451 us, total = 16.809 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ObjectManager.ObjectAdded - 21 total (0 active), Execution time: mean = 13.522 us, total = 283.961 us, Queueing time: mean = 115.200 us, max = 203.067 us, min = 15.853 us, total = 2.419 ms [state-dump] PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 204.482 us, total = 2.658 ms, Queueing time: mean = 3.109 ms, max = 9.915 ms, min = 28.359 us, total = 40.413 ms [state-dump] - 12 total (0 active), Execution time: mean = 1.109 us, total = 13.308 us, Queueing time: mean = 117.235 us, max = 207.332 us, min = 32.817 us, total = 1.407 ms [state-dump] RaySyncer.BroadcastMessage - 12 total (0 active), Execution time: mean = 229.790 us, total = 2.757 ms, Queueing time: mean = 796.500 ns, max = 1.330 us, min = 180.000 ns, total = 9.558 us [state-dump] NodeManagerService.grpc_server.RequestWorkerLease.HandleRequestImpl - 10 total (0 active), Execution time: mean = 133.400 us, total = 1.334 ms, Queueing time: mean = 133.654 us, max = 258.016 us, min = 50.206 us, total = 1.337 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker - 10 total (0 active), Execution time: mean = 698.937 us, total = 6.989 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.RequestWorkerLease - 10 total (0 active), Execution time: mean = 967.372 us, total = 9.674 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] WorkerPool.PopWorkerCallback - 10 total (0 active), Execution time: mean = 24.488 us, total = 244.878 us, Queueing time: mean = 114.997 us, max = 206.869 us, min = 23.653 us, total = 1.150 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker.HandleRequestImpl - 10 total (0 active), Execution time: mean = 129.906 us, total = 1.299 ms, Queueing time: mean = 109.463 us, max = 137.712 us, min = 31.332 us, total = 1.095 ms [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 7 total (1 active), Execution time: mean = 428.229 s, total = 2997.604 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 6 total (0 active), Execution time: mean = 351.177 us, total = 2.107 ms, Queueing time: mean = 117.415 us, max = 183.286 us, min = 20.299 us, total = 704.491 us [state-dump] NodeManager.GCTaskFailureReason - 4 total (1 active), Execution time: mean = 7.355 us, total = 29.419 us, Queueing time: mean = 66.631 us, max = 97.290 us, min = 83.871 us, total = 266.523 us [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 1.320 ms, total = 2.640 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 142.119 us, total = 284.237 us, Queueing time: mean = 702.200 us, max = 1.287 ms, min = 117.240 us, total = 1.404 ms [state-dump] RaySyncerRegister - 2 total (0 active), Execution time: mean = 2.053 us, total = 4.106 us, Queueing time: mean = 222.000 ns, max = 369.000 ns, min = 75.000 ns, total = 444.000 ns [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 2.178 ms, total = 2.178 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 1.022 s, total = 1.022 s, Queueing time: mean = 126.779 us, max = 126.779 us, min = 126.779 us, total = 126.779 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 43.810 us, total = 43.810 us, Queueing time: mean = 116.080 us, max = 116.080 us, min = 116.080 us, total = 116.080 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 1.935 ms, total = 1.935 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 2.002 ms, total = 2.002 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 1.278 ms, total = 1.278 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 288.995 us, total = 288.995 us, Queueing time: mean = 122.868 us, max = 122.868 us, min = 122.868 us, total = 122.868 us [state-dump] Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 74.750 us, total = 74.750 us, Queueing time: mean = 225.429 us, max = 225.429 us, min = 225.429 us, total = 225.429 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 242.297 us, total = 242.297 us, Queueing time: mean = 122.977 us, max = 122.977 us, min = 122.977 us, total = 122.977 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 92.080 us, total = 92.080 us, Queueing time: mean = 36.492 us, max = 36.492 us, min = 36.492 us, total = 36.492 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 1.746 ms, total = 1.746 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] DebugString() time ms: 1 [state-dump] [state-dump] [2025-01-20 23:56:54,919 I 13636 13664] (raylet) store.cc:564: Plasma store debug dump: Current usage: 0 / 2.14748 GB - num bytes created total: 168 0 pending objects of total size 0MB - objects spillable: 0 - bytes spillable: 0 - objects unsealed: 0 - bytes unsealed: 0 - objects in use: 0 - bytes in use: 0 - objects evictable: 0 - bytes evictable: 0 - objects created by worker: 0 - bytes created by worker: 0 - objects restored: 0 - bytes restored: 0 - objects received: 0 - bytes received: 0 - objects errored: 0 - bytes errored: 0 [2025-01-20 23:56:56,080 I 13636 13636] (raylet) node_manager.cc:525: [state-dump] NodeManager: [state-dump] Node ID: 94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3 [state-dump] Node name: 192.168.0.2 [state-dump] InitialConfigResources: {object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, GPU: 20000, node:__internal_head__: 10000, accelerator_type:A40: 10000, memory: 844966629380000} [state-dump] ClusterTaskManager: [state-dump] ========== Node: 94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3 ================= [state-dump] Infeasible queue length: 0 [state-dump] Schedule queue length: 0 [state-dump] Dispatch queue length: 0 [state-dump] num_waiting_for_resource: 0 [state-dump] num_waiting_for_plasma_memory: 0 [state-dump] num_waiting_for_remote_node_resources: 0 [state-dump] num_worker_not_started_by_job_config_not_exist: 0 [state-dump] num_worker_not_started_by_registration_timeout: 0 [state-dump] num_tasks_waiting_for_workers: 0 [state-dump] num_cancelled_tasks: 0 [state-dump] cluster_resource_scheduler state: [state-dump] Local id: -2729954820020547849 Local resources: {"total":{CPU: [200000], memory: [844966629380000], accelerator_type:A40: [10000], node:192.168.0.2: [10000], node:__internal_head__: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000]}}, "available": {CPU: [200000], memory: [844966629380000], accelerator_type:A40: [10000], node:192.168.0.2: [10000], node:__internal_head__: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000]}}, "labels":{"ray.io/node_id":"94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3",} is_draining: 0 is_idle: 1 Cluster resources: node id: -2729954820020547849{"total":{object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, GPU: 20000, node:__internal_head__: 10000, accelerator_type:A40: 10000, memory: 844966629380000}}, "available": {GPU: 20000, node:__internal_head__: 10000, memory: 844966629380000, accelerator_type:A40: 10000, object_store_memory: 21474836480000, node:192.168.0.2: 10000, CPU: 200000}}, "labels":{"ray.io/node_id":"94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} [state-dump] Waiting tasks size: 0 [state-dump] Number of executing tasks: 0 [state-dump] Number of pinned task arguments: 0 [state-dump] Number of total spilled tasks: 0 [state-dump] Number of spilled waiting tasks: 0 [state-dump] Number of spilled unschedulable tasks: 0 [state-dump] Resource usage { [state-dump] } [state-dump] Backlog Size per scheduling descriptor :{workerId: num backlogs}: [state-dump] [state-dump] Running tasks by scheduling class: [state-dump] ================================================== [state-dump] [state-dump] ClusterResources: [state-dump] LocalObjectManager: [state-dump] - num pinned objects: 0 [state-dump] - pinned objects size: 0 [state-dump] - num objects pending restore: 0 [state-dump] - num objects pending spill: 0 [state-dump] - num bytes pending spill: 0 [state-dump] - num bytes currently spilled: 0 [state-dump] - cumulative spill requests: 0 [state-dump] - cumulative restore requests: 0 [state-dump] - spilled objects pending delete: 0 [state-dump] [state-dump] ObjectManager: [state-dump] - num local objects: 0 [state-dump] - num unfulfilled push requests: 0 [state-dump] - num object pull requests: 0 [state-dump] - num chunks received total: 0 [state-dump] - num chunks received failed (all): 0 [state-dump] - num chunks received failed / cancelled: 0 [state-dump] - num chunks received failed / plasma error: 0 [state-dump] Event stats: [state-dump] Global stats: 0 total (0 active) [state-dump] Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Execution time: mean = -nan s, total = 0.000 s [state-dump] Event stats: [state-dump] PushManager: [state-dump] - num pushes in flight: 0 [state-dump] - num chunks in flight: 0 [state-dump] - num chunks remaining: 0 [state-dump] - max chunks allowed: 409 [state-dump] OwnershipBasedObjectDirectory: [state-dump] - num listeners: 0 [state-dump] - cumulative location updates: 0 [state-dump] - num location updates per second: 0.000 [state-dump] - num location lookups per second: 0.000 [state-dump] - num locations added per second: 0.000 [state-dump] - num locations removed per second: 0.000 [state-dump] BufferPool: [state-dump] - create buffer state map size: 0 [state-dump] PullManager: [state-dump] - num bytes available for pulled objects: 2147483648 [state-dump] - num bytes being pulled (all): 0 [state-dump] - num bytes being pulled / pinned: 0 [state-dump] - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - first get request bundle: N/A [state-dump] - first wait request bundle: N/A [state-dump] - first task request bundle: N/A [state-dump] - num objects queued: 0 [state-dump] - num objects actively pulled (all): 0 [state-dump] - num objects actively pulled / pinned: 0 [state-dump] - num bundles being pulled: 0 [state-dump] - num pull retries: 0 [state-dump] - max timeout seconds: 0 [state-dump] - max timeout request is already processed. No entry. [state-dump] [state-dump] WorkerPool: [state-dump] - registered jobs: 1 [state-dump] - process_failed_job_config_missing: 0 [state-dump] - process_failed_rate_limited: 0 [state-dump] - process_failed_pending_registration: 0 [state-dump] - process_failed_runtime_env_setup_failed: 0 [state-dump] - num PYTHON workers: 20 [state-dump] - num PYTHON drivers: 1 [state-dump] - num PYTHON pending start requests: 0 [state-dump] - num PYTHON pending registration requests: 0 [state-dump] - num object spill callbacks queued: 0 [state-dump] - num object restore queued: 0 [state-dump] - num util functions queued: 0 [state-dump] - num idle workers: 20 [state-dump] TaskDependencyManager: [state-dump] - task deps map size: 0 [state-dump] - get req map size: 0 [state-dump] - wait req map size: 0 [state-dump] - local objects map size: 0 [state-dump] WaitManager: [state-dump] - num active wait requests: 0 [state-dump] Subscriber: [state-dump] Channel WORKER_OBJECT_LOCATIONS_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_OBJECT_EVICTION [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_REF_REMOVED_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] num async plasma notifications: 0 [state-dump] Remote node managers: [state-dump] Event stats: [state-dump] Global stats: 288153 total (35 active) [state-dump] Queueing time: mean = 294.895 us, max = 59.826 s, min = -0.001 s, total = 84.975 s [state-dump] Execution time: mean = 10.582 ms, total = 3049.343 s [state-dump] Event stats: [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog - 69279 total (0 active), Execution time: mean = 526.648 us, total = 36.486 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 69279 total (0 active), Execution time: mean = 34.999 us, total = 2.425 s, Queueing time: mean = 109.212 us, max = 2.841 ms, min = 1.846 us, total = 7.566 s [state-dump] ObjectManager.UpdateAvailableMemory - 32969 total (0 active), Execution time: mean = 5.997 us, total = 197.709 ms, Queueing time: mean = 108.575 us, max = 1.662 ms, min = 2.228 us, total = 3.580 s [state-dump] NodeManager.CheckGC - 32969 total (1 active), Execution time: mean = 2.821 us, total = 93.017 ms, Queueing time: mean = 96.736 us, max = 25.875 ms, min = 2.848 us, total = 3.189 s [state-dump] RaySyncer.OnDemandBroadcasting - 32969 total (1 active), Execution time: mean = 10.355 us, total = 341.400 ms, Queueing time: mean = 90.091 us, max = 25.869 ms, min = 6.166 us, total = 2.970 s [state-dump] RayletWorkerPool.deadline_timer.kill_idle_workers - 16494 total (1 active), Execution time: mean = 17.982 us, total = 296.596 ms, Queueing time: mean = 75.827 us, max = 26.386 ms, min = -0.001 s, total = 1.251 s [state-dump] MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 13174 total (1 active), Execution time: mean = 452.506 us, total = 5.961 s, Queueing time: mean = 74.217 us, max = 3.532 ms, min = -0.000 s, total = 977.730 ms [state-dump] NodeManager.deadline_timer.spill_objects_when_over_threshold - 3300 total (1 active), Execution time: mean = 3.077 us, total = 10.155 ms, Queueing time: mean = 181.173 us, max = 2.946 ms, min = 3.958 us, total = 597.872 ms [state-dump] NodeManager.deadline_timer.flush_free_objects - 3300 total (1 active), Execution time: mean = 8.927 us, total = 29.459 ms, Queueing time: mean = 177.156 us, max = 2.947 ms, min = 3.811 us, total = 584.614 ms [state-dump] NodeManager.ScheduleAndDispatchTasks - 3300 total (1 active), Execution time: mean = 16.404 us, total = 54.132 ms, Queueing time: mean = 73.383 us, max = 2.581 ms, min = 10.666 us, total = 242.164 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 3299 total (0 active), Execution time: mean = 96.198 us, total = 317.356 ms, Queueing time: mean = 110.471 us, max = 2.934 ms, min = 4.027 us, total = 364.443 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad - 3299 total (0 active), Execution time: mean = 603.524 us, total = 1.991 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ClusterResourceManager.ResetRemoteNodeView - 1101 total (1 active), Execution time: mean = 10.183 us, total = 11.212 ms, Queueing time: mean = 74.956 us, max = 442.307 us, min = 14.043 us, total = 82.527 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 660 total (0 active), Execution time: mean = 1.572 ms, total = 1.038 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.deadline_timer.record_metrics - 660 total (1 active), Execution time: mean = 578.360 us, total = 381.718 ms, Queueing time: mean = 341.184 us, max = 2.010 ms, min = 9.115 us, total = 225.181 ms [state-dump] NodeManager.GcsCheckAlive - 660 total (1 active), Execution time: mean = 301.848 us, total = 199.220 ms, Queueing time: mean = 617.123 us, max = 2.567 ms, min = 6.690 us, total = 407.301 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 660 total (0 active), Execution time: mean = 54.240 us, total = 35.798 ms, Queueing time: mean = 114.792 us, max = 4.779 ms, min = 11.561 us, total = 75.763 ms [state-dump] NodeManager.deadline_timer.debug_state_dump - 330 total (1 active), Execution time: mean = 1.767 ms, total = 583.169 ms, Queueing time: mean = 68.499 us, max = 163.431 us, min = 11.609 us, total = 22.605 ms [state-dump] ClientConnection.async_read.ProcessMessageHeader - 95 total (21 active), Execution time: mean = 7.620 us, total = 723.861 us, Queueing time: mean = 660.815 ms, max = 59.826 s, min = 29.537 us, total = 62.777 s [state-dump] ClientConnection.async_read.ProcessMessage - 74 total (0 active), Execution time: mean = 752.373 us, total = 55.676 ms, Queueing time: mean = 18.898 us, max = 247.615 us, min = 4.453 us, total = 1.398 ms [state-dump] NodeManager.deadline_timer.print_event_loop_stats - 55 total (1 active, 1 running), Execution time: mean = 2.707 ms, total = 148.897 ms, Queueing time: mean = 73.769 us, max = 215.454 us, min = 15.835 us, total = 4.057 ms [state-dump] ClientConnection.async_write.DoAsyncWrites - 22 total (0 active), Execution time: mean = 2.129 us, total = 46.834 us, Queueing time: mean = 43.475 us, max = 264.443 us, min = 17.727 us, total = 956.459 us [state-dump] ObjectManager.ObjectDeleted - 21 total (0 active), Execution time: mean = 18.654 us, total = 391.728 us, Queueing time: mean = 66.932 us, max = 269.040 us, min = 24.280 us, total = 1.406 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 21 total (0 active), Execution time: mean = 119.783 us, total = 2.515 ms, Queueing time: mean = 119.709 us, max = 193.975 us, min = 26.529 us, total = 2.514 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig - 21 total (0 active), Execution time: mean = 800.451 us, total = 16.809 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ObjectManager.ObjectAdded - 21 total (0 active), Execution time: mean = 13.522 us, total = 283.961 us, Queueing time: mean = 115.200 us, max = 203.067 us, min = 15.853 us, total = 2.419 ms [state-dump] PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 204.482 us, total = 2.658 ms, Queueing time: mean = 3.109 ms, max = 9.915 ms, min = 28.359 us, total = 40.413 ms [state-dump] - 12 total (0 active), Execution time: mean = 1.109 us, total = 13.308 us, Queueing time: mean = 117.235 us, max = 207.332 us, min = 32.817 us, total = 1.407 ms [state-dump] RaySyncer.BroadcastMessage - 12 total (0 active), Execution time: mean = 229.790 us, total = 2.757 ms, Queueing time: mean = 796.500 ns, max = 1.330 us, min = 180.000 ns, total = 9.558 us [state-dump] NodeManagerService.grpc_server.RequestWorkerLease.HandleRequestImpl - 10 total (0 active), Execution time: mean = 133.400 us, total = 1.334 ms, Queueing time: mean = 133.654 us, max = 258.016 us, min = 50.206 us, total = 1.337 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker - 10 total (0 active), Execution time: mean = 698.937 us, total = 6.989 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.RequestWorkerLease - 10 total (0 active), Execution time: mean = 967.372 us, total = 9.674 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] WorkerPool.PopWorkerCallback - 10 total (0 active), Execution time: mean = 24.488 us, total = 244.878 us, Queueing time: mean = 114.997 us, max = 206.869 us, min = 23.653 us, total = 1.150 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker.HandleRequestImpl - 10 total (0 active), Execution time: mean = 129.906 us, total = 1.299 ms, Queueing time: mean = 109.463 us, max = 137.712 us, min = 31.332 us, total = 1.095 ms [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 7 total (1 active), Execution time: mean = 428.229 s, total = 2997.604 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 6 total (0 active), Execution time: mean = 351.177 us, total = 2.107 ms, Queueing time: mean = 117.415 us, max = 183.286 us, min = 20.299 us, total = 704.491 us [state-dump] NodeManager.GCTaskFailureReason - 4 total (1 active), Execution time: mean = 7.355 us, total = 29.419 us, Queueing time: mean = 66.631 us, max = 97.290 us, min = 83.871 us, total = 266.523 us [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 1.320 ms, total = 2.640 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 142.119 us, total = 284.237 us, Queueing time: mean = 702.200 us, max = 1.287 ms, min = 117.240 us, total = 1.404 ms [state-dump] RaySyncerRegister - 2 total (0 active), Execution time: mean = 2.053 us, total = 4.106 us, Queueing time: mean = 222.000 ns, max = 369.000 ns, min = 75.000 ns, total = 444.000 ns [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 2.178 ms, total = 2.178 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 1.022 s, total = 1.022 s, Queueing time: mean = 126.779 us, max = 126.779 us, min = 126.779 us, total = 126.779 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 43.810 us, total = 43.810 us, Queueing time: mean = 116.080 us, max = 116.080 us, min = 116.080 us, total = 116.080 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 1.935 ms, total = 1.935 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 2.002 ms, total = 2.002 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 1.278 ms, total = 1.278 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 288.995 us, total = 288.995 us, Queueing time: mean = 122.868 us, max = 122.868 us, min = 122.868 us, total = 122.868 us [state-dump] Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 74.750 us, total = 74.750 us, Queueing time: mean = 225.429 us, max = 225.429 us, min = 225.429 us, total = 225.429 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 242.297 us, total = 242.297 us, Queueing time: mean = 122.977 us, max = 122.977 us, min = 122.977 us, total = 122.977 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 92.080 us, total = 92.080 us, Queueing time: mean = 36.492 us, max = 36.492 us, min = 36.492 us, total = 36.492 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 1.746 ms, total = 1.746 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] DebugString() time ms: 1 [state-dump] [state-dump] [2025-01-20 23:57:54,919 I 13636 13664] (raylet) store.cc:564: Plasma store debug dump: Current usage: 0 / 2.14748 GB - num bytes created total: 168 0 pending objects of total size 0MB - objects spillable: 0 - bytes spillable: 0 - objects unsealed: 0 - bytes unsealed: 0 - objects in use: 0 - bytes in use: 0 - objects evictable: 0 - bytes evictable: 0 - objects created by worker: 0 - bytes created by worker: 0 - objects restored: 0 - bytes restored: 0 - objects received: 0 - bytes received: 0 - objects errored: 0 - bytes errored: 0 [2025-01-20 23:57:56,081 I 13636 13636] (raylet) node_manager.cc:525: [state-dump] NodeManager: [state-dump] Node ID: 94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3 [state-dump] Node name: 192.168.0.2 [state-dump] InitialConfigResources: {object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, GPU: 20000, node:__internal_head__: 10000, accelerator_type:A40: 10000, memory: 844966629380000} [state-dump] ClusterTaskManager: [state-dump] ========== Node: 94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3 ================= [state-dump] Infeasible queue length: 0 [state-dump] Schedule queue length: 0 [state-dump] Dispatch queue length: 0 [state-dump] num_waiting_for_resource: 0 [state-dump] num_waiting_for_plasma_memory: 0 [state-dump] num_waiting_for_remote_node_resources: 0 [state-dump] num_worker_not_started_by_job_config_not_exist: 0 [state-dump] num_worker_not_started_by_registration_timeout: 0 [state-dump] num_tasks_waiting_for_workers: 0 [state-dump] num_cancelled_tasks: 0 [state-dump] cluster_resource_scheduler state: [state-dump] Local id: -2729954820020547849 Local resources: {"total":{CPU: [200000], memory: [844966629380000], accelerator_type:A40: [10000], node:192.168.0.2: [10000], node:__internal_head__: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000]}}, "available": {CPU: [200000], memory: [844966629380000], accelerator_type:A40: [10000], node:192.168.0.2: [10000], node:__internal_head__: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000]}}, "labels":{"ray.io/node_id":"94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3",} is_draining: 0 is_idle: 1 Cluster resources: node id: -2729954820020547849{"total":{object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, GPU: 20000, node:__internal_head__: 10000, accelerator_type:A40: 10000, memory: 844966629380000}}, "available": {GPU: 20000, node:__internal_head__: 10000, memory: 844966629380000, accelerator_type:A40: 10000, object_store_memory: 21474836480000, node:192.168.0.2: 10000, CPU: 200000}}, "labels":{"ray.io/node_id":"94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} [state-dump] Waiting tasks size: 0 [state-dump] Number of executing tasks: 0 [state-dump] Number of pinned task arguments: 0 [state-dump] Number of total spilled tasks: 0 [state-dump] Number of spilled waiting tasks: 0 [state-dump] Number of spilled unschedulable tasks: 0 [state-dump] Resource usage { [state-dump] } [state-dump] Backlog Size per scheduling descriptor :{workerId: num backlogs}: [state-dump] [state-dump] Running tasks by scheduling class: [state-dump] ================================================== [state-dump] [state-dump] ClusterResources: [state-dump] LocalObjectManager: [state-dump] - num pinned objects: 0 [state-dump] - pinned objects size: 0 [state-dump] - num objects pending restore: 0 [state-dump] - num objects pending spill: 0 [state-dump] - num bytes pending spill: 0 [state-dump] - num bytes currently spilled: 0 [state-dump] - cumulative spill requests: 0 [state-dump] - cumulative restore requests: 0 [state-dump] - spilled objects pending delete: 0 [state-dump] [state-dump] ObjectManager: [state-dump] - num local objects: 0 [state-dump] - num unfulfilled push requests: 0 [state-dump] - num object pull requests: 0 [state-dump] - num chunks received total: 0 [state-dump] - num chunks received failed (all): 0 [state-dump] - num chunks received failed / cancelled: 0 [state-dump] - num chunks received failed / plasma error: 0 [state-dump] Event stats: [state-dump] Global stats: 0 total (0 active) [state-dump] Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Execution time: mean = -nan s, total = 0.000 s [state-dump] Event stats: [state-dump] PushManager: [state-dump] - num pushes in flight: 0 [state-dump] - num chunks in flight: 0 [state-dump] - num chunks remaining: 0 [state-dump] - max chunks allowed: 409 [state-dump] OwnershipBasedObjectDirectory: [state-dump] - num listeners: 0 [state-dump] - cumulative location updates: 0 [state-dump] - num location updates per second: 0.000 [state-dump] - num location lookups per second: 0.000 [state-dump] - num locations added per second: 0.000 [state-dump] - num locations removed per second: 0.000 [state-dump] BufferPool: [state-dump] - create buffer state map size: 0 [state-dump] PullManager: [state-dump] - num bytes available for pulled objects: 2147483648 [state-dump] - num bytes being pulled (all): 0 [state-dump] - num bytes being pulled / pinned: 0 [state-dump] - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - first get request bundle: N/A [state-dump] - first wait request bundle: N/A [state-dump] - first task request bundle: N/A [state-dump] - num objects queued: 0 [state-dump] - num objects actively pulled (all): 0 [state-dump] - num objects actively pulled / pinned: 0 [state-dump] - num bundles being pulled: 0 [state-dump] - num pull retries: 0 [state-dump] - max timeout seconds: 0 [state-dump] - max timeout request is already processed. No entry. [state-dump] [state-dump] WorkerPool: [state-dump] - registered jobs: 1 [state-dump] - process_failed_job_config_missing: 0 [state-dump] - process_failed_rate_limited: 0 [state-dump] - process_failed_pending_registration: 0 [state-dump] - process_failed_runtime_env_setup_failed: 0 [state-dump] - num PYTHON workers: 20 [state-dump] - num PYTHON drivers: 1 [state-dump] - num PYTHON pending start requests: 0 [state-dump] - num PYTHON pending registration requests: 0 [state-dump] - num object spill callbacks queued: 0 [state-dump] - num object restore queued: 0 [state-dump] - num util functions queued: 0 [state-dump] - num idle workers: 20 [state-dump] TaskDependencyManager: [state-dump] - task deps map size: 0 [state-dump] - get req map size: 0 [state-dump] - wait req map size: 0 [state-dump] - local objects map size: 0 [state-dump] WaitManager: [state-dump] - num active wait requests: 0 [state-dump] Subscriber: [state-dump] Channel WORKER_OBJECT_LOCATIONS_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_OBJECT_EVICTION [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_REF_REMOVED_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] num async plasma notifications: 0 [state-dump] Remote node managers: [state-dump] Event stats: [state-dump] Global stats: 293387 total (35 active) [state-dump] Queueing time: mean = 291.001 us, max = 59.826 s, min = -0.001 s, total = 85.376 s [state-dump] Execution time: mean = 10.397 ms, total = 3050.254 s [state-dump] Event stats: [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog - 70539 total (0 active), Execution time: mean = 526.497 us, total = 37.139 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 70539 total (0 active), Execution time: mean = 34.979 us, total = 2.467 s, Queueing time: mean = 109.133 us, max = 2.841 ms, min = 1.846 us, total = 7.698 s [state-dump] ObjectManager.UpdateAvailableMemory - 33569 total (0 active), Execution time: mean = 6.000 us, total = 201.424 ms, Queueing time: mean = 108.618 us, max = 1.662 ms, min = 2.228 us, total = 3.646 s [state-dump] NodeManager.CheckGC - 33569 total (1 active), Execution time: mean = 2.825 us, total = 94.821 ms, Queueing time: mean = 96.807 us, max = 25.875 ms, min = 2.848 us, total = 3.250 s [state-dump] RaySyncer.OnDemandBroadcasting - 33569 total (1 active), Execution time: mean = 10.367 us, total = 348.013 ms, Queueing time: mean = 90.153 us, max = 25.869 ms, min = 6.166 us, total = 3.026 s [state-dump] RayletWorkerPool.deadline_timer.kill_idle_workers - 16794 total (1 active), Execution time: mean = 18.004 us, total = 302.359 ms, Queueing time: mean = 75.704 us, max = 26.386 ms, min = -0.001 s, total = 1.271 s [state-dump] MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 13413 total (1 active), Execution time: mean = 452.512 us, total = 6.070 s, Queueing time: mean = 74.332 us, max = 3.532 ms, min = -0.000 s, total = 997.013 ms [state-dump] NodeManager.deadline_timer.spill_objects_when_over_threshold - 3360 total (1 active), Execution time: mean = 3.076 us, total = 10.336 ms, Queueing time: mean = 181.105 us, max = 2.946 ms, min = 3.958 us, total = 608.513 ms [state-dump] NodeManager.deadline_timer.flush_free_objects - 3360 total (1 active), Execution time: mean = 8.935 us, total = 30.022 ms, Queueing time: mean = 177.088 us, max = 2.947 ms, min = 3.811 us, total = 595.017 ms [state-dump] NodeManager.ScheduleAndDispatchTasks - 3360 total (1 active), Execution time: mean = 16.440 us, total = 55.237 ms, Queueing time: mean = 73.335 us, max = 2.581 ms, min = 10.666 us, total = 246.406 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 3359 total (0 active), Execution time: mean = 96.141 us, total = 322.938 ms, Queueing time: mean = 110.404 us, max = 2.934 ms, min = 4.027 us, total = 370.846 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad - 3359 total (0 active), Execution time: mean = 603.624 us, total = 2.028 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ClusterResourceManager.ResetRemoteNodeView - 1121 total (1 active), Execution time: mean = 10.166 us, total = 11.396 ms, Queueing time: mean = 74.843 us, max = 442.307 us, min = 14.043 us, total = 83.899 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 672 total (0 active), Execution time: mean = 1.574 ms, total = 1.058 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.deadline_timer.record_metrics - 672 total (1 active), Execution time: mean = 578.112 us, total = 388.492 ms, Queueing time: mean = 341.268 us, max = 2.010 ms, min = 9.115 us, total = 229.332 ms [state-dump] NodeManager.GcsCheckAlive - 672 total (1 active), Execution time: mean = 302.424 us, total = 203.229 ms, Queueing time: mean = 616.329 us, max = 2.567 ms, min = 6.690 us, total = 414.173 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 672 total (0 active), Execution time: mean = 54.272 us, total = 36.471 ms, Queueing time: mean = 114.754 us, max = 4.779 ms, min = 11.561 us, total = 77.115 ms [state-dump] NodeManager.deadline_timer.debug_state_dump - 336 total (1 active), Execution time: mean = 1.766 ms, total = 593.448 ms, Queueing time: mean = 68.574 us, max = 163.431 us, min = 11.609 us, total = 23.041 ms [state-dump] ClientConnection.async_read.ProcessMessageHeader - 95 total (21 active), Execution time: mean = 7.620 us, total = 723.861 us, Queueing time: mean = 660.815 ms, max = 59.826 s, min = 29.537 us, total = 62.777 s [state-dump] ClientConnection.async_read.ProcessMessage - 74 total (0 active), Execution time: mean = 752.373 us, total = 55.676 ms, Queueing time: mean = 18.898 us, max = 247.615 us, min = 4.453 us, total = 1.398 ms [state-dump] NodeManager.deadline_timer.print_event_loop_stats - 56 total (1 active, 1 running), Execution time: mean = 2.709 ms, total = 151.678 ms, Queueing time: mean = 74.197 us, max = 215.454 us, min = 15.835 us, total = 4.155 ms [state-dump] ClientConnection.async_write.DoAsyncWrites - 22 total (0 active), Execution time: mean = 2.129 us, total = 46.834 us, Queueing time: mean = 43.475 us, max = 264.443 us, min = 17.727 us, total = 956.459 us [state-dump] ObjectManager.ObjectDeleted - 21 total (0 active), Execution time: mean = 18.654 us, total = 391.728 us, Queueing time: mean = 66.932 us, max = 269.040 us, min = 24.280 us, total = 1.406 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 21 total (0 active), Execution time: mean = 119.783 us, total = 2.515 ms, Queueing time: mean = 119.709 us, max = 193.975 us, min = 26.529 us, total = 2.514 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig - 21 total (0 active), Execution time: mean = 800.451 us, total = 16.809 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ObjectManager.ObjectAdded - 21 total (0 active), Execution time: mean = 13.522 us, total = 283.961 us, Queueing time: mean = 115.200 us, max = 203.067 us, min = 15.853 us, total = 2.419 ms [state-dump] PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 204.482 us, total = 2.658 ms, Queueing time: mean = 3.109 ms, max = 9.915 ms, min = 28.359 us, total = 40.413 ms [state-dump] - 12 total (0 active), Execution time: mean = 1.109 us, total = 13.308 us, Queueing time: mean = 117.235 us, max = 207.332 us, min = 32.817 us, total = 1.407 ms [state-dump] RaySyncer.BroadcastMessage - 12 total (0 active), Execution time: mean = 229.790 us, total = 2.757 ms, Queueing time: mean = 796.500 ns, max = 1.330 us, min = 180.000 ns, total = 9.558 us [state-dump] NodeManagerService.grpc_server.RequestWorkerLease.HandleRequestImpl - 10 total (0 active), Execution time: mean = 133.400 us, total = 1.334 ms, Queueing time: mean = 133.654 us, max = 258.016 us, min = 50.206 us, total = 1.337 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker - 10 total (0 active), Execution time: mean = 698.937 us, total = 6.989 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.RequestWorkerLease - 10 total (0 active), Execution time: mean = 967.372 us, total = 9.674 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] WorkerPool.PopWorkerCallback - 10 total (0 active), Execution time: mean = 24.488 us, total = 244.878 us, Queueing time: mean = 114.997 us, max = 206.869 us, min = 23.653 us, total = 1.150 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker.HandleRequestImpl - 10 total (0 active), Execution time: mean = 129.906 us, total = 1.299 ms, Queueing time: mean = 109.463 us, max = 137.712 us, min = 31.332 us, total = 1.095 ms [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 7 total (1 active), Execution time: mean = 428.229 s, total = 2997.604 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 6 total (0 active), Execution time: mean = 351.177 us, total = 2.107 ms, Queueing time: mean = 117.415 us, max = 183.286 us, min = 20.299 us, total = 704.491 us [state-dump] NodeManager.GCTaskFailureReason - 4 total (1 active), Execution time: mean = 7.355 us, total = 29.419 us, Queueing time: mean = 66.631 us, max = 97.290 us, min = 83.871 us, total = 266.523 us [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 1.320 ms, total = 2.640 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 142.119 us, total = 284.237 us, Queueing time: mean = 702.200 us, max = 1.287 ms, min = 117.240 us, total = 1.404 ms [state-dump] RaySyncerRegister - 2 total (0 active), Execution time: mean = 2.053 us, total = 4.106 us, Queueing time: mean = 222.000 ns, max = 369.000 ns, min = 75.000 ns, total = 444.000 ns [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 2.178 ms, total = 2.178 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 1.022 s, total = 1.022 s, Queueing time: mean = 126.779 us, max = 126.779 us, min = 126.779 us, total = 126.779 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 43.810 us, total = 43.810 us, Queueing time: mean = 116.080 us, max = 116.080 us, min = 116.080 us, total = 116.080 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 1.935 ms, total = 1.935 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 2.002 ms, total = 2.002 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 1.278 ms, total = 1.278 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 288.995 us, total = 288.995 us, Queueing time: mean = 122.868 us, max = 122.868 us, min = 122.868 us, total = 122.868 us [state-dump] Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 74.750 us, total = 74.750 us, Queueing time: mean = 225.429 us, max = 225.429 us, min = 225.429 us, total = 225.429 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 242.297 us, total = 242.297 us, Queueing time: mean = 122.977 us, max = 122.977 us, min = 122.977 us, total = 122.977 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 92.080 us, total = 92.080 us, Queueing time: mean = 36.492 us, max = 36.492 us, min = 36.492 us, total = 36.492 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 1.746 ms, total = 1.746 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] DebugString() time ms: 1 [state-dump] [state-dump] [2025-01-20 23:58:54,920 I 13636 13664] (raylet) store.cc:564: Plasma store debug dump: Current usage: 0 / 2.14748 GB - num bytes created total: 168 0 pending objects of total size 0MB - objects spillable: 0 - bytes spillable: 0 - objects unsealed: 0 - bytes unsealed: 0 - objects in use: 0 - bytes in use: 0 - objects evictable: 0 - bytes evictable: 0 - objects created by worker: 0 - bytes created by worker: 0 - objects restored: 0 - bytes restored: 0 - objects received: 0 - bytes received: 0 - objects errored: 0 - bytes errored: 0 [2025-01-20 23:58:56,084 I 13636 13636] (raylet) node_manager.cc:525: [state-dump] NodeManager: [state-dump] Node ID: 94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3 [state-dump] Node name: 192.168.0.2 [state-dump] InitialConfigResources: {object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, GPU: 20000, node:__internal_head__: 10000, accelerator_type:A40: 10000, memory: 844966629380000} [state-dump] ClusterTaskManager: [state-dump] ========== Node: 94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3 ================= [state-dump] Infeasible queue length: 0 [state-dump] Schedule queue length: 0 [state-dump] Dispatch queue length: 0 [state-dump] num_waiting_for_resource: 0 [state-dump] num_waiting_for_plasma_memory: 0 [state-dump] num_waiting_for_remote_node_resources: 0 [state-dump] num_worker_not_started_by_job_config_not_exist: 0 [state-dump] num_worker_not_started_by_registration_timeout: 0 [state-dump] num_tasks_waiting_for_workers: 0 [state-dump] num_cancelled_tasks: 0 [state-dump] cluster_resource_scheduler state: [state-dump] Local id: -2729954820020547849 Local resources: {"total":{CPU: [200000], memory: [844966629380000], accelerator_type:A40: [10000], node:192.168.0.2: [10000], node:__internal_head__: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000]}}, "available": {CPU: [200000], memory: [844966629380000], accelerator_type:A40: [10000], node:192.168.0.2: [10000], node:__internal_head__: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000]}}, "labels":{"ray.io/node_id":"94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3",} is_draining: 0 is_idle: 1 Cluster resources: node id: -2729954820020547849{"total":{object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, GPU: 20000, node:__internal_head__: 10000, accelerator_type:A40: 10000, memory: 844966629380000}}, "available": {GPU: 20000, node:__internal_head__: 10000, memory: 844966629380000, accelerator_type:A40: 10000, object_store_memory: 21474836480000, node:192.168.0.2: 10000, CPU: 200000}}, "labels":{"ray.io/node_id":"94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} [state-dump] Waiting tasks size: 0 [state-dump] Number of executing tasks: 0 [state-dump] Number of pinned task arguments: 0 [state-dump] Number of total spilled tasks: 0 [state-dump] Number of spilled waiting tasks: 0 [state-dump] Number of spilled unschedulable tasks: 0 [state-dump] Resource usage { [state-dump] } [state-dump] Backlog Size per scheduling descriptor :{workerId: num backlogs}: [state-dump] [state-dump] Running tasks by scheduling class: [state-dump] ================================================== [state-dump] [state-dump] ClusterResources: [state-dump] LocalObjectManager: [state-dump] - num pinned objects: 0 [state-dump] - pinned objects size: 0 [state-dump] - num objects pending restore: 0 [state-dump] - num objects pending spill: 0 [state-dump] - num bytes pending spill: 0 [state-dump] - num bytes currently spilled: 0 [state-dump] - cumulative spill requests: 0 [state-dump] - cumulative restore requests: 0 [state-dump] - spilled objects pending delete: 0 [state-dump] [state-dump] ObjectManager: [state-dump] - num local objects: 0 [state-dump] - num unfulfilled push requests: 0 [state-dump] - num object pull requests: 0 [state-dump] - num chunks received total: 0 [state-dump] - num chunks received failed (all): 0 [state-dump] - num chunks received failed / cancelled: 0 [state-dump] - num chunks received failed / plasma error: 0 [state-dump] Event stats: [state-dump] Global stats: 0 total (0 active) [state-dump] Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Execution time: mean = -nan s, total = 0.000 s [state-dump] Event stats: [state-dump] PushManager: [state-dump] - num pushes in flight: 0 [state-dump] - num chunks in flight: 0 [state-dump] - num chunks remaining: 0 [state-dump] - max chunks allowed: 409 [state-dump] OwnershipBasedObjectDirectory: [state-dump] - num listeners: 0 [state-dump] - cumulative location updates: 0 [state-dump] - num location updates per second: 0.000 [state-dump] - num location lookups per second: 0.000 [state-dump] - num locations added per second: 0.000 [state-dump] - num locations removed per second: 0.000 [state-dump] BufferPool: [state-dump] - create buffer state map size: 0 [state-dump] PullManager: [state-dump] - num bytes available for pulled objects: 2147483648 [state-dump] - num bytes being pulled (all): 0 [state-dump] - num bytes being pulled / pinned: 0 [state-dump] - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - first get request bundle: N/A [state-dump] - first wait request bundle: N/A [state-dump] - first task request bundle: N/A [state-dump] - num objects queued: 0 [state-dump] - num objects actively pulled (all): 0 [state-dump] - num objects actively pulled / pinned: 0 [state-dump] - num bundles being pulled: 0 [state-dump] - num pull retries: 0 [state-dump] - max timeout seconds: 0 [state-dump] - max timeout request is already processed. No entry. [state-dump] [state-dump] WorkerPool: [state-dump] - registered jobs: 1 [state-dump] - process_failed_job_config_missing: 0 [state-dump] - process_failed_rate_limited: 0 [state-dump] - process_failed_pending_registration: 0 [state-dump] - process_failed_runtime_env_setup_failed: 0 [state-dump] - num PYTHON workers: 20 [state-dump] - num PYTHON drivers: 1 [state-dump] - num PYTHON pending start requests: 0 [state-dump] - num PYTHON pending registration requests: 0 [state-dump] - num object spill callbacks queued: 0 [state-dump] - num object restore queued: 0 [state-dump] - num util functions queued: 0 [state-dump] - num idle workers: 20 [state-dump] TaskDependencyManager: [state-dump] - task deps map size: 0 [state-dump] - get req map size: 0 [state-dump] - wait req map size: 0 [state-dump] - local objects map size: 0 [state-dump] WaitManager: [state-dump] - num active wait requests: 0 [state-dump] Subscriber: [state-dump] Channel WORKER_OBJECT_LOCATIONS_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_OBJECT_EVICTION [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_REF_REMOVED_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] num async plasma notifications: 0 [state-dump] Remote node managers: [state-dump] Event stats: [state-dump] Global stats: 298619 total (35 active) [state-dump] Queueing time: mean = 287.359 us, max = 59.826 s, min = -0.001 s, total = 85.811 s [state-dump] Execution time: mean = 10.218 ms, total = 3051.235 s [state-dump] Event stats: [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog - 71799 total (0 active), Execution time: mean = 527.171 us, total = 37.850 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 71799 total (0 active), Execution time: mean = 35.036 us, total = 2.516 s, Queueing time: mean = 109.310 us, max = 2.841 ms, min = 1.846 us, total = 7.848 s [state-dump] ObjectManager.UpdateAvailableMemory - 34168 total (0 active), Execution time: mean = 6.005 us, total = 205.175 ms, Queueing time: mean = 108.882 us, max = 1.662 ms, min = 2.228 us, total = 3.720 s [state-dump] NodeManager.CheckGC - 34168 total (1 active), Execution time: mean = 2.826 us, total = 96.545 ms, Queueing time: mean = 96.847 us, max = 25.875 ms, min = 2.848 us, total = 3.309 s [state-dump] RaySyncer.OnDemandBroadcasting - 34168 total (1 active), Execution time: mean = 10.357 us, total = 353.878 ms, Queueing time: mean = 90.204 us, max = 25.869 ms, min = 6.166 us, total = 3.082 s [state-dump] RayletWorkerPool.deadline_timer.kill_idle_workers - 17094 total (1 active), Execution time: mean = 18.004 us, total = 307.754 ms, Queueing time: mean = 75.903 us, max = 26.386 ms, min = -0.001 s, total = 1.297 s [state-dump] MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 13653 total (1 active), Execution time: mean = 452.662 us, total = 6.180 s, Queueing time: mean = 74.386 us, max = 3.532 ms, min = -0.000 s, total = 1.016 s [state-dump] NodeManager.deadline_timer.spill_objects_when_over_threshold - 3420 total (1 active), Execution time: mean = 3.079 us, total = 10.531 ms, Queueing time: mean = 181.442 us, max = 2.946 ms, min = 3.958 us, total = 620.530 ms [state-dump] NodeManager.deadline_timer.flush_free_objects - 3420 total (1 active), Execution time: mean = 8.935 us, total = 30.559 ms, Queueing time: mean = 177.428 us, max = 2.947 ms, min = 3.811 us, total = 606.803 ms [state-dump] NodeManager.ScheduleAndDispatchTasks - 3420 total (1 active), Execution time: mean = 16.440 us, total = 56.223 ms, Queueing time: mean = 73.464 us, max = 2.581 ms, min = 10.666 us, total = 251.246 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 3419 total (0 active), Execution time: mean = 96.148 us, total = 328.731 ms, Queueing time: mean = 110.462 us, max = 2.934 ms, min = 4.027 us, total = 377.671 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad - 3419 total (0 active), Execution time: mean = 604.970 us, total = 2.068 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ClusterResourceManager.ResetRemoteNodeView - 1141 total (1 active), Execution time: mean = 10.140 us, total = 11.569 ms, Queueing time: mean = 74.792 us, max = 442.307 us, min = 14.043 us, total = 85.338 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 684 total (0 active), Execution time: mean = 1.577 ms, total = 1.078 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.deadline_timer.record_metrics - 684 total (1 active), Execution time: mean = 580.879 us, total = 397.321 ms, Queueing time: mean = 340.275 us, max = 2.010 ms, min = 9.115 us, total = 232.748 ms [state-dump] NodeManager.GcsCheckAlive - 684 total (1 active), Execution time: mean = 302.478 us, total = 206.895 ms, Queueing time: mean = 618.066 us, max = 2.567 ms, min = 6.690 us, total = 422.757 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 684 total (0 active), Execution time: mean = 54.359 us, total = 37.182 ms, Queueing time: mean = 114.798 us, max = 4.779 ms, min = 11.561 us, total = 78.522 ms [state-dump] NodeManager.deadline_timer.debug_state_dump - 342 total (1 active), Execution time: mean = 1.767 ms, total = 604.422 ms, Queueing time: mean = 68.678 us, max = 163.431 us, min = 11.609 us, total = 23.488 ms [state-dump] ClientConnection.async_read.ProcessMessageHeader - 95 total (21 active), Execution time: mean = 7.620 us, total = 723.861 us, Queueing time: mean = 660.815 ms, max = 59.826 s, min = 29.537 us, total = 62.777 s [state-dump] ClientConnection.async_read.ProcessMessage - 74 total (0 active), Execution time: mean = 752.373 us, total = 55.676 ms, Queueing time: mean = 18.898 us, max = 247.615 us, min = 4.453 us, total = 1.398 ms [state-dump] NodeManager.deadline_timer.print_event_loop_stats - 57 total (1 active, 1 running), Execution time: mean = 2.682 ms, total = 152.871 ms, Queueing time: mean = 73.511 us, max = 215.454 us, min = 15.835 us, total = 4.190 ms [state-dump] ClientConnection.async_write.DoAsyncWrites - 22 total (0 active), Execution time: mean = 2.129 us, total = 46.834 us, Queueing time: mean = 43.475 us, max = 264.443 us, min = 17.727 us, total = 956.459 us [state-dump] ObjectManager.ObjectDeleted - 21 total (0 active), Execution time: mean = 18.654 us, total = 391.728 us, Queueing time: mean = 66.932 us, max = 269.040 us, min = 24.280 us, total = 1.406 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 21 total (0 active), Execution time: mean = 119.783 us, total = 2.515 ms, Queueing time: mean = 119.709 us, max = 193.975 us, min = 26.529 us, total = 2.514 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig - 21 total (0 active), Execution time: mean = 800.451 us, total = 16.809 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ObjectManager.ObjectAdded - 21 total (0 active), Execution time: mean = 13.522 us, total = 283.961 us, Queueing time: mean = 115.200 us, max = 203.067 us, min = 15.853 us, total = 2.419 ms [state-dump] PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 204.482 us, total = 2.658 ms, Queueing time: mean = 3.109 ms, max = 9.915 ms, min = 28.359 us, total = 40.413 ms [state-dump] - 12 total (0 active), Execution time: mean = 1.109 us, total = 13.308 us, Queueing time: mean = 117.235 us, max = 207.332 us, min = 32.817 us, total = 1.407 ms [state-dump] RaySyncer.BroadcastMessage - 12 total (0 active), Execution time: mean = 229.790 us, total = 2.757 ms, Queueing time: mean = 796.500 ns, max = 1.330 us, min = 180.000 ns, total = 9.558 us [state-dump] NodeManagerService.grpc_server.RequestWorkerLease.HandleRequestImpl - 10 total (0 active), Execution time: mean = 133.400 us, total = 1.334 ms, Queueing time: mean = 133.654 us, max = 258.016 us, min = 50.206 us, total = 1.337 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker - 10 total (0 active), Execution time: mean = 698.937 us, total = 6.989 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.RequestWorkerLease - 10 total (0 active), Execution time: mean = 967.372 us, total = 9.674 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] WorkerPool.PopWorkerCallback - 10 total (0 active), Execution time: mean = 24.488 us, total = 244.878 us, Queueing time: mean = 114.997 us, max = 206.869 us, min = 23.653 us, total = 1.150 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker.HandleRequestImpl - 10 total (0 active), Execution time: mean = 129.906 us, total = 1.299 ms, Queueing time: mean = 109.463 us, max = 137.712 us, min = 31.332 us, total = 1.095 ms [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 7 total (1 active), Execution time: mean = 428.229 s, total = 2997.604 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 6 total (0 active), Execution time: mean = 351.177 us, total = 2.107 ms, Queueing time: mean = 117.415 us, max = 183.286 us, min = 20.299 us, total = 704.491 us [state-dump] NodeManager.GCTaskFailureReason - 4 total (1 active), Execution time: mean = 7.355 us, total = 29.419 us, Queueing time: mean = 66.631 us, max = 97.290 us, min = 83.871 us, total = 266.523 us [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 1.320 ms, total = 2.640 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 142.119 us, total = 284.237 us, Queueing time: mean = 702.200 us, max = 1.287 ms, min = 117.240 us, total = 1.404 ms [state-dump] RaySyncerRegister - 2 total (0 active), Execution time: mean = 2.053 us, total = 4.106 us, Queueing time: mean = 222.000 ns, max = 369.000 ns, min = 75.000 ns, total = 444.000 ns [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 2.178 ms, total = 2.178 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 1.022 s, total = 1.022 s, Queueing time: mean = 126.779 us, max = 126.779 us, min = 126.779 us, total = 126.779 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 43.810 us, total = 43.810 us, Queueing time: mean = 116.080 us, max = 116.080 us, min = 116.080 us, total = 116.080 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 1.935 ms, total = 1.935 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 2.002 ms, total = 2.002 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 1.278 ms, total = 1.278 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 288.995 us, total = 288.995 us, Queueing time: mean = 122.868 us, max = 122.868 us, min = 122.868 us, total = 122.868 us [state-dump] Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 74.750 us, total = 74.750 us, Queueing time: mean = 225.429 us, max = 225.429 us, min = 225.429 us, total = 225.429 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 242.297 us, total = 242.297 us, Queueing time: mean = 122.977 us, max = 122.977 us, min = 122.977 us, total = 122.977 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 92.080 us, total = 92.080 us, Queueing time: mean = 36.492 us, max = 36.492 us, min = 36.492 us, total = 36.492 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 1.746 ms, total = 1.746 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] DebugString() time ms: 1 [state-dump] [state-dump] [2025-01-20 23:59:54,920 I 13636 13664] (raylet) store.cc:564: Plasma store debug dump: Current usage: 0 / 2.14748 GB - num bytes created total: 168 0 pending objects of total size 0MB - objects spillable: 0 - bytes spillable: 0 - objects unsealed: 0 - bytes unsealed: 0 - objects in use: 0 - bytes in use: 0 - objects evictable: 0 - bytes evictable: 0 - objects created by worker: 0 - bytes created by worker: 0 - objects restored: 0 - bytes restored: 0 - objects received: 0 - bytes received: 0 - objects errored: 0 - bytes errored: 0 [2025-01-20 23:59:56,087 I 13636 13636] (raylet) node_manager.cc:525: [state-dump] NodeManager: [state-dump] Node ID: 94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3 [state-dump] Node name: 192.168.0.2 [state-dump] InitialConfigResources: {object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, GPU: 20000, node:__internal_head__: 10000, accelerator_type:A40: 10000, memory: 844966629380000} [state-dump] ClusterTaskManager: [state-dump] ========== Node: 94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3 ================= [state-dump] Infeasible queue length: 0 [state-dump] Schedule queue length: 0 [state-dump] Dispatch queue length: 0 [state-dump] num_waiting_for_resource: 0 [state-dump] num_waiting_for_plasma_memory: 0 [state-dump] num_waiting_for_remote_node_resources: 0 [state-dump] num_worker_not_started_by_job_config_not_exist: 0 [state-dump] num_worker_not_started_by_registration_timeout: 0 [state-dump] num_tasks_waiting_for_workers: 0 [state-dump] num_cancelled_tasks: 0 [state-dump] cluster_resource_scheduler state: [state-dump] Local id: -2729954820020547849 Local resources: {"total":{CPU: [200000], memory: [844966629380000], accelerator_type:A40: [10000], node:192.168.0.2: [10000], node:__internal_head__: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000]}}, "available": {CPU: [200000], memory: [844966629380000], accelerator_type:A40: [10000], node:192.168.0.2: [10000], node:__internal_head__: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000]}}, "labels":{"ray.io/node_id":"94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3",} is_draining: 0 is_idle: 1 Cluster resources: node id: -2729954820020547849{"total":{object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, GPU: 20000, node:__internal_head__: 10000, accelerator_type:A40: 10000, memory: 844966629380000}}, "available": {GPU: 20000, node:__internal_head__: 10000, memory: 844966629380000, accelerator_type:A40: 10000, object_store_memory: 21474836480000, node:192.168.0.2: 10000, CPU: 200000}}, "labels":{"ray.io/node_id":"94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} [state-dump] Waiting tasks size: 0 [state-dump] Number of executing tasks: 0 [state-dump] Number of pinned task arguments: 0 [state-dump] Number of total spilled tasks: 0 [state-dump] Number of spilled waiting tasks: 0 [state-dump] Number of spilled unschedulable tasks: 0 [state-dump] Resource usage { [state-dump] } [state-dump] Backlog Size per scheduling descriptor :{workerId: num backlogs}: [state-dump] [state-dump] Running tasks by scheduling class: [state-dump] ================================================== [state-dump] [state-dump] ClusterResources: [state-dump] LocalObjectManager: [state-dump] - num pinned objects: 0 [state-dump] - pinned objects size: 0 [state-dump] - num objects pending restore: 0 [state-dump] - num objects pending spill: 0 [state-dump] - num bytes pending spill: 0 [state-dump] - num bytes currently spilled: 0 [state-dump] - cumulative spill requests: 0 [state-dump] - cumulative restore requests: 0 [state-dump] - spilled objects pending delete: 0 [state-dump] [state-dump] ObjectManager: [state-dump] - num local objects: 0 [state-dump] - num unfulfilled push requests: 0 [state-dump] - num object pull requests: 0 [state-dump] - num chunks received total: 0 [state-dump] - num chunks received failed (all): 0 [state-dump] - num chunks received failed / cancelled: 0 [state-dump] - num chunks received failed / plasma error: 0 [state-dump] Event stats: [state-dump] Global stats: 0 total (0 active) [state-dump] Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Execution time: mean = -nan s, total = 0.000 s [state-dump] Event stats: [state-dump] PushManager: [state-dump] - num pushes in flight: 0 [state-dump] - num chunks in flight: 0 [state-dump] - num chunks remaining: 0 [state-dump] - max chunks allowed: 409 [state-dump] OwnershipBasedObjectDirectory: [state-dump] - num listeners: 0 [state-dump] - cumulative location updates: 0 [state-dump] - num location updates per second: 0.000 [state-dump] - num location lookups per second: 0.000 [state-dump] - num locations added per second: 0.000 [state-dump] - num locations removed per second: 0.000 [state-dump] BufferPool: [state-dump] - create buffer state map size: 0 [state-dump] PullManager: [state-dump] - num bytes available for pulled objects: 2147483648 [state-dump] - num bytes being pulled (all): 0 [state-dump] - num bytes being pulled / pinned: 0 [state-dump] - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - first get request bundle: N/A [state-dump] - first wait request bundle: N/A [state-dump] - first task request bundle: N/A [state-dump] - num objects queued: 0 [state-dump] - num objects actively pulled (all): 0 [state-dump] - num objects actively pulled / pinned: 0 [state-dump] - num bundles being pulled: 0 [state-dump] - num pull retries: 0 [state-dump] - max timeout seconds: 0 [state-dump] - max timeout request is already processed. No entry. [state-dump] [state-dump] WorkerPool: [state-dump] - registered jobs: 1 [state-dump] - process_failed_job_config_missing: 0 [state-dump] - process_failed_rate_limited: 0 [state-dump] - process_failed_pending_registration: 0 [state-dump] - process_failed_runtime_env_setup_failed: 0 [state-dump] - num PYTHON workers: 20 [state-dump] - num PYTHON drivers: 1 [state-dump] - num PYTHON pending start requests: 0 [state-dump] - num PYTHON pending registration requests: 0 [state-dump] - num object spill callbacks queued: 0 [state-dump] - num object restore queued: 0 [state-dump] - num util functions queued: 0 [state-dump] - num idle workers: 20 [state-dump] TaskDependencyManager: [state-dump] - task deps map size: 0 [state-dump] - get req map size: 0 [state-dump] - wait req map size: 0 [state-dump] - local objects map size: 0 [state-dump] WaitManager: [state-dump] - num active wait requests: 0 [state-dump] Subscriber: [state-dump] Channel WORKER_OBJECT_LOCATIONS_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_OBJECT_EVICTION [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_REF_REMOVED_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] num async plasma notifications: 0 [state-dump] Remote node managers: [state-dump] Event stats: [state-dump] Global stats: 303852 total (35 active) [state-dump] Queueing time: mean = 283.733 us, max = 59.826 s, min = -0.001 s, total = 86.213 s [state-dump] Execution time: mean = 10.045 ms, total = 3052.149 s [state-dump] Event stats: [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog - 73059 total (0 active), Execution time: mean = 527.000 us, total = 38.502 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 73059 total (0 active), Execution time: mean = 35.035 us, total = 2.560 s, Queueing time: mean = 109.235 us, max = 2.841 ms, min = 1.846 us, total = 7.981 s [state-dump] ObjectManager.UpdateAvailableMemory - 34768 total (0 active), Execution time: mean = 6.016 us, total = 209.154 ms, Queueing time: mean = 108.969 us, max = 1.662 ms, min = 2.228 us, total = 3.789 s [state-dump] NodeManager.CheckGC - 34768 total (1 active), Execution time: mean = 2.830 us, total = 98.381 ms, Queueing time: mean = 96.944 us, max = 25.875 ms, min = 2.848 us, total = 3.371 s [state-dump] RaySyncer.OnDemandBroadcasting - 34768 total (1 active), Execution time: mean = 10.385 us, total = 361.070 ms, Queueing time: mean = 90.277 us, max = 25.869 ms, min = 6.166 us, total = 3.139 s [state-dump] RayletWorkerPool.deadline_timer.kill_idle_workers - 17393 total (1 active), Execution time: mean = 18.046 us, total = 313.866 ms, Queueing time: mean = 75.880 us, max = 26.386 ms, min = -0.001 s, total = 1.320 s [state-dump] MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 13892 total (1 active), Execution time: mean = 452.715 us, total = 6.289 s, Queueing time: mean = 74.395 us, max = 3.532 ms, min = -0.000 s, total = 1.033 s [state-dump] NodeManager.deadline_timer.spill_objects_when_over_threshold - 3480 total (1 active), Execution time: mean = 3.082 us, total = 10.726 ms, Queueing time: mean = 181.166 us, max = 2.946 ms, min = 3.958 us, total = 630.459 ms [state-dump] NodeManager.deadline_timer.flush_free_objects - 3480 total (1 active), Execution time: mean = 8.975 us, total = 31.234 ms, Queueing time: mean = 177.123 us, max = 2.947 ms, min = 3.811 us, total = 616.388 ms [state-dump] NodeManager.ScheduleAndDispatchTasks - 3480 total (1 active), Execution time: mean = 16.500 us, total = 57.420 ms, Queueing time: mean = 73.515 us, max = 2.581 ms, min = 10.666 us, total = 255.832 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 3479 total (0 active), Execution time: mean = 96.172 us, total = 334.584 ms, Queueing time: mean = 110.415 us, max = 2.934 ms, min = 4.027 us, total = 384.133 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad - 3479 total (0 active), Execution time: mean = 605.161 us, total = 2.105 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ClusterResourceManager.ResetRemoteNodeView - 1161 total (1 active), Execution time: mean = 10.147 us, total = 11.780 ms, Queueing time: mean = 74.935 us, max = 442.307 us, min = 14.043 us, total = 87.000 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 696 total (0 active), Execution time: mean = 1.578 ms, total = 1.098 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.deadline_timer.record_metrics - 696 total (1 active), Execution time: mean = 580.699 us, total = 404.167 ms, Queueing time: mean = 339.184 us, max = 2.010 ms, min = 9.115 us, total = 236.072 ms [state-dump] NodeManager.GcsCheckAlive - 696 total (1 active), Execution time: mean = 303.364 us, total = 211.142 ms, Queueing time: mean = 615.789 us, max = 2.567 ms, min = 6.690 us, total = 428.589 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 696 total (0 active), Execution time: mean = 54.522 us, total = 37.947 ms, Queueing time: mean = 114.389 us, max = 4.779 ms, min = 11.561 us, total = 79.614 ms [state-dump] NodeManager.deadline_timer.debug_state_dump - 348 total (1 active), Execution time: mean = 1.768 ms, total = 615.129 ms, Queueing time: mean = 68.749 us, max = 163.431 us, min = 11.609 us, total = 23.925 ms [state-dump] ClientConnection.async_read.ProcessMessageHeader - 95 total (21 active), Execution time: mean = 7.620 us, total = 723.861 us, Queueing time: mean = 660.815 ms, max = 59.826 s, min = 29.537 us, total = 62.777 s [state-dump] ClientConnection.async_read.ProcessMessage - 74 total (0 active), Execution time: mean = 752.373 us, total = 55.676 ms, Queueing time: mean = 18.898 us, max = 247.615 us, min = 4.453 us, total = 1.398 ms [state-dump] NodeManager.deadline_timer.print_event_loop_stats - 58 total (1 active, 1 running), Execution time: mean = 2.681 ms, total = 155.522 ms, Queueing time: mean = 74.020 us, max = 215.454 us, min = 15.835 us, total = 4.293 ms [state-dump] ClientConnection.async_write.DoAsyncWrites - 22 total (0 active), Execution time: mean = 2.129 us, total = 46.834 us, Queueing time: mean = 43.475 us, max = 264.443 us, min = 17.727 us, total = 956.459 us [state-dump] ObjectManager.ObjectDeleted - 21 total (0 active), Execution time: mean = 18.654 us, total = 391.728 us, Queueing time: mean = 66.932 us, max = 269.040 us, min = 24.280 us, total = 1.406 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 21 total (0 active), Execution time: mean = 119.783 us, total = 2.515 ms, Queueing time: mean = 119.709 us, max = 193.975 us, min = 26.529 us, total = 2.514 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig - 21 total (0 active), Execution time: mean = 800.451 us, total = 16.809 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ObjectManager.ObjectAdded - 21 total (0 active), Execution time: mean = 13.522 us, total = 283.961 us, Queueing time: mean = 115.200 us, max = 203.067 us, min = 15.853 us, total = 2.419 ms [state-dump] PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 204.482 us, total = 2.658 ms, Queueing time: mean = 3.109 ms, max = 9.915 ms, min = 28.359 us, total = 40.413 ms [state-dump] - 12 total (0 active), Execution time: mean = 1.109 us, total = 13.308 us, Queueing time: mean = 117.235 us, max = 207.332 us, min = 32.817 us, total = 1.407 ms [state-dump] RaySyncer.BroadcastMessage - 12 total (0 active), Execution time: mean = 229.790 us, total = 2.757 ms, Queueing time: mean = 796.500 ns, max = 1.330 us, min = 180.000 ns, total = 9.558 us [state-dump] NodeManagerService.grpc_server.RequestWorkerLease.HandleRequestImpl - 10 total (0 active), Execution time: mean = 133.400 us, total = 1.334 ms, Queueing time: mean = 133.654 us, max = 258.016 us, min = 50.206 us, total = 1.337 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker - 10 total (0 active), Execution time: mean = 698.937 us, total = 6.989 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.RequestWorkerLease - 10 total (0 active), Execution time: mean = 967.372 us, total = 9.674 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] WorkerPool.PopWorkerCallback - 10 total (0 active), Execution time: mean = 24.488 us, total = 244.878 us, Queueing time: mean = 114.997 us, max = 206.869 us, min = 23.653 us, total = 1.150 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker.HandleRequestImpl - 10 total (0 active), Execution time: mean = 129.906 us, total = 1.299 ms, Queueing time: mean = 109.463 us, max = 137.712 us, min = 31.332 us, total = 1.095 ms [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 7 total (1 active), Execution time: mean = 428.229 s, total = 2997.604 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 6 total (0 active), Execution time: mean = 351.177 us, total = 2.107 ms, Queueing time: mean = 117.415 us, max = 183.286 us, min = 20.299 us, total = 704.491 us [state-dump] NodeManager.GCTaskFailureReason - 4 total (1 active), Execution time: mean = 7.355 us, total = 29.419 us, Queueing time: mean = 66.631 us, max = 97.290 us, min = 83.871 us, total = 266.523 us [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 1.320 ms, total = 2.640 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 142.119 us, total = 284.237 us, Queueing time: mean = 702.200 us, max = 1.287 ms, min = 117.240 us, total = 1.404 ms [state-dump] RaySyncerRegister - 2 total (0 active), Execution time: mean = 2.053 us, total = 4.106 us, Queueing time: mean = 222.000 ns, max = 369.000 ns, min = 75.000 ns, total = 444.000 ns [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 2.178 ms, total = 2.178 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 1.022 s, total = 1.022 s, Queueing time: mean = 126.779 us, max = 126.779 us, min = 126.779 us, total = 126.779 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 43.810 us, total = 43.810 us, Queueing time: mean = 116.080 us, max = 116.080 us, min = 116.080 us, total = 116.080 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 1.935 ms, total = 1.935 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 2.002 ms, total = 2.002 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 1.278 ms, total = 1.278 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 288.995 us, total = 288.995 us, Queueing time: mean = 122.868 us, max = 122.868 us, min = 122.868 us, total = 122.868 us [state-dump] Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 74.750 us, total = 74.750 us, Queueing time: mean = 225.429 us, max = 225.429 us, min = 225.429 us, total = 225.429 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 242.297 us, total = 242.297 us, Queueing time: mean = 122.977 us, max = 122.977 us, min = 122.977 us, total = 122.977 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 92.080 us, total = 92.080 us, Queueing time: mean = 36.492 us, max = 36.492 us, min = 36.492 us, total = 36.492 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 1.746 ms, total = 1.746 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] DebugString() time ms: 1 [state-dump] [state-dump] [2025-01-21 00:00:54,920 I 13636 13664] (raylet) store.cc:564: Plasma store debug dump: Current usage: 0 / 2.14748 GB - num bytes created total: 168 0 pending objects of total size 0MB - objects spillable: 0 - bytes spillable: 0 - objects unsealed: 0 - bytes unsealed: 0 - objects in use: 0 - bytes in use: 0 - objects evictable: 0 - bytes evictable: 0 - objects created by worker: 0 - bytes created by worker: 0 - objects restored: 0 - bytes restored: 0 - objects received: 0 - bytes received: 0 - objects errored: 0 - bytes errored: 0 [2025-01-21 00:00:56,090 I 13636 13636] (raylet) node_manager.cc:525: [state-dump] NodeManager: [state-dump] Node ID: 94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3 [state-dump] Node name: 192.168.0.2 [state-dump] InitialConfigResources: {object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, GPU: 20000, node:__internal_head__: 10000, accelerator_type:A40: 10000, memory: 844966629380000} [state-dump] ClusterTaskManager: [state-dump] ========== Node: 94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3 ================= [state-dump] Infeasible queue length: 0 [state-dump] Schedule queue length: 0 [state-dump] Dispatch queue length: 0 [state-dump] num_waiting_for_resource: 0 [state-dump] num_waiting_for_plasma_memory: 0 [state-dump] num_waiting_for_remote_node_resources: 0 [state-dump] num_worker_not_started_by_job_config_not_exist: 0 [state-dump] num_worker_not_started_by_registration_timeout: 0 [state-dump] num_tasks_waiting_for_workers: 0 [state-dump] num_cancelled_tasks: 0 [state-dump] cluster_resource_scheduler state: [state-dump] Local id: -2729954820020547849 Local resources: {"total":{CPU: [200000], memory: [844966629380000], accelerator_type:A40: [10000], node:192.168.0.2: [10000], node:__internal_head__: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000]}}, "available": {CPU: [200000], memory: [844966629380000], accelerator_type:A40: [10000], node:192.168.0.2: [10000], node:__internal_head__: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000]}}, "labels":{"ray.io/node_id":"94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3",} is_draining: 0 is_idle: 1 Cluster resources: node id: -2729954820020547849{"total":{object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, GPU: 20000, node:__internal_head__: 10000, accelerator_type:A40: 10000, memory: 844966629380000}}, "available": {GPU: 20000, node:__internal_head__: 10000, memory: 844966629380000, accelerator_type:A40: 10000, object_store_memory: 21474836480000, node:192.168.0.2: 10000, CPU: 200000}}, "labels":{"ray.io/node_id":"94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} [state-dump] Waiting tasks size: 0 [state-dump] Number of executing tasks: 0 [state-dump] Number of pinned task arguments: 0 [state-dump] Number of total spilled tasks: 0 [state-dump] Number of spilled waiting tasks: 0 [state-dump] Number of spilled unschedulable tasks: 0 [state-dump] Resource usage { [state-dump] } [state-dump] Backlog Size per scheduling descriptor :{workerId: num backlogs}: [state-dump] [state-dump] Running tasks by scheduling class: [state-dump] ================================================== [state-dump] [state-dump] ClusterResources: [state-dump] LocalObjectManager: [state-dump] - num pinned objects: 0 [state-dump] - pinned objects size: 0 [state-dump] - num objects pending restore: 0 [state-dump] - num objects pending spill: 0 [state-dump] - num bytes pending spill: 0 [state-dump] - num bytes currently spilled: 0 [state-dump] - cumulative spill requests: 0 [state-dump] - cumulative restore requests: 0 [state-dump] - spilled objects pending delete: 0 [state-dump] [state-dump] ObjectManager: [state-dump] - num local objects: 0 [state-dump] - num unfulfilled push requests: 0 [state-dump] - num object pull requests: 0 [state-dump] - num chunks received total: 0 [state-dump] - num chunks received failed (all): 0 [state-dump] - num chunks received failed / cancelled: 0 [state-dump] - num chunks received failed / plasma error: 0 [state-dump] Event stats: [state-dump] Global stats: 0 total (0 active) [state-dump] Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Execution time: mean = -nan s, total = 0.000 s [state-dump] Event stats: [state-dump] PushManager: [state-dump] - num pushes in flight: 0 [state-dump] - num chunks in flight: 0 [state-dump] - num chunks remaining: 0 [state-dump] - max chunks allowed: 409 [state-dump] OwnershipBasedObjectDirectory: [state-dump] - num listeners: 0 [state-dump] - cumulative location updates: 0 [state-dump] - num location updates per second: 0.000 [state-dump] - num location lookups per second: 0.000 [state-dump] - num locations added per second: 0.000 [state-dump] - num locations removed per second: 0.000 [state-dump] BufferPool: [state-dump] - create buffer state map size: 0 [state-dump] PullManager: [state-dump] - num bytes available for pulled objects: 2147483648 [state-dump] - num bytes being pulled (all): 0 [state-dump] - num bytes being pulled / pinned: 0 [state-dump] - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - first get request bundle: N/A [state-dump] - first wait request bundle: N/A [state-dump] - first task request bundle: N/A [state-dump] - num objects queued: 0 [state-dump] - num objects actively pulled (all): 0 [state-dump] - num objects actively pulled / pinned: 0 [state-dump] - num bundles being pulled: 0 [state-dump] - num pull retries: 0 [state-dump] - max timeout seconds: 0 [state-dump] - max timeout request is already processed. No entry. [state-dump] [state-dump] WorkerPool: [state-dump] - registered jobs: 1 [state-dump] - process_failed_job_config_missing: 0 [state-dump] - process_failed_rate_limited: 0 [state-dump] - process_failed_pending_registration: 0 [state-dump] - process_failed_runtime_env_setup_failed: 0 [state-dump] - num PYTHON workers: 20 [state-dump] - num PYTHON drivers: 1 [state-dump] - num PYTHON pending start requests: 0 [state-dump] - num PYTHON pending registration requests: 0 [state-dump] - num object spill callbacks queued: 0 [state-dump] - num object restore queued: 0 [state-dump] - num util functions queued: 0 [state-dump] - num idle workers: 20 [state-dump] TaskDependencyManager: [state-dump] - task deps map size: 0 [state-dump] - get req map size: 0 [state-dump] - wait req map size: 0 [state-dump] - local objects map size: 0 [state-dump] WaitManager: [state-dump] - num active wait requests: 0 [state-dump] Subscriber: [state-dump] Channel WORKER_OBJECT_LOCATIONS_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_OBJECT_EVICTION [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_REF_REMOVED_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] num async plasma notifications: 0 [state-dump] Remote node managers: [state-dump] Event stats: [state-dump] Global stats: 309084 total (35 active) [state-dump] Queueing time: mean = 280.060 us, max = 59.826 s, min = -0.001 s, total = 86.562 s [state-dump] Execution time: mean = 9.878 ms, total = 3052.988 s [state-dump] Event stats: [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog - 74319 total (0 active), Execution time: mean = 525.990 us, total = 39.091 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 74319 total (0 active), Execution time: mean = 34.958 us, total = 2.598 s, Queueing time: mean = 108.880 us, max = 2.841 ms, min = 1.846 us, total = 8.092 s [state-dump] ObjectManager.UpdateAvailableMemory - 35367 total (0 active), Execution time: mean = 6.005 us, total = 212.386 ms, Queueing time: mean = 108.625 us, max = 1.662 ms, min = 2.228 us, total = 3.842 s [state-dump] NodeManager.CheckGC - 35367 total (1 active), Execution time: mean = 2.831 us, total = 100.121 ms, Queueing time: mean = 96.782 us, max = 25.875 ms, min = 2.848 us, total = 3.423 s [state-dump] RaySyncer.OnDemandBroadcasting - 35367 total (1 active), Execution time: mean = 10.384 us, total = 367.237 ms, Queueing time: mean = 90.117 us, max = 25.869 ms, min = 6.166 us, total = 3.187 s [state-dump] RayletWorkerPool.deadline_timer.kill_idle_workers - 17693 total (1 active), Execution time: mean = 18.020 us, total = 318.830 ms, Queueing time: mean = 75.692 us, max = 26.386 ms, min = -0.001 s, total = 1.339 s [state-dump] MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 14132 total (1 active), Execution time: mean = 452.527 us, total = 6.395 s, Queueing time: mean = 74.275 us, max = 3.532 ms, min = -0.000 s, total = 1.050 s [state-dump] NodeManager.deadline_timer.spill_objects_when_over_threshold - 3540 total (1 active), Execution time: mean = 3.081 us, total = 10.907 ms, Queueing time: mean = 181.408 us, max = 2.946 ms, min = 3.958 us, total = 642.183 ms [state-dump] NodeManager.deadline_timer.flush_free_objects - 3540 total (1 active), Execution time: mean = 8.971 us, total = 31.756 ms, Queueing time: mean = 177.365 us, max = 2.947 ms, min = 3.811 us, total = 627.871 ms [state-dump] NodeManager.ScheduleAndDispatchTasks - 3540 total (1 active), Execution time: mean = 16.484 us, total = 58.352 ms, Queueing time: mean = 73.392 us, max = 2.581 ms, min = 10.666 us, total = 259.808 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 3539 total (0 active), Execution time: mean = 96.187 us, total = 340.405 ms, Queueing time: mean = 110.258 us, max = 2.934 ms, min = 4.027 us, total = 390.203 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad - 3539 total (0 active), Execution time: mean = 605.260 us, total = 2.142 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ClusterResourceManager.ResetRemoteNodeView - 1181 total (1 active), Execution time: mean = 10.112 us, total = 11.942 ms, Queueing time: mean = 74.818 us, max = 442.307 us, min = 14.043 us, total = 88.360 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 708 total (0 active), Execution time: mean = 1.578 ms, total = 1.117 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.deadline_timer.record_metrics - 708 total (1 active), Execution time: mean = 581.605 us, total = 411.777 ms, Queueing time: mean = 339.434 us, max = 2.010 ms, min = 9.115 us, total = 240.320 ms [state-dump] NodeManager.GcsCheckAlive - 708 total (1 active), Execution time: mean = 303.530 us, total = 214.899 ms, Queueing time: mean = 616.846 us, max = 2.567 ms, min = 6.690 us, total = 436.727 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 708 total (0 active), Execution time: mean = 54.521 us, total = 38.601 ms, Queueing time: mean = 113.977 us, max = 4.779 ms, min = 11.561 us, total = 80.696 ms [state-dump] NodeManager.deadline_timer.debug_state_dump - 354 total (1 active), Execution time: mean = 1.770 ms, total = 626.472 ms, Queueing time: mean = 68.693 us, max = 163.431 us, min = 11.609 us, total = 24.317 ms [state-dump] ClientConnection.async_read.ProcessMessageHeader - 95 total (21 active), Execution time: mean = 7.620 us, total = 723.861 us, Queueing time: mean = 660.815 ms, max = 59.826 s, min = 29.537 us, total = 62.777 s [state-dump] ClientConnection.async_read.ProcessMessage - 74 total (0 active), Execution time: mean = 752.373 us, total = 55.676 ms, Queueing time: mean = 18.898 us, max = 247.615 us, min = 4.453 us, total = 1.398 ms [state-dump] NodeManager.deadline_timer.print_event_loop_stats - 59 total (1 active, 1 running), Execution time: mean = 2.685 ms, total = 158.412 ms, Queueing time: mean = 73.878 us, max = 215.454 us, min = 15.835 us, total = 4.359 ms [state-dump] ClientConnection.async_write.DoAsyncWrites - 22 total (0 active), Execution time: mean = 2.129 us, total = 46.834 us, Queueing time: mean = 43.475 us, max = 264.443 us, min = 17.727 us, total = 956.459 us [state-dump] ObjectManager.ObjectDeleted - 21 total (0 active), Execution time: mean = 18.654 us, total = 391.728 us, Queueing time: mean = 66.932 us, max = 269.040 us, min = 24.280 us, total = 1.406 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 21 total (0 active), Execution time: mean = 119.783 us, total = 2.515 ms, Queueing time: mean = 119.709 us, max = 193.975 us, min = 26.529 us, total = 2.514 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig - 21 total (0 active), Execution time: mean = 800.451 us, total = 16.809 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ObjectManager.ObjectAdded - 21 total (0 active), Execution time: mean = 13.522 us, total = 283.961 us, Queueing time: mean = 115.200 us, max = 203.067 us, min = 15.853 us, total = 2.419 ms [state-dump] PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 204.482 us, total = 2.658 ms, Queueing time: mean = 3.109 ms, max = 9.915 ms, min = 28.359 us, total = 40.413 ms [state-dump] - 12 total (0 active), Execution time: mean = 1.109 us, total = 13.308 us, Queueing time: mean = 117.235 us, max = 207.332 us, min = 32.817 us, total = 1.407 ms [state-dump] RaySyncer.BroadcastMessage - 12 total (0 active), Execution time: mean = 229.790 us, total = 2.757 ms, Queueing time: mean = 796.500 ns, max = 1.330 us, min = 180.000 ns, total = 9.558 us [state-dump] NodeManagerService.grpc_server.RequestWorkerLease.HandleRequestImpl - 10 total (0 active), Execution time: mean = 133.400 us, total = 1.334 ms, Queueing time: mean = 133.654 us, max = 258.016 us, min = 50.206 us, total = 1.337 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker - 10 total (0 active), Execution time: mean = 698.937 us, total = 6.989 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.RequestWorkerLease - 10 total (0 active), Execution time: mean = 967.372 us, total = 9.674 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] WorkerPool.PopWorkerCallback - 10 total (0 active), Execution time: mean = 24.488 us, total = 244.878 us, Queueing time: mean = 114.997 us, max = 206.869 us, min = 23.653 us, total = 1.150 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker.HandleRequestImpl - 10 total (0 active), Execution time: mean = 129.906 us, total = 1.299 ms, Queueing time: mean = 109.463 us, max = 137.712 us, min = 31.332 us, total = 1.095 ms [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 7 total (1 active), Execution time: mean = 428.229 s, total = 2997.604 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 6 total (0 active), Execution time: mean = 351.177 us, total = 2.107 ms, Queueing time: mean = 117.415 us, max = 183.286 us, min = 20.299 us, total = 704.491 us [state-dump] NodeManager.GCTaskFailureReason - 4 total (1 active), Execution time: mean = 7.355 us, total = 29.419 us, Queueing time: mean = 66.631 us, max = 97.290 us, min = 83.871 us, total = 266.523 us [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 1.320 ms, total = 2.640 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 142.119 us, total = 284.237 us, Queueing time: mean = 702.200 us, max = 1.287 ms, min = 117.240 us, total = 1.404 ms [state-dump] RaySyncerRegister - 2 total (0 active), Execution time: mean = 2.053 us, total = 4.106 us, Queueing time: mean = 222.000 ns, max = 369.000 ns, min = 75.000 ns, total = 444.000 ns [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 2.178 ms, total = 2.178 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 1.022 s, total = 1.022 s, Queueing time: mean = 126.779 us, max = 126.779 us, min = 126.779 us, total = 126.779 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 43.810 us, total = 43.810 us, Queueing time: mean = 116.080 us, max = 116.080 us, min = 116.080 us, total = 116.080 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 1.935 ms, total = 1.935 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 2.002 ms, total = 2.002 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 1.278 ms, total = 1.278 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 288.995 us, total = 288.995 us, Queueing time: mean = 122.868 us, max = 122.868 us, min = 122.868 us, total = 122.868 us [state-dump] Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 74.750 us, total = 74.750 us, Queueing time: mean = 225.429 us, max = 225.429 us, min = 225.429 us, total = 225.429 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 242.297 us, total = 242.297 us, Queueing time: mean = 122.977 us, max = 122.977 us, min = 122.977 us, total = 122.977 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 92.080 us, total = 92.080 us, Queueing time: mean = 36.492 us, max = 36.492 us, min = 36.492 us, total = 36.492 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 1.746 ms, total = 1.746 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] DebugString() time ms: 1 [state-dump] [state-dump] [2025-01-21 00:01:54,920 I 13636 13664] (raylet) store.cc:564: Plasma store debug dump: Current usage: 0 / 2.14748 GB - num bytes created total: 168 0 pending objects of total size 0MB - objects spillable: 0 - bytes spillable: 0 - objects unsealed: 0 - bytes unsealed: 0 - objects in use: 0 - bytes in use: 0 - objects evictable: 0 - bytes evictable: 0 - objects created by worker: 0 - bytes created by worker: 0 - objects restored: 0 - bytes restored: 0 - objects received: 0 - bytes received: 0 - objects errored: 0 - bytes errored: 0 [2025-01-21 00:01:56,093 I 13636 13636] (raylet) node_manager.cc:525: [state-dump] NodeManager: [state-dump] Node ID: 94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3 [state-dump] Node name: 192.168.0.2 [state-dump] InitialConfigResources: {object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, GPU: 20000, node:__internal_head__: 10000, accelerator_type:A40: 10000, memory: 844966629380000} [state-dump] ClusterTaskManager: [state-dump] ========== Node: 94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3 ================= [state-dump] Infeasible queue length: 0 [state-dump] Schedule queue length: 0 [state-dump] Dispatch queue length: 0 [state-dump] num_waiting_for_resource: 0 [state-dump] num_waiting_for_plasma_memory: 0 [state-dump] num_waiting_for_remote_node_resources: 0 [state-dump] num_worker_not_started_by_job_config_not_exist: 0 [state-dump] num_worker_not_started_by_registration_timeout: 0 [state-dump] num_tasks_waiting_for_workers: 0 [state-dump] num_cancelled_tasks: 0 [state-dump] cluster_resource_scheduler state: [state-dump] Local id: -2729954820020547849 Local resources: {"total":{CPU: [200000], memory: [844966629380000], accelerator_type:A40: [10000], node:192.168.0.2: [10000], node:__internal_head__: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000]}}, "available": {CPU: [200000], memory: [844966629380000], accelerator_type:A40: [10000], node:192.168.0.2: [10000], node:__internal_head__: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000]}}, "labels":{"ray.io/node_id":"94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3",} is_draining: 0 is_idle: 1 Cluster resources: node id: -2729954820020547849{"total":{object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, GPU: 20000, node:__internal_head__: 10000, accelerator_type:A40: 10000, memory: 844966629380000}}, "available": {GPU: 20000, node:__internal_head__: 10000, memory: 844966629380000, accelerator_type:A40: 10000, object_store_memory: 21474836480000, node:192.168.0.2: 10000, CPU: 200000}}, "labels":{"ray.io/node_id":"94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} [state-dump] Waiting tasks size: 0 [state-dump] Number of executing tasks: 0 [state-dump] Number of pinned task arguments: 0 [state-dump] Number of total spilled tasks: 0 [state-dump] Number of spilled waiting tasks: 0 [state-dump] Number of spilled unschedulable tasks: 0 [state-dump] Resource usage { [state-dump] } [state-dump] Backlog Size per scheduling descriptor :{workerId: num backlogs}: [state-dump] [state-dump] Running tasks by scheduling class: [state-dump] ================================================== [state-dump] [state-dump] ClusterResources: [state-dump] LocalObjectManager: [state-dump] - num pinned objects: 0 [state-dump] - pinned objects size: 0 [state-dump] - num objects pending restore: 0 [state-dump] - num objects pending spill: 0 [state-dump] - num bytes pending spill: 0 [state-dump] - num bytes currently spilled: 0 [state-dump] - cumulative spill requests: 0 [state-dump] - cumulative restore requests: 0 [state-dump] - spilled objects pending delete: 0 [state-dump] [state-dump] ObjectManager: [state-dump] - num local objects: 0 [state-dump] - num unfulfilled push requests: 0 [state-dump] - num object pull requests: 0 [state-dump] - num chunks received total: 0 [state-dump] - num chunks received failed (all): 0 [state-dump] - num chunks received failed / cancelled: 0 [state-dump] - num chunks received failed / plasma error: 0 [state-dump] Event stats: [state-dump] Global stats: 0 total (0 active) [state-dump] Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Execution time: mean = -nan s, total = 0.000 s [state-dump] Event stats: [state-dump] PushManager: [state-dump] - num pushes in flight: 0 [state-dump] - num chunks in flight: 0 [state-dump] - num chunks remaining: 0 [state-dump] - max chunks allowed: 409 [state-dump] OwnershipBasedObjectDirectory: [state-dump] - num listeners: 0 [state-dump] - cumulative location updates: 0 [state-dump] - num location updates per second: 0.000 [state-dump] - num location lookups per second: 0.000 [state-dump] - num locations added per second: 0.000 [state-dump] - num locations removed per second: 0.000 [state-dump] BufferPool: [state-dump] - create buffer state map size: 0 [state-dump] PullManager: [state-dump] - num bytes available for pulled objects: 2147483648 [state-dump] - num bytes being pulled (all): 0 [state-dump] - num bytes being pulled / pinned: 0 [state-dump] - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - first get request bundle: N/A [state-dump] - first wait request bundle: N/A [state-dump] - first task request bundle: N/A [state-dump] - num objects queued: 0 [state-dump] - num objects actively pulled (all): 0 [state-dump] - num objects actively pulled / pinned: 0 [state-dump] - num bundles being pulled: 0 [state-dump] - num pull retries: 0 [state-dump] - max timeout seconds: 0 [state-dump] - max timeout request is already processed. No entry. [state-dump] [state-dump] WorkerPool: [state-dump] - registered jobs: 1 [state-dump] - process_failed_job_config_missing: 0 [state-dump] - process_failed_rate_limited: 0 [state-dump] - process_failed_pending_registration: 0 [state-dump] - process_failed_runtime_env_setup_failed: 0 [state-dump] - num PYTHON workers: 20 [state-dump] - num PYTHON drivers: 1 [state-dump] - num PYTHON pending start requests: 0 [state-dump] - num PYTHON pending registration requests: 0 [state-dump] - num object spill callbacks queued: 0 [state-dump] - num object restore queued: 0 [state-dump] - num util functions queued: 0 [state-dump] - num idle workers: 20 [state-dump] TaskDependencyManager: [state-dump] - task deps map size: 0 [state-dump] - get req map size: 0 [state-dump] - wait req map size: 0 [state-dump] - local objects map size: 0 [state-dump] WaitManager: [state-dump] - num active wait requests: 0 [state-dump] Subscriber: [state-dump] Channel WORKER_OBJECT_LOCATIONS_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_OBJECT_EVICTION [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_REF_REMOVED_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] num async plasma notifications: 0 [state-dump] Remote node managers: [state-dump] Event stats: [state-dump] Global stats: 314321 total (35 active) [state-dump] Queueing time: mean = 276.760 us, max = 59.826 s, min = -0.001 s, total = 86.992 s [state-dump] Execution time: mean = 11.625 ms, total = 3653.957 s [state-dump] Event stats: [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog - 75579 total (0 active), Execution time: mean = 526.298 us, total = 39.777 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 75579 total (0 active), Execution time: mean = 35.020 us, total = 2.647 s, Queueing time: mean = 108.934 us, max = 2.841 ms, min = 1.846 us, total = 8.233 s [state-dump] ObjectManager.UpdateAvailableMemory - 35967 total (0 active), Execution time: mean = 6.023 us, total = 216.613 ms, Queueing time: mean = 108.821 us, max = 1.662 ms, min = 2.228 us, total = 3.914 s [state-dump] NodeManager.CheckGC - 35967 total (1 active), Execution time: mean = 2.839 us, total = 102.128 ms, Queueing time: mean = 96.861 us, max = 25.875 ms, min = 2.848 us, total = 3.484 s [state-dump] RaySyncer.OnDemandBroadcasting - 35967 total (1 active), Execution time: mean = 10.415 us, total = 374.589 ms, Queueing time: mean = 90.174 us, max = 25.869 ms, min = 6.166 us, total = 3.243 s [state-dump] RayletWorkerPool.deadline_timer.kill_idle_workers - 17993 total (1 active), Execution time: mean = 18.120 us, total = 326.028 ms, Queueing time: mean = 75.872 us, max = 26.386 ms, min = -0.001 s, total = 1.365 s [state-dump] MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 14371 total (1 active), Execution time: mean = 452.921 us, total = 6.509 s, Queueing time: mean = 74.367 us, max = 3.532 ms, min = -0.000 s, total = 1.069 s [state-dump] NodeManager.deadline_timer.spill_objects_when_over_threshold - 3600 total (1 active), Execution time: mean = 3.089 us, total = 11.121 ms, Queueing time: mean = 181.806 us, max = 2.946 ms, min = 3.958 us, total = 654.501 ms [state-dump] NodeManager.deadline_timer.flush_free_objects - 3600 total (1 active), Execution time: mean = 9.008 us, total = 32.428 ms, Queueing time: mean = 177.745 us, max = 2.947 ms, min = 3.811 us, total = 639.880 ms [state-dump] NodeManager.ScheduleAndDispatchTasks - 3600 total (1 active), Execution time: mean = 16.577 us, total = 59.677 ms, Queueing time: mean = 73.568 us, max = 2.581 ms, min = 10.666 us, total = 264.846 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 3599 total (0 active), Execution time: mean = 96.341 us, total = 346.730 ms, Queueing time: mean = 110.514 us, max = 2.934 ms, min = 4.027 us, total = 397.739 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad - 3599 total (0 active), Execution time: mean = 606.471 us, total = 2.183 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ClusterResourceManager.ResetRemoteNodeView - 1201 total (1 active), Execution time: mean = 10.120 us, total = 12.154 ms, Queueing time: mean = 74.805 us, max = 442.307 us, min = 14.043 us, total = 89.841 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 720 total (0 active), Execution time: mean = 1.581 ms, total = 1.138 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.deadline_timer.record_metrics - 720 total (1 active), Execution time: mean = 581.887 us, total = 418.959 ms, Queueing time: mean = 341.248 us, max = 2.010 ms, min = 9.115 us, total = 245.699 ms [state-dump] NodeManager.GcsCheckAlive - 720 total (1 active), Execution time: mean = 304.682 us, total = 219.371 ms, Queueing time: mean = 617.929 us, max = 2.567 ms, min = 6.690 us, total = 444.909 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 720 total (0 active), Execution time: mean = 54.705 us, total = 39.388 ms, Queueing time: mean = 113.991 us, max = 4.779 ms, min = 11.561 us, total = 82.073 ms [state-dump] NodeManager.deadline_timer.debug_state_dump - 360 total (1 active), Execution time: mean = 1.774 ms, total = 638.724 ms, Queueing time: mean = 68.834 us, max = 163.431 us, min = 11.609 us, total = 24.780 ms [state-dump] ClientConnection.async_read.ProcessMessageHeader - 95 total (21 active), Execution time: mean = 7.620 us, total = 723.861 us, Queueing time: mean = 660.815 ms, max = 59.826 s, min = 29.537 us, total = 62.777 s [state-dump] ClientConnection.async_read.ProcessMessage - 74 total (0 active), Execution time: mean = 752.373 us, total = 55.676 ms, Queueing time: mean = 18.898 us, max = 247.615 us, min = 4.453 us, total = 1.398 ms [state-dump] NodeManager.deadline_timer.print_event_loop_stats - 60 total (1 active, 1 running), Execution time: mean = 2.686 ms, total = 161.184 ms, Queueing time: mean = 73.762 us, max = 215.454 us, min = 15.835 us, total = 4.426 ms [state-dump] ClientConnection.async_write.DoAsyncWrites - 22 total (0 active), Execution time: mean = 2.129 us, total = 46.834 us, Queueing time: mean = 43.475 us, max = 264.443 us, min = 17.727 us, total = 956.459 us [state-dump] ObjectManager.ObjectDeleted - 21 total (0 active), Execution time: mean = 18.654 us, total = 391.728 us, Queueing time: mean = 66.932 us, max = 269.040 us, min = 24.280 us, total = 1.406 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 21 total (0 active), Execution time: mean = 119.783 us, total = 2.515 ms, Queueing time: mean = 119.709 us, max = 193.975 us, min = 26.529 us, total = 2.514 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig - 21 total (0 active), Execution time: mean = 800.451 us, total = 16.809 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ObjectManager.ObjectAdded - 21 total (0 active), Execution time: mean = 13.522 us, total = 283.961 us, Queueing time: mean = 115.200 us, max = 203.067 us, min = 15.853 us, total = 2.419 ms [state-dump] PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 204.482 us, total = 2.658 ms, Queueing time: mean = 3.109 ms, max = 9.915 ms, min = 28.359 us, total = 40.413 ms [state-dump] - 12 total (0 active), Execution time: mean = 1.109 us, total = 13.308 us, Queueing time: mean = 117.235 us, max = 207.332 us, min = 32.817 us, total = 1.407 ms [state-dump] RaySyncer.BroadcastMessage - 12 total (0 active), Execution time: mean = 229.790 us, total = 2.757 ms, Queueing time: mean = 796.500 ns, max = 1.330 us, min = 180.000 ns, total = 9.558 us [state-dump] NodeManagerService.grpc_server.RequestWorkerLease.HandleRequestImpl - 10 total (0 active), Execution time: mean = 133.400 us, total = 1.334 ms, Queueing time: mean = 133.654 us, max = 258.016 us, min = 50.206 us, total = 1.337 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker - 10 total (0 active), Execution time: mean = 698.937 us, total = 6.989 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.RequestWorkerLease - 10 total (0 active), Execution time: mean = 967.372 us, total = 9.674 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] WorkerPool.PopWorkerCallback - 10 total (0 active), Execution time: mean = 24.488 us, total = 244.878 us, Queueing time: mean = 114.997 us, max = 206.869 us, min = 23.653 us, total = 1.150 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker.HandleRequestImpl - 10 total (0 active), Execution time: mean = 129.906 us, total = 1.299 ms, Queueing time: mean = 109.463 us, max = 137.712 us, min = 31.332 us, total = 1.095 ms [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 8 total (1 active), Execution time: mean = 449.701 s, total = 3597.605 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 7 total (0 active), Execution time: mean = 367.018 us, total = 2.569 ms, Queueing time: mean = 127.648 us, max = 189.048 us, min = 20.299 us, total = 893.539 us [state-dump] NodeManager.GCTaskFailureReason - 5 total (1 active), Execution time: mean = 6.901 us, total = 34.506 us, Queueing time: mean = 58.173 us, max = 97.290 us, min = 24.344 us, total = 290.867 us [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 1.320 ms, total = 2.640 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 142.119 us, total = 284.237 us, Queueing time: mean = 702.200 us, max = 1.287 ms, min = 117.240 us, total = 1.404 ms [state-dump] RaySyncerRegister - 2 total (0 active), Execution time: mean = 2.053 us, total = 4.106 us, Queueing time: mean = 222.000 ns, max = 369.000 ns, min = 75.000 ns, total = 444.000 ns [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 2.178 ms, total = 2.178 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 1.022 s, total = 1.022 s, Queueing time: mean = 126.779 us, max = 126.779 us, min = 126.779 us, total = 126.779 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 43.810 us, total = 43.810 us, Queueing time: mean = 116.080 us, max = 116.080 us, min = 116.080 us, total = 116.080 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 1.935 ms, total = 1.935 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 2.002 ms, total = 2.002 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 1.278 ms, total = 1.278 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 288.995 us, total = 288.995 us, Queueing time: mean = 122.868 us, max = 122.868 us, min = 122.868 us, total = 122.868 us [state-dump] Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 74.750 us, total = 74.750 us, Queueing time: mean = 225.429 us, max = 225.429 us, min = 225.429 us, total = 225.429 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 242.297 us, total = 242.297 us, Queueing time: mean = 122.977 us, max = 122.977 us, min = 122.977 us, total = 122.977 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 92.080 us, total = 92.080 us, Queueing time: mean = 36.492 us, max = 36.492 us, min = 36.492 us, total = 36.492 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 1.746 ms, total = 1.746 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] DebugString() time ms: 1 [state-dump] [state-dump] [2025-01-21 00:02:54,921 I 13636 13664] (raylet) store.cc:564: Plasma store debug dump: Current usage: 0 / 2.14748 GB - num bytes created total: 168 0 pending objects of total size 0MB - objects spillable: 0 - bytes spillable: 0 - objects unsealed: 0 - bytes unsealed: 0 - objects in use: 0 - bytes in use: 0 - objects evictable: 0 - bytes evictable: 0 - objects created by worker: 0 - bytes created by worker: 0 - objects restored: 0 - bytes restored: 0 - objects received: 0 - bytes received: 0 - objects errored: 0 - bytes errored: 0 [2025-01-21 00:02:56,095 I 13636 13636] (raylet) node_manager.cc:525: [state-dump] NodeManager: [state-dump] Node ID: 94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3 [state-dump] Node name: 192.168.0.2 [state-dump] InitialConfigResources: {object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, GPU: 20000, node:__internal_head__: 10000, accelerator_type:A40: 10000, memory: 844966629380000} [state-dump] ClusterTaskManager: [state-dump] ========== Node: 94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3 ================= [state-dump] Infeasible queue length: 0 [state-dump] Schedule queue length: 0 [state-dump] Dispatch queue length: 0 [state-dump] num_waiting_for_resource: 0 [state-dump] num_waiting_for_plasma_memory: 0 [state-dump] num_waiting_for_remote_node_resources: 0 [state-dump] num_worker_not_started_by_job_config_not_exist: 0 [state-dump] num_worker_not_started_by_registration_timeout: 0 [state-dump] num_tasks_waiting_for_workers: 0 [state-dump] num_cancelled_tasks: 0 [state-dump] cluster_resource_scheduler state: [state-dump] Local id: -2729954820020547849 Local resources: {"total":{CPU: [200000], memory: [844966629380000], accelerator_type:A40: [10000], node:192.168.0.2: [10000], node:__internal_head__: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000]}}, "available": {CPU: [200000], memory: [844966629380000], accelerator_type:A40: [10000], node:192.168.0.2: [10000], node:__internal_head__: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000]}}, "labels":{"ray.io/node_id":"94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3",} is_draining: 0 is_idle: 1 Cluster resources: node id: -2729954820020547849{"total":{object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, GPU: 20000, node:__internal_head__: 10000, accelerator_type:A40: 10000, memory: 844966629380000}}, "available": {GPU: 20000, node:__internal_head__: 10000, memory: 844966629380000, accelerator_type:A40: 10000, object_store_memory: 21474836480000, node:192.168.0.2: 10000, CPU: 200000}}, "labels":{"ray.io/node_id":"94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} [state-dump] Waiting tasks size: 0 [state-dump] Number of executing tasks: 0 [state-dump] Number of pinned task arguments: 0 [state-dump] Number of total spilled tasks: 0 [state-dump] Number of spilled waiting tasks: 0 [state-dump] Number of spilled unschedulable tasks: 0 [state-dump] Resource usage { [state-dump] } [state-dump] Backlog Size per scheduling descriptor :{workerId: num backlogs}: [state-dump] [state-dump] Running tasks by scheduling class: [state-dump] ================================================== [state-dump] [state-dump] ClusterResources: [state-dump] LocalObjectManager: [state-dump] - num pinned objects: 0 [state-dump] - pinned objects size: 0 [state-dump] - num objects pending restore: 0 [state-dump] - num objects pending spill: 0 [state-dump] - num bytes pending spill: 0 [state-dump] - num bytes currently spilled: 0 [state-dump] - cumulative spill requests: 0 [state-dump] - cumulative restore requests: 0 [state-dump] - spilled objects pending delete: 0 [state-dump] [state-dump] ObjectManager: [state-dump] - num local objects: 0 [state-dump] - num unfulfilled push requests: 0 [state-dump] - num object pull requests: 0 [state-dump] - num chunks received total: 0 [state-dump] - num chunks received failed (all): 0 [state-dump] - num chunks received failed / cancelled: 0 [state-dump] - num chunks received failed / plasma error: 0 [state-dump] Event stats: [state-dump] Global stats: 0 total (0 active) [state-dump] Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Execution time: mean = -nan s, total = 0.000 s [state-dump] Event stats: [state-dump] PushManager: [state-dump] - num pushes in flight: 0 [state-dump] - num chunks in flight: 0 [state-dump] - num chunks remaining: 0 [state-dump] - max chunks allowed: 409 [state-dump] OwnershipBasedObjectDirectory: [state-dump] - num listeners: 0 [state-dump] - cumulative location updates: 0 [state-dump] - num location updates per second: 0.000 [state-dump] - num location lookups per second: 0.000 [state-dump] - num locations added per second: 0.000 [state-dump] - num locations removed per second: 0.000 [state-dump] BufferPool: [state-dump] - create buffer state map size: 0 [state-dump] PullManager: [state-dump] - num bytes available for pulled objects: 2147483648 [state-dump] - num bytes being pulled (all): 0 [state-dump] - num bytes being pulled / pinned: 0 [state-dump] - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - first get request bundle: N/A [state-dump] - first wait request bundle: N/A [state-dump] - first task request bundle: N/A [state-dump] - num objects queued: 0 [state-dump] - num objects actively pulled (all): 0 [state-dump] - num objects actively pulled / pinned: 0 [state-dump] - num bundles being pulled: 0 [state-dump] - num pull retries: 0 [state-dump] - max timeout seconds: 0 [state-dump] - max timeout request is already processed. No entry. [state-dump] [state-dump] WorkerPool: [state-dump] - registered jobs: 1 [state-dump] - process_failed_job_config_missing: 0 [state-dump] - process_failed_rate_limited: 0 [state-dump] - process_failed_pending_registration: 0 [state-dump] - process_failed_runtime_env_setup_failed: 0 [state-dump] - num PYTHON workers: 20 [state-dump] - num PYTHON drivers: 1 [state-dump] - num PYTHON pending start requests: 0 [state-dump] - num PYTHON pending registration requests: 0 [state-dump] - num object spill callbacks queued: 0 [state-dump] - num object restore queued: 0 [state-dump] - num util functions queued: 0 [state-dump] - num idle workers: 20 [state-dump] TaskDependencyManager: [state-dump] - task deps map size: 0 [state-dump] - get req map size: 0 [state-dump] - wait req map size: 0 [state-dump] - local objects map size: 0 [state-dump] WaitManager: [state-dump] - num active wait requests: 0 [state-dump] Subscriber: [state-dump] Channel WORKER_OBJECT_LOCATIONS_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_OBJECT_EVICTION [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_REF_REMOVED_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] num async plasma notifications: 0 [state-dump] Remote node managers: [state-dump] Event stats: [state-dump] Global stats: 319553 total (35 active) [state-dump] Queueing time: mean = 273.410 us, max = 59.826 s, min = -0.001 s, total = 87.369 s [state-dump] Execution time: mean = 11.437 ms, total = 3654.866 s [state-dump] Event stats: [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog - 76839 total (0 active), Execution time: mean = 526.149 us, total = 40.429 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 76839 total (0 active), Execution time: mean = 34.984 us, total = 2.688 s, Queueing time: mean = 108.788 us, max = 2.841 ms, min = 1.846 us, total = 8.359 s [state-dump] ObjectManager.UpdateAvailableMemory - 36566 total (0 active), Execution time: mean = 6.018 us, total = 220.068 ms, Queueing time: mean = 108.729 us, max = 1.662 ms, min = 2.228 us, total = 3.976 s [state-dump] NodeManager.CheckGC - 36566 total (1 active), Execution time: mean = 2.840 us, total = 103.832 ms, Queueing time: mean = 96.752 us, max = 25.875 ms, min = 2.848 us, total = 3.538 s [state-dump] RaySyncer.OnDemandBroadcasting - 36566 total (1 active), Execution time: mean = 10.409 us, total = 380.606 ms, Queueing time: mean = 90.071 us, max = 25.869 ms, min = 6.166 us, total = 3.294 s [state-dump] RayletWorkerPool.deadline_timer.kill_idle_workers - 18293 total (1 active), Execution time: mean = 18.103 us, total = 331.166 ms, Queueing time: mean = 75.796 us, max = 26.386 ms, min = -0.001 s, total = 1.387 s [state-dump] MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 14611 total (1 active), Execution time: mean = 453.059 us, total = 6.620 s, Queueing time: mean = 74.269 us, max = 3.532 ms, min = -0.000 s, total = 1.085 s [state-dump] NodeManager.deadline_timer.spill_objects_when_over_threshold - 3660 total (1 active), Execution time: mean = 3.088 us, total = 11.302 ms, Queueing time: mean = 181.889 us, max = 2.946 ms, min = 3.958 us, total = 665.715 ms [state-dump] NodeManager.deadline_timer.flush_free_objects - 3660 total (1 active), Execution time: mean = 9.006 us, total = 32.964 ms, Queueing time: mean = 177.826 us, max = 2.947 ms, min = 3.811 us, total = 650.844 ms [state-dump] NodeManager.ScheduleAndDispatchTasks - 3660 total (1 active), Execution time: mean = 16.545 us, total = 60.554 ms, Queueing time: mean = 73.522 us, max = 2.581 ms, min = 10.666 us, total = 269.091 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 3659 total (0 active), Execution time: mean = 96.297 us, total = 352.352 ms, Queueing time: mean = 110.446 us, max = 2.934 ms, min = 4.027 us, total = 404.121 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad - 3659 total (0 active), Execution time: mean = 606.672 us, total = 2.220 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ClusterResourceManager.ResetRemoteNodeView - 1221 total (1 active), Execution time: mean = 10.086 us, total = 12.315 ms, Queueing time: mean = 74.566 us, max = 442.307 us, min = 14.043 us, total = 91.046 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 732 total (0 active), Execution time: mean = 1.581 ms, total = 1.158 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.deadline_timer.record_metrics - 732 total (1 active), Execution time: mean = 581.884 us, total = 425.939 ms, Queueing time: mean = 341.491 us, max = 2.010 ms, min = 9.115 us, total = 249.972 ms [state-dump] NodeManager.GcsCheckAlive - 732 total (1 active), Execution time: mean = 304.545 us, total = 222.927 ms, Queueing time: mean = 618.271 us, max = 2.567 ms, min = 6.690 us, total = 452.575 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 732 total (0 active), Execution time: mean = 54.640 us, total = 39.997 ms, Queueing time: mean = 113.730 us, max = 4.779 ms, min = 11.561 us, total = 83.251 ms [state-dump] NodeManager.deadline_timer.debug_state_dump - 366 total (1 active), Execution time: mean = 1.775 ms, total = 649.590 ms, Queueing time: mean = 68.931 us, max = 163.431 us, min = 11.609 us, total = 25.229 ms [state-dump] ClientConnection.async_read.ProcessMessageHeader - 95 total (21 active), Execution time: mean = 7.620 us, total = 723.861 us, Queueing time: mean = 660.815 ms, max = 59.826 s, min = 29.537 us, total = 62.777 s [state-dump] ClientConnection.async_read.ProcessMessage - 74 total (0 active), Execution time: mean = 752.373 us, total = 55.676 ms, Queueing time: mean = 18.898 us, max = 247.615 us, min = 4.453 us, total = 1.398 ms [state-dump] NodeManager.deadline_timer.print_event_loop_stats - 61 total (1 active, 1 running), Execution time: mean = 2.689 ms, total = 164.034 ms, Queueing time: mean = 72.960 us, max = 215.454 us, min = 15.835 us, total = 4.451 ms [state-dump] ClientConnection.async_write.DoAsyncWrites - 22 total (0 active), Execution time: mean = 2.129 us, total = 46.834 us, Queueing time: mean = 43.475 us, max = 264.443 us, min = 17.727 us, total = 956.459 us [state-dump] ObjectManager.ObjectDeleted - 21 total (0 active), Execution time: mean = 18.654 us, total = 391.728 us, Queueing time: mean = 66.932 us, max = 269.040 us, min = 24.280 us, total = 1.406 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 21 total (0 active), Execution time: mean = 119.783 us, total = 2.515 ms, Queueing time: mean = 119.709 us, max = 193.975 us, min = 26.529 us, total = 2.514 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig - 21 total (0 active), Execution time: mean = 800.451 us, total = 16.809 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ObjectManager.ObjectAdded - 21 total (0 active), Execution time: mean = 13.522 us, total = 283.961 us, Queueing time: mean = 115.200 us, max = 203.067 us, min = 15.853 us, total = 2.419 ms [state-dump] PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 204.482 us, total = 2.658 ms, Queueing time: mean = 3.109 ms, max = 9.915 ms, min = 28.359 us, total = 40.413 ms [state-dump] - 12 total (0 active), Execution time: mean = 1.109 us, total = 13.308 us, Queueing time: mean = 117.235 us, max = 207.332 us, min = 32.817 us, total = 1.407 ms [state-dump] RaySyncer.BroadcastMessage - 12 total (0 active), Execution time: mean = 229.790 us, total = 2.757 ms, Queueing time: mean = 796.500 ns, max = 1.330 us, min = 180.000 ns, total = 9.558 us [state-dump] NodeManagerService.grpc_server.RequestWorkerLease.HandleRequestImpl - 10 total (0 active), Execution time: mean = 133.400 us, total = 1.334 ms, Queueing time: mean = 133.654 us, max = 258.016 us, min = 50.206 us, total = 1.337 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker - 10 total (0 active), Execution time: mean = 698.937 us, total = 6.989 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.RequestWorkerLease - 10 total (0 active), Execution time: mean = 967.372 us, total = 9.674 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] WorkerPool.PopWorkerCallback - 10 total (0 active), Execution time: mean = 24.488 us, total = 244.878 us, Queueing time: mean = 114.997 us, max = 206.869 us, min = 23.653 us, total = 1.150 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker.HandleRequestImpl - 10 total (0 active), Execution time: mean = 129.906 us, total = 1.299 ms, Queueing time: mean = 109.463 us, max = 137.712 us, min = 31.332 us, total = 1.095 ms [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 8 total (1 active), Execution time: mean = 449.701 s, total = 3597.605 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 7 total (0 active), Execution time: mean = 367.018 us, total = 2.569 ms, Queueing time: mean = 127.648 us, max = 189.048 us, min = 20.299 us, total = 893.539 us [state-dump] NodeManager.GCTaskFailureReason - 5 total (1 active), Execution time: mean = 6.901 us, total = 34.506 us, Queueing time: mean = 58.173 us, max = 97.290 us, min = 24.344 us, total = 290.867 us [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 1.320 ms, total = 2.640 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 142.119 us, total = 284.237 us, Queueing time: mean = 702.200 us, max = 1.287 ms, min = 117.240 us, total = 1.404 ms [state-dump] RaySyncerRegister - 2 total (0 active), Execution time: mean = 2.053 us, total = 4.106 us, Queueing time: mean = 222.000 ns, max = 369.000 ns, min = 75.000 ns, total = 444.000 ns [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 2.178 ms, total = 2.178 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 1.022 s, total = 1.022 s, Queueing time: mean = 126.779 us, max = 126.779 us, min = 126.779 us, total = 126.779 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 43.810 us, total = 43.810 us, Queueing time: mean = 116.080 us, max = 116.080 us, min = 116.080 us, total = 116.080 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 1.935 ms, total = 1.935 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 2.002 ms, total = 2.002 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 1.278 ms, total = 1.278 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 288.995 us, total = 288.995 us, Queueing time: mean = 122.868 us, max = 122.868 us, min = 122.868 us, total = 122.868 us [state-dump] Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 74.750 us, total = 74.750 us, Queueing time: mean = 225.429 us, max = 225.429 us, min = 225.429 us, total = 225.429 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 242.297 us, total = 242.297 us, Queueing time: mean = 122.977 us, max = 122.977 us, min = 122.977 us, total = 122.977 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 92.080 us, total = 92.080 us, Queueing time: mean = 36.492 us, max = 36.492 us, min = 36.492 us, total = 36.492 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 1.746 ms, total = 1.746 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] DebugString() time ms: 1 [state-dump] [state-dump] [2025-01-21 00:03:54,921 I 13636 13664] (raylet) store.cc:564: Plasma store debug dump: Current usage: 0 / 2.14748 GB - num bytes created total: 168 0 pending objects of total size 0MB - objects spillable: 0 - bytes spillable: 0 - objects unsealed: 0 - bytes unsealed: 0 - objects in use: 0 - bytes in use: 0 - objects evictable: 0 - bytes evictable: 0 - objects created by worker: 0 - bytes created by worker: 0 - objects restored: 0 - bytes restored: 0 - objects received: 0 - bytes received: 0 - objects errored: 0 - bytes errored: 0 [2025-01-21 00:03:56,099 I 13636 13636] (raylet) node_manager.cc:525: [state-dump] NodeManager: [state-dump] Node ID: 94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3 [state-dump] Node name: 192.168.0.2 [state-dump] InitialConfigResources: {object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, GPU: 20000, node:__internal_head__: 10000, accelerator_type:A40: 10000, memory: 844966629380000} [state-dump] ClusterTaskManager: [state-dump] ========== Node: 94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3 ================= [state-dump] Infeasible queue length: 0 [state-dump] Schedule queue length: 0 [state-dump] Dispatch queue length: 0 [state-dump] num_waiting_for_resource: 0 [state-dump] num_waiting_for_plasma_memory: 0 [state-dump] num_waiting_for_remote_node_resources: 0 [state-dump] num_worker_not_started_by_job_config_not_exist: 0 [state-dump] num_worker_not_started_by_registration_timeout: 0 [state-dump] num_tasks_waiting_for_workers: 0 [state-dump] num_cancelled_tasks: 0 [state-dump] cluster_resource_scheduler state: [state-dump] Local id: -2729954820020547849 Local resources: {"total":{CPU: [200000], memory: [844966629380000], accelerator_type:A40: [10000], node:192.168.0.2: [10000], node:__internal_head__: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000]}}, "available": {CPU: [200000], memory: [844966629380000], accelerator_type:A40: [10000], node:192.168.0.2: [10000], node:__internal_head__: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000]}}, "labels":{"ray.io/node_id":"94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3",} is_draining: 0 is_idle: 1 Cluster resources: node id: -2729954820020547849{"total":{object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, GPU: 20000, node:__internal_head__: 10000, accelerator_type:A40: 10000, memory: 844966629380000}}, "available": {GPU: 20000, node:__internal_head__: 10000, memory: 844966629380000, accelerator_type:A40: 10000, object_store_memory: 21474836480000, node:192.168.0.2: 10000, CPU: 200000}}, "labels":{"ray.io/node_id":"94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} [state-dump] Waiting tasks size: 0 [state-dump] Number of executing tasks: 0 [state-dump] Number of pinned task arguments: 0 [state-dump] Number of total spilled tasks: 0 [state-dump] Number of spilled waiting tasks: 0 [state-dump] Number of spilled unschedulable tasks: 0 [state-dump] Resource usage { [state-dump] } [state-dump] Backlog Size per scheduling descriptor :{workerId: num backlogs}: [state-dump] [state-dump] Running tasks by scheduling class: [state-dump] ================================================== [state-dump] [state-dump] ClusterResources: [state-dump] LocalObjectManager: [state-dump] - num pinned objects: 0 [state-dump] - pinned objects size: 0 [state-dump] - num objects pending restore: 0 [state-dump] - num objects pending spill: 0 [state-dump] - num bytes pending spill: 0 [state-dump] - num bytes currently spilled: 0 [state-dump] - cumulative spill requests: 0 [state-dump] - cumulative restore requests: 0 [state-dump] - spilled objects pending delete: 0 [state-dump] [state-dump] ObjectManager: [state-dump] - num local objects: 0 [state-dump] - num unfulfilled push requests: 0 [state-dump] - num object pull requests: 0 [state-dump] - num chunks received total: 0 [state-dump] - num chunks received failed (all): 0 [state-dump] - num chunks received failed / cancelled: 0 [state-dump] - num chunks received failed / plasma error: 0 [state-dump] Event stats: [state-dump] Global stats: 0 total (0 active) [state-dump] Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Execution time: mean = -nan s, total = 0.000 s [state-dump] Event stats: [state-dump] PushManager: [state-dump] - num pushes in flight: 0 [state-dump] - num chunks in flight: 0 [state-dump] - num chunks remaining: 0 [state-dump] - max chunks allowed: 409 [state-dump] OwnershipBasedObjectDirectory: [state-dump] - num listeners: 0 [state-dump] - cumulative location updates: 0 [state-dump] - num location updates per second: 0.000 [state-dump] - num location lookups per second: 0.000 [state-dump] - num locations added per second: 0.000 [state-dump] - num locations removed per second: 0.000 [state-dump] BufferPool: [state-dump] - create buffer state map size: 0 [state-dump] PullManager: [state-dump] - num bytes available for pulled objects: 2147483648 [state-dump] - num bytes being pulled (all): 0 [state-dump] - num bytes being pulled / pinned: 0 [state-dump] - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - first get request bundle: N/A [state-dump] - first wait request bundle: N/A [state-dump] - first task request bundle: N/A [state-dump] - num objects queued: 0 [state-dump] - num objects actively pulled (all): 0 [state-dump] - num objects actively pulled / pinned: 0 [state-dump] - num bundles being pulled: 0 [state-dump] - num pull retries: 0 [state-dump] - max timeout seconds: 0 [state-dump] - max timeout request is already processed. No entry. [state-dump] [state-dump] WorkerPool: [state-dump] - registered jobs: 1 [state-dump] - process_failed_job_config_missing: 0 [state-dump] - process_failed_rate_limited: 0 [state-dump] - process_failed_pending_registration: 0 [state-dump] - process_failed_runtime_env_setup_failed: 0 [state-dump] - num PYTHON workers: 20 [state-dump] - num PYTHON drivers: 1 [state-dump] - num PYTHON pending start requests: 0 [state-dump] - num PYTHON pending registration requests: 0 [state-dump] - num object spill callbacks queued: 0 [state-dump] - num object restore queued: 0 [state-dump] - num util functions queued: 0 [state-dump] - num idle workers: 20 [state-dump] TaskDependencyManager: [state-dump] - task deps map size: 0 [state-dump] - get req map size: 0 [state-dump] - wait req map size: 0 [state-dump] - local objects map size: 0 [state-dump] WaitManager: [state-dump] - num active wait requests: 0 [state-dump] Subscriber: [state-dump] Channel WORKER_OBJECT_LOCATIONS_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_OBJECT_EVICTION [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_REF_REMOVED_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] num async plasma notifications: 0 [state-dump] Remote node managers: [state-dump] Event stats: [state-dump] Global stats: 324784 total (35 active) [state-dump] Queueing time: mean = 270.301 us, max = 59.826 s, min = -0.001 s, total = 87.789 s [state-dump] Execution time: mean = 11.256 ms, total = 3655.809 s [state-dump] Event stats: [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog - 78099 total (0 active), Execution time: mean = 526.334 us, total = 41.106 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 78099 total (0 active), Execution time: mean = 35.010 us, total = 2.734 s, Queueing time: mean = 108.843 us, max = 2.841 ms, min = 1.846 us, total = 8.501 s [state-dump] ObjectManager.UpdateAvailableMemory - 37165 total (0 active), Execution time: mean = 6.023 us, total = 223.854 ms, Queueing time: mean = 108.876 us, max = 1.662 ms, min = 2.228 us, total = 4.046 s [state-dump] NodeManager.CheckGC - 37165 total (1 active), Execution time: mean = 2.841 us, total = 105.567 ms, Queueing time: mean = 96.850 us, max = 25.875 ms, min = 2.848 us, total = 3.599 s [state-dump] RaySyncer.OnDemandBroadcasting - 37165 total (1 active), Execution time: mean = 10.421 us, total = 387.309 ms, Queueing time: mean = 90.159 us, max = 25.869 ms, min = 6.166 us, total = 3.351 s [state-dump] RayletWorkerPool.deadline_timer.kill_idle_workers - 18593 total (1 active), Execution time: mean = 18.103 us, total = 336.597 ms, Queueing time: mean = 75.829 us, max = 26.386 ms, min = -0.001 s, total = 1.410 s [state-dump] MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 14850 total (1 active), Execution time: mean = 453.268 us, total = 6.731 s, Queueing time: mean = 74.321 us, max = 3.532 ms, min = -0.000 s, total = 1.104 s [state-dump] NodeManager.deadline_timer.spill_objects_when_over_threshold - 3720 total (1 active), Execution time: mean = 3.085 us, total = 11.475 ms, Queueing time: mean = 181.798 us, max = 2.946 ms, min = 3.958 us, total = 676.289 ms [state-dump] NodeManager.deadline_timer.flush_free_objects - 3720 total (1 active), Execution time: mean = 8.995 us, total = 33.461 ms, Queueing time: mean = 177.739 us, max = 2.947 ms, min = 3.811 us, total = 661.190 ms [state-dump] NodeManager.ScheduleAndDispatchTasks - 3720 total (1 active), Execution time: mean = 16.573 us, total = 61.652 ms, Queueing time: mean = 73.636 us, max = 2.581 ms, min = 10.666 us, total = 273.927 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 3719 total (0 active), Execution time: mean = 96.307 us, total = 358.167 ms, Queueing time: mean = 110.532 us, max = 2.934 ms, min = 4.027 us, total = 411.067 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad - 3719 total (0 active), Execution time: mean = 607.248 us, total = 2.258 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ClusterResourceManager.ResetRemoteNodeView - 1241 total (1 active), Execution time: mean = 10.096 us, total = 12.530 ms, Queueing time: mean = 74.730 us, max = 442.307 us, min = 14.043 us, total = 92.740 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 744 total (0 active), Execution time: mean = 1.583 ms, total = 1.178 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.deadline_timer.record_metrics - 744 total (1 active), Execution time: mean = 581.251 us, total = 432.451 ms, Queueing time: mean = 341.849 us, max = 2.010 ms, min = 9.115 us, total = 254.335 ms [state-dump] NodeManager.GcsCheckAlive - 744 total (1 active), Execution time: mean = 304.878 us, total = 226.829 ms, Queueing time: mean = 617.605 us, max = 2.567 ms, min = 6.690 us, total = 459.498 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 744 total (0 active), Execution time: mean = 54.670 us, total = 40.674 ms, Queueing time: mean = 113.811 us, max = 4.779 ms, min = 11.561 us, total = 84.675 ms [state-dump] NodeManager.deadline_timer.debug_state_dump - 372 total (1 active), Execution time: mean = 1.774 ms, total = 660.024 ms, Queueing time: mean = 68.907 us, max = 163.431 us, min = 11.609 us, total = 25.633 ms [state-dump] ClientConnection.async_read.ProcessMessageHeader - 95 total (21 active), Execution time: mean = 7.620 us, total = 723.861 us, Queueing time: mean = 660.815 ms, max = 59.826 s, min = 29.537 us, total = 62.777 s [state-dump] ClientConnection.async_read.ProcessMessage - 74 total (0 active), Execution time: mean = 752.373 us, total = 55.676 ms, Queueing time: mean = 18.898 us, max = 247.615 us, min = 4.453 us, total = 1.398 ms [state-dump] NodeManager.deadline_timer.print_event_loop_stats - 62 total (1 active, 1 running), Execution time: mean = 2.690 ms, total = 166.771 ms, Queueing time: mean = 72.670 us, max = 215.454 us, min = 15.835 us, total = 4.506 ms [state-dump] ClientConnection.async_write.DoAsyncWrites - 22 total (0 active), Execution time: mean = 2.129 us, total = 46.834 us, Queueing time: mean = 43.475 us, max = 264.443 us, min = 17.727 us, total = 956.459 us [state-dump] ObjectManager.ObjectDeleted - 21 total (0 active), Execution time: mean = 18.654 us, total = 391.728 us, Queueing time: mean = 66.932 us, max = 269.040 us, min = 24.280 us, total = 1.406 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 21 total (0 active), Execution time: mean = 119.783 us, total = 2.515 ms, Queueing time: mean = 119.709 us, max = 193.975 us, min = 26.529 us, total = 2.514 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig - 21 total (0 active), Execution time: mean = 800.451 us, total = 16.809 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ObjectManager.ObjectAdded - 21 total (0 active), Execution time: mean = 13.522 us, total = 283.961 us, Queueing time: mean = 115.200 us, max = 203.067 us, min = 15.853 us, total = 2.419 ms [state-dump] PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 204.482 us, total = 2.658 ms, Queueing time: mean = 3.109 ms, max = 9.915 ms, min = 28.359 us, total = 40.413 ms [state-dump] - 12 total (0 active), Execution time: mean = 1.109 us, total = 13.308 us, Queueing time: mean = 117.235 us, max = 207.332 us, min = 32.817 us, total = 1.407 ms [state-dump] RaySyncer.BroadcastMessage - 12 total (0 active), Execution time: mean = 229.790 us, total = 2.757 ms, Queueing time: mean = 796.500 ns, max = 1.330 us, min = 180.000 ns, total = 9.558 us [state-dump] NodeManagerService.grpc_server.RequestWorkerLease.HandleRequestImpl - 10 total (0 active), Execution time: mean = 133.400 us, total = 1.334 ms, Queueing time: mean = 133.654 us, max = 258.016 us, min = 50.206 us, total = 1.337 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker - 10 total (0 active), Execution time: mean = 698.937 us, total = 6.989 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.RequestWorkerLease - 10 total (0 active), Execution time: mean = 967.372 us, total = 9.674 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] WorkerPool.PopWorkerCallback - 10 total (0 active), Execution time: mean = 24.488 us, total = 244.878 us, Queueing time: mean = 114.997 us, max = 206.869 us, min = 23.653 us, total = 1.150 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker.HandleRequestImpl - 10 total (0 active), Execution time: mean = 129.906 us, total = 1.299 ms, Queueing time: mean = 109.463 us, max = 137.712 us, min = 31.332 us, total = 1.095 ms [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 8 total (1 active), Execution time: mean = 449.701 s, total = 3597.605 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 7 total (0 active), Execution time: mean = 367.018 us, total = 2.569 ms, Queueing time: mean = 127.648 us, max = 189.048 us, min = 20.299 us, total = 893.539 us [state-dump] NodeManager.GCTaskFailureReason - 5 total (1 active), Execution time: mean = 6.901 us, total = 34.506 us, Queueing time: mean = 58.173 us, max = 97.290 us, min = 24.344 us, total = 290.867 us [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 1.320 ms, total = 2.640 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 142.119 us, total = 284.237 us, Queueing time: mean = 702.200 us, max = 1.287 ms, min = 117.240 us, total = 1.404 ms [state-dump] RaySyncerRegister - 2 total (0 active), Execution time: mean = 2.053 us, total = 4.106 us, Queueing time: mean = 222.000 ns, max = 369.000 ns, min = 75.000 ns, total = 444.000 ns [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 2.178 ms, total = 2.178 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 1.022 s, total = 1.022 s, Queueing time: mean = 126.779 us, max = 126.779 us, min = 126.779 us, total = 126.779 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 43.810 us, total = 43.810 us, Queueing time: mean = 116.080 us, max = 116.080 us, min = 116.080 us, total = 116.080 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 1.935 ms, total = 1.935 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 2.002 ms, total = 2.002 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 1.278 ms, total = 1.278 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 288.995 us, total = 288.995 us, Queueing time: mean = 122.868 us, max = 122.868 us, min = 122.868 us, total = 122.868 us [state-dump] Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 74.750 us, total = 74.750 us, Queueing time: mean = 225.429 us, max = 225.429 us, min = 225.429 us, total = 225.429 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 242.297 us, total = 242.297 us, Queueing time: mean = 122.977 us, max = 122.977 us, min = 122.977 us, total = 122.977 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 92.080 us, total = 92.080 us, Queueing time: mean = 36.492 us, max = 36.492 us, min = 36.492 us, total = 36.492 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 1.746 ms, total = 1.746 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] DebugString() time ms: 2 [state-dump] [state-dump] [2025-01-21 00:04:54,922 I 13636 13664] (raylet) store.cc:564: Plasma store debug dump: Current usage: 0 / 2.14748 GB - num bytes created total: 168 0 pending objects of total size 0MB - objects spillable: 0 - bytes spillable: 0 - objects unsealed: 0 - bytes unsealed: 0 - objects in use: 0 - bytes in use: 0 - objects evictable: 0 - bytes evictable: 0 - objects created by worker: 0 - bytes created by worker: 0 - objects restored: 0 - bytes restored: 0 - objects received: 0 - bytes received: 0 - objects errored: 0 - bytes errored: 0 [2025-01-21 00:04:56,102 I 13636 13636] (raylet) node_manager.cc:525: [state-dump] NodeManager: [state-dump] Node ID: 94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3 [state-dump] Node name: 192.168.0.2 [state-dump] InitialConfigResources: {object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, GPU: 20000, node:__internal_head__: 10000, accelerator_type:A40: 10000, memory: 844966629380000} [state-dump] ClusterTaskManager: [state-dump] ========== Node: 94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3 ================= [state-dump] Infeasible queue length: 0 [state-dump] Schedule queue length: 0 [state-dump] Dispatch queue length: 0 [state-dump] num_waiting_for_resource: 0 [state-dump] num_waiting_for_plasma_memory: 0 [state-dump] num_waiting_for_remote_node_resources: 0 [state-dump] num_worker_not_started_by_job_config_not_exist: 0 [state-dump] num_worker_not_started_by_registration_timeout: 0 [state-dump] num_tasks_waiting_for_workers: 0 [state-dump] num_cancelled_tasks: 0 [state-dump] cluster_resource_scheduler state: [state-dump] Local id: -2729954820020547849 Local resources: {"total":{CPU: [200000], memory: [844966629380000], accelerator_type:A40: [10000], node:192.168.0.2: [10000], node:__internal_head__: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000]}}, "available": {CPU: [200000], memory: [844966629380000], accelerator_type:A40: [10000], node:192.168.0.2: [10000], node:__internal_head__: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000]}}, "labels":{"ray.io/node_id":"94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3",} is_draining: 0 is_idle: 1 Cluster resources: node id: -2729954820020547849{"total":{object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, GPU: 20000, node:__internal_head__: 10000, accelerator_type:A40: 10000, memory: 844966629380000}}, "available": {GPU: 20000, node:__internal_head__: 10000, memory: 844966629380000, accelerator_type:A40: 10000, object_store_memory: 21474836480000, node:192.168.0.2: 10000, CPU: 200000}}, "labels":{"ray.io/node_id":"94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} [state-dump] Waiting tasks size: 0 [state-dump] Number of executing tasks: 0 [state-dump] Number of pinned task arguments: 0 [state-dump] Number of total spilled tasks: 0 [state-dump] Number of spilled waiting tasks: 0 [state-dump] Number of spilled unschedulable tasks: 0 [state-dump] Resource usage { [state-dump] } [state-dump] Backlog Size per scheduling descriptor :{workerId: num backlogs}: [state-dump] [state-dump] Running tasks by scheduling class: [state-dump] ================================================== [state-dump] [state-dump] ClusterResources: [state-dump] LocalObjectManager: [state-dump] - num pinned objects: 0 [state-dump] - pinned objects size: 0 [state-dump] - num objects pending restore: 0 [state-dump] - num objects pending spill: 0 [state-dump] - num bytes pending spill: 0 [state-dump] - num bytes currently spilled: 0 [state-dump] - cumulative spill requests: 0 [state-dump] - cumulative restore requests: 0 [state-dump] - spilled objects pending delete: 0 [state-dump] [state-dump] ObjectManager: [state-dump] - num local objects: 0 [state-dump] - num unfulfilled push requests: 0 [state-dump] - num object pull requests: 0 [state-dump] - num chunks received total: 0 [state-dump] - num chunks received failed (all): 0 [state-dump] - num chunks received failed / cancelled: 0 [state-dump] - num chunks received failed / plasma error: 0 [state-dump] Event stats: [state-dump] Global stats: 0 total (0 active) [state-dump] Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Execution time: mean = -nan s, total = 0.000 s [state-dump] Event stats: [state-dump] PushManager: [state-dump] - num pushes in flight: 0 [state-dump] - num chunks in flight: 0 [state-dump] - num chunks remaining: 0 [state-dump] - max chunks allowed: 409 [state-dump] OwnershipBasedObjectDirectory: [state-dump] - num listeners: 0 [state-dump] - cumulative location updates: 0 [state-dump] - num location updates per second: 0.000 [state-dump] - num location lookups per second: 0.000 [state-dump] - num locations added per second: 0.000 [state-dump] - num locations removed per second: 0.000 [state-dump] BufferPool: [state-dump] - create buffer state map size: 0 [state-dump] PullManager: [state-dump] - num bytes available for pulled objects: 2147483648 [state-dump] - num bytes being pulled (all): 0 [state-dump] - num bytes being pulled / pinned: 0 [state-dump] - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - first get request bundle: N/A [state-dump] - first wait request bundle: N/A [state-dump] - first task request bundle: N/A [state-dump] - num objects queued: 0 [state-dump] - num objects actively pulled (all): 0 [state-dump] - num objects actively pulled / pinned: 0 [state-dump] - num bundles being pulled: 0 [state-dump] - num pull retries: 0 [state-dump] - max timeout seconds: 0 [state-dump] - max timeout request is already processed. No entry. [state-dump] [state-dump] WorkerPool: [state-dump] - registered jobs: 1 [state-dump] - process_failed_job_config_missing: 0 [state-dump] - process_failed_rate_limited: 0 [state-dump] - process_failed_pending_registration: 0 [state-dump] - process_failed_runtime_env_setup_failed: 0 [state-dump] - num PYTHON workers: 20 [state-dump] - num PYTHON drivers: 1 [state-dump] - num PYTHON pending start requests: 0 [state-dump] - num PYTHON pending registration requests: 0 [state-dump] - num object spill callbacks queued: 0 [state-dump] - num object restore queued: 0 [state-dump] - num util functions queued: 0 [state-dump] - num idle workers: 20 [state-dump] TaskDependencyManager: [state-dump] - task deps map size: 0 [state-dump] - get req map size: 0 [state-dump] - wait req map size: 0 [state-dump] - local objects map size: 0 [state-dump] WaitManager: [state-dump] - num active wait requests: 0 [state-dump] Subscriber: [state-dump] Channel WORKER_OBJECT_LOCATIONS_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_OBJECT_EVICTION [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_REF_REMOVED_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] num async plasma notifications: 0 [state-dump] Remote node managers: [state-dump] Event stats: [state-dump] Global stats: 330019 total (35 active) [state-dump] Queueing time: mean = 267.310 us, max = 59.826 s, min = -0.001 s, total = 88.217 s [state-dump] Execution time: mean = 11.081 ms, total = 3656.796 s [state-dump] Event stats: [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog - 79359 total (0 active), Execution time: mean = 526.968 us, total = 41.820 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 79359 total (0 active), Execution time: mean = 35.056 us, total = 2.782 s, Queueing time: mean = 109.035 us, max = 2.841 ms, min = 1.846 us, total = 8.653 s [state-dump] ObjectManager.UpdateAvailableMemory - 37765 total (0 active), Execution time: mean = 6.030 us, total = 227.719 ms, Queueing time: mean = 109.051 us, max = 1.662 ms, min = 2.228 us, total = 4.118 s [state-dump] NodeManager.CheckGC - 37765 total (1 active), Execution time: mean = 2.842 us, total = 107.326 ms, Queueing time: mean = 96.849 us, max = 25.875 ms, min = 2.848 us, total = 3.658 s [state-dump] RaySyncer.OnDemandBroadcasting - 37765 total (1 active), Execution time: mean = 10.422 us, total = 393.581 ms, Queueing time: mean = 90.161 us, max = 25.869 ms, min = 6.166 us, total = 3.405 s [state-dump] RayletWorkerPool.deadline_timer.kill_idle_workers - 18893 total (1 active), Execution time: mean = 18.116 us, total = 342.273 ms, Queueing time: mean = 75.988 us, max = 26.386 ms, min = -0.001 s, total = 1.436 s [state-dump] MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 15090 total (1 active), Execution time: mean = 453.632 us, total = 6.845 s, Queueing time: mean = 74.392 us, max = 3.532 ms, min = -0.000 s, total = 1.123 s [state-dump] NodeManager.deadline_timer.spill_objects_when_over_threshold - 3780 total (1 active), Execution time: mean = 3.086 us, total = 11.665 ms, Queueing time: mean = 181.712 us, max = 2.946 ms, min = 3.958 us, total = 686.870 ms [state-dump] NodeManager.deadline_timer.flush_free_objects - 3780 total (1 active), Execution time: mean = 9.004 us, total = 34.033 ms, Queueing time: mean = 177.649 us, max = 2.947 ms, min = 3.811 us, total = 671.514 ms [state-dump] NodeManager.ScheduleAndDispatchTasks - 3780 total (1 active), Execution time: mean = 16.579 us, total = 62.667 ms, Queueing time: mean = 73.616 us, max = 2.581 ms, min = 10.666 us, total = 278.267 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 3779 total (0 active), Execution time: mean = 96.321 us, total = 363.999 ms, Queueing time: mean = 110.714 us, max = 2.934 ms, min = 4.027 us, total = 418.389 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad - 3779 total (0 active), Execution time: mean = 608.334 us, total = 2.299 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ClusterResourceManager.ResetRemoteNodeView - 1261 total (1 active), Execution time: mean = 10.082 us, total = 12.714 ms, Queueing time: mean = 74.843 us, max = 442.307 us, min = 14.043 us, total = 94.378 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 756 total (0 active), Execution time: mean = 1.585 ms, total = 1.198 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.deadline_timer.record_metrics - 756 total (1 active), Execution time: mean = 581.552 us, total = 439.653 ms, Queueing time: mean = 341.117 us, max = 2.010 ms, min = 9.115 us, total = 257.884 ms [state-dump] NodeManager.GcsCheckAlive - 756 total (1 active), Execution time: mean = 304.658 us, total = 230.322 ms, Queueing time: mean = 617.362 us, max = 2.567 ms, min = 6.690 us, total = 466.726 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 756 total (0 active), Execution time: mean = 54.703 us, total = 41.355 ms, Queueing time: mean = 113.776 us, max = 4.779 ms, min = 11.561 us, total = 86.015 ms [state-dump] NodeManager.deadline_timer.debug_state_dump - 378 total (1 active), Execution time: mean = 1.773 ms, total = 670.382 ms, Queueing time: mean = 68.881 us, max = 163.431 us, min = 11.609 us, total = 26.037 ms [state-dump] ClientConnection.async_read.ProcessMessageHeader - 95 total (21 active), Execution time: mean = 7.620 us, total = 723.861 us, Queueing time: mean = 660.815 ms, max = 59.826 s, min = 29.537 us, total = 62.777 s [state-dump] ClientConnection.async_read.ProcessMessage - 74 total (0 active), Execution time: mean = 752.373 us, total = 55.676 ms, Queueing time: mean = 18.898 us, max = 247.615 us, min = 4.453 us, total = 1.398 ms [state-dump] NodeManager.deadline_timer.print_event_loop_stats - 63 total (1 active, 1 running), Execution time: mean = 2.699 ms, total = 170.041 ms, Queueing time: mean = 72.652 us, max = 215.454 us, min = 15.835 us, total = 4.577 ms [state-dump] ClientConnection.async_write.DoAsyncWrites - 22 total (0 active), Execution time: mean = 2.129 us, total = 46.834 us, Queueing time: mean = 43.475 us, max = 264.443 us, min = 17.727 us, total = 956.459 us [state-dump] ObjectManager.ObjectDeleted - 21 total (0 active), Execution time: mean = 18.654 us, total = 391.728 us, Queueing time: mean = 66.932 us, max = 269.040 us, min = 24.280 us, total = 1.406 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 21 total (0 active), Execution time: mean = 119.783 us, total = 2.515 ms, Queueing time: mean = 119.709 us, max = 193.975 us, min = 26.529 us, total = 2.514 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig - 21 total (0 active), Execution time: mean = 800.451 us, total = 16.809 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ObjectManager.ObjectAdded - 21 total (0 active), Execution time: mean = 13.522 us, total = 283.961 us, Queueing time: mean = 115.200 us, max = 203.067 us, min = 15.853 us, total = 2.419 ms [state-dump] PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 204.482 us, total = 2.658 ms, Queueing time: mean = 3.109 ms, max = 9.915 ms, min = 28.359 us, total = 40.413 ms [state-dump] - 12 total (0 active), Execution time: mean = 1.109 us, total = 13.308 us, Queueing time: mean = 117.235 us, max = 207.332 us, min = 32.817 us, total = 1.407 ms [state-dump] RaySyncer.BroadcastMessage - 12 total (0 active), Execution time: mean = 229.790 us, total = 2.757 ms, Queueing time: mean = 796.500 ns, max = 1.330 us, min = 180.000 ns, total = 9.558 us [state-dump] NodeManagerService.grpc_server.RequestWorkerLease.HandleRequestImpl - 10 total (0 active), Execution time: mean = 133.400 us, total = 1.334 ms, Queueing time: mean = 133.654 us, max = 258.016 us, min = 50.206 us, total = 1.337 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker - 10 total (0 active), Execution time: mean = 698.937 us, total = 6.989 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.RequestWorkerLease - 10 total (0 active), Execution time: mean = 967.372 us, total = 9.674 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] WorkerPool.PopWorkerCallback - 10 total (0 active), Execution time: mean = 24.488 us, total = 244.878 us, Queueing time: mean = 114.997 us, max = 206.869 us, min = 23.653 us, total = 1.150 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker.HandleRequestImpl - 10 total (0 active), Execution time: mean = 129.906 us, total = 1.299 ms, Queueing time: mean = 109.463 us, max = 137.712 us, min = 31.332 us, total = 1.095 ms [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 8 total (1 active), Execution time: mean = 449.701 s, total = 3597.605 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 7 total (0 active), Execution time: mean = 367.018 us, total = 2.569 ms, Queueing time: mean = 127.648 us, max = 189.048 us, min = 20.299 us, total = 893.539 us [state-dump] NodeManager.GCTaskFailureReason - 5 total (1 active), Execution time: mean = 6.901 us, total = 34.506 us, Queueing time: mean = 58.173 us, max = 97.290 us, min = 24.344 us, total = 290.867 us [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 1.320 ms, total = 2.640 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 142.119 us, total = 284.237 us, Queueing time: mean = 702.200 us, max = 1.287 ms, min = 117.240 us, total = 1.404 ms [state-dump] RaySyncerRegister - 2 total (0 active), Execution time: mean = 2.053 us, total = 4.106 us, Queueing time: mean = 222.000 ns, max = 369.000 ns, min = 75.000 ns, total = 444.000 ns [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 2.178 ms, total = 2.178 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 1.022 s, total = 1.022 s, Queueing time: mean = 126.779 us, max = 126.779 us, min = 126.779 us, total = 126.779 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 43.810 us, total = 43.810 us, Queueing time: mean = 116.080 us, max = 116.080 us, min = 116.080 us, total = 116.080 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 1.935 ms, total = 1.935 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 2.002 ms, total = 2.002 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 1.278 ms, total = 1.278 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 288.995 us, total = 288.995 us, Queueing time: mean = 122.868 us, max = 122.868 us, min = 122.868 us, total = 122.868 us [state-dump] Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 74.750 us, total = 74.750 us, Queueing time: mean = 225.429 us, max = 225.429 us, min = 225.429 us, total = 225.429 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 242.297 us, total = 242.297 us, Queueing time: mean = 122.977 us, max = 122.977 us, min = 122.977 us, total = 122.977 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 92.080 us, total = 92.080 us, Queueing time: mean = 36.492 us, max = 36.492 us, min = 36.492 us, total = 36.492 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 1.746 ms, total = 1.746 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] DebugString() time ms: 1 [state-dump] [state-dump] [2025-01-21 00:05:54,922 I 13636 13664] (raylet) store.cc:564: Plasma store debug dump: Current usage: 0 / 2.14748 GB - num bytes created total: 168 0 pending objects of total size 0MB - objects spillable: 0 - bytes spillable: 0 - objects unsealed: 0 - bytes unsealed: 0 - objects in use: 0 - bytes in use: 0 - objects evictable: 0 - bytes evictable: 0 - objects created by worker: 0 - bytes created by worker: 0 - objects restored: 0 - bytes restored: 0 - objects received: 0 - bytes received: 0 - objects errored: 0 - bytes errored: 0 [2025-01-21 00:05:56,104 I 13636 13636] (raylet) node_manager.cc:525: [state-dump] NodeManager: [state-dump] Node ID: 94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3 [state-dump] Node name: 192.168.0.2 [state-dump] InitialConfigResources: {object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, GPU: 20000, node:__internal_head__: 10000, accelerator_type:A40: 10000, memory: 844966629380000} [state-dump] ClusterTaskManager: [state-dump] ========== Node: 94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3 ================= [state-dump] Infeasible queue length: 0 [state-dump] Schedule queue length: 0 [state-dump] Dispatch queue length: 0 [state-dump] num_waiting_for_resource: 0 [state-dump] num_waiting_for_plasma_memory: 0 [state-dump] num_waiting_for_remote_node_resources: 0 [state-dump] num_worker_not_started_by_job_config_not_exist: 0 [state-dump] num_worker_not_started_by_registration_timeout: 0 [state-dump] num_tasks_waiting_for_workers: 0 [state-dump] num_cancelled_tasks: 0 [state-dump] cluster_resource_scheduler state: [state-dump] Local id: -2729954820020547849 Local resources: {"total":{CPU: [200000], memory: [844966629380000], accelerator_type:A40: [10000], node:192.168.0.2: [10000], node:__internal_head__: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000]}}, "available": {CPU: [200000], memory: [844966629380000], accelerator_type:A40: [10000], node:192.168.0.2: [10000], node:__internal_head__: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000]}}, "labels":{"ray.io/node_id":"94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3",} is_draining: 0 is_idle: 1 Cluster resources: node id: -2729954820020547849{"total":{object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, GPU: 20000, node:__internal_head__: 10000, accelerator_type:A40: 10000, memory: 844966629380000}}, "available": {GPU: 20000, node:__internal_head__: 10000, memory: 844966629380000, accelerator_type:A40: 10000, object_store_memory: 21474836480000, node:192.168.0.2: 10000, CPU: 200000}}, "labels":{"ray.io/node_id":"94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} [state-dump] Waiting tasks size: 0 [state-dump] Number of executing tasks: 0 [state-dump] Number of pinned task arguments: 0 [state-dump] Number of total spilled tasks: 0 [state-dump] Number of spilled waiting tasks: 0 [state-dump] Number of spilled unschedulable tasks: 0 [state-dump] Resource usage { [state-dump] } [state-dump] Backlog Size per scheduling descriptor :{workerId: num backlogs}: [state-dump] [state-dump] Running tasks by scheduling class: [state-dump] ================================================== [state-dump] [state-dump] ClusterResources: [state-dump] LocalObjectManager: [state-dump] - num pinned objects: 0 [state-dump] - pinned objects size: 0 [state-dump] - num objects pending restore: 0 [state-dump] - num objects pending spill: 0 [state-dump] - num bytes pending spill: 0 [state-dump] - num bytes currently spilled: 0 [state-dump] - cumulative spill requests: 0 [state-dump] - cumulative restore requests: 0 [state-dump] - spilled objects pending delete: 0 [state-dump] [state-dump] ObjectManager: [state-dump] - num local objects: 0 [state-dump] - num unfulfilled push requests: 0 [state-dump] - num object pull requests: 0 [state-dump] - num chunks received total: 0 [state-dump] - num chunks received failed (all): 0 [state-dump] - num chunks received failed / cancelled: 0 [state-dump] - num chunks received failed / plasma error: 0 [state-dump] Event stats: [state-dump] Global stats: 0 total (0 active) [state-dump] Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Execution time: mean = -nan s, total = 0.000 s [state-dump] Event stats: [state-dump] PushManager: [state-dump] - num pushes in flight: 0 [state-dump] - num chunks in flight: 0 [state-dump] - num chunks remaining: 0 [state-dump] - max chunks allowed: 409 [state-dump] OwnershipBasedObjectDirectory: [state-dump] - num listeners: 0 [state-dump] - cumulative location updates: 0 [state-dump] - num location updates per second: 0.000 [state-dump] - num location lookups per second: 0.000 [state-dump] - num locations added per second: 0.000 [state-dump] - num locations removed per second: 0.000 [state-dump] BufferPool: [state-dump] - create buffer state map size: 0 [state-dump] PullManager: [state-dump] - num bytes available for pulled objects: 2147483648 [state-dump] - num bytes being pulled (all): 0 [state-dump] - num bytes being pulled / pinned: 0 [state-dump] - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - first get request bundle: N/A [state-dump] - first wait request bundle: N/A [state-dump] - first task request bundle: N/A [state-dump] - num objects queued: 0 [state-dump] - num objects actively pulled (all): 0 [state-dump] - num objects actively pulled / pinned: 0 [state-dump] - num bundles being pulled: 0 [state-dump] - num pull retries: 0 [state-dump] - max timeout seconds: 0 [state-dump] - max timeout request is already processed. No entry. [state-dump] [state-dump] WorkerPool: [state-dump] - registered jobs: 1 [state-dump] - process_failed_job_config_missing: 0 [state-dump] - process_failed_rate_limited: 0 [state-dump] - process_failed_pending_registration: 0 [state-dump] - process_failed_runtime_env_setup_failed: 0 [state-dump] - num PYTHON workers: 20 [state-dump] - num PYTHON drivers: 1 [state-dump] - num PYTHON pending start requests: 0 [state-dump] - num PYTHON pending registration requests: 0 [state-dump] - num object spill callbacks queued: 0 [state-dump] - num object restore queued: 0 [state-dump] - num util functions queued: 0 [state-dump] - num idle workers: 20 [state-dump] TaskDependencyManager: [state-dump] - task deps map size: 0 [state-dump] - get req map size: 0 [state-dump] - wait req map size: 0 [state-dump] - local objects map size: 0 [state-dump] WaitManager: [state-dump] - num active wait requests: 0 [state-dump] Subscriber: [state-dump] Channel WORKER_OBJECT_LOCATIONS_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_OBJECT_EVICTION [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_REF_REMOVED_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] num async plasma notifications: 0 [state-dump] Remote node managers: [state-dump] Event stats: [state-dump] Global stats: 335248 total (35 active) [state-dump] Queueing time: mean = 264.486 us, max = 59.826 s, min = -0.001 s, total = 88.668 s [state-dump] Execution time: mean = 10.911 ms, total = 3657.760 s [state-dump] Event stats: [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog - 80618 total (0 active), Execution time: mean = 527.336 us, total = 42.513 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 80618 total (0 active), Execution time: mean = 35.081 us, total = 2.828 s, Queueing time: mean = 109.191 us, max = 2.841 ms, min = 1.846 us, total = 8.803 s [state-dump] ObjectManager.UpdateAvailableMemory - 38364 total (0 active), Execution time: mean = 6.039 us, total = 231.696 ms, Queueing time: mean = 109.179 us, max = 1.662 ms, min = 2.228 us, total = 4.189 s [state-dump] NodeManager.CheckGC - 38364 total (1 active), Execution time: mean = 2.862 us, total = 109.788 ms, Queueing time: mean = 97.153 us, max = 25.875 ms, min = -0.000 s, total = 3.727 s [state-dump] RaySyncer.OnDemandBroadcasting - 38364 total (1 active), Execution time: mean = 10.437 us, total = 400.396 ms, Queueing time: mean = 90.469 us, max = 25.869 ms, min = 6.166 us, total = 3.471 s [state-dump] RayletWorkerPool.deadline_timer.kill_idle_workers - 19193 total (1 active), Execution time: mean = 18.164 us, total = 348.621 ms, Queueing time: mean = 76.068 us, max = 26.386 ms, min = -0.001 s, total = 1.460 s [state-dump] MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 15329 total (1 active), Execution time: mean = 453.826 us, total = 6.957 s, Queueing time: mean = 74.462 us, max = 3.532 ms, min = -0.000 s, total = 1.141 s [state-dump] NodeManager.deadline_timer.spill_objects_when_over_threshold - 3840 total (1 active), Execution time: mean = 3.087 us, total = 11.855 ms, Queueing time: mean = 182.021 us, max = 2.946 ms, min = 3.958 us, total = 698.959 ms [state-dump] NodeManager.deadline_timer.flush_free_objects - 3840 total (1 active), Execution time: mean = 9.012 us, total = 34.604 ms, Queueing time: mean = 177.954 us, max = 2.947 ms, min = 3.811 us, total = 683.344 ms [state-dump] NodeManager.ScheduleAndDispatchTasks - 3840 total (1 active), Execution time: mean = 16.611 us, total = 63.787 ms, Queueing time: mean = 73.605 us, max = 2.581 ms, min = 10.666 us, total = 282.644 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 3839 total (0 active), Execution time: mean = 96.336 us, total = 369.833 ms, Queueing time: mean = 110.873 us, max = 2.934 ms, min = 4.027 us, total = 425.640 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad - 3839 total (0 active), Execution time: mean = 608.909 us, total = 2.338 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ClusterResourceManager.ResetRemoteNodeView - 1281 total (1 active), Execution time: mean = 10.066 us, total = 12.895 ms, Queueing time: mean = 75.297 us, max = 442.307 us, min = 14.043 us, total = 96.455 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 768 total (0 active), Execution time: mean = 1.586 ms, total = 1.218 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.deadline_timer.record_metrics - 768 total (1 active), Execution time: mean = 582.234 us, total = 447.156 ms, Queueing time: mean = 341.853 us, max = 2.010 ms, min = 9.115 us, total = 262.543 ms [state-dump] NodeManager.GcsCheckAlive - 768 total (1 active), Execution time: mean = 305.363 us, total = 234.519 ms, Queueing time: mean = 618.115 us, max = 2.567 ms, min = 6.690 us, total = 474.712 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 768 total (0 active), Execution time: mean = 54.707 us, total = 42.015 ms, Queueing time: mean = 113.666 us, max = 4.779 ms, min = 11.561 us, total = 87.296 ms [state-dump] NodeManager.deadline_timer.debug_state_dump - 384 total (1 active), Execution time: mean = 1.776 ms, total = 681.936 ms, Queueing time: mean = 68.947 us, max = 163.431 us, min = 11.609 us, total = 26.476 ms [state-dump] ClientConnection.async_read.ProcessMessageHeader - 95 total (21 active), Execution time: mean = 7.620 us, total = 723.861 us, Queueing time: mean = 660.815 ms, max = 59.826 s, min = 29.537 us, total = 62.777 s [state-dump] ClientConnection.async_read.ProcessMessage - 74 total (0 active), Execution time: mean = 752.373 us, total = 55.676 ms, Queueing time: mean = 18.898 us, max = 247.615 us, min = 4.453 us, total = 1.398 ms [state-dump] NodeManager.deadline_timer.print_event_loop_stats - 64 total (1 active, 1 running), Execution time: mean = 2.697 ms, total = 172.628 ms, Queueing time: mean = 73.828 us, max = 215.454 us, min = 15.835 us, total = 4.725 ms [state-dump] ClientConnection.async_write.DoAsyncWrites - 22 total (0 active), Execution time: mean = 2.129 us, total = 46.834 us, Queueing time: mean = 43.475 us, max = 264.443 us, min = 17.727 us, total = 956.459 us [state-dump] ObjectManager.ObjectDeleted - 21 total (0 active), Execution time: mean = 18.654 us, total = 391.728 us, Queueing time: mean = 66.932 us, max = 269.040 us, min = 24.280 us, total = 1.406 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 21 total (0 active), Execution time: mean = 119.783 us, total = 2.515 ms, Queueing time: mean = 119.709 us, max = 193.975 us, min = 26.529 us, total = 2.514 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig - 21 total (0 active), Execution time: mean = 800.451 us, total = 16.809 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ObjectManager.ObjectAdded - 21 total (0 active), Execution time: mean = 13.522 us, total = 283.961 us, Queueing time: mean = 115.200 us, max = 203.067 us, min = 15.853 us, total = 2.419 ms [state-dump] PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 204.482 us, total = 2.658 ms, Queueing time: mean = 3.109 ms, max = 9.915 ms, min = 28.359 us, total = 40.413 ms [state-dump] - 12 total (0 active), Execution time: mean = 1.109 us, total = 13.308 us, Queueing time: mean = 117.235 us, max = 207.332 us, min = 32.817 us, total = 1.407 ms [state-dump] RaySyncer.BroadcastMessage - 12 total (0 active), Execution time: mean = 229.790 us, total = 2.757 ms, Queueing time: mean = 796.500 ns, max = 1.330 us, min = 180.000 ns, total = 9.558 us [state-dump] NodeManagerService.grpc_server.RequestWorkerLease.HandleRequestImpl - 10 total (0 active), Execution time: mean = 133.400 us, total = 1.334 ms, Queueing time: mean = 133.654 us, max = 258.016 us, min = 50.206 us, total = 1.337 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker - 10 total (0 active), Execution time: mean = 698.937 us, total = 6.989 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.RequestWorkerLease - 10 total (0 active), Execution time: mean = 967.372 us, total = 9.674 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] WorkerPool.PopWorkerCallback - 10 total (0 active), Execution time: mean = 24.488 us, total = 244.878 us, Queueing time: mean = 114.997 us, max = 206.869 us, min = 23.653 us, total = 1.150 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker.HandleRequestImpl - 10 total (0 active), Execution time: mean = 129.906 us, total = 1.299 ms, Queueing time: mean = 109.463 us, max = 137.712 us, min = 31.332 us, total = 1.095 ms [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 8 total (1 active), Execution time: mean = 449.701 s, total = 3597.605 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 7 total (0 active), Execution time: mean = 367.018 us, total = 2.569 ms, Queueing time: mean = 127.648 us, max = 189.048 us, min = 20.299 us, total = 893.539 us [state-dump] NodeManager.GCTaskFailureReason - 5 total (1 active), Execution time: mean = 6.901 us, total = 34.506 us, Queueing time: mean = 58.173 us, max = 97.290 us, min = 24.344 us, total = 290.867 us [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 1.320 ms, total = 2.640 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 142.119 us, total = 284.237 us, Queueing time: mean = 702.200 us, max = 1.287 ms, min = 117.240 us, total = 1.404 ms [state-dump] RaySyncerRegister - 2 total (0 active), Execution time: mean = 2.053 us, total = 4.106 us, Queueing time: mean = 222.000 ns, max = 369.000 ns, min = 75.000 ns, total = 444.000 ns [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 2.178 ms, total = 2.178 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 1.022 s, total = 1.022 s, Queueing time: mean = 126.779 us, max = 126.779 us, min = 126.779 us, total = 126.779 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 43.810 us, total = 43.810 us, Queueing time: mean = 116.080 us, max = 116.080 us, min = 116.080 us, total = 116.080 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 1.935 ms, total = 1.935 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 2.002 ms, total = 2.002 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 1.278 ms, total = 1.278 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 288.995 us, total = 288.995 us, Queueing time: mean = 122.868 us, max = 122.868 us, min = 122.868 us, total = 122.868 us [state-dump] Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 74.750 us, total = 74.750 us, Queueing time: mean = 225.429 us, max = 225.429 us, min = 225.429 us, total = 225.429 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 242.297 us, total = 242.297 us, Queueing time: mean = 122.977 us, max = 122.977 us, min = 122.977 us, total = 122.977 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 92.080 us, total = 92.080 us, Queueing time: mean = 36.492 us, max = 36.492 us, min = 36.492 us, total = 36.492 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 1.746 ms, total = 1.746 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] DebugString() time ms: 2 [state-dump] [state-dump] [2025-01-21 00:06:54,922 I 13636 13664] (raylet) store.cc:564: Plasma store debug dump: Current usage: 0 / 2.14748 GB - num bytes created total: 168 0 pending objects of total size 0MB - objects spillable: 0 - bytes spillable: 0 - objects unsealed: 0 - bytes unsealed: 0 - objects in use: 0 - bytes in use: 0 - objects evictable: 0 - bytes evictable: 0 - objects created by worker: 0 - bytes created by worker: 0 - objects restored: 0 - bytes restored: 0 - objects received: 0 - bytes received: 0 - objects errored: 0 - bytes errored: 0 [2025-01-21 00:06:56,107 I 13636 13636] (raylet) node_manager.cc:525: [state-dump] NodeManager: [state-dump] Node ID: 94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3 [state-dump] Node name: 192.168.0.2 [state-dump] InitialConfigResources: {object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, GPU: 20000, node:__internal_head__: 10000, accelerator_type:A40: 10000, memory: 844966629380000} [state-dump] ClusterTaskManager: [state-dump] ========== Node: 94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3 ================= [state-dump] Infeasible queue length: 0 [state-dump] Schedule queue length: 0 [state-dump] Dispatch queue length: 0 [state-dump] num_waiting_for_resource: 0 [state-dump] num_waiting_for_plasma_memory: 0 [state-dump] num_waiting_for_remote_node_resources: 0 [state-dump] num_worker_not_started_by_job_config_not_exist: 0 [state-dump] num_worker_not_started_by_registration_timeout: 0 [state-dump] num_tasks_waiting_for_workers: 0 [state-dump] num_cancelled_tasks: 0 [state-dump] cluster_resource_scheduler state: [state-dump] Local id: -2729954820020547849 Local resources: {"total":{CPU: [200000], memory: [844966629380000], accelerator_type:A40: [10000], node:192.168.0.2: [10000], node:__internal_head__: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000]}}, "available": {CPU: [200000], memory: [844966629380000], accelerator_type:A40: [10000], node:192.168.0.2: [10000], node:__internal_head__: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000]}}, "labels":{"ray.io/node_id":"94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3",} is_draining: 0 is_idle: 1 Cluster resources: node id: -2729954820020547849{"total":{object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, GPU: 20000, node:__internal_head__: 10000, accelerator_type:A40: 10000, memory: 844966629380000}}, "available": {GPU: 20000, node:__internal_head__: 10000, memory: 844966629380000, accelerator_type:A40: 10000, object_store_memory: 21474836480000, node:192.168.0.2: 10000, CPU: 200000}}, "labels":{"ray.io/node_id":"94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} [state-dump] Waiting tasks size: 0 [state-dump] Number of executing tasks: 0 [state-dump] Number of pinned task arguments: 0 [state-dump] Number of total spilled tasks: 0 [state-dump] Number of spilled waiting tasks: 0 [state-dump] Number of spilled unschedulable tasks: 0 [state-dump] Resource usage { [state-dump] } [state-dump] Backlog Size per scheduling descriptor :{workerId: num backlogs}: [state-dump] [state-dump] Running tasks by scheduling class: [state-dump] ================================================== [state-dump] [state-dump] ClusterResources: [state-dump] LocalObjectManager: [state-dump] - num pinned objects: 0 [state-dump] - pinned objects size: 0 [state-dump] - num objects pending restore: 0 [state-dump] - num objects pending spill: 0 [state-dump] - num bytes pending spill: 0 [state-dump] - num bytes currently spilled: 0 [state-dump] - cumulative spill requests: 0 [state-dump] - cumulative restore requests: 0 [state-dump] - spilled objects pending delete: 0 [state-dump] [state-dump] ObjectManager: [state-dump] - num local objects: 0 [state-dump] - num unfulfilled push requests: 0 [state-dump] - num object pull requests: 0 [state-dump] - num chunks received total: 0 [state-dump] - num chunks received failed (all): 0 [state-dump] - num chunks received failed / cancelled: 0 [state-dump] - num chunks received failed / plasma error: 0 [state-dump] Event stats: [state-dump] Global stats: 0 total (0 active) [state-dump] Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Execution time: mean = -nan s, total = 0.000 s [state-dump] Event stats: [state-dump] PushManager: [state-dump] - num pushes in flight: 0 [state-dump] - num chunks in flight: 0 [state-dump] - num chunks remaining: 0 [state-dump] - max chunks allowed: 409 [state-dump] OwnershipBasedObjectDirectory: [state-dump] - num listeners: 0 [state-dump] - cumulative location updates: 0 [state-dump] - num location updates per second: 0.000 [state-dump] - num location lookups per second: 0.000 [state-dump] - num locations added per second: 0.000 [state-dump] - num locations removed per second: 0.000 [state-dump] BufferPool: [state-dump] - create buffer state map size: 0 [state-dump] PullManager: [state-dump] - num bytes available for pulled objects: 2147483648 [state-dump] - num bytes being pulled (all): 0 [state-dump] - num bytes being pulled / pinned: 0 [state-dump] - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - first get request bundle: N/A [state-dump] - first wait request bundle: N/A [state-dump] - first task request bundle: N/A [state-dump] - num objects queued: 0 [state-dump] - num objects actively pulled (all): 0 [state-dump] - num objects actively pulled / pinned: 0 [state-dump] - num bundles being pulled: 0 [state-dump] - num pull retries: 0 [state-dump] - max timeout seconds: 0 [state-dump] - max timeout request is already processed. No entry. [state-dump] [state-dump] WorkerPool: [state-dump] - registered jobs: 1 [state-dump] - process_failed_job_config_missing: 0 [state-dump] - process_failed_rate_limited: 0 [state-dump] - process_failed_pending_registration: 0 [state-dump] - process_failed_runtime_env_setup_failed: 0 [state-dump] - num PYTHON workers: 20 [state-dump] - num PYTHON drivers: 1 [state-dump] - num PYTHON pending start requests: 0 [state-dump] - num PYTHON pending registration requests: 0 [state-dump] - num object spill callbacks queued: 0 [state-dump] - num object restore queued: 0 [state-dump] - num util functions queued: 0 [state-dump] - num idle workers: 20 [state-dump] TaskDependencyManager: [state-dump] - task deps map size: 0 [state-dump] - get req map size: 0 [state-dump] - wait req map size: 0 [state-dump] - local objects map size: 0 [state-dump] WaitManager: [state-dump] - num active wait requests: 0 [state-dump] Subscriber: [state-dump] Channel WORKER_OBJECT_LOCATIONS_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_OBJECT_EVICTION [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_REF_REMOVED_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] num async plasma notifications: 0 [state-dump] Remote node managers: [state-dump] Event stats: [state-dump] Global stats: 340483 total (35 active) [state-dump] Queueing time: mean = 261.632 us, max = 59.826 s, min = -0.001 s, total = 89.081 s [state-dump] Execution time: mean = 10.746 ms, total = 3658.700 s [state-dump] Event stats: [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog - 81878 total (0 active), Execution time: mean = 527.539 us, total = 43.194 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 81878 total (0 active), Execution time: mean = 35.102 us, total = 2.874 s, Queueing time: mean = 109.301 us, max = 2.841 ms, min = 1.846 us, total = 8.949 s [state-dump] ObjectManager.UpdateAvailableMemory - 38964 total (0 active), Execution time: mean = 6.042 us, total = 235.416 ms, Queueing time: mean = 109.272 us, max = 1.662 ms, min = 2.228 us, total = 4.258 s [state-dump] NodeManager.CheckGC - 38964 total (1 active), Execution time: mean = 2.862 us, total = 111.500 ms, Queueing time: mean = 97.177 us, max = 25.875 ms, min = -0.000 s, total = 3.786 s [state-dump] RaySyncer.OnDemandBroadcasting - 38964 total (1 active), Execution time: mean = 10.440 us, total = 406.766 ms, Queueing time: mean = 90.490 us, max = 25.869 ms, min = 6.166 us, total = 3.526 s [state-dump] RayletWorkerPool.deadline_timer.kill_idle_workers - 19493 total (1 active), Execution time: mean = 18.160 us, total = 353.999 ms, Queueing time: mean = 75.992 us, max = 26.386 ms, min = -0.001 s, total = 1.481 s [state-dump] MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 15569 total (1 active), Execution time: mean = 453.834 us, total = 7.066 s, Queueing time: mean = 74.508 us, max = 3.532 ms, min = -0.000 s, total = 1.160 s [state-dump] NodeManager.deadline_timer.spill_objects_when_over_threshold - 3900 total (1 active), Execution time: mean = 3.087 us, total = 12.039 ms, Queueing time: mean = 181.521 us, max = 2.946 ms, min = 3.845 us, total = 707.932 ms [state-dump] NodeManager.deadline_timer.flush_free_objects - 3900 total (1 active), Execution time: mean = 9.020 us, total = 35.180 ms, Queueing time: mean = 177.450 us, max = 2.947 ms, min = 3.811 us, total = 692.054 ms [state-dump] NodeManager.ScheduleAndDispatchTasks - 3900 total (1 active), Execution time: mean = 16.635 us, total = 64.878 ms, Queueing time: mean = 73.683 us, max = 2.581 ms, min = 10.666 us, total = 287.365 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 3899 total (0 active), Execution time: mean = 96.367 us, total = 375.737 ms, Queueing time: mean = 111.103 us, max = 2.934 ms, min = 4.027 us, total = 433.190 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad - 3899 total (0 active), Execution time: mean = 609.336 us, total = 2.376 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ClusterResourceManager.ResetRemoteNodeView - 1301 total (1 active), Execution time: mean = 10.048 us, total = 13.072 ms, Queueing time: mean = 75.590 us, max = 442.307 us, min = 14.043 us, total = 98.342 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 780 total (0 active), Execution time: mean = 1.587 ms, total = 1.238 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.deadline_timer.record_metrics - 780 total (1 active), Execution time: mean = 580.557 us, total = 452.835 ms, Queueing time: mean = 341.088 us, max = 2.010 ms, min = 9.115 us, total = 266.049 ms [state-dump] NodeManager.GcsCheckAlive - 780 total (1 active), Execution time: mean = 305.316 us, total = 238.147 ms, Queueing time: mean = 615.807 us, max = 2.567 ms, min = 6.690 us, total = 480.330 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 780 total (0 active), Execution time: mean = 54.649 us, total = 42.626 ms, Queueing time: mean = 113.613 us, max = 4.779 ms, min = 11.561 us, total = 88.618 ms [state-dump] NodeManager.deadline_timer.debug_state_dump - 390 total (1 active), Execution time: mean = 1.771 ms, total = 690.882 ms, Queueing time: mean = 69.481 us, max = 176.385 us, min = 11.609 us, total = 27.098 ms [state-dump] ClientConnection.async_read.ProcessMessageHeader - 95 total (21 active), Execution time: mean = 7.620 us, total = 723.861 us, Queueing time: mean = 660.815 ms, max = 59.826 s, min = 29.537 us, total = 62.777 s [state-dump] ClientConnection.async_read.ProcessMessage - 74 total (0 active), Execution time: mean = 752.373 us, total = 55.676 ms, Queueing time: mean = 18.898 us, max = 247.615 us, min = 4.453 us, total = 1.398 ms [state-dump] NodeManager.deadline_timer.print_event_loop_stats - 65 total (1 active, 1 running), Execution time: mean = 2.700 ms, total = 175.472 ms, Queueing time: mean = 73.950 us, max = 215.454 us, min = 15.835 us, total = 4.807 ms [state-dump] ClientConnection.async_write.DoAsyncWrites - 22 total (0 active), Execution time: mean = 2.129 us, total = 46.834 us, Queueing time: mean = 43.475 us, max = 264.443 us, min = 17.727 us, total = 956.459 us [state-dump] ObjectManager.ObjectDeleted - 21 total (0 active), Execution time: mean = 18.654 us, total = 391.728 us, Queueing time: mean = 66.932 us, max = 269.040 us, min = 24.280 us, total = 1.406 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 21 total (0 active), Execution time: mean = 119.783 us, total = 2.515 ms, Queueing time: mean = 119.709 us, max = 193.975 us, min = 26.529 us, total = 2.514 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig - 21 total (0 active), Execution time: mean = 800.451 us, total = 16.809 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ObjectManager.ObjectAdded - 21 total (0 active), Execution time: mean = 13.522 us, total = 283.961 us, Queueing time: mean = 115.200 us, max = 203.067 us, min = 15.853 us, total = 2.419 ms [state-dump] PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 204.482 us, total = 2.658 ms, Queueing time: mean = 3.109 ms, max = 9.915 ms, min = 28.359 us, total = 40.413 ms [state-dump] - 12 total (0 active), Execution time: mean = 1.109 us, total = 13.308 us, Queueing time: mean = 117.235 us, max = 207.332 us, min = 32.817 us, total = 1.407 ms [state-dump] RaySyncer.BroadcastMessage - 12 total (0 active), Execution time: mean = 229.790 us, total = 2.757 ms, Queueing time: mean = 796.500 ns, max = 1.330 us, min = 180.000 ns, total = 9.558 us [state-dump] NodeManagerService.grpc_server.RequestWorkerLease.HandleRequestImpl - 10 total (0 active), Execution time: mean = 133.400 us, total = 1.334 ms, Queueing time: mean = 133.654 us, max = 258.016 us, min = 50.206 us, total = 1.337 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker - 10 total (0 active), Execution time: mean = 698.937 us, total = 6.989 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.RequestWorkerLease - 10 total (0 active), Execution time: mean = 967.372 us, total = 9.674 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] WorkerPool.PopWorkerCallback - 10 total (0 active), Execution time: mean = 24.488 us, total = 244.878 us, Queueing time: mean = 114.997 us, max = 206.869 us, min = 23.653 us, total = 1.150 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker.HandleRequestImpl - 10 total (0 active), Execution time: mean = 129.906 us, total = 1.299 ms, Queueing time: mean = 109.463 us, max = 137.712 us, min = 31.332 us, total = 1.095 ms [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 8 total (1 active), Execution time: mean = 449.701 s, total = 3597.605 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 7 total (0 active), Execution time: mean = 367.018 us, total = 2.569 ms, Queueing time: mean = 127.648 us, max = 189.048 us, min = 20.299 us, total = 893.539 us [state-dump] NodeManager.GCTaskFailureReason - 5 total (1 active), Execution time: mean = 6.901 us, total = 34.506 us, Queueing time: mean = 58.173 us, max = 97.290 us, min = 24.344 us, total = 290.867 us [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 1.320 ms, total = 2.640 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 142.119 us, total = 284.237 us, Queueing time: mean = 702.200 us, max = 1.287 ms, min = 117.240 us, total = 1.404 ms [state-dump] RaySyncerRegister - 2 total (0 active), Execution time: mean = 2.053 us, total = 4.106 us, Queueing time: mean = 222.000 ns, max = 369.000 ns, min = 75.000 ns, total = 444.000 ns [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 2.178 ms, total = 2.178 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 1.022 s, total = 1.022 s, Queueing time: mean = 126.779 us, max = 126.779 us, min = 126.779 us, total = 126.779 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 43.810 us, total = 43.810 us, Queueing time: mean = 116.080 us, max = 116.080 us, min = 116.080 us, total = 116.080 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 1.935 ms, total = 1.935 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 2.002 ms, total = 2.002 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 1.278 ms, total = 1.278 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 288.995 us, total = 288.995 us, Queueing time: mean = 122.868 us, max = 122.868 us, min = 122.868 us, total = 122.868 us [state-dump] Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 74.750 us, total = 74.750 us, Queueing time: mean = 225.429 us, max = 225.429 us, min = 225.429 us, total = 225.429 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 242.297 us, total = 242.297 us, Queueing time: mean = 122.977 us, max = 122.977 us, min = 122.977 us, total = 122.977 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 92.080 us, total = 92.080 us, Queueing time: mean = 36.492 us, max = 36.492 us, min = 36.492 us, total = 36.492 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 1.746 ms, total = 1.746 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] DebugString() time ms: 1 [state-dump] [state-dump] [2025-01-21 00:07:54,923 I 13636 13664] (raylet) store.cc:564: Plasma store debug dump: Current usage: 0 / 2.14748 GB - num bytes created total: 168 0 pending objects of total size 0MB - objects spillable: 0 - bytes spillable: 0 - objects unsealed: 0 - bytes unsealed: 0 - objects in use: 0 - bytes in use: 0 - objects evictable: 0 - bytes evictable: 0 - objects created by worker: 0 - bytes created by worker: 0 - objects restored: 0 - bytes restored: 0 - objects received: 0 - bytes received: 0 - objects errored: 0 - bytes errored: 0 [2025-01-21 00:07:56,110 I 13636 13636] (raylet) node_manager.cc:525: [state-dump] NodeManager: [state-dump] Node ID: 94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3 [state-dump] Node name: 192.168.0.2 [state-dump] InitialConfigResources: {object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, GPU: 20000, node:__internal_head__: 10000, accelerator_type:A40: 10000, memory: 844966629380000} [state-dump] ClusterTaskManager: [state-dump] ========== Node: 94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3 ================= [state-dump] Infeasible queue length: 0 [state-dump] Schedule queue length: 0 [state-dump] Dispatch queue length: 0 [state-dump] num_waiting_for_resource: 0 [state-dump] num_waiting_for_plasma_memory: 0 [state-dump] num_waiting_for_remote_node_resources: 0 [state-dump] num_worker_not_started_by_job_config_not_exist: 0 [state-dump] num_worker_not_started_by_registration_timeout: 0 [state-dump] num_tasks_waiting_for_workers: 0 [state-dump] num_cancelled_tasks: 0 [state-dump] cluster_resource_scheduler state: [state-dump] Local id: -2729954820020547849 Local resources: {"total":{CPU: [200000], memory: [844966629380000], accelerator_type:A40: [10000], node:192.168.0.2: [10000], node:__internal_head__: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000]}}, "available": {CPU: [200000], memory: [844966629380000], accelerator_type:A40: [10000], node:192.168.0.2: [10000], node:__internal_head__: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000]}}, "labels":{"ray.io/node_id":"94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3",} is_draining: 0 is_idle: 1 Cluster resources: node id: -2729954820020547849{"total":{object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, GPU: 20000, node:__internal_head__: 10000, accelerator_type:A40: 10000, memory: 844966629380000}}, "available": {GPU: 20000, node:__internal_head__: 10000, memory: 844966629380000, accelerator_type:A40: 10000, object_store_memory: 21474836480000, node:192.168.0.2: 10000, CPU: 200000}}, "labels":{"ray.io/node_id":"94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} [state-dump] Waiting tasks size: 0 [state-dump] Number of executing tasks: 0 [state-dump] Number of pinned task arguments: 0 [state-dump] Number of total spilled tasks: 0 [state-dump] Number of spilled waiting tasks: 0 [state-dump] Number of spilled unschedulable tasks: 0 [state-dump] Resource usage { [state-dump] } [state-dump] Backlog Size per scheduling descriptor :{workerId: num backlogs}: [state-dump] [state-dump] Running tasks by scheduling class: [state-dump] ================================================== [state-dump] [state-dump] ClusterResources: [state-dump] LocalObjectManager: [state-dump] - num pinned objects: 0 [state-dump] - pinned objects size: 0 [state-dump] - num objects pending restore: 0 [state-dump] - num objects pending spill: 0 [state-dump] - num bytes pending spill: 0 [state-dump] - num bytes currently spilled: 0 [state-dump] - cumulative spill requests: 0 [state-dump] - cumulative restore requests: 0 [state-dump] - spilled objects pending delete: 0 [state-dump] [state-dump] ObjectManager: [state-dump] - num local objects: 0 [state-dump] - num unfulfilled push requests: 0 [state-dump] - num object pull requests: 0 [state-dump] - num chunks received total: 0 [state-dump] - num chunks received failed (all): 0 [state-dump] - num chunks received failed / cancelled: 0 [state-dump] - num chunks received failed / plasma error: 0 [state-dump] Event stats: [state-dump] Global stats: 0 total (0 active) [state-dump] Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Execution time: mean = -nan s, total = 0.000 s [state-dump] Event stats: [state-dump] PushManager: [state-dump] - num pushes in flight: 0 [state-dump] - num chunks in flight: 0 [state-dump] - num chunks remaining: 0 [state-dump] - max chunks allowed: 409 [state-dump] OwnershipBasedObjectDirectory: [state-dump] - num listeners: 0 [state-dump] - cumulative location updates: 0 [state-dump] - num location updates per second: 0.000 [state-dump] - num location lookups per second: 0.000 [state-dump] - num locations added per second: 0.000 [state-dump] - num locations removed per second: 0.000 [state-dump] BufferPool: [state-dump] - create buffer state map size: 0 [state-dump] PullManager: [state-dump] - num bytes available for pulled objects: 2147483648 [state-dump] - num bytes being pulled (all): 0 [state-dump] - num bytes being pulled / pinned: 0 [state-dump] - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - first get request bundle: N/A [state-dump] - first wait request bundle: N/A [state-dump] - first task request bundle: N/A [state-dump] - num objects queued: 0 [state-dump] - num objects actively pulled (all): 0 [state-dump] - num objects actively pulled / pinned: 0 [state-dump] - num bundles being pulled: 0 [state-dump] - num pull retries: 0 [state-dump] - max timeout seconds: 0 [state-dump] - max timeout request is already processed. No entry. [state-dump] [state-dump] WorkerPool: [state-dump] - registered jobs: 1 [state-dump] - process_failed_job_config_missing: 0 [state-dump] - process_failed_rate_limited: 0 [state-dump] - process_failed_pending_registration: 0 [state-dump] - process_failed_runtime_env_setup_failed: 0 [state-dump] - num PYTHON workers: 20 [state-dump] - num PYTHON drivers: 1 [state-dump] - num PYTHON pending start requests: 0 [state-dump] - num PYTHON pending registration requests: 0 [state-dump] - num object spill callbacks queued: 0 [state-dump] - num object restore queued: 0 [state-dump] - num util functions queued: 0 [state-dump] - num idle workers: 20 [state-dump] TaskDependencyManager: [state-dump] - task deps map size: 0 [state-dump] - get req map size: 0 [state-dump] - wait req map size: 0 [state-dump] - local objects map size: 0 [state-dump] WaitManager: [state-dump] - num active wait requests: 0 [state-dump] Subscriber: [state-dump] Channel WORKER_OBJECT_LOCATIONS_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_OBJECT_EVICTION [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_REF_REMOVED_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] num async plasma notifications: 0 [state-dump] Remote node managers: [state-dump] Event stats: [state-dump] Global stats: 345709 total (35 active) [state-dump] Queueing time: mean = 258.914 us, max = 59.826 s, min = -0.001 s, total = 89.509 s [state-dump] Execution time: mean = 10.586 ms, total = 3659.656 s [state-dump] Event stats: [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog - 83136 total (0 active), Execution time: mean = 527.863 us, total = 43.884 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 83136 total (0 active), Execution time: mean = 35.120 us, total = 2.920 s, Queueing time: mean = 109.412 us, max = 2.841 ms, min = 1.846 us, total = 9.096 s [state-dump] ObjectManager.UpdateAvailableMemory - 39563 total (0 active), Execution time: mean = 6.050 us, total = 239.344 ms, Queueing time: mean = 109.422 us, max = 1.662 ms, min = 2.228 us, total = 4.329 s [state-dump] NodeManager.CheckGC - 39563 total (1 active), Execution time: mean = 2.862 us, total = 113.240 ms, Queueing time: mean = 97.221 us, max = 25.875 ms, min = -0.000 s, total = 3.846 s [state-dump] RaySyncer.OnDemandBroadcasting - 39563 total (1 active), Execution time: mean = 10.448 us, total = 413.368 ms, Queueing time: mean = 90.527 us, max = 25.869 ms, min = 6.166 us, total = 3.582 s [state-dump] RayletWorkerPool.deadline_timer.kill_idle_workers - 19792 total (1 active), Execution time: mean = 18.195 us, total = 360.112 ms, Queueing time: mean = 76.064 us, max = 26.386 ms, min = -0.001 s, total = 1.505 s [state-dump] MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 15808 total (1 active), Execution time: mean = 453.997 us, total = 7.177 s, Queueing time: mean = 74.542 us, max = 3.532 ms, min = -0.000 s, total = 1.178 s [state-dump] NodeManager.deadline_timer.spill_objects_when_over_threshold - 3960 total (1 active), Execution time: mean = 3.086 us, total = 12.220 ms, Queueing time: mean = 181.804 us, max = 2.946 ms, min = 3.845 us, total = 719.945 ms [state-dump] NodeManager.deadline_timer.flush_free_objects - 3960 total (1 active), Execution time: mean = 9.021 us, total = 35.724 ms, Queueing time: mean = 177.731 us, max = 2.947 ms, min = 3.811 us, total = 703.814 ms [state-dump] NodeManager.ScheduleAndDispatchTasks - 3960 total (1 active), Execution time: mean = 16.639 us, total = 65.891 ms, Queueing time: mean = 73.699 us, max = 2.581 ms, min = 10.666 us, total = 291.849 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 3959 total (0 active), Execution time: mean = 96.392 us, total = 381.616 ms, Queueing time: mean = 111.211 us, max = 2.934 ms, min = 4.027 us, total = 440.283 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad - 3959 total (0 active), Execution time: mean = 609.390 us, total = 2.413 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ClusterResourceManager.ResetRemoteNodeView - 1321 total (1 active), Execution time: mean = 10.035 us, total = 13.256 ms, Queueing time: mean = 75.703 us, max = 442.307 us, min = 14.043 us, total = 100.003 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 792 total (0 active), Execution time: mean = 1.588 ms, total = 1.258 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.deadline_timer.record_metrics - 792 total (1 active), Execution time: mean = 580.587 us, total = 459.825 ms, Queueing time: mean = 342.209 us, max = 2.010 ms, min = 9.115 us, total = 271.029 ms [state-dump] NodeManager.GcsCheckAlive - 792 total (1 active), Execution time: mean = 306.123 us, total = 242.449 ms, Queueing time: mean = 616.237 us, max = 2.567 ms, min = 6.690 us, total = 488.060 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 792 total (0 active), Execution time: mean = 54.651 us, total = 43.284 ms, Queueing time: mean = 113.528 us, max = 4.779 ms, min = 11.561 us, total = 89.914 ms [state-dump] NodeManager.deadline_timer.debug_state_dump - 396 total (1 active), Execution time: mean = 1.773 ms, total = 701.980 ms, Queueing time: mean = 69.610 us, max = 176.385 us, min = 11.609 us, total = 27.565 ms [state-dump] ClientConnection.async_read.ProcessMessageHeader - 95 total (21 active), Execution time: mean = 7.620 us, total = 723.861 us, Queueing time: mean = 660.815 ms, max = 59.826 s, min = 29.537 us, total = 62.777 s [state-dump] ClientConnection.async_read.ProcessMessage - 74 total (0 active), Execution time: mean = 752.373 us, total = 55.676 ms, Queueing time: mean = 18.898 us, max = 247.615 us, min = 4.453 us, total = 1.398 ms [state-dump] NodeManager.deadline_timer.print_event_loop_stats - 66 total (1 active, 1 running), Execution time: mean = 2.692 ms, total = 177.670 ms, Queueing time: mean = 73.627 us, max = 215.454 us, min = 15.835 us, total = 4.859 ms [state-dump] ClientConnection.async_write.DoAsyncWrites - 22 total (0 active), Execution time: mean = 2.129 us, total = 46.834 us, Queueing time: mean = 43.475 us, max = 264.443 us, min = 17.727 us, total = 956.459 us [state-dump] ObjectManager.ObjectDeleted - 21 total (0 active), Execution time: mean = 18.654 us, total = 391.728 us, Queueing time: mean = 66.932 us, max = 269.040 us, min = 24.280 us, total = 1.406 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 21 total (0 active), Execution time: mean = 119.783 us, total = 2.515 ms, Queueing time: mean = 119.709 us, max = 193.975 us, min = 26.529 us, total = 2.514 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig - 21 total (0 active), Execution time: mean = 800.451 us, total = 16.809 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ObjectManager.ObjectAdded - 21 total (0 active), Execution time: mean = 13.522 us, total = 283.961 us, Queueing time: mean = 115.200 us, max = 203.067 us, min = 15.853 us, total = 2.419 ms [state-dump] PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 204.482 us, total = 2.658 ms, Queueing time: mean = 3.109 ms, max = 9.915 ms, min = 28.359 us, total = 40.413 ms [state-dump] - 12 total (0 active), Execution time: mean = 1.109 us, total = 13.308 us, Queueing time: mean = 117.235 us, max = 207.332 us, min = 32.817 us, total = 1.407 ms [state-dump] RaySyncer.BroadcastMessage - 12 total (0 active), Execution time: mean = 229.790 us, total = 2.757 ms, Queueing time: mean = 796.500 ns, max = 1.330 us, min = 180.000 ns, total = 9.558 us [state-dump] NodeManagerService.grpc_server.RequestWorkerLease.HandleRequestImpl - 10 total (0 active), Execution time: mean = 133.400 us, total = 1.334 ms, Queueing time: mean = 133.654 us, max = 258.016 us, min = 50.206 us, total = 1.337 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker - 10 total (0 active), Execution time: mean = 698.937 us, total = 6.989 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.RequestWorkerLease - 10 total (0 active), Execution time: mean = 967.372 us, total = 9.674 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] WorkerPool.PopWorkerCallback - 10 total (0 active), Execution time: mean = 24.488 us, total = 244.878 us, Queueing time: mean = 114.997 us, max = 206.869 us, min = 23.653 us, total = 1.150 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker.HandleRequestImpl - 10 total (0 active), Execution time: mean = 129.906 us, total = 1.299 ms, Queueing time: mean = 109.463 us, max = 137.712 us, min = 31.332 us, total = 1.095 ms [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 8 total (1 active), Execution time: mean = 449.701 s, total = 3597.605 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 7 total (0 active), Execution time: mean = 367.018 us, total = 2.569 ms, Queueing time: mean = 127.648 us, max = 189.048 us, min = 20.299 us, total = 893.539 us [state-dump] NodeManager.GCTaskFailureReason - 5 total (1 active), Execution time: mean = 6.901 us, total = 34.506 us, Queueing time: mean = 58.173 us, max = 97.290 us, min = 24.344 us, total = 290.867 us [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 1.320 ms, total = 2.640 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 142.119 us, total = 284.237 us, Queueing time: mean = 702.200 us, max = 1.287 ms, min = 117.240 us, total = 1.404 ms [state-dump] RaySyncerRegister - 2 total (0 active), Execution time: mean = 2.053 us, total = 4.106 us, Queueing time: mean = 222.000 ns, max = 369.000 ns, min = 75.000 ns, total = 444.000 ns [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 2.178 ms, total = 2.178 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 1.022 s, total = 1.022 s, Queueing time: mean = 126.779 us, max = 126.779 us, min = 126.779 us, total = 126.779 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 43.810 us, total = 43.810 us, Queueing time: mean = 116.080 us, max = 116.080 us, min = 116.080 us, total = 116.080 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 1.935 ms, total = 1.935 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 2.002 ms, total = 2.002 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 1.278 ms, total = 1.278 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 288.995 us, total = 288.995 us, Queueing time: mean = 122.868 us, max = 122.868 us, min = 122.868 us, total = 122.868 us [state-dump] Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 74.750 us, total = 74.750 us, Queueing time: mean = 225.429 us, max = 225.429 us, min = 225.429 us, total = 225.429 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 242.297 us, total = 242.297 us, Queueing time: mean = 122.977 us, max = 122.977 us, min = 122.977 us, total = 122.977 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 92.080 us, total = 92.080 us, Queueing time: mean = 36.492 us, max = 36.492 us, min = 36.492 us, total = 36.492 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 1.746 ms, total = 1.746 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] DebugString() time ms: 1 [state-dump] [state-dump] [2025-01-21 00:08:54,923 I 13636 13664] (raylet) store.cc:564: Plasma store debug dump: Current usage: 0 / 2.14748 GB - num bytes created total: 168 0 pending objects of total size 0MB - objects spillable: 0 - bytes spillable: 0 - objects unsealed: 0 - bytes unsealed: 0 - objects in use: 0 - bytes in use: 0 - objects evictable: 0 - bytes evictable: 0 - objects created by worker: 0 - bytes created by worker: 0 - objects restored: 0 - bytes restored: 0 - objects received: 0 - bytes received: 0 - objects errored: 0 - bytes errored: 0 [2025-01-21 00:08:56,112 I 13636 13636] (raylet) node_manager.cc:525: [state-dump] NodeManager: [state-dump] Node ID: 94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3 [state-dump] Node name: 192.168.0.2 [state-dump] InitialConfigResources: {object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, GPU: 20000, node:__internal_head__: 10000, accelerator_type:A40: 10000, memory: 844966629380000} [state-dump] ClusterTaskManager: [state-dump] ========== Node: 94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3 ================= [state-dump] Infeasible queue length: 0 [state-dump] Schedule queue length: 0 [state-dump] Dispatch queue length: 0 [state-dump] num_waiting_for_resource: 0 [state-dump] num_waiting_for_plasma_memory: 0 [state-dump] num_waiting_for_remote_node_resources: 0 [state-dump] num_worker_not_started_by_job_config_not_exist: 0 [state-dump] num_worker_not_started_by_registration_timeout: 0 [state-dump] num_tasks_waiting_for_workers: 0 [state-dump] num_cancelled_tasks: 0 [state-dump] cluster_resource_scheduler state: [state-dump] Local id: -2729954820020547849 Local resources: {"total":{CPU: [200000], memory: [844966629380000], accelerator_type:A40: [10000], node:192.168.0.2: [10000], node:__internal_head__: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000]}}, "available": {CPU: [200000], memory: [844966629380000], accelerator_type:A40: [10000], node:192.168.0.2: [10000], node:__internal_head__: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000]}}, "labels":{"ray.io/node_id":"94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3",} is_draining: 0 is_idle: 1 Cluster resources: node id: -2729954820020547849{"total":{object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, GPU: 20000, node:__internal_head__: 10000, accelerator_type:A40: 10000, memory: 844966629380000}}, "available": {GPU: 20000, node:__internal_head__: 10000, memory: 844966629380000, accelerator_type:A40: 10000, object_store_memory: 21474836480000, node:192.168.0.2: 10000, CPU: 200000}}, "labels":{"ray.io/node_id":"94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} [state-dump] Waiting tasks size: 0 [state-dump] Number of executing tasks: 0 [state-dump] Number of pinned task arguments: 0 [state-dump] Number of total spilled tasks: 0 [state-dump] Number of spilled waiting tasks: 0 [state-dump] Number of spilled unschedulable tasks: 0 [state-dump] Resource usage { [state-dump] } [state-dump] Backlog Size per scheduling descriptor :{workerId: num backlogs}: [state-dump] [state-dump] Running tasks by scheduling class: [state-dump] ================================================== [state-dump] [state-dump] ClusterResources: [state-dump] LocalObjectManager: [state-dump] - num pinned objects: 0 [state-dump] - pinned objects size: 0 [state-dump] - num objects pending restore: 0 [state-dump] - num objects pending spill: 0 [state-dump] - num bytes pending spill: 0 [state-dump] - num bytes currently spilled: 0 [state-dump] - cumulative spill requests: 0 [state-dump] - cumulative restore requests: 0 [state-dump] - spilled objects pending delete: 0 [state-dump] [state-dump] ObjectManager: [state-dump] - num local objects: 0 [state-dump] - num unfulfilled push requests: 0 [state-dump] - num object pull requests: 0 [state-dump] - num chunks received total: 0 [state-dump] - num chunks received failed (all): 0 [state-dump] - num chunks received failed / cancelled: 0 [state-dump] - num chunks received failed / plasma error: 0 [state-dump] Event stats: [state-dump] Global stats: 0 total (0 active) [state-dump] Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Execution time: mean = -nan s, total = 0.000 s [state-dump] Event stats: [state-dump] PushManager: [state-dump] - num pushes in flight: 0 [state-dump] - num chunks in flight: 0 [state-dump] - num chunks remaining: 0 [state-dump] - max chunks allowed: 409 [state-dump] OwnershipBasedObjectDirectory: [state-dump] - num listeners: 0 [state-dump] - cumulative location updates: 0 [state-dump] - num location updates per second: 0.000 [state-dump] - num location lookups per second: 0.000 [state-dump] - num locations added per second: 0.000 [state-dump] - num locations removed per second: 0.000 [state-dump] BufferPool: [state-dump] - create buffer state map size: 0 [state-dump] PullManager: [state-dump] - num bytes available for pulled objects: 2147483648 [state-dump] - num bytes being pulled (all): 0 [state-dump] - num bytes being pulled / pinned: 0 [state-dump] - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - first get request bundle: N/A [state-dump] - first wait request bundle: N/A [state-dump] - first task request bundle: N/A [state-dump] - num objects queued: 0 [state-dump] - num objects actively pulled (all): 0 [state-dump] - num objects actively pulled / pinned: 0 [state-dump] - num bundles being pulled: 0 [state-dump] - num pull retries: 0 [state-dump] - max timeout seconds: 0 [state-dump] - max timeout request is already processed. No entry. [state-dump] [state-dump] WorkerPool: [state-dump] - registered jobs: 1 [state-dump] - process_failed_job_config_missing: 0 [state-dump] - process_failed_rate_limited: 0 [state-dump] - process_failed_pending_registration: 0 [state-dump] - process_failed_runtime_env_setup_failed: 0 [state-dump] - num PYTHON workers: 20 [state-dump] - num PYTHON drivers: 1 [state-dump] - num PYTHON pending start requests: 0 [state-dump] - num PYTHON pending registration requests: 0 [state-dump] - num object spill callbacks queued: 0 [state-dump] - num object restore queued: 0 [state-dump] - num util functions queued: 0 [state-dump] - num idle workers: 20 [state-dump] TaskDependencyManager: [state-dump] - task deps map size: 0 [state-dump] - get req map size: 0 [state-dump] - wait req map size: 0 [state-dump] - local objects map size: 0 [state-dump] WaitManager: [state-dump] - num active wait requests: 0 [state-dump] Subscriber: [state-dump] Channel WORKER_OBJECT_LOCATIONS_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_OBJECT_EVICTION [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_REF_REMOVED_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] num async plasma notifications: 0 [state-dump] Remote node managers: [state-dump] Event stats: [state-dump] Global stats: 350941 total (35 active) [state-dump] Queueing time: mean = 256.283 us, max = 59.826 s, min = -0.001 s, total = 89.940 s [state-dump] Execution time: mean = 10.431 ms, total = 3660.609 s [state-dump] Event stats: [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog - 84396 total (0 active), Execution time: mean = 528.129 us, total = 44.572 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 84396 total (0 active), Execution time: mean = 35.124 us, total = 2.964 s, Queueing time: mean = 109.526 us, max = 2.841 ms, min = 1.846 us, total = 9.244 s [state-dump] ObjectManager.UpdateAvailableMemory - 40162 total (0 active), Execution time: mean = 6.054 us, total = 243.159 ms, Queueing time: mean = 109.491 us, max = 1.662 ms, min = 2.228 us, total = 4.397 s [state-dump] NodeManager.CheckGC - 40162 total (1 active), Execution time: mean = 2.863 us, total = 114.997 ms, Queueing time: mean = 97.304 us, max = 25.875 ms, min = -0.000 s, total = 3.908 s [state-dump] RaySyncer.OnDemandBroadcasting - 40162 total (1 active), Execution time: mean = 10.458 us, total = 420.024 ms, Queueing time: mean = 90.600 us, max = 25.869 ms, min = 6.166 us, total = 3.639 s [state-dump] RayletWorkerPool.deadline_timer.kill_idle_workers - 20092 total (1 active), Execution time: mean = 18.215 us, total = 365.981 ms, Queueing time: mean = 76.290 us, max = 26.386 ms, min = -0.001 s, total = 1.533 s [state-dump] MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 16048 total (1 active), Execution time: mean = 454.199 us, total = 7.289 s, Queueing time: mean = 74.606 us, max = 3.532 ms, min = -0.000 s, total = 1.197 s [state-dump] NodeManager.deadline_timer.spill_objects_when_over_threshold - 4020 total (1 active), Execution time: mean = 3.088 us, total = 12.413 ms, Queueing time: mean = 181.962 us, max = 2.946 ms, min = 3.845 us, total = 731.486 ms [state-dump] NodeManager.deadline_timer.flush_free_objects - 4020 total (1 active), Execution time: mean = 9.042 us, total = 36.349 ms, Queueing time: mean = 177.874 us, max = 2.947 ms, min = 3.811 us, total = 715.054 ms [state-dump] NodeManager.ScheduleAndDispatchTasks - 4020 total (1 active), Execution time: mean = 16.646 us, total = 66.917 ms, Queueing time: mean = 73.775 us, max = 2.581 ms, min = 10.666 us, total = 296.576 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 4019 total (0 active), Execution time: mean = 96.470 us, total = 387.714 ms, Queueing time: mean = 111.265 us, max = 2.934 ms, min = 4.027 us, total = 447.173 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad - 4019 total (0 active), Execution time: mean = 609.498 us, total = 2.450 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ClusterResourceManager.ResetRemoteNodeView - 1341 total (1 active), Execution time: mean = 10.025 us, total = 13.444 ms, Queueing time: mean = 75.787 us, max = 442.307 us, min = 14.043 us, total = 101.630 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 804 total (0 active), Execution time: mean = 1.589 ms, total = 1.278 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.deadline_timer.record_metrics - 804 total (1 active), Execution time: mean = 580.903 us, total = 467.046 ms, Queueing time: mean = 343.013 us, max = 2.010 ms, min = 9.115 us, total = 275.782 ms [state-dump] NodeManager.GcsCheckAlive - 804 total (1 active), Execution time: mean = 306.413 us, total = 246.356 ms, Queueing time: mean = 616.955 us, max = 2.567 ms, min = 6.690 us, total = 496.032 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 804 total (0 active), Execution time: mean = 54.710 us, total = 43.987 ms, Queueing time: mean = 113.114 us, max = 4.779 ms, min = 11.561 us, total = 90.943 ms [state-dump] NodeManager.deadline_timer.debug_state_dump - 402 total (1 active), Execution time: mean = 1.775 ms, total = 713.532 ms, Queueing time: mean = 69.967 us, max = 176.385 us, min = 11.609 us, total = 28.127 ms [state-dump] ClientConnection.async_read.ProcessMessageHeader - 95 total (21 active), Execution time: mean = 7.620 us, total = 723.861 us, Queueing time: mean = 660.815 ms, max = 59.826 s, min = 29.537 us, total = 62.777 s [state-dump] ClientConnection.async_read.ProcessMessage - 74 total (0 active), Execution time: mean = 752.373 us, total = 55.676 ms, Queueing time: mean = 18.898 us, max = 247.615 us, min = 4.453 us, total = 1.398 ms [state-dump] NodeManager.deadline_timer.print_event_loop_stats - 67 total (1 active, 1 running), Execution time: mean = 2.693 ms, total = 180.401 ms, Queueing time: mean = 74.417 us, max = 215.454 us, min = 15.835 us, total = 4.986 ms [state-dump] ClientConnection.async_write.DoAsyncWrites - 22 total (0 active), Execution time: mean = 2.129 us, total = 46.834 us, Queueing time: mean = 43.475 us, max = 264.443 us, min = 17.727 us, total = 956.459 us [state-dump] ObjectManager.ObjectDeleted - 21 total (0 active), Execution time: mean = 18.654 us, total = 391.728 us, Queueing time: mean = 66.932 us, max = 269.040 us, min = 24.280 us, total = 1.406 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 21 total (0 active), Execution time: mean = 119.783 us, total = 2.515 ms, Queueing time: mean = 119.709 us, max = 193.975 us, min = 26.529 us, total = 2.514 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig - 21 total (0 active), Execution time: mean = 800.451 us, total = 16.809 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ObjectManager.ObjectAdded - 21 total (0 active), Execution time: mean = 13.522 us, total = 283.961 us, Queueing time: mean = 115.200 us, max = 203.067 us, min = 15.853 us, total = 2.419 ms [state-dump] PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 204.482 us, total = 2.658 ms, Queueing time: mean = 3.109 ms, max = 9.915 ms, min = 28.359 us, total = 40.413 ms [state-dump] - 12 total (0 active), Execution time: mean = 1.109 us, total = 13.308 us, Queueing time: mean = 117.235 us, max = 207.332 us, min = 32.817 us, total = 1.407 ms [state-dump] RaySyncer.BroadcastMessage - 12 total (0 active), Execution time: mean = 229.790 us, total = 2.757 ms, Queueing time: mean = 796.500 ns, max = 1.330 us, min = 180.000 ns, total = 9.558 us [state-dump] NodeManagerService.grpc_server.RequestWorkerLease.HandleRequestImpl - 10 total (0 active), Execution time: mean = 133.400 us, total = 1.334 ms, Queueing time: mean = 133.654 us, max = 258.016 us, min = 50.206 us, total = 1.337 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker - 10 total (0 active), Execution time: mean = 698.937 us, total = 6.989 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.RequestWorkerLease - 10 total (0 active), Execution time: mean = 967.372 us, total = 9.674 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] WorkerPool.PopWorkerCallback - 10 total (0 active), Execution time: mean = 24.488 us, total = 244.878 us, Queueing time: mean = 114.997 us, max = 206.869 us, min = 23.653 us, total = 1.150 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker.HandleRequestImpl - 10 total (0 active), Execution time: mean = 129.906 us, total = 1.299 ms, Queueing time: mean = 109.463 us, max = 137.712 us, min = 31.332 us, total = 1.095 ms [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 8 total (1 active), Execution time: mean = 449.701 s, total = 3597.605 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 7 total (0 active), Execution time: mean = 367.018 us, total = 2.569 ms, Queueing time: mean = 127.648 us, max = 189.048 us, min = 20.299 us, total = 893.539 us [state-dump] NodeManager.GCTaskFailureReason - 5 total (1 active), Execution time: mean = 6.901 us, total = 34.506 us, Queueing time: mean = 58.173 us, max = 97.290 us, min = 24.344 us, total = 290.867 us [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 1.320 ms, total = 2.640 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 142.119 us, total = 284.237 us, Queueing time: mean = 702.200 us, max = 1.287 ms, min = 117.240 us, total = 1.404 ms [state-dump] RaySyncerRegister - 2 total (0 active), Execution time: mean = 2.053 us, total = 4.106 us, Queueing time: mean = 222.000 ns, max = 369.000 ns, min = 75.000 ns, total = 444.000 ns [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 2.178 ms, total = 2.178 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 1.022 s, total = 1.022 s, Queueing time: mean = 126.779 us, max = 126.779 us, min = 126.779 us, total = 126.779 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 43.810 us, total = 43.810 us, Queueing time: mean = 116.080 us, max = 116.080 us, min = 116.080 us, total = 116.080 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 1.935 ms, total = 1.935 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 2.002 ms, total = 2.002 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 1.278 ms, total = 1.278 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 288.995 us, total = 288.995 us, Queueing time: mean = 122.868 us, max = 122.868 us, min = 122.868 us, total = 122.868 us [state-dump] Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 74.750 us, total = 74.750 us, Queueing time: mean = 225.429 us, max = 225.429 us, min = 225.429 us, total = 225.429 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 242.297 us, total = 242.297 us, Queueing time: mean = 122.977 us, max = 122.977 us, min = 122.977 us, total = 122.977 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 92.080 us, total = 92.080 us, Queueing time: mean = 36.492 us, max = 36.492 us, min = 36.492 us, total = 36.492 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 1.746 ms, total = 1.746 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] DebugString() time ms: 2 [state-dump] [state-dump] [2025-01-21 00:09:54,923 I 13636 13664] (raylet) store.cc:564: Plasma store debug dump: Current usage: 0 / 2.14748 GB - num bytes created total: 168 0 pending objects of total size 0MB - objects spillable: 0 - bytes spillable: 0 - objects unsealed: 0 - bytes unsealed: 0 - objects in use: 0 - bytes in use: 0 - objects evictable: 0 - bytes evictable: 0 - objects created by worker: 0 - bytes created by worker: 0 - objects restored: 0 - bytes restored: 0 - objects received: 0 - bytes received: 0 - objects errored: 0 - bytes errored: 0 [2025-01-21 00:09:56,115 I 13636 13636] (raylet) node_manager.cc:525: [state-dump] NodeManager: [state-dump] Node ID: 94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3 [state-dump] Node name: 192.168.0.2 [state-dump] InitialConfigResources: {object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, GPU: 20000, node:__internal_head__: 10000, accelerator_type:A40: 10000, memory: 844966629380000} [state-dump] ClusterTaskManager: [state-dump] ========== Node: 94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3 ================= [state-dump] Infeasible queue length: 0 [state-dump] Schedule queue length: 0 [state-dump] Dispatch queue length: 0 [state-dump] num_waiting_for_resource: 0 [state-dump] num_waiting_for_plasma_memory: 0 [state-dump] num_waiting_for_remote_node_resources: 0 [state-dump] num_worker_not_started_by_job_config_not_exist: 0 [state-dump] num_worker_not_started_by_registration_timeout: 0 [state-dump] num_tasks_waiting_for_workers: 0 [state-dump] num_cancelled_tasks: 0 [state-dump] cluster_resource_scheduler state: [state-dump] Local id: -2729954820020547849 Local resources: {"total":{CPU: [200000], memory: [844966629380000], accelerator_type:A40: [10000], node:192.168.0.2: [10000], node:__internal_head__: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000]}}, "available": {CPU: [200000], memory: [844966629380000], accelerator_type:A40: [10000], node:192.168.0.2: [10000], node:__internal_head__: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000]}}, "labels":{"ray.io/node_id":"94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3",} is_draining: 0 is_idle: 1 Cluster resources: node id: -2729954820020547849{"total":{object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, GPU: 20000, node:__internal_head__: 10000, accelerator_type:A40: 10000, memory: 844966629380000}}, "available": {GPU: 20000, node:__internal_head__: 10000, memory: 844966629380000, accelerator_type:A40: 10000, object_store_memory: 21474836480000, node:192.168.0.2: 10000, CPU: 200000}}, "labels":{"ray.io/node_id":"94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} [state-dump] Waiting tasks size: 0 [state-dump] Number of executing tasks: 0 [state-dump] Number of pinned task arguments: 0 [state-dump] Number of total spilled tasks: 0 [state-dump] Number of spilled waiting tasks: 0 [state-dump] Number of spilled unschedulable tasks: 0 [state-dump] Resource usage { [state-dump] } [state-dump] Backlog Size per scheduling descriptor :{workerId: num backlogs}: [state-dump] [state-dump] Running tasks by scheduling class: [state-dump] ================================================== [state-dump] [state-dump] ClusterResources: [state-dump] LocalObjectManager: [state-dump] - num pinned objects: 0 [state-dump] - pinned objects size: 0 [state-dump] - num objects pending restore: 0 [state-dump] - num objects pending spill: 0 [state-dump] - num bytes pending spill: 0 [state-dump] - num bytes currently spilled: 0 [state-dump] - cumulative spill requests: 0 [state-dump] - cumulative restore requests: 0 [state-dump] - spilled objects pending delete: 0 [state-dump] [state-dump] ObjectManager: [state-dump] - num local objects: 0 [state-dump] - num unfulfilled push requests: 0 [state-dump] - num object pull requests: 0 [state-dump] - num chunks received total: 0 [state-dump] - num chunks received failed (all): 0 [state-dump] - num chunks received failed / cancelled: 0 [state-dump] - num chunks received failed / plasma error: 0 [state-dump] Event stats: [state-dump] Global stats: 0 total (0 active) [state-dump] Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Execution time: mean = -nan s, total = 0.000 s [state-dump] Event stats: [state-dump] PushManager: [state-dump] - num pushes in flight: 0 [state-dump] - num chunks in flight: 0 [state-dump] - num chunks remaining: 0 [state-dump] - max chunks allowed: 409 [state-dump] OwnershipBasedObjectDirectory: [state-dump] - num listeners: 0 [state-dump] - cumulative location updates: 0 [state-dump] - num location updates per second: 0.000 [state-dump] - num location lookups per second: 0.000 [state-dump] - num locations added per second: 0.000 [state-dump] - num locations removed per second: 0.000 [state-dump] BufferPool: [state-dump] - create buffer state map size: 0 [state-dump] PullManager: [state-dump] - num bytes available for pulled objects: 2147483648 [state-dump] - num bytes being pulled (all): 0 [state-dump] - num bytes being pulled / pinned: 0 [state-dump] - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - first get request bundle: N/A [state-dump] - first wait request bundle: N/A [state-dump] - first task request bundle: N/A [state-dump] - num objects queued: 0 [state-dump] - num objects actively pulled (all): 0 [state-dump] - num objects actively pulled / pinned: 0 [state-dump] - num bundles being pulled: 0 [state-dump] - num pull retries: 0 [state-dump] - max timeout seconds: 0 [state-dump] - max timeout request is already processed. No entry. [state-dump] [state-dump] WorkerPool: [state-dump] - registered jobs: 1 [state-dump] - process_failed_job_config_missing: 0 [state-dump] - process_failed_rate_limited: 0 [state-dump] - process_failed_pending_registration: 0 [state-dump] - process_failed_runtime_env_setup_failed: 0 [state-dump] - num PYTHON workers: 20 [state-dump] - num PYTHON drivers: 1 [state-dump] - num PYTHON pending start requests: 0 [state-dump] - num PYTHON pending registration requests: 0 [state-dump] - num object spill callbacks queued: 0 [state-dump] - num object restore queued: 0 [state-dump] - num util functions queued: 0 [state-dump] - num idle workers: 20 [state-dump] TaskDependencyManager: [state-dump] - task deps map size: 0 [state-dump] - get req map size: 0 [state-dump] - wait req map size: 0 [state-dump] - local objects map size: 0 [state-dump] WaitManager: [state-dump] - num active wait requests: 0 [state-dump] Subscriber: [state-dump] Channel WORKER_OBJECT_LOCATIONS_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_OBJECT_EVICTION [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_REF_REMOVED_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] num async plasma notifications: 0 [state-dump] Remote node managers: [state-dump] Event stats: [state-dump] Global stats: 356173 total (35 active) [state-dump] Queueing time: mean = 253.718 us, max = 59.826 s, min = -0.001 s, total = 90.367 s [state-dump] Execution time: mean = 10.280 ms, total = 3661.570 s [state-dump] Event stats: [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog - 85655 total (0 active), Execution time: mean = 528.413 us, total = 45.261 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 85655 total (0 active), Execution time: mean = 35.150 us, total = 3.011 s, Queueing time: mean = 109.620 us, max = 2.841 ms, min = 1.846 us, total = 9.390 s [state-dump] ObjectManager.UpdateAvailableMemory - 40762 total (0 active), Execution time: mean = 6.065 us, total = 247.228 ms, Queueing time: mean = 109.582 us, max = 1.662 ms, min = 2.228 us, total = 4.467 s [state-dump] NodeManager.CheckGC - 40762 total (1 active), Execution time: mean = 2.867 us, total = 116.861 ms, Queueing time: mean = 97.425 us, max = 25.875 ms, min = -0.000 s, total = 3.971 s [state-dump] RaySyncer.OnDemandBroadcasting - 40762 total (1 active), Execution time: mean = 10.483 us, total = 427.311 ms, Queueing time: mean = 90.701 us, max = 25.869 ms, min = 6.166 us, total = 3.697 s [state-dump] RayletWorkerPool.deadline_timer.kill_idle_workers - 20392 total (1 active), Execution time: mean = 18.250 us, total = 372.157 ms, Queueing time: mean = 76.250 us, max = 26.386 ms, min = -0.001 s, total = 1.555 s [state-dump] MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 16287 total (1 active), Execution time: mean = 454.462 us, total = 7.402 s, Queueing time: mean = 74.686 us, max = 3.532 ms, min = -0.000 s, total = 1.216 s [state-dump] NodeManager.deadline_timer.spill_objects_when_over_threshold - 4080 total (1 active), Execution time: mean = 3.088 us, total = 12.601 ms, Queueing time: mean = 181.888 us, max = 2.946 ms, min = 3.845 us, total = 742.104 ms [state-dump] NodeManager.deadline_timer.flush_free_objects - 4080 total (1 active), Execution time: mean = 9.047 us, total = 36.913 ms, Queueing time: mean = 177.797 us, max = 2.947 ms, min = 3.811 us, total = 725.413 ms [state-dump] NodeManager.ScheduleAndDispatchTasks - 4080 total (1 active), Execution time: mean = 16.672 us, total = 68.022 ms, Queueing time: mean = 73.873 us, max = 2.581 ms, min = 10.666 us, total = 301.401 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 4079 total (0 active), Execution time: mean = 96.569 us, total = 393.903 ms, Queueing time: mean = 111.831 us, max = 2.934 ms, min = 4.027 us, total = 456.158 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad - 4079 total (0 active), Execution time: mean = 610.325 us, total = 2.490 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ClusterResourceManager.ResetRemoteNodeView - 1361 total (1 active), Execution time: mean = 10.032 us, total = 13.654 ms, Queueing time: mean = 75.979 us, max = 442.307 us, min = 14.043 us, total = 103.408 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 816 total (0 active), Execution time: mean = 1.591 ms, total = 1.298 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.deadline_timer.record_metrics - 816 total (1 active), Execution time: mean = 581.308 us, total = 474.347 ms, Queueing time: mean = 342.201 us, max = 2.010 ms, min = 9.115 us, total = 279.236 ms [state-dump] NodeManager.GcsCheckAlive - 816 total (1 active), Execution time: mean = 306.692 us, total = 250.260 ms, Queueing time: mean = 616.320 us, max = 2.567 ms, min = 6.690 us, total = 502.917 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 816 total (0 active), Execution time: mean = 54.868 us, total = 44.773 ms, Queueing time: mean = 113.269 us, max = 4.779 ms, min = 11.561 us, total = 92.428 ms [state-dump] NodeManager.deadline_timer.debug_state_dump - 408 total (1 active), Execution time: mean = 1.774 ms, total = 723.746 ms, Queueing time: mean = 69.880 us, max = 176.385 us, min = 11.609 us, total = 28.511 ms [state-dump] ClientConnection.async_read.ProcessMessageHeader - 95 total (21 active), Execution time: mean = 7.620 us, total = 723.861 us, Queueing time: mean = 660.815 ms, max = 59.826 s, min = 29.537 us, total = 62.777 s [state-dump] ClientConnection.async_read.ProcessMessage - 74 total (0 active), Execution time: mean = 752.373 us, total = 55.676 ms, Queueing time: mean = 18.898 us, max = 247.615 us, min = 4.453 us, total = 1.398 ms [state-dump] NodeManager.deadline_timer.print_event_loop_stats - 68 total (1 active, 1 running), Execution time: mean = 2.690 ms, total = 182.917 ms, Queueing time: mean = 74.208 us, max = 215.454 us, min = 15.835 us, total = 5.046 ms [state-dump] ClientConnection.async_write.DoAsyncWrites - 22 total (0 active), Execution time: mean = 2.129 us, total = 46.834 us, Queueing time: mean = 43.475 us, max = 264.443 us, min = 17.727 us, total = 956.459 us [state-dump] ObjectManager.ObjectDeleted - 21 total (0 active), Execution time: mean = 18.654 us, total = 391.728 us, Queueing time: mean = 66.932 us, max = 269.040 us, min = 24.280 us, total = 1.406 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 21 total (0 active), Execution time: mean = 119.783 us, total = 2.515 ms, Queueing time: mean = 119.709 us, max = 193.975 us, min = 26.529 us, total = 2.514 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig - 21 total (0 active), Execution time: mean = 800.451 us, total = 16.809 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ObjectManager.ObjectAdded - 21 total (0 active), Execution time: mean = 13.522 us, total = 283.961 us, Queueing time: mean = 115.200 us, max = 203.067 us, min = 15.853 us, total = 2.419 ms [state-dump] PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 204.482 us, total = 2.658 ms, Queueing time: mean = 3.109 ms, max = 9.915 ms, min = 28.359 us, total = 40.413 ms [state-dump] - 12 total (0 active), Execution time: mean = 1.109 us, total = 13.308 us, Queueing time: mean = 117.235 us, max = 207.332 us, min = 32.817 us, total = 1.407 ms [state-dump] RaySyncer.BroadcastMessage - 12 total (0 active), Execution time: mean = 229.790 us, total = 2.757 ms, Queueing time: mean = 796.500 ns, max = 1.330 us, min = 180.000 ns, total = 9.558 us [state-dump] NodeManagerService.grpc_server.RequestWorkerLease.HandleRequestImpl - 10 total (0 active), Execution time: mean = 133.400 us, total = 1.334 ms, Queueing time: mean = 133.654 us, max = 258.016 us, min = 50.206 us, total = 1.337 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker - 10 total (0 active), Execution time: mean = 698.937 us, total = 6.989 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.RequestWorkerLease - 10 total (0 active), Execution time: mean = 967.372 us, total = 9.674 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] WorkerPool.PopWorkerCallback - 10 total (0 active), Execution time: mean = 24.488 us, total = 244.878 us, Queueing time: mean = 114.997 us, max = 206.869 us, min = 23.653 us, total = 1.150 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker.HandleRequestImpl - 10 total (0 active), Execution time: mean = 129.906 us, total = 1.299 ms, Queueing time: mean = 109.463 us, max = 137.712 us, min = 31.332 us, total = 1.095 ms [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 8 total (1 active), Execution time: mean = 449.701 s, total = 3597.605 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 7 total (0 active), Execution time: mean = 367.018 us, total = 2.569 ms, Queueing time: mean = 127.648 us, max = 189.048 us, min = 20.299 us, total = 893.539 us [state-dump] NodeManager.GCTaskFailureReason - 5 total (1 active), Execution time: mean = 6.901 us, total = 34.506 us, Queueing time: mean = 58.173 us, max = 97.290 us, min = 24.344 us, total = 290.867 us [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 1.320 ms, total = 2.640 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 142.119 us, total = 284.237 us, Queueing time: mean = 702.200 us, max = 1.287 ms, min = 117.240 us, total = 1.404 ms [state-dump] RaySyncerRegister - 2 total (0 active), Execution time: mean = 2.053 us, total = 4.106 us, Queueing time: mean = 222.000 ns, max = 369.000 ns, min = 75.000 ns, total = 444.000 ns [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 2.178 ms, total = 2.178 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 1.022 s, total = 1.022 s, Queueing time: mean = 126.779 us, max = 126.779 us, min = 126.779 us, total = 126.779 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 43.810 us, total = 43.810 us, Queueing time: mean = 116.080 us, max = 116.080 us, min = 116.080 us, total = 116.080 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 1.935 ms, total = 1.935 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 2.002 ms, total = 2.002 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 1.278 ms, total = 1.278 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 288.995 us, total = 288.995 us, Queueing time: mean = 122.868 us, max = 122.868 us, min = 122.868 us, total = 122.868 us [state-dump] Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 74.750 us, total = 74.750 us, Queueing time: mean = 225.429 us, max = 225.429 us, min = 225.429 us, total = 225.429 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 242.297 us, total = 242.297 us, Queueing time: mean = 122.977 us, max = 122.977 us, min = 122.977 us, total = 122.977 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 92.080 us, total = 92.080 us, Queueing time: mean = 36.492 us, max = 36.492 us, min = 36.492 us, total = 36.492 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 1.746 ms, total = 1.746 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] DebugString() time ms: 2 [state-dump] [state-dump] [2025-01-21 00:10:54,924 I 13636 13664] (raylet) store.cc:564: Plasma store debug dump: Current usage: 0 / 2.14748 GB - num bytes created total: 168 0 pending objects of total size 0MB - objects spillable: 0 - bytes spillable: 0 - objects unsealed: 0 - bytes unsealed: 0 - objects in use: 0 - bytes in use: 0 - objects evictable: 0 - bytes evictable: 0 - objects created by worker: 0 - bytes created by worker: 0 - objects restored: 0 - bytes restored: 0 - objects received: 0 - bytes received: 0 - objects errored: 0 - bytes errored: 0 [2025-01-21 00:10:56,119 I 13636 13636] (raylet) node_manager.cc:525: [state-dump] NodeManager: [state-dump] Node ID: 94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3 [state-dump] Node name: 192.168.0.2 [state-dump] InitialConfigResources: {object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, GPU: 20000, node:__internal_head__: 10000, accelerator_type:A40: 10000, memory: 844966629380000} [state-dump] ClusterTaskManager: [state-dump] ========== Node: 94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3 ================= [state-dump] Infeasible queue length: 0 [state-dump] Schedule queue length: 0 [state-dump] Dispatch queue length: 0 [state-dump] num_waiting_for_resource: 0 [state-dump] num_waiting_for_plasma_memory: 0 [state-dump] num_waiting_for_remote_node_resources: 0 [state-dump] num_worker_not_started_by_job_config_not_exist: 0 [state-dump] num_worker_not_started_by_registration_timeout: 0 [state-dump] num_tasks_waiting_for_workers: 0 [state-dump] num_cancelled_tasks: 0 [state-dump] cluster_resource_scheduler state: [state-dump] Local id: -2729954820020547849 Local resources: {"total":{CPU: [200000], memory: [844966629380000], accelerator_type:A40: [10000], node:192.168.0.2: [10000], node:__internal_head__: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000]}}, "available": {CPU: [200000], memory: [844966629380000], accelerator_type:A40: [10000], node:192.168.0.2: [10000], node:__internal_head__: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000]}}, "labels":{"ray.io/node_id":"94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3",} is_draining: 0 is_idle: 1 Cluster resources: node id: -2729954820020547849{"total":{object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, GPU: 20000, node:__internal_head__: 10000, accelerator_type:A40: 10000, memory: 844966629380000}}, "available": {GPU: 20000, node:__internal_head__: 10000, memory: 844966629380000, accelerator_type:A40: 10000, object_store_memory: 21474836480000, node:192.168.0.2: 10000, CPU: 200000}}, "labels":{"ray.io/node_id":"94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} [state-dump] Waiting tasks size: 0 [state-dump] Number of executing tasks: 0 [state-dump] Number of pinned task arguments: 0 [state-dump] Number of total spilled tasks: 0 [state-dump] Number of spilled waiting tasks: 0 [state-dump] Number of spilled unschedulable tasks: 0 [state-dump] Resource usage { [state-dump] } [state-dump] Backlog Size per scheduling descriptor :{workerId: num backlogs}: [state-dump] [state-dump] Running tasks by scheduling class: [state-dump] ================================================== [state-dump] [state-dump] ClusterResources: [state-dump] LocalObjectManager: [state-dump] - num pinned objects: 0 [state-dump] - pinned objects size: 0 [state-dump] - num objects pending restore: 0 [state-dump] - num objects pending spill: 0 [state-dump] - num bytes pending spill: 0 [state-dump] - num bytes currently spilled: 0 [state-dump] - cumulative spill requests: 0 [state-dump] - cumulative restore requests: 0 [state-dump] - spilled objects pending delete: 0 [state-dump] [state-dump] ObjectManager: [state-dump] - num local objects: 0 [state-dump] - num unfulfilled push requests: 0 [state-dump] - num object pull requests: 0 [state-dump] - num chunks received total: 0 [state-dump] - num chunks received failed (all): 0 [state-dump] - num chunks received failed / cancelled: 0 [state-dump] - num chunks received failed / plasma error: 0 [state-dump] Event stats: [state-dump] Global stats: 0 total (0 active) [state-dump] Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Execution time: mean = -nan s, total = 0.000 s [state-dump] Event stats: [state-dump] PushManager: [state-dump] - num pushes in flight: 0 [state-dump] - num chunks in flight: 0 [state-dump] - num chunks remaining: 0 [state-dump] - max chunks allowed: 409 [state-dump] OwnershipBasedObjectDirectory: [state-dump] - num listeners: 0 [state-dump] - cumulative location updates: 0 [state-dump] - num location updates per second: 0.000 [state-dump] - num location lookups per second: 0.000 [state-dump] - num locations added per second: 0.000 [state-dump] - num locations removed per second: 0.000 [state-dump] BufferPool: [state-dump] - create buffer state map size: 0 [state-dump] PullManager: [state-dump] - num bytes available for pulled objects: 2147483648 [state-dump] - num bytes being pulled (all): 0 [state-dump] - num bytes being pulled / pinned: 0 [state-dump] - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - first get request bundle: N/A [state-dump] - first wait request bundle: N/A [state-dump] - first task request bundle: N/A [state-dump] - num objects queued: 0 [state-dump] - num objects actively pulled (all): 0 [state-dump] - num objects actively pulled / pinned: 0 [state-dump] - num bundles being pulled: 0 [state-dump] - num pull retries: 0 [state-dump] - max timeout seconds: 0 [state-dump] - max timeout request is already processed. No entry. [state-dump] [state-dump] WorkerPool: [state-dump] - registered jobs: 1 [state-dump] - process_failed_job_config_missing: 0 [state-dump] - process_failed_rate_limited: 0 [state-dump] - process_failed_pending_registration: 0 [state-dump] - process_failed_runtime_env_setup_failed: 0 [state-dump] - num PYTHON workers: 20 [state-dump] - num PYTHON drivers: 1 [state-dump] - num PYTHON pending start requests: 0 [state-dump] - num PYTHON pending registration requests: 0 [state-dump] - num object spill callbacks queued: 0 [state-dump] - num object restore queued: 0 [state-dump] - num util functions queued: 0 [state-dump] - num idle workers: 20 [state-dump] TaskDependencyManager: [state-dump] - task deps map size: 0 [state-dump] - get req map size: 0 [state-dump] - wait req map size: 0 [state-dump] - local objects map size: 0 [state-dump] WaitManager: [state-dump] - num active wait requests: 0 [state-dump] Subscriber: [state-dump] Channel WORKER_OBJECT_LOCATIONS_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_OBJECT_EVICTION [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_REF_REMOVED_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] num async plasma notifications: 0 [state-dump] Remote node managers: [state-dump] Event stats: [state-dump] Global stats: 361396 total (39 active) [state-dump] Queueing time: mean = 251.273 us, max = 59.826 s, min = -0.001 s, total = 90.809 s [state-dump] Execution time: mean = 10.134 ms, total = 3662.553 s [state-dump] Event stats: [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog - 86912 total (2 active), Execution time: mean = 528.893 us, total = 45.967 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 86912 total (2 active), Execution time: mean = 35.196 us, total = 3.059 s, Queueing time: mean = 109.817 us, max = 2.841 ms, min = 1.846 us, total = 9.544 s [state-dump] ObjectManager.UpdateAvailableMemory - 41361 total (0 active), Execution time: mean = 6.079 us, total = 251.451 ms, Queueing time: mean = 109.755 us, max = 1.662 ms, min = 2.228 us, total = 4.540 s [state-dump] NodeManager.CheckGC - 41361 total (1 active), Execution time: mean = 2.873 us, total = 118.828 ms, Queueing time: mean = 97.542 us, max = 25.875 ms, min = -0.000 s, total = 4.034 s [state-dump] RaySyncer.OnDemandBroadcasting - 41361 total (1 active), Execution time: mean = 10.508 us, total = 434.633 ms, Queueing time: mean = 90.799 us, max = 25.869 ms, min = 6.166 us, total = 3.756 s [state-dump] RayletWorkerPool.deadline_timer.kill_idle_workers - 20692 total (1 active), Execution time: mean = 18.337 us, total = 379.439 ms, Queueing time: mean = 76.354 us, max = 26.386 ms, min = -0.001 s, total = 1.580 s [state-dump] MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 16526 total (1 active), Execution time: mean = 454.782 us, total = 7.516 s, Queueing time: mean = 74.759 us, max = 3.532 ms, min = -0.000 s, total = 1.235 s [state-dump] NodeManager.deadline_timer.spill_objects_when_over_threshold - 4140 total (1 active), Execution time: mean = 3.095 us, total = 12.813 ms, Queueing time: mean = 181.955 us, max = 2.946 ms, min = 3.845 us, total = 753.293 ms [state-dump] NodeManager.deadline_timer.flush_free_objects - 4140 total (1 active), Execution time: mean = 9.087 us, total = 37.621 ms, Queueing time: mean = 177.843 us, max = 2.947 ms, min = 3.811 us, total = 736.269 ms [state-dump] NodeManager.ScheduleAndDispatchTasks - 4140 total (1 active), Execution time: mean = 16.741 us, total = 69.306 ms, Queueing time: mean = 73.951 us, max = 2.581 ms, min = 10.666 us, total = 306.157 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 4138 total (0 active), Execution time: mean = 96.660 us, total = 399.981 ms, Queueing time: mean = 111.937 us, max = 2.934 ms, min = 4.027 us, total = 463.196 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad - 4138 total (0 active), Execution time: mean = 610.812 us, total = 2.528 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ClusterResourceManager.ResetRemoteNodeView - 1381 total (1 active), Execution time: mean = 10.026 us, total = 13.846 ms, Queueing time: mean = 75.982 us, max = 442.307 us, min = 14.043 us, total = 104.931 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 828 total (0 active), Execution time: mean = 1.592 ms, total = 1.318 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.deadline_timer.record_metrics - 828 total (1 active), Execution time: mean = 581.871 us, total = 481.789 ms, Queueing time: mean = 341.976 us, max = 2.010 ms, min = 9.115 us, total = 283.156 ms [state-dump] NodeManager.GcsCheckAlive - 828 total (1 active), Execution time: mean = 307.521 us, total = 254.627 ms, Queueing time: mean = 615.842 us, max = 2.567 ms, min = 6.690 us, total = 509.917 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 828 total (0 active), Execution time: mean = 54.992 us, total = 45.534 ms, Queueing time: mean = 113.167 us, max = 4.779 ms, min = 11.561 us, total = 93.702 ms [state-dump] NodeManager.deadline_timer.debug_state_dump - 414 total (1 active), Execution time: mean = 1.775 ms, total = 734.748 ms, Queueing time: mean = 70.026 us, max = 176.385 us, min = 11.609 us, total = 28.991 ms [state-dump] ClientConnection.async_read.ProcessMessageHeader - 95 total (21 active), Execution time: mean = 7.620 us, total = 723.861 us, Queueing time: mean = 660.815 ms, max = 59.826 s, min = 29.537 us, total = 62.777 s [state-dump] ClientConnection.async_read.ProcessMessage - 74 total (0 active), Execution time: mean = 752.373 us, total = 55.676 ms, Queueing time: mean = 18.898 us, max = 247.615 us, min = 4.453 us, total = 1.398 ms [state-dump] NodeManager.deadline_timer.print_event_loop_stats - 69 total (1 active, 1 running), Execution time: mean = 2.694 ms, total = 185.880 ms, Queueing time: mean = 75.243 us, max = 215.454 us, min = 15.835 us, total = 5.192 ms [state-dump] ClientConnection.async_write.DoAsyncWrites - 22 total (0 active), Execution time: mean = 2.129 us, total = 46.834 us, Queueing time: mean = 43.475 us, max = 264.443 us, min = 17.727 us, total = 956.459 us [state-dump] ObjectManager.ObjectDeleted - 21 total (0 active), Execution time: mean = 18.654 us, total = 391.728 us, Queueing time: mean = 66.932 us, max = 269.040 us, min = 24.280 us, total = 1.406 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 21 total (0 active), Execution time: mean = 119.783 us, total = 2.515 ms, Queueing time: mean = 119.709 us, max = 193.975 us, min = 26.529 us, total = 2.514 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig - 21 total (0 active), Execution time: mean = 800.451 us, total = 16.809 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ObjectManager.ObjectAdded - 21 total (0 active), Execution time: mean = 13.522 us, total = 283.961 us, Queueing time: mean = 115.200 us, max = 203.067 us, min = 15.853 us, total = 2.419 ms [state-dump] PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 204.482 us, total = 2.658 ms, Queueing time: mean = 3.109 ms, max = 9.915 ms, min = 28.359 us, total = 40.413 ms [state-dump] - 12 total (0 active), Execution time: mean = 1.109 us, total = 13.308 us, Queueing time: mean = 117.235 us, max = 207.332 us, min = 32.817 us, total = 1.407 ms [state-dump] RaySyncer.BroadcastMessage - 12 total (0 active), Execution time: mean = 229.790 us, total = 2.757 ms, Queueing time: mean = 796.500 ns, max = 1.330 us, min = 180.000 ns, total = 9.558 us [state-dump] NodeManagerService.grpc_server.RequestWorkerLease.HandleRequestImpl - 10 total (0 active), Execution time: mean = 133.400 us, total = 1.334 ms, Queueing time: mean = 133.654 us, max = 258.016 us, min = 50.206 us, total = 1.337 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker - 10 total (0 active), Execution time: mean = 698.937 us, total = 6.989 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.RequestWorkerLease - 10 total (0 active), Execution time: mean = 967.372 us, total = 9.674 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] WorkerPool.PopWorkerCallback - 10 total (0 active), Execution time: mean = 24.488 us, total = 244.878 us, Queueing time: mean = 114.997 us, max = 206.869 us, min = 23.653 us, total = 1.150 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker.HandleRequestImpl - 10 total (0 active), Execution time: mean = 129.906 us, total = 1.299 ms, Queueing time: mean = 109.463 us, max = 137.712 us, min = 31.332 us, total = 1.095 ms [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 8 total (1 active), Execution time: mean = 449.701 s, total = 3597.605 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 7 total (0 active), Execution time: mean = 367.018 us, total = 2.569 ms, Queueing time: mean = 127.648 us, max = 189.048 us, min = 20.299 us, total = 893.539 us [state-dump] NodeManager.GCTaskFailureReason - 5 total (1 active), Execution time: mean = 6.901 us, total = 34.506 us, Queueing time: mean = 58.173 us, max = 97.290 us, min = 24.344 us, total = 290.867 us [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 1.320 ms, total = 2.640 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 142.119 us, total = 284.237 us, Queueing time: mean = 702.200 us, max = 1.287 ms, min = 117.240 us, total = 1.404 ms [state-dump] RaySyncerRegister - 2 total (0 active), Execution time: mean = 2.053 us, total = 4.106 us, Queueing time: mean = 222.000 ns, max = 369.000 ns, min = 75.000 ns, total = 444.000 ns [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 2.178 ms, total = 2.178 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 1.022 s, total = 1.022 s, Queueing time: mean = 126.779 us, max = 126.779 us, min = 126.779 us, total = 126.779 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 43.810 us, total = 43.810 us, Queueing time: mean = 116.080 us, max = 116.080 us, min = 116.080 us, total = 116.080 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 1.935 ms, total = 1.935 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 2.002 ms, total = 2.002 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 1.278 ms, total = 1.278 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 288.995 us, total = 288.995 us, Queueing time: mean = 122.868 us, max = 122.868 us, min = 122.868 us, total = 122.868 us [state-dump] Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 74.750 us, total = 74.750 us, Queueing time: mean = 225.429 us, max = 225.429 us, min = 225.429 us, total = 225.429 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 242.297 us, total = 242.297 us, Queueing time: mean = 122.977 us, max = 122.977 us, min = 122.977 us, total = 122.977 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 92.080 us, total = 92.080 us, Queueing time: mean = 36.492 us, max = 36.492 us, min = 36.492 us, total = 36.492 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 1.746 ms, total = 1.746 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] DebugString() time ms: 2 [state-dump] [state-dump] [2025-01-21 00:11:54,924 I 13636 13664] (raylet) store.cc:564: Plasma store debug dump: Current usage: 0 / 2.14748 GB - num bytes created total: 168 0 pending objects of total size 0MB - objects spillable: 0 - bytes spillable: 0 - objects unsealed: 0 - bytes unsealed: 0 - objects in use: 0 - bytes in use: 0 - objects evictable: 0 - bytes evictable: 0 - objects created by worker: 0 - bytes created by worker: 0 - objects restored: 0 - bytes restored: 0 - objects received: 0 - bytes received: 0 - objects errored: 0 - bytes errored: 0 [2025-01-21 00:11:56,122 I 13636 13636] (raylet) node_manager.cc:525: [state-dump] NodeManager: [state-dump] Node ID: 94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3 [state-dump] Node name: 192.168.0.2 [state-dump] InitialConfigResources: {object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, GPU: 20000, node:__internal_head__: 10000, accelerator_type:A40: 10000, memory: 844966629380000} [state-dump] ClusterTaskManager: [state-dump] ========== Node: 94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3 ================= [state-dump] Infeasible queue length: 0 [state-dump] Schedule queue length: 0 [state-dump] Dispatch queue length: 0 [state-dump] num_waiting_for_resource: 0 [state-dump] num_waiting_for_plasma_memory: 0 [state-dump] num_waiting_for_remote_node_resources: 0 [state-dump] num_worker_not_started_by_job_config_not_exist: 0 [state-dump] num_worker_not_started_by_registration_timeout: 0 [state-dump] num_tasks_waiting_for_workers: 0 [state-dump] num_cancelled_tasks: 0 [state-dump] cluster_resource_scheduler state: [state-dump] Local id: -2729954820020547849 Local resources: {"total":{CPU: [200000], memory: [844966629380000], accelerator_type:A40: [10000], node:192.168.0.2: [10000], node:__internal_head__: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000]}}, "available": {CPU: [200000], memory: [844966629380000], accelerator_type:A40: [10000], node:192.168.0.2: [10000], node:__internal_head__: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000]}}, "labels":{"ray.io/node_id":"94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3",} is_draining: 0 is_idle: 1 Cluster resources: node id: -2729954820020547849{"total":{object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, GPU: 20000, node:__internal_head__: 10000, accelerator_type:A40: 10000, memory: 844966629380000}}, "available": {GPU: 20000, node:__internal_head__: 10000, memory: 844966629380000, accelerator_type:A40: 10000, object_store_memory: 21474836480000, node:192.168.0.2: 10000, CPU: 200000}}, "labels":{"ray.io/node_id":"94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} [state-dump] Waiting tasks size: 0 [state-dump] Number of executing tasks: 0 [state-dump] Number of pinned task arguments: 0 [state-dump] Number of total spilled tasks: 0 [state-dump] Number of spilled waiting tasks: 0 [state-dump] Number of spilled unschedulable tasks: 0 [state-dump] Resource usage { [state-dump] } [state-dump] Backlog Size per scheduling descriptor :{workerId: num backlogs}: [state-dump] [state-dump] Running tasks by scheduling class: [state-dump] ================================================== [state-dump] [state-dump] ClusterResources: [state-dump] LocalObjectManager: [state-dump] - num pinned objects: 0 [state-dump] - pinned objects size: 0 [state-dump] - num objects pending restore: 0 [state-dump] - num objects pending spill: 0 [state-dump] - num bytes pending spill: 0 [state-dump] - num bytes currently spilled: 0 [state-dump] - cumulative spill requests: 0 [state-dump] - cumulative restore requests: 0 [state-dump] - spilled objects pending delete: 0 [state-dump] [state-dump] ObjectManager: [state-dump] - num local objects: 0 [state-dump] - num unfulfilled push requests: 0 [state-dump] - num object pull requests: 0 [state-dump] - num chunks received total: 0 [state-dump] - num chunks received failed (all): 0 [state-dump] - num chunks received failed / cancelled: 0 [state-dump] - num chunks received failed / plasma error: 0 [state-dump] Event stats: [state-dump] Global stats: 0 total (0 active) [state-dump] Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Execution time: mean = -nan s, total = 0.000 s [state-dump] Event stats: [state-dump] PushManager: [state-dump] - num pushes in flight: 0 [state-dump] - num chunks in flight: 0 [state-dump] - num chunks remaining: 0 [state-dump] - max chunks allowed: 409 [state-dump] OwnershipBasedObjectDirectory: [state-dump] - num listeners: 0 [state-dump] - cumulative location updates: 0 [state-dump] - num location updates per second: 0.000 [state-dump] - num location lookups per second: 0.000 [state-dump] - num locations added per second: 0.000 [state-dump] - num locations removed per second: 0.000 [state-dump] BufferPool: [state-dump] - create buffer state map size: 0 [state-dump] PullManager: [state-dump] - num bytes available for pulled objects: 2147483648 [state-dump] - num bytes being pulled (all): 0 [state-dump] - num bytes being pulled / pinned: 0 [state-dump] - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - first get request bundle: N/A [state-dump] - first wait request bundle: N/A [state-dump] - first task request bundle: N/A [state-dump] - num objects queued: 0 [state-dump] - num objects actively pulled (all): 0 [state-dump] - num objects actively pulled / pinned: 0 [state-dump] - num bundles being pulled: 0 [state-dump] - num pull retries: 0 [state-dump] - max timeout seconds: 0 [state-dump] - max timeout request is already processed. No entry. [state-dump] [state-dump] WorkerPool: [state-dump] - registered jobs: 1 [state-dump] - process_failed_job_config_missing: 0 [state-dump] - process_failed_rate_limited: 0 [state-dump] - process_failed_pending_registration: 0 [state-dump] - process_failed_runtime_env_setup_failed: 0 [state-dump] - num PYTHON workers: 20 [state-dump] - num PYTHON drivers: 1 [state-dump] - num PYTHON pending start requests: 0 [state-dump] - num PYTHON pending registration requests: 0 [state-dump] - num object spill callbacks queued: 0 [state-dump] - num object restore queued: 0 [state-dump] - num util functions queued: 0 [state-dump] - num idle workers: 20 [state-dump] TaskDependencyManager: [state-dump] - task deps map size: 0 [state-dump] - get req map size: 0 [state-dump] - wait req map size: 0 [state-dump] - local objects map size: 0 [state-dump] WaitManager: [state-dump] - num active wait requests: 0 [state-dump] Subscriber: [state-dump] Channel WORKER_OBJECT_LOCATIONS_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_OBJECT_EVICTION [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_REF_REMOVED_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] num async plasma notifications: 0 [state-dump] Remote node managers: [state-dump] Event stats: [state-dump] Global stats: 366605 total (35 active) [state-dump] Queueing time: mean = 248.937 us, max = 59.826 s, min = -0.001 s, total = 91.262 s [state-dump] Execution time: mean = 11.630 ms, total = 4263.525 s [state-dump] Event stats: [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog - 88158 total (0 active), Execution time: mean = 529.384 us, total = 46.669 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 88158 total (0 active), Execution time: mean = 35.217 us, total = 3.105 s, Queueing time: mean = 110.140 us, max = 3.003 ms, min = 1.846 us, total = 9.710 s [state-dump] ObjectManager.UpdateAvailableMemory - 41961 total (0 active), Execution time: mean = 6.091 us, total = 255.594 ms, Queueing time: mean = 109.893 us, max = 1.662 ms, min = 2.228 us, total = 4.611 s [state-dump] NodeManager.CheckGC - 41961 total (1 active), Execution time: mean = 2.875 us, total = 120.641 ms, Queueing time: mean = 97.686 us, max = 25.875 ms, min = -0.000 s, total = 4.099 s [state-dump] RaySyncer.OnDemandBroadcasting - 41961 total (1 active), Execution time: mean = 10.545 us, total = 442.498 ms, Queueing time: mean = 90.910 us, max = 25.869 ms, min = 6.166 us, total = 3.815 s [state-dump] RayletWorkerPool.deadline_timer.kill_idle_workers - 20992 total (1 active), Execution time: mean = 18.403 us, total = 386.312 ms, Queueing time: mean = 76.421 us, max = 26.386 ms, min = -0.001 s, total = 1.604 s [state-dump] MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 16766 total (1 active), Execution time: mean = 454.814 us, total = 7.625 s, Queueing time: mean = 74.815 us, max = 3.532 ms, min = -0.000 s, total = 1.254 s [state-dump] NodeManager.deadline_timer.spill_objects_when_over_threshold - 4200 total (1 active), Execution time: mean = 3.105 us, total = 13.041 ms, Queueing time: mean = 182.013 us, max = 2.946 ms, min = 3.845 us, total = 764.453 ms [state-dump] NodeManager.deadline_timer.flush_free_objects - 4200 total (1 active), Execution time: mean = 9.103 us, total = 38.233 ms, Queueing time: mean = 177.900 us, max = 2.947 ms, min = 3.811 us, total = 747.181 ms [state-dump] NodeManager.ScheduleAndDispatchTasks - 4200 total (1 active), Execution time: mean = 16.767 us, total = 70.423 ms, Queueing time: mean = 73.978 us, max = 2.581 ms, min = 8.735 us, total = 310.709 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 4198 total (0 active), Execution time: mean = 96.870 us, total = 406.658 ms, Queueing time: mean = 112.107 us, max = 2.934 ms, min = 4.027 us, total = 470.624 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad - 4198 total (0 active), Execution time: mean = 611.158 us, total = 2.566 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ClusterResourceManager.ResetRemoteNodeView - 1401 total (1 active), Execution time: mean = 10.042 us, total = 14.069 ms, Queueing time: mean = 75.990 us, max = 442.307 us, min = 14.043 us, total = 106.462 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 840 total (0 active), Execution time: mean = 1.593 ms, total = 1.338 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.deadline_timer.record_metrics - 840 total (1 active), Execution time: mean = 582.046 us, total = 488.918 ms, Queueing time: mean = 341.861 us, max = 2.010 ms, min = 9.115 us, total = 287.163 ms [state-dump] NodeManager.GcsCheckAlive - 840 total (1 active), Execution time: mean = 308.387 us, total = 259.045 ms, Queueing time: mean = 615.259 us, max = 2.567 ms, min = 6.690 us, total = 516.817 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 840 total (0 active), Execution time: mean = 55.025 us, total = 46.221 ms, Queueing time: mean = 113.122 us, max = 4.779 ms, min = 11.561 us, total = 95.022 ms [state-dump] NodeManager.deadline_timer.debug_state_dump - 420 total (1 active), Execution time: mean = 1.775 ms, total = 745.322 ms, Queueing time: mean = 70.130 us, max = 176.385 us, min = 11.609 us, total = 29.455 ms [state-dump] ClientConnection.async_read.ProcessMessageHeader - 95 total (21 active), Execution time: mean = 7.620 us, total = 723.861 us, Queueing time: mean = 660.815 ms, max = 59.826 s, min = 29.537 us, total = 62.777 s [state-dump] ClientConnection.async_read.ProcessMessage - 74 total (0 active), Execution time: mean = 752.373 us, total = 55.676 ms, Queueing time: mean = 18.898 us, max = 247.615 us, min = 4.453 us, total = 1.398 ms [state-dump] NodeManager.deadline_timer.print_event_loop_stats - 70 total (1 active, 1 running), Execution time: mean = 2.698 ms, total = 188.850 ms, Queueing time: mean = 78.812 us, max = 325.100 us, min = 15.835 us, total = 5.517 ms [state-dump] ClientConnection.async_write.DoAsyncWrites - 22 total (0 active), Execution time: mean = 2.129 us, total = 46.834 us, Queueing time: mean = 43.475 us, max = 264.443 us, min = 17.727 us, total = 956.459 us [state-dump] ObjectManager.ObjectDeleted - 21 total (0 active), Execution time: mean = 18.654 us, total = 391.728 us, Queueing time: mean = 66.932 us, max = 269.040 us, min = 24.280 us, total = 1.406 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 21 total (0 active), Execution time: mean = 119.783 us, total = 2.515 ms, Queueing time: mean = 119.709 us, max = 193.975 us, min = 26.529 us, total = 2.514 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig - 21 total (0 active), Execution time: mean = 800.451 us, total = 16.809 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ObjectManager.ObjectAdded - 21 total (0 active), Execution time: mean = 13.522 us, total = 283.961 us, Queueing time: mean = 115.200 us, max = 203.067 us, min = 15.853 us, total = 2.419 ms [state-dump] PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 204.482 us, total = 2.658 ms, Queueing time: mean = 3.109 ms, max = 9.915 ms, min = 28.359 us, total = 40.413 ms [state-dump] - 12 total (0 active), Execution time: mean = 1.109 us, total = 13.308 us, Queueing time: mean = 117.235 us, max = 207.332 us, min = 32.817 us, total = 1.407 ms [state-dump] RaySyncer.BroadcastMessage - 12 total (0 active), Execution time: mean = 229.790 us, total = 2.757 ms, Queueing time: mean = 796.500 ns, max = 1.330 us, min = 180.000 ns, total = 9.558 us [state-dump] NodeManagerService.grpc_server.RequestWorkerLease.HandleRequestImpl - 10 total (0 active), Execution time: mean = 133.400 us, total = 1.334 ms, Queueing time: mean = 133.654 us, max = 258.016 us, min = 50.206 us, total = 1.337 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker - 10 total (0 active), Execution time: mean = 698.937 us, total = 6.989 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.RequestWorkerLease - 10 total (0 active), Execution time: mean = 967.372 us, total = 9.674 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] WorkerPool.PopWorkerCallback - 10 total (0 active), Execution time: mean = 24.488 us, total = 244.878 us, Queueing time: mean = 114.997 us, max = 206.869 us, min = 23.653 us, total = 1.150 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker.HandleRequestImpl - 10 total (0 active), Execution time: mean = 129.906 us, total = 1.299 ms, Queueing time: mean = 109.463 us, max = 137.712 us, min = 31.332 us, total = 1.095 ms [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 9 total (1 active), Execution time: mean = 466.401 s, total = 4197.607 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 8 total (0 active), Execution time: mean = 381.115 us, total = 3.049 ms, Queueing time: mean = 142.114 us, max = 243.371 us, min = 20.299 us, total = 1.137 ms [state-dump] NodeManager.GCTaskFailureReason - 5 total (1 active), Execution time: mean = 6.901 us, total = 34.506 us, Queueing time: mean = 58.173 us, max = 97.290 us, min = 24.344 us, total = 290.867 us [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 1.320 ms, total = 2.640 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 142.119 us, total = 284.237 us, Queueing time: mean = 702.200 us, max = 1.287 ms, min = 117.240 us, total = 1.404 ms [state-dump] RaySyncerRegister - 2 total (0 active), Execution time: mean = 2.053 us, total = 4.106 us, Queueing time: mean = 222.000 ns, max = 369.000 ns, min = 75.000 ns, total = 444.000 ns [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 2.178 ms, total = 2.178 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 1.022 s, total = 1.022 s, Queueing time: mean = 126.779 us, max = 126.779 us, min = 126.779 us, total = 126.779 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 43.810 us, total = 43.810 us, Queueing time: mean = 116.080 us, max = 116.080 us, min = 116.080 us, total = 116.080 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 1.935 ms, total = 1.935 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 2.002 ms, total = 2.002 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 1.278 ms, total = 1.278 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 288.995 us, total = 288.995 us, Queueing time: mean = 122.868 us, max = 122.868 us, min = 122.868 us, total = 122.868 us [state-dump] Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 74.750 us, total = 74.750 us, Queueing time: mean = 225.429 us, max = 225.429 us, min = 225.429 us, total = 225.429 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 242.297 us, total = 242.297 us, Queueing time: mean = 122.977 us, max = 122.977 us, min = 122.977 us, total = 122.977 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 92.080 us, total = 92.080 us, Queueing time: mean = 36.492 us, max = 36.492 us, min = 36.492 us, total = 36.492 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 1.746 ms, total = 1.746 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] DebugString() time ms: 2 [state-dump] [state-dump] [2025-01-21 00:12:54,924 I 13636 13664] (raylet) store.cc:564: Plasma store debug dump: Current usage: 0 / 2.14748 GB - num bytes created total: 168 0 pending objects of total size 0MB - objects spillable: 0 - bytes spillable: 0 - objects unsealed: 0 - bytes unsealed: 0 - objects in use: 0 - bytes in use: 0 - objects evictable: 0 - bytes evictable: 0 - objects created by worker: 0 - bytes created by worker: 0 - objects restored: 0 - bytes restored: 0 - objects received: 0 - bytes received: 0 - objects errored: 0 - bytes errored: 0 [2025-01-21 00:12:56,125 I 13636 13636] (raylet) node_manager.cc:525: [state-dump] NodeManager: [state-dump] Node ID: 94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3 [state-dump] Node name: 192.168.0.2 [state-dump] InitialConfigResources: {object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, GPU: 20000, node:__internal_head__: 10000, accelerator_type:A40: 10000, memory: 844966629380000} [state-dump] ClusterTaskManager: [state-dump] ========== Node: 94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3 ================= [state-dump] Infeasible queue length: 0 [state-dump] Schedule queue length: 0 [state-dump] Dispatch queue length: 0 [state-dump] num_waiting_for_resource: 0 [state-dump] num_waiting_for_plasma_memory: 0 [state-dump] num_waiting_for_remote_node_resources: 0 [state-dump] num_worker_not_started_by_job_config_not_exist: 0 [state-dump] num_worker_not_started_by_registration_timeout: 0 [state-dump] num_tasks_waiting_for_workers: 0 [state-dump] num_cancelled_tasks: 0 [state-dump] cluster_resource_scheduler state: [state-dump] Local id: -2729954820020547849 Local resources: {"total":{CPU: [200000], memory: [844966629380000], accelerator_type:A40: [10000], node:192.168.0.2: [10000], node:__internal_head__: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000]}}, "available": {CPU: [200000], memory: [844966629380000], accelerator_type:A40: [10000], node:192.168.0.2: [10000], node:__internal_head__: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000]}}, "labels":{"ray.io/node_id":"94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3",} is_draining: 0 is_idle: 1 Cluster resources: node id: -2729954820020547849{"total":{object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, GPU: 20000, node:__internal_head__: 10000, accelerator_type:A40: 10000, memory: 844966629380000}}, "available": {GPU: 20000, node:__internal_head__: 10000, memory: 844966629380000, accelerator_type:A40: 10000, object_store_memory: 21474836480000, node:192.168.0.2: 10000, CPU: 200000}}, "labels":{"ray.io/node_id":"94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} [state-dump] Waiting tasks size: 0 [state-dump] Number of executing tasks: 0 [state-dump] Number of pinned task arguments: 0 [state-dump] Number of total spilled tasks: 0 [state-dump] Number of spilled waiting tasks: 0 [state-dump] Number of spilled unschedulable tasks: 0 [state-dump] Resource usage { [state-dump] } [state-dump] Backlog Size per scheduling descriptor :{workerId: num backlogs}: [state-dump] [state-dump] Running tasks by scheduling class: [state-dump] ================================================== [state-dump] [state-dump] ClusterResources: [state-dump] LocalObjectManager: [state-dump] - num pinned objects: 0 [state-dump] - pinned objects size: 0 [state-dump] - num objects pending restore: 0 [state-dump] - num objects pending spill: 0 [state-dump] - num bytes pending spill: 0 [state-dump] - num bytes currently spilled: 0 [state-dump] - cumulative spill requests: 0 [state-dump] - cumulative restore requests: 0 [state-dump] - spilled objects pending delete: 0 [state-dump] [state-dump] ObjectManager: [state-dump] - num local objects: 0 [state-dump] - num unfulfilled push requests: 0 [state-dump] - num object pull requests: 0 [state-dump] - num chunks received total: 0 [state-dump] - num chunks received failed (all): 0 [state-dump] - num chunks received failed / cancelled: 0 [state-dump] - num chunks received failed / plasma error: 0 [state-dump] Event stats: [state-dump] Global stats: 0 total (0 active) [state-dump] Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Execution time: mean = -nan s, total = 0.000 s [state-dump] Event stats: [state-dump] PushManager: [state-dump] - num pushes in flight: 0 [state-dump] - num chunks in flight: 0 [state-dump] - num chunks remaining: 0 [state-dump] - max chunks allowed: 409 [state-dump] OwnershipBasedObjectDirectory: [state-dump] - num listeners: 0 [state-dump] - cumulative location updates: 0 [state-dump] - num location updates per second: 0.000 [state-dump] - num location lookups per second: 0.000 [state-dump] - num locations added per second: 0.000 [state-dump] - num locations removed per second: 0.000 [state-dump] BufferPool: [state-dump] - create buffer state map size: 0 [state-dump] PullManager: [state-dump] - num bytes available for pulled objects: 2147483648 [state-dump] - num bytes being pulled (all): 0 [state-dump] - num bytes being pulled / pinned: 0 [state-dump] - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - first get request bundle: N/A [state-dump] - first wait request bundle: N/A [state-dump] - first task request bundle: N/A [state-dump] - num objects queued: 0 [state-dump] - num objects actively pulled (all): 0 [state-dump] - num objects actively pulled / pinned: 0 [state-dump] - num bundles being pulled: 0 [state-dump] - num pull retries: 0 [state-dump] - max timeout seconds: 0 [state-dump] - max timeout request is already processed. No entry. [state-dump] [state-dump] WorkerPool: [state-dump] - registered jobs: 1 [state-dump] - process_failed_job_config_missing: 0 [state-dump] - process_failed_rate_limited: 0 [state-dump] - process_failed_pending_registration: 0 [state-dump] - process_failed_runtime_env_setup_failed: 0 [state-dump] - num PYTHON workers: 20 [state-dump] - num PYTHON drivers: 1 [state-dump] - num PYTHON pending start requests: 0 [state-dump] - num PYTHON pending registration requests: 0 [state-dump] - num object spill callbacks queued: 0 [state-dump] - num object restore queued: 0 [state-dump] - num util functions queued: 0 [state-dump] - num idle workers: 20 [state-dump] TaskDependencyManager: [state-dump] - task deps map size: 0 [state-dump] - get req map size: 0 [state-dump] - wait req map size: 0 [state-dump] - local objects map size: 0 [state-dump] WaitManager: [state-dump] - num active wait requests: 0 [state-dump] Subscriber: [state-dump] Channel WORKER_OBJECT_LOCATIONS_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_OBJECT_EVICTION [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_REF_REMOVED_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] num async plasma notifications: 0 [state-dump] Remote node managers: [state-dump] Event stats: [state-dump] Global stats: 371836 total (35 active) [state-dump] Queueing time: mean = 246.555 us, max = 59.826 s, min = -0.001 s, total = 91.678 s [state-dump] Execution time: mean = 11.469 ms, total = 4264.440 s [state-dump] Event stats: [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog - 89418 total (0 active), Execution time: mean = 529.241 us, total = 47.324 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 89418 total (0 active), Execution time: mean = 35.192 us, total = 3.147 s, Queueing time: mean = 110.176 us, max = 3.003 ms, min = 1.846 us, total = 9.852 s [state-dump] ObjectManager.UpdateAvailableMemory - 42560 total (0 active), Execution time: mean = 6.095 us, total = 259.393 ms, Queueing time: mean = 109.871 us, max = 1.662 ms, min = 2.228 us, total = 4.676 s [state-dump] NodeManager.CheckGC - 42560 total (1 active), Execution time: mean = 2.878 us, total = 122.481 ms, Queueing time: mean = 97.746 us, max = 25.875 ms, min = -0.000 s, total = 4.160 s [state-dump] RaySyncer.OnDemandBroadcasting - 42560 total (1 active), Execution time: mean = 10.562 us, total = 449.503 ms, Queueing time: mean = 90.958 us, max = 25.869 ms, min = 6.166 us, total = 3.871 s [state-dump] RayletWorkerPool.deadline_timer.kill_idle_workers - 21292 total (1 active), Execution time: mean = 18.420 us, total = 392.199 ms, Queueing time: mean = 76.374 us, max = 26.386 ms, min = -0.001 s, total = 1.626 s [state-dump] MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 17005 total (1 active), Execution time: mean = 454.862 us, total = 7.735 s, Queueing time: mean = 74.813 us, max = 3.532 ms, min = -0.000 s, total = 1.272 s [state-dump] NodeManager.deadline_timer.spill_objects_when_over_threshold - 4260 total (1 active), Execution time: mean = 3.108 us, total = 13.239 ms, Queueing time: mean = 182.299 us, max = 2.946 ms, min = 3.845 us, total = 776.595 ms [state-dump] NodeManager.deadline_timer.flush_free_objects - 4260 total (1 active), Execution time: mean = 9.130 us, total = 38.894 ms, Queueing time: mean = 178.173 us, max = 2.947 ms, min = 3.811 us, total = 759.018 ms [state-dump] NodeManager.ScheduleAndDispatchTasks - 4260 total (1 active), Execution time: mean = 16.766 us, total = 71.424 ms, Queueing time: mean = 74.070 us, max = 2.581 ms, min = 8.735 us, total = 315.538 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 4258 total (0 active), Execution time: mean = 96.924 us, total = 412.702 ms, Queueing time: mean = 112.184 us, max = 2.934 ms, min = 4.027 us, total = 477.680 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad - 4258 total (0 active), Execution time: mean = 611.027 us, total = 2.602 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ClusterResourceManager.ResetRemoteNodeView - 1421 total (1 active), Execution time: mean = 10.048 us, total = 14.279 ms, Queueing time: mean = 75.918 us, max = 442.307 us, min = 14.043 us, total = 107.879 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 852 total (0 active), Execution time: mean = 1.592 ms, total = 1.356 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.deadline_timer.record_metrics - 852 total (1 active), Execution time: mean = 582.619 us, total = 496.392 ms, Queueing time: mean = 343.032 us, max = 2.010 ms, min = 9.115 us, total = 292.264 ms [state-dump] NodeManager.GcsCheckAlive - 852 total (1 active), Execution time: mean = 309.175 us, total = 263.417 ms, Queueing time: mean = 616.092 us, max = 2.567 ms, min = 6.690 us, total = 524.910 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 852 total (0 active), Execution time: mean = 55.041 us, total = 46.895 ms, Queueing time: mean = 112.916 us, max = 4.779 ms, min = 11.561 us, total = 96.205 ms [state-dump] NodeManager.deadline_timer.debug_state_dump - 426 total (1 active), Execution time: mean = 1.778 ms, total = 757.295 ms, Queueing time: mean = 70.245 us, max = 176.385 us, min = 11.609 us, total = 29.924 ms [state-dump] ClientConnection.async_read.ProcessMessageHeader - 95 total (21 active), Execution time: mean = 7.620 us, total = 723.861 us, Queueing time: mean = 660.815 ms, max = 59.826 s, min = 29.537 us, total = 62.777 s [state-dump] ClientConnection.async_read.ProcessMessage - 74 total (0 active), Execution time: mean = 752.373 us, total = 55.676 ms, Queueing time: mean = 18.898 us, max = 247.615 us, min = 4.453 us, total = 1.398 ms [state-dump] NodeManager.deadline_timer.print_event_loop_stats - 71 total (1 active, 1 running), Execution time: mean = 2.702 ms, total = 191.814 ms, Queueing time: mean = 78.629 us, max = 325.100 us, min = 15.835 us, total = 5.583 ms [state-dump] ClientConnection.async_write.DoAsyncWrites - 22 total (0 active), Execution time: mean = 2.129 us, total = 46.834 us, Queueing time: mean = 43.475 us, max = 264.443 us, min = 17.727 us, total = 956.459 us [state-dump] ObjectManager.ObjectDeleted - 21 total (0 active), Execution time: mean = 18.654 us, total = 391.728 us, Queueing time: mean = 66.932 us, max = 269.040 us, min = 24.280 us, total = 1.406 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 21 total (0 active), Execution time: mean = 119.783 us, total = 2.515 ms, Queueing time: mean = 119.709 us, max = 193.975 us, min = 26.529 us, total = 2.514 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig - 21 total (0 active), Execution time: mean = 800.451 us, total = 16.809 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ObjectManager.ObjectAdded - 21 total (0 active), Execution time: mean = 13.522 us, total = 283.961 us, Queueing time: mean = 115.200 us, max = 203.067 us, min = 15.853 us, total = 2.419 ms [state-dump] PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 204.482 us, total = 2.658 ms, Queueing time: mean = 3.109 ms, max = 9.915 ms, min = 28.359 us, total = 40.413 ms [state-dump] - 12 total (0 active), Execution time: mean = 1.109 us, total = 13.308 us, Queueing time: mean = 117.235 us, max = 207.332 us, min = 32.817 us, total = 1.407 ms [state-dump] RaySyncer.BroadcastMessage - 12 total (0 active), Execution time: mean = 229.790 us, total = 2.757 ms, Queueing time: mean = 796.500 ns, max = 1.330 us, min = 180.000 ns, total = 9.558 us [state-dump] NodeManagerService.grpc_server.RequestWorkerLease.HandleRequestImpl - 10 total (0 active), Execution time: mean = 133.400 us, total = 1.334 ms, Queueing time: mean = 133.654 us, max = 258.016 us, min = 50.206 us, total = 1.337 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker - 10 total (0 active), Execution time: mean = 698.937 us, total = 6.989 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.RequestWorkerLease - 10 total (0 active), Execution time: mean = 967.372 us, total = 9.674 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] WorkerPool.PopWorkerCallback - 10 total (0 active), Execution time: mean = 24.488 us, total = 244.878 us, Queueing time: mean = 114.997 us, max = 206.869 us, min = 23.653 us, total = 1.150 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker.HandleRequestImpl - 10 total (0 active), Execution time: mean = 129.906 us, total = 1.299 ms, Queueing time: mean = 109.463 us, max = 137.712 us, min = 31.332 us, total = 1.095 ms [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 9 total (1 active), Execution time: mean = 466.401 s, total = 4197.607 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 8 total (0 active), Execution time: mean = 381.115 us, total = 3.049 ms, Queueing time: mean = 142.114 us, max = 243.371 us, min = 20.299 us, total = 1.137 ms [state-dump] NodeManager.GCTaskFailureReason - 5 total (1 active), Execution time: mean = 6.901 us, total = 34.506 us, Queueing time: mean = 58.173 us, max = 97.290 us, min = 24.344 us, total = 290.867 us [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 1.320 ms, total = 2.640 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 142.119 us, total = 284.237 us, Queueing time: mean = 702.200 us, max = 1.287 ms, min = 117.240 us, total = 1.404 ms [state-dump] RaySyncerRegister - 2 total (0 active), Execution time: mean = 2.053 us, total = 4.106 us, Queueing time: mean = 222.000 ns, max = 369.000 ns, min = 75.000 ns, total = 444.000 ns [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 2.178 ms, total = 2.178 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 1.022 s, total = 1.022 s, Queueing time: mean = 126.779 us, max = 126.779 us, min = 126.779 us, total = 126.779 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 43.810 us, total = 43.810 us, Queueing time: mean = 116.080 us, max = 116.080 us, min = 116.080 us, total = 116.080 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 1.935 ms, total = 1.935 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 2.002 ms, total = 2.002 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 1.278 ms, total = 1.278 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 288.995 us, total = 288.995 us, Queueing time: mean = 122.868 us, max = 122.868 us, min = 122.868 us, total = 122.868 us [state-dump] Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 74.750 us, total = 74.750 us, Queueing time: mean = 225.429 us, max = 225.429 us, min = 225.429 us, total = 225.429 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 242.297 us, total = 242.297 us, Queueing time: mean = 122.977 us, max = 122.977 us, min = 122.977 us, total = 122.977 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 92.080 us, total = 92.080 us, Queueing time: mean = 36.492 us, max = 36.492 us, min = 36.492 us, total = 36.492 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 1.746 ms, total = 1.746 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] DebugString() time ms: 1 [state-dump] [state-dump] [2025-01-21 00:13:54,925 I 13636 13664] (raylet) store.cc:564: Plasma store debug dump: Current usage: 0 / 2.14748 GB - num bytes created total: 168 0 pending objects of total size 0MB - objects spillable: 0 - bytes spillable: 0 - objects unsealed: 0 - bytes unsealed: 0 - objects in use: 0 - bytes in use: 0 - objects evictable: 0 - bytes evictable: 0 - objects created by worker: 0 - bytes created by worker: 0 - objects restored: 0 - bytes restored: 0 - objects received: 0 - bytes received: 0 - objects errored: 0 - bytes errored: 0 [2025-01-21 00:13:56,128 I 13636 13636] (raylet) node_manager.cc:525: [state-dump] NodeManager: [state-dump] Node ID: 94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3 [state-dump] Node name: 192.168.0.2 [state-dump] InitialConfigResources: {object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, GPU: 20000, node:__internal_head__: 10000, accelerator_type:A40: 10000, memory: 844966629380000} [state-dump] ClusterTaskManager: [state-dump] ========== Node: 94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3 ================= [state-dump] Infeasible queue length: 0 [state-dump] Schedule queue length: 0 [state-dump] Dispatch queue length: 0 [state-dump] num_waiting_for_resource: 0 [state-dump] num_waiting_for_plasma_memory: 0 [state-dump] num_waiting_for_remote_node_resources: 0 [state-dump] num_worker_not_started_by_job_config_not_exist: 0 [state-dump] num_worker_not_started_by_registration_timeout: 0 [state-dump] num_tasks_waiting_for_workers: 0 [state-dump] num_cancelled_tasks: 0 [state-dump] cluster_resource_scheduler state: [state-dump] Local id: -2729954820020547849 Local resources: {"total":{CPU: [200000], memory: [844966629380000], accelerator_type:A40: [10000], node:192.168.0.2: [10000], node:__internal_head__: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000]}}, "available": {CPU: [200000], memory: [844966629380000], accelerator_type:A40: [10000], node:192.168.0.2: [10000], node:__internal_head__: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000]}}, "labels":{"ray.io/node_id":"94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3",} is_draining: 0 is_idle: 1 Cluster resources: node id: -2729954820020547849{"total":{object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, GPU: 20000, node:__internal_head__: 10000, accelerator_type:A40: 10000, memory: 844966629380000}}, "available": {GPU: 20000, node:__internal_head__: 10000, memory: 844966629380000, accelerator_type:A40: 10000, object_store_memory: 21474836480000, node:192.168.0.2: 10000, CPU: 200000}}, "labels":{"ray.io/node_id":"94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} [state-dump] Waiting tasks size: 0 [state-dump] Number of executing tasks: 0 [state-dump] Number of pinned task arguments: 0 [state-dump] Number of total spilled tasks: 0 [state-dump] Number of spilled waiting tasks: 0 [state-dump] Number of spilled unschedulable tasks: 0 [state-dump] Resource usage { [state-dump] } [state-dump] Backlog Size per scheduling descriptor :{workerId: num backlogs}: [state-dump] [state-dump] Running tasks by scheduling class: [state-dump] ================================================== [state-dump] [state-dump] ClusterResources: [state-dump] LocalObjectManager: [state-dump] - num pinned objects: 0 [state-dump] - pinned objects size: 0 [state-dump] - num objects pending restore: 0 [state-dump] - num objects pending spill: 0 [state-dump] - num bytes pending spill: 0 [state-dump] - num bytes currently spilled: 0 [state-dump] - cumulative spill requests: 0 [state-dump] - cumulative restore requests: 0 [state-dump] - spilled objects pending delete: 0 [state-dump] [state-dump] ObjectManager: [state-dump] - num local objects: 0 [state-dump] - num unfulfilled push requests: 0 [state-dump] - num object pull requests: 0 [state-dump] - num chunks received total: 0 [state-dump] - num chunks received failed (all): 0 [state-dump] - num chunks received failed / cancelled: 0 [state-dump] - num chunks received failed / plasma error: 0 [state-dump] Event stats: [state-dump] Global stats: 0 total (0 active) [state-dump] Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Execution time: mean = -nan s, total = 0.000 s [state-dump] Event stats: [state-dump] PushManager: [state-dump] - num pushes in flight: 0 [state-dump] - num chunks in flight: 0 [state-dump] - num chunks remaining: 0 [state-dump] - max chunks allowed: 409 [state-dump] OwnershipBasedObjectDirectory: [state-dump] - num listeners: 0 [state-dump] - cumulative location updates: 0 [state-dump] - num location updates per second: 0.000 [state-dump] - num location lookups per second: 0.000 [state-dump] - num locations added per second: 0.000 [state-dump] - num locations removed per second: 0.000 [state-dump] BufferPool: [state-dump] - create buffer state map size: 0 [state-dump] PullManager: [state-dump] - num bytes available for pulled objects: 2147483648 [state-dump] - num bytes being pulled (all): 0 [state-dump] - num bytes being pulled / pinned: 0 [state-dump] - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - first get request bundle: N/A [state-dump] - first wait request bundle: N/A [state-dump] - first task request bundle: N/A [state-dump] - num objects queued: 0 [state-dump] - num objects actively pulled (all): 0 [state-dump] - num objects actively pulled / pinned: 0 [state-dump] - num bundles being pulled: 0 [state-dump] - num pull retries: 0 [state-dump] - max timeout seconds: 0 [state-dump] - max timeout request is already processed. No entry. [state-dump] [state-dump] WorkerPool: [state-dump] - registered jobs: 1 [state-dump] - process_failed_job_config_missing: 0 [state-dump] - process_failed_rate_limited: 0 [state-dump] - process_failed_pending_registration: 0 [state-dump] - process_failed_runtime_env_setup_failed: 0 [state-dump] - num PYTHON workers: 20 [state-dump] - num PYTHON drivers: 1 [state-dump] - num PYTHON pending start requests: 0 [state-dump] - num PYTHON pending registration requests: 0 [state-dump] - num object spill callbacks queued: 0 [state-dump] - num object restore queued: 0 [state-dump] - num util functions queued: 0 [state-dump] - num idle workers: 20 [state-dump] TaskDependencyManager: [state-dump] - task deps map size: 0 [state-dump] - get req map size: 0 [state-dump] - wait req map size: 0 [state-dump] - local objects map size: 0 [state-dump] WaitManager: [state-dump] - num active wait requests: 0 [state-dump] Subscriber: [state-dump] Channel WORKER_OBJECT_LOCATIONS_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_OBJECT_EVICTION [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_REF_REMOVED_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] num async plasma notifications: 0 [state-dump] Remote node managers: [state-dump] Event stats: [state-dump] Global stats: 377068 total (35 active) [state-dump] Queueing time: mean = 244.294 us, max = 59.826 s, min = -0.001 s, total = 92.115 s [state-dump] Execution time: mean = 11.312 ms, total = 4265.408 s [state-dump] Event stats: [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog - 90678 total (0 active), Execution time: mean = 529.531 us, total = 48.017 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 90678 total (0 active), Execution time: mean = 35.217 us, total = 3.193 s, Queueing time: mean = 110.261 us, max = 3.003 ms, min = 1.846 us, total = 9.998 s [state-dump] ObjectManager.UpdateAvailableMemory - 43159 total (0 active), Execution time: mean = 6.109 us, total = 263.658 ms, Queueing time: mean = 109.960 us, max = 1.662 ms, min = 2.228 us, total = 4.746 s [state-dump] NodeManager.CheckGC - 43159 total (1 active), Execution time: mean = 2.883 us, total = 124.409 ms, Queueing time: mean = 97.876 us, max = 25.875 ms, min = -0.000 s, total = 4.224 s [state-dump] RaySyncer.OnDemandBroadcasting - 43159 total (1 active), Execution time: mean = 10.590 us, total = 457.051 ms, Queueing time: mean = 91.065 us, max = 25.869 ms, min = 6.166 us, total = 3.930 s [state-dump] RayletWorkerPool.deadline_timer.kill_idle_workers - 21592 total (1 active), Execution time: mean = 18.466 us, total = 398.708 ms, Queueing time: mean = 76.455 us, max = 26.386 ms, min = -0.001 s, total = 1.651 s [state-dump] MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 17245 total (1 active), Execution time: mean = 455.080 us, total = 7.848 s, Queueing time: mean = 75.039 us, max = 3.532 ms, min = -0.000 s, total = 1.294 s [state-dump] NodeManager.deadline_timer.spill_objects_when_over_threshold - 4320 total (1 active), Execution time: mean = 3.111 us, total = 13.440 ms, Queueing time: mean = 182.552 us, max = 2.946 ms, min = 3.845 us, total = 788.626 ms [state-dump] NodeManager.deadline_timer.flush_free_objects - 4320 total (1 active), Execution time: mean = 9.148 us, total = 39.519 ms, Queueing time: mean = 178.417 us, max = 2.947 ms, min = 3.811 us, total = 770.763 ms [state-dump] NodeManager.ScheduleAndDispatchTasks - 4320 total (1 active), Execution time: mean = 16.783 us, total = 72.502 ms, Queueing time: mean = 73.979 us, max = 2.581 ms, min = 8.735 us, total = 319.590 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 4318 total (0 active), Execution time: mean = 96.966 us, total = 418.698 ms, Queueing time: mean = 112.346 us, max = 2.934 ms, min = 4.027 us, total = 485.110 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad - 4318 total (0 active), Execution time: mean = 611.465 us, total = 2.640 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ClusterResourceManager.ResetRemoteNodeView - 1441 total (1 active), Execution time: mean = 10.055 us, total = 14.489 ms, Queueing time: mean = 76.059 us, max = 442.307 us, min = 14.043 us, total = 109.601 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 864 total (0 active), Execution time: mean = 1.594 ms, total = 1.377 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.deadline_timer.record_metrics - 864 total (1 active), Execution time: mean = 583.142 us, total = 503.835 ms, Queueing time: mean = 343.767 us, max = 2.010 ms, min = 9.115 us, total = 297.015 ms [state-dump] NodeManager.GcsCheckAlive - 864 total (1 active), Execution time: mean = 310.166 us, total = 267.983 ms, Queueing time: mean = 616.332 us, max = 2.567 ms, min = 6.690 us, total = 532.511 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 864 total (0 active), Execution time: mean = 55.087 us, total = 47.595 ms, Queueing time: mean = 113.046 us, max = 4.779 ms, min = 11.561 us, total = 97.671 ms [state-dump] NodeManager.deadline_timer.debug_state_dump - 432 total (1 active), Execution time: mean = 1.780 ms, total = 769.128 ms, Queueing time: mean = 70.234 us, max = 176.385 us, min = 11.609 us, total = 30.341 ms [state-dump] ClientConnection.async_read.ProcessMessageHeader - 95 total (21 active), Execution time: mean = 7.620 us, total = 723.861 us, Queueing time: mean = 660.815 ms, max = 59.826 s, min = 29.537 us, total = 62.777 s [state-dump] ClientConnection.async_read.ProcessMessage - 74 total (0 active), Execution time: mean = 752.373 us, total = 55.676 ms, Queueing time: mean = 18.898 us, max = 247.615 us, min = 4.453 us, total = 1.398 ms [state-dump] NodeManager.deadline_timer.print_event_loop_stats - 72 total (1 active, 1 running), Execution time: mean = 2.709 ms, total = 195.063 ms, Queueing time: mean = 78.549 us, max = 325.100 us, min = 15.835 us, total = 5.655 ms [state-dump] ClientConnection.async_write.DoAsyncWrites - 22 total (0 active), Execution time: mean = 2.129 us, total = 46.834 us, Queueing time: mean = 43.475 us, max = 264.443 us, min = 17.727 us, total = 956.459 us [state-dump] ObjectManager.ObjectDeleted - 21 total (0 active), Execution time: mean = 18.654 us, total = 391.728 us, Queueing time: mean = 66.932 us, max = 269.040 us, min = 24.280 us, total = 1.406 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 21 total (0 active), Execution time: mean = 119.783 us, total = 2.515 ms, Queueing time: mean = 119.709 us, max = 193.975 us, min = 26.529 us, total = 2.514 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig - 21 total (0 active), Execution time: mean = 800.451 us, total = 16.809 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ObjectManager.ObjectAdded - 21 total (0 active), Execution time: mean = 13.522 us, total = 283.961 us, Queueing time: mean = 115.200 us, max = 203.067 us, min = 15.853 us, total = 2.419 ms [state-dump] PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 204.482 us, total = 2.658 ms, Queueing time: mean = 3.109 ms, max = 9.915 ms, min = 28.359 us, total = 40.413 ms [state-dump] - 12 total (0 active), Execution time: mean = 1.109 us, total = 13.308 us, Queueing time: mean = 117.235 us, max = 207.332 us, min = 32.817 us, total = 1.407 ms [state-dump] RaySyncer.BroadcastMessage - 12 total (0 active), Execution time: mean = 229.790 us, total = 2.757 ms, Queueing time: mean = 796.500 ns, max = 1.330 us, min = 180.000 ns, total = 9.558 us [state-dump] NodeManagerService.grpc_server.RequestWorkerLease.HandleRequestImpl - 10 total (0 active), Execution time: mean = 133.400 us, total = 1.334 ms, Queueing time: mean = 133.654 us, max = 258.016 us, min = 50.206 us, total = 1.337 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker - 10 total (0 active), Execution time: mean = 698.937 us, total = 6.989 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.RequestWorkerLease - 10 total (0 active), Execution time: mean = 967.372 us, total = 9.674 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] WorkerPool.PopWorkerCallback - 10 total (0 active), Execution time: mean = 24.488 us, total = 244.878 us, Queueing time: mean = 114.997 us, max = 206.869 us, min = 23.653 us, total = 1.150 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker.HandleRequestImpl - 10 total (0 active), Execution time: mean = 129.906 us, total = 1.299 ms, Queueing time: mean = 109.463 us, max = 137.712 us, min = 31.332 us, total = 1.095 ms [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 9 total (1 active), Execution time: mean = 466.401 s, total = 4197.607 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 8 total (0 active), Execution time: mean = 381.115 us, total = 3.049 ms, Queueing time: mean = 142.114 us, max = 243.371 us, min = 20.299 us, total = 1.137 ms [state-dump] NodeManager.GCTaskFailureReason - 5 total (1 active), Execution time: mean = 6.901 us, total = 34.506 us, Queueing time: mean = 58.173 us, max = 97.290 us, min = 24.344 us, total = 290.867 us [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 1.320 ms, total = 2.640 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 142.119 us, total = 284.237 us, Queueing time: mean = 702.200 us, max = 1.287 ms, min = 117.240 us, total = 1.404 ms [state-dump] RaySyncerRegister - 2 total (0 active), Execution time: mean = 2.053 us, total = 4.106 us, Queueing time: mean = 222.000 ns, max = 369.000 ns, min = 75.000 ns, total = 444.000 ns [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 2.178 ms, total = 2.178 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 1.022 s, total = 1.022 s, Queueing time: mean = 126.779 us, max = 126.779 us, min = 126.779 us, total = 126.779 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 43.810 us, total = 43.810 us, Queueing time: mean = 116.080 us, max = 116.080 us, min = 116.080 us, total = 116.080 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 1.935 ms, total = 1.935 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 2.002 ms, total = 2.002 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 1.278 ms, total = 1.278 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 288.995 us, total = 288.995 us, Queueing time: mean = 122.868 us, max = 122.868 us, min = 122.868 us, total = 122.868 us [state-dump] Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 74.750 us, total = 74.750 us, Queueing time: mean = 225.429 us, max = 225.429 us, min = 225.429 us, total = 225.429 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 242.297 us, total = 242.297 us, Queueing time: mean = 122.977 us, max = 122.977 us, min = 122.977 us, total = 122.977 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 92.080 us, total = 92.080 us, Queueing time: mean = 36.492 us, max = 36.492 us, min = 36.492 us, total = 36.492 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 1.746 ms, total = 1.746 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] DebugString() time ms: 1 [state-dump] [state-dump] [2025-01-21 00:14:54,925 I 13636 13664] (raylet) store.cc:564: Plasma store debug dump: Current usage: 0 / 2.14748 GB - num bytes created total: 168 0 pending objects of total size 0MB - objects spillable: 0 - bytes spillable: 0 - objects unsealed: 0 - bytes unsealed: 0 - objects in use: 0 - bytes in use: 0 - objects evictable: 0 - bytes evictable: 0 - objects created by worker: 0 - bytes created by worker: 0 - objects restored: 0 - bytes restored: 0 - objects received: 0 - bytes received: 0 - objects errored: 0 - bytes errored: 0 [2025-01-21 00:14:56,131 I 13636 13636] (raylet) node_manager.cc:525: [state-dump] NodeManager: [state-dump] Node ID: 94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3 [state-dump] Node name: 192.168.0.2 [state-dump] InitialConfigResources: {object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, GPU: 20000, node:__internal_head__: 10000, accelerator_type:A40: 10000, memory: 844966629380000} [state-dump] ClusterTaskManager: [state-dump] ========== Node: 94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3 ================= [state-dump] Infeasible queue length: 0 [state-dump] Schedule queue length: 0 [state-dump] Dispatch queue length: 0 [state-dump] num_waiting_for_resource: 0 [state-dump] num_waiting_for_plasma_memory: 0 [state-dump] num_waiting_for_remote_node_resources: 0 [state-dump] num_worker_not_started_by_job_config_not_exist: 0 [state-dump] num_worker_not_started_by_registration_timeout: 0 [state-dump] num_tasks_waiting_for_workers: 0 [state-dump] num_cancelled_tasks: 0 [state-dump] cluster_resource_scheduler state: [state-dump] Local id: -2729954820020547849 Local resources: {"total":{CPU: [200000], memory: [844966629380000], accelerator_type:A40: [10000], node:192.168.0.2: [10000], node:__internal_head__: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000]}}, "available": {CPU: [200000], memory: [844966629380000], accelerator_type:A40: [10000], node:192.168.0.2: [10000], node:__internal_head__: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000]}}, "labels":{"ray.io/node_id":"94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3",} is_draining: 0 is_idle: 1 Cluster resources: node id: -2729954820020547849{"total":{object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, GPU: 20000, node:__internal_head__: 10000, accelerator_type:A40: 10000, memory: 844966629380000}}, "available": {GPU: 20000, node:__internal_head__: 10000, memory: 844966629380000, accelerator_type:A40: 10000, object_store_memory: 21474836480000, node:192.168.0.2: 10000, CPU: 200000}}, "labels":{"ray.io/node_id":"94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} [state-dump] Waiting tasks size: 0 [state-dump] Number of executing tasks: 0 [state-dump] Number of pinned task arguments: 0 [state-dump] Number of total spilled tasks: 0 [state-dump] Number of spilled waiting tasks: 0 [state-dump] Number of spilled unschedulable tasks: 0 [state-dump] Resource usage { [state-dump] } [state-dump] Backlog Size per scheduling descriptor :{workerId: num backlogs}: [state-dump] [state-dump] Running tasks by scheduling class: [state-dump] ================================================== [state-dump] [state-dump] ClusterResources: [state-dump] LocalObjectManager: [state-dump] - num pinned objects: 0 [state-dump] - pinned objects size: 0 [state-dump] - num objects pending restore: 0 [state-dump] - num objects pending spill: 0 [state-dump] - num bytes pending spill: 0 [state-dump] - num bytes currently spilled: 0 [state-dump] - cumulative spill requests: 0 [state-dump] - cumulative restore requests: 0 [state-dump] - spilled objects pending delete: 0 [state-dump] [state-dump] ObjectManager: [state-dump] - num local objects: 0 [state-dump] - num unfulfilled push requests: 0 [state-dump] - num object pull requests: 0 [state-dump] - num chunks received total: 0 [state-dump] - num chunks received failed (all): 0 [state-dump] - num chunks received failed / cancelled: 0 [state-dump] - num chunks received failed / plasma error: 0 [state-dump] Event stats: [state-dump] Global stats: 0 total (0 active) [state-dump] Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Execution time: mean = -nan s, total = 0.000 s [state-dump] Event stats: [state-dump] PushManager: [state-dump] - num pushes in flight: 0 [state-dump] - num chunks in flight: 0 [state-dump] - num chunks remaining: 0 [state-dump] - max chunks allowed: 409 [state-dump] OwnershipBasedObjectDirectory: [state-dump] - num listeners: 0 [state-dump] - cumulative location updates: 0 [state-dump] - num location updates per second: 0.000 [state-dump] - num location lookups per second: 0.000 [state-dump] - num locations added per second: 0.000 [state-dump] - num locations removed per second: 0.000 [state-dump] BufferPool: [state-dump] - create buffer state map size: 0 [state-dump] PullManager: [state-dump] - num bytes available for pulled objects: 2147483648 [state-dump] - num bytes being pulled (all): 0 [state-dump] - num bytes being pulled / pinned: 0 [state-dump] - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - first get request bundle: N/A [state-dump] - first wait request bundle: N/A [state-dump] - first task request bundle: N/A [state-dump] - num objects queued: 0 [state-dump] - num objects actively pulled (all): 0 [state-dump] - num objects actively pulled / pinned: 0 [state-dump] - num bundles being pulled: 0 [state-dump] - num pull retries: 0 [state-dump] - max timeout seconds: 0 [state-dump] - max timeout request is already processed. No entry. [state-dump] [state-dump] WorkerPool: [state-dump] - registered jobs: 1 [state-dump] - process_failed_job_config_missing: 0 [state-dump] - process_failed_rate_limited: 0 [state-dump] - process_failed_pending_registration: 0 [state-dump] - process_failed_runtime_env_setup_failed: 0 [state-dump] - num PYTHON workers: 20 [state-dump] - num PYTHON drivers: 1 [state-dump] - num PYTHON pending start requests: 0 [state-dump] - num PYTHON pending registration requests: 0 [state-dump] - num object spill callbacks queued: 0 [state-dump] - num object restore queued: 0 [state-dump] - num util functions queued: 0 [state-dump] - num idle workers: 20 [state-dump] TaskDependencyManager: [state-dump] - task deps map size: 0 [state-dump] - get req map size: 0 [state-dump] - wait req map size: 0 [state-dump] - local objects map size: 0 [state-dump] WaitManager: [state-dump] - num active wait requests: 0 [state-dump] Subscriber: [state-dump] Channel WORKER_OBJECT_LOCATIONS_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_OBJECT_EVICTION [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_REF_REMOVED_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] num async plasma notifications: 0 [state-dump] Remote node managers: [state-dump] Event stats: [state-dump] Global stats: 382301 total (35 active) [state-dump] Queueing time: mean = 242.098 us, max = 59.826 s, min = -0.001 s, total = 92.554 s [state-dump] Execution time: mean = 11.160 ms, total = 4266.374 s [state-dump] Event stats: [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog - 91938 total (0 active), Execution time: mean = 529.843 us, total = 48.713 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 91938 total (0 active), Execution time: mean = 35.236 us, total = 3.240 s, Queueing time: mean = 110.366 us, max = 3.003 ms, min = 1.846 us, total = 10.147 s [state-dump] ObjectManager.UpdateAvailableMemory - 43759 total (0 active), Execution time: mean = 6.124 us, total = 267.977 ms, Queueing time: mean = 110.090 us, max = 1.662 ms, min = 2.228 us, total = 4.817 s [state-dump] NodeManager.CheckGC - 43759 total (1 active), Execution time: mean = 2.886 us, total = 126.283 ms, Queueing time: mean = 98.072 us, max = 25.875 ms, min = -0.000 s, total = 4.292 s [state-dump] RaySyncer.OnDemandBroadcasting - 43759 total (1 active), Execution time: mean = 10.616 us, total = 464.540 ms, Queueing time: mean = 91.241 us, max = 25.869 ms, min = 6.166 us, total = 3.993 s [state-dump] RayletWorkerPool.deadline_timer.kill_idle_workers - 21891 total (1 active), Execution time: mean = 18.504 us, total = 405.081 ms, Queueing time: mean = 76.510 us, max = 26.386 ms, min = -0.001 s, total = 1.675 s [state-dump] MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 17484 total (1 active), Execution time: mean = 455.320 us, total = 7.961 s, Queueing time: mean = 75.076 us, max = 3.532 ms, min = -0.000 s, total = 1.313 s [state-dump] NodeManager.deadline_timer.spill_objects_when_over_threshold - 4380 total (1 active), Execution time: mean = 3.116 us, total = 13.648 ms, Queueing time: mean = 182.467 us, max = 2.946 ms, min = 3.845 us, total = 799.203 ms [state-dump] NodeManager.deadline_timer.flush_free_objects - 4380 total (1 active), Execution time: mean = 9.165 us, total = 40.144 ms, Queueing time: mean = 178.325 us, max = 2.947 ms, min = 3.811 us, total = 781.063 ms [state-dump] NodeManager.ScheduleAndDispatchTasks - 4380 total (1 active), Execution time: mean = 16.805 us, total = 73.604 ms, Queueing time: mean = 74.029 us, max = 2.581 ms, min = 8.735 us, total = 324.246 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 4378 total (0 active), Execution time: mean = 97.050 us, total = 424.883 ms, Queueing time: mean = 112.402 us, max = 2.934 ms, min = 4.027 us, total = 492.096 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad - 4378 total (0 active), Execution time: mean = 611.655 us, total = 2.678 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ClusterResourceManager.ResetRemoteNodeView - 1461 total (1 active), Execution time: mean = 10.077 us, total = 14.723 ms, Queueing time: mean = 76.097 us, max = 442.307 us, min = 14.043 us, total = 111.177 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 876 total (0 active), Execution time: mean = 1.594 ms, total = 1.397 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.deadline_timer.record_metrics - 876 total (1 active), Execution time: mean = 583.185 us, total = 510.870 ms, Queueing time: mean = 343.317 us, max = 2.010 ms, min = 9.115 us, total = 300.746 ms [state-dump] NodeManager.GcsCheckAlive - 876 total (1 active), Execution time: mean = 310.629 us, total = 272.111 ms, Queueing time: mean = 615.392 us, max = 2.567 ms, min = 6.690 us, total = 539.083 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 876 total (0 active), Execution time: mean = 55.105 us, total = 48.272 ms, Queueing time: mean = 113.172 us, max = 4.779 ms, min = 11.561 us, total = 99.138 ms [state-dump] NodeManager.deadline_timer.debug_state_dump - 438 total (1 active), Execution time: mean = 1.780 ms, total = 779.747 ms, Queueing time: mean = 70.235 us, max = 176.385 us, min = 11.609 us, total = 30.763 ms [state-dump] ClientConnection.async_read.ProcessMessageHeader - 95 total (21 active), Execution time: mean = 7.620 us, total = 723.861 us, Queueing time: mean = 660.815 ms, max = 59.826 s, min = 29.537 us, total = 62.777 s [state-dump] ClientConnection.async_read.ProcessMessage - 74 total (0 active), Execution time: mean = 752.373 us, total = 55.676 ms, Queueing time: mean = 18.898 us, max = 247.615 us, min = 4.453 us, total = 1.398 ms [state-dump] NodeManager.deadline_timer.print_event_loop_stats - 73 total (1 active, 1 running), Execution time: mean = 2.713 ms, total = 198.060 ms, Queueing time: mean = 78.828 us, max = 325.100 us, min = 15.835 us, total = 5.754 ms [state-dump] ClientConnection.async_write.DoAsyncWrites - 22 total (0 active), Execution time: mean = 2.129 us, total = 46.834 us, Queueing time: mean = 43.475 us, max = 264.443 us, min = 17.727 us, total = 956.459 us [state-dump] ObjectManager.ObjectDeleted - 21 total (0 active), Execution time: mean = 18.654 us, total = 391.728 us, Queueing time: mean = 66.932 us, max = 269.040 us, min = 24.280 us, total = 1.406 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 21 total (0 active), Execution time: mean = 119.783 us, total = 2.515 ms, Queueing time: mean = 119.709 us, max = 193.975 us, min = 26.529 us, total = 2.514 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig - 21 total (0 active), Execution time: mean = 800.451 us, total = 16.809 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ObjectManager.ObjectAdded - 21 total (0 active), Execution time: mean = 13.522 us, total = 283.961 us, Queueing time: mean = 115.200 us, max = 203.067 us, min = 15.853 us, total = 2.419 ms [state-dump] PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 204.482 us, total = 2.658 ms, Queueing time: mean = 3.109 ms, max = 9.915 ms, min = 28.359 us, total = 40.413 ms [state-dump] - 12 total (0 active), Execution time: mean = 1.109 us, total = 13.308 us, Queueing time: mean = 117.235 us, max = 207.332 us, min = 32.817 us, total = 1.407 ms [state-dump] RaySyncer.BroadcastMessage - 12 total (0 active), Execution time: mean = 229.790 us, total = 2.757 ms, Queueing time: mean = 796.500 ns, max = 1.330 us, min = 180.000 ns, total = 9.558 us [state-dump] NodeManagerService.grpc_server.RequestWorkerLease.HandleRequestImpl - 10 total (0 active), Execution time: mean = 133.400 us, total = 1.334 ms, Queueing time: mean = 133.654 us, max = 258.016 us, min = 50.206 us, total = 1.337 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker - 10 total (0 active), Execution time: mean = 698.937 us, total = 6.989 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.RequestWorkerLease - 10 total (0 active), Execution time: mean = 967.372 us, total = 9.674 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] WorkerPool.PopWorkerCallback - 10 total (0 active), Execution time: mean = 24.488 us, total = 244.878 us, Queueing time: mean = 114.997 us, max = 206.869 us, min = 23.653 us, total = 1.150 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker.HandleRequestImpl - 10 total (0 active), Execution time: mean = 129.906 us, total = 1.299 ms, Queueing time: mean = 109.463 us, max = 137.712 us, min = 31.332 us, total = 1.095 ms [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 9 total (1 active), Execution time: mean = 466.401 s, total = 4197.607 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 8 total (0 active), Execution time: mean = 381.115 us, total = 3.049 ms, Queueing time: mean = 142.114 us, max = 243.371 us, min = 20.299 us, total = 1.137 ms [state-dump] NodeManager.GCTaskFailureReason - 5 total (1 active), Execution time: mean = 6.901 us, total = 34.506 us, Queueing time: mean = 58.173 us, max = 97.290 us, min = 24.344 us, total = 290.867 us [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 1.320 ms, total = 2.640 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 142.119 us, total = 284.237 us, Queueing time: mean = 702.200 us, max = 1.287 ms, min = 117.240 us, total = 1.404 ms [state-dump] RaySyncerRegister - 2 total (0 active), Execution time: mean = 2.053 us, total = 4.106 us, Queueing time: mean = 222.000 ns, max = 369.000 ns, min = 75.000 ns, total = 444.000 ns [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 2.178 ms, total = 2.178 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 1.022 s, total = 1.022 s, Queueing time: mean = 126.779 us, max = 126.779 us, min = 126.779 us, total = 126.779 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 43.810 us, total = 43.810 us, Queueing time: mean = 116.080 us, max = 116.080 us, min = 116.080 us, total = 116.080 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 1.935 ms, total = 1.935 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 2.002 ms, total = 2.002 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 1.278 ms, total = 1.278 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 288.995 us, total = 288.995 us, Queueing time: mean = 122.868 us, max = 122.868 us, min = 122.868 us, total = 122.868 us [state-dump] Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 74.750 us, total = 74.750 us, Queueing time: mean = 225.429 us, max = 225.429 us, min = 225.429 us, total = 225.429 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 242.297 us, total = 242.297 us, Queueing time: mean = 122.977 us, max = 122.977 us, min = 122.977 us, total = 122.977 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 92.080 us, total = 92.080 us, Queueing time: mean = 36.492 us, max = 36.492 us, min = 36.492 us, total = 36.492 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 1.746 ms, total = 1.746 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] DebugString() time ms: 1 [state-dump] [state-dump] [2025-01-21 00:15:54,925 I 13636 13664] (raylet) store.cc:564: Plasma store debug dump: Current usage: 0 / 2.14748 GB - num bytes created total: 168 0 pending objects of total size 0MB - objects spillable: 0 - bytes spillable: 0 - objects unsealed: 0 - bytes unsealed: 0 - objects in use: 0 - bytes in use: 0 - objects evictable: 0 - bytes evictable: 0 - objects created by worker: 0 - bytes created by worker: 0 - objects restored: 0 - bytes restored: 0 - objects received: 0 - bytes received: 0 - objects errored: 0 - bytes errored: 0 [2025-01-21 00:15:56,134 I 13636 13636] (raylet) node_manager.cc:525: [state-dump] NodeManager: [state-dump] Node ID: 94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3 [state-dump] Node name: 192.168.0.2 [state-dump] InitialConfigResources: {object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, GPU: 20000, node:__internal_head__: 10000, accelerator_type:A40: 10000, memory: 844966629380000} [state-dump] ClusterTaskManager: [state-dump] ========== Node: 94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3 ================= [state-dump] Infeasible queue length: 0 [state-dump] Schedule queue length: 0 [state-dump] Dispatch queue length: 0 [state-dump] num_waiting_for_resource: 0 [state-dump] num_waiting_for_plasma_memory: 0 [state-dump] num_waiting_for_remote_node_resources: 0 [state-dump] num_worker_not_started_by_job_config_not_exist: 0 [state-dump] num_worker_not_started_by_registration_timeout: 0 [state-dump] num_tasks_waiting_for_workers: 0 [state-dump] num_cancelled_tasks: 0 [state-dump] cluster_resource_scheduler state: [state-dump] Local id: -2729954820020547849 Local resources: {"total":{CPU: [200000], memory: [844966629380000], accelerator_type:A40: [10000], node:192.168.0.2: [10000], node:__internal_head__: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000]}}, "available": {CPU: [200000], memory: [844966629380000], accelerator_type:A40: [10000], node:192.168.0.2: [10000], node:__internal_head__: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000]}}, "labels":{"ray.io/node_id":"94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3",} is_draining: 0 is_idle: 1 Cluster resources: node id: -2729954820020547849{"total":{object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, GPU: 20000, node:__internal_head__: 10000, accelerator_type:A40: 10000, memory: 844966629380000}}, "available": {GPU: 20000, node:__internal_head__: 10000, memory: 844966629380000, accelerator_type:A40: 10000, object_store_memory: 21474836480000, node:192.168.0.2: 10000, CPU: 200000}}, "labels":{"ray.io/node_id":"94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} [state-dump] Waiting tasks size: 0 [state-dump] Number of executing tasks: 0 [state-dump] Number of pinned task arguments: 0 [state-dump] Number of total spilled tasks: 0 [state-dump] Number of spilled waiting tasks: 0 [state-dump] Number of spilled unschedulable tasks: 0 [state-dump] Resource usage { [state-dump] } [state-dump] Backlog Size per scheduling descriptor :{workerId: num backlogs}: [state-dump] [state-dump] Running tasks by scheduling class: [state-dump] ================================================== [state-dump] [state-dump] ClusterResources: [state-dump] LocalObjectManager: [state-dump] - num pinned objects: 0 [state-dump] - pinned objects size: 0 [state-dump] - num objects pending restore: 0 [state-dump] - num objects pending spill: 0 [state-dump] - num bytes pending spill: 0 [state-dump] - num bytes currently spilled: 0 [state-dump] - cumulative spill requests: 0 [state-dump] - cumulative restore requests: 0 [state-dump] - spilled objects pending delete: 0 [state-dump] [state-dump] ObjectManager: [state-dump] - num local objects: 0 [state-dump] - num unfulfilled push requests: 0 [state-dump] - num object pull requests: 0 [state-dump] - num chunks received total: 0 [state-dump] - num chunks received failed (all): 0 [state-dump] - num chunks received failed / cancelled: 0 [state-dump] - num chunks received failed / plasma error: 0 [state-dump] Event stats: [state-dump] Global stats: 0 total (0 active) [state-dump] Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Execution time: mean = -nan s, total = 0.000 s [state-dump] Event stats: [state-dump] PushManager: [state-dump] - num pushes in flight: 0 [state-dump] - num chunks in flight: 0 [state-dump] - num chunks remaining: 0 [state-dump] - max chunks allowed: 409 [state-dump] OwnershipBasedObjectDirectory: [state-dump] - num listeners: 0 [state-dump] - cumulative location updates: 0 [state-dump] - num location updates per second: 0.000 [state-dump] - num location lookups per second: 0.000 [state-dump] - num locations added per second: 0.000 [state-dump] - num locations removed per second: 0.000 [state-dump] BufferPool: [state-dump] - create buffer state map size: 0 [state-dump] PullManager: [state-dump] - num bytes available for pulled objects: 2147483648 [state-dump] - num bytes being pulled (all): 0 [state-dump] - num bytes being pulled / pinned: 0 [state-dump] - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - first get request bundle: N/A [state-dump] - first wait request bundle: N/A [state-dump] - first task request bundle: N/A [state-dump] - num objects queued: 0 [state-dump] - num objects actively pulled (all): 0 [state-dump] - num objects actively pulled / pinned: 0 [state-dump] - num bundles being pulled: 0 [state-dump] - num pull retries: 0 [state-dump] - max timeout seconds: 0 [state-dump] - max timeout request is already processed. No entry. [state-dump] [state-dump] WorkerPool: [state-dump] - registered jobs: 1 [state-dump] - process_failed_job_config_missing: 0 [state-dump] - process_failed_rate_limited: 0 [state-dump] - process_failed_pending_registration: 0 [state-dump] - process_failed_runtime_env_setup_failed: 0 [state-dump] - num PYTHON workers: 20 [state-dump] - num PYTHON drivers: 1 [state-dump] - num PYTHON pending start requests: 0 [state-dump] - num PYTHON pending registration requests: 0 [state-dump] - num object spill callbacks queued: 0 [state-dump] - num object restore queued: 0 [state-dump] - num util functions queued: 0 [state-dump] - num idle workers: 20 [state-dump] TaskDependencyManager: [state-dump] - task deps map size: 0 [state-dump] - get req map size: 0 [state-dump] - wait req map size: 0 [state-dump] - local objects map size: 0 [state-dump] WaitManager: [state-dump] - num active wait requests: 0 [state-dump] Subscriber: [state-dump] Channel WORKER_OBJECT_LOCATIONS_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_OBJECT_EVICTION [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_REF_REMOVED_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] num async plasma notifications: 0 [state-dump] Remote node managers: [state-dump] Event stats: [state-dump] Global stats: 387533 total (35 active) [state-dump] Queueing time: mean = 239.920 us, max = 59.826 s, min = -0.001 s, total = 92.977 s [state-dump] Execution time: mean = 11.012 ms, total = 4267.339 s [state-dump] Event stats: [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog - 93198 total (0 active), Execution time: mean = 530.157 us, total = 49.410 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 93198 total (0 active), Execution time: mean = 35.271 us, total = 3.287 s, Queueing time: mean = 110.495 us, max = 3.003 ms, min = 1.846 us, total = 10.298 s [state-dump] ObjectManager.UpdateAvailableMemory - 44358 total (0 active), Execution time: mean = 6.134 us, total = 272.110 ms, Queueing time: mean = 110.128 us, max = 1.662 ms, min = 2.228 us, total = 4.885 s [state-dump] NodeManager.CheckGC - 44358 total (1 active), Execution time: mean = 2.888 us, total = 128.096 ms, Queueing time: mean = 98.103 us, max = 25.875 ms, min = -0.000 s, total = 4.352 s [state-dump] RaySyncer.OnDemandBroadcasting - 44358 total (1 active), Execution time: mean = 10.635 us, total = 471.725 ms, Queueing time: mean = 91.256 us, max = 25.869 ms, min = 6.166 us, total = 4.048 s [state-dump] RayletWorkerPool.deadline_timer.kill_idle_workers - 22191 total (1 active), Execution time: mean = 18.526 us, total = 411.104 ms, Queueing time: mean = 76.604 us, max = 26.386 ms, min = -0.001 s, total = 1.700 s [state-dump] MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 17724 total (1 active), Execution time: mean = 455.537 us, total = 8.074 s, Queueing time: mean = 75.206 us, max = 3.532 ms, min = -0.000 s, total = 1.333 s [state-dump] NodeManager.deadline_timer.spill_objects_when_over_threshold - 4440 total (1 active), Execution time: mean = 3.117 us, total = 13.842 ms, Queueing time: mean = 182.166 us, max = 2.946 ms, min = 3.845 us, total = 808.815 ms [state-dump] NodeManager.deadline_timer.flush_free_objects - 4440 total (1 active), Execution time: mean = 9.177 us, total = 40.746 ms, Queueing time: mean = 178.018 us, max = 2.947 ms, min = 3.811 us, total = 790.401 ms [state-dump] NodeManager.ScheduleAndDispatchTasks - 4440 total (1 active), Execution time: mean = 16.834 us, total = 74.743 ms, Queueing time: mean = 74.126 us, max = 2.581 ms, min = 8.735 us, total = 329.120 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 4438 total (0 active), Execution time: mean = 97.095 us, total = 430.908 ms, Queueing time: mean = 112.395 us, max = 2.934 ms, min = 4.027 us, total = 498.808 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad - 4438 total (0 active), Execution time: mean = 611.794 us, total = 2.715 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ClusterResourceManager.ResetRemoteNodeView - 1481 total (1 active), Execution time: mean = 10.083 us, total = 14.933 ms, Queueing time: mean = 76.091 us, max = 442.307 us, min = 14.043 us, total = 112.691 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 888 total (0 active), Execution time: mean = 1.595 ms, total = 1.417 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.deadline_timer.record_metrics - 888 total (1 active), Execution time: mean = 582.358 us, total = 517.134 ms, Queueing time: mean = 342.737 us, max = 2.010 ms, min = 9.115 us, total = 304.350 ms [state-dump] NodeManager.GcsCheckAlive - 888 total (1 active), Execution time: mean = 311.407 us, total = 276.530 ms, Queueing time: mean = 613.231 us, max = 2.567 ms, min = 6.690 us, total = 544.549 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 888 total (0 active), Execution time: mean = 55.171 us, total = 48.992 ms, Queueing time: mean = 113.161 us, max = 4.779 ms, min = 11.561 us, total = 100.487 ms [state-dump] NodeManager.deadline_timer.debug_state_dump - 444 total (1 active), Execution time: mean = 1.777 ms, total = 788.976 ms, Queueing time: mean = 70.136 us, max = 176.385 us, min = 11.609 us, total = 31.140 ms [state-dump] ClientConnection.async_read.ProcessMessageHeader - 95 total (21 active), Execution time: mean = 7.620 us, total = 723.861 us, Queueing time: mean = 660.815 ms, max = 59.826 s, min = 29.537 us, total = 62.777 s [state-dump] NodeManager.deadline_timer.print_event_loop_stats - 74 total (1 active, 1 running), Execution time: mean = 2.717 ms, total = 201.029 ms, Queueing time: mean = 78.842 us, max = 325.100 us, min = 15.835 us, total = 5.834 ms [state-dump] ClientConnection.async_read.ProcessMessage - 74 total (0 active), Execution time: mean = 752.373 us, total = 55.676 ms, Queueing time: mean = 18.898 us, max = 247.615 us, min = 4.453 us, total = 1.398 ms [state-dump] ClientConnection.async_write.DoAsyncWrites - 22 total (0 active), Execution time: mean = 2.129 us, total = 46.834 us, Queueing time: mean = 43.475 us, max = 264.443 us, min = 17.727 us, total = 956.459 us [state-dump] ObjectManager.ObjectDeleted - 21 total (0 active), Execution time: mean = 18.654 us, total = 391.728 us, Queueing time: mean = 66.932 us, max = 269.040 us, min = 24.280 us, total = 1.406 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 21 total (0 active), Execution time: mean = 119.783 us, total = 2.515 ms, Queueing time: mean = 119.709 us, max = 193.975 us, min = 26.529 us, total = 2.514 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig - 21 total (0 active), Execution time: mean = 800.451 us, total = 16.809 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ObjectManager.ObjectAdded - 21 total (0 active), Execution time: mean = 13.522 us, total = 283.961 us, Queueing time: mean = 115.200 us, max = 203.067 us, min = 15.853 us, total = 2.419 ms [state-dump] PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 204.482 us, total = 2.658 ms, Queueing time: mean = 3.109 ms, max = 9.915 ms, min = 28.359 us, total = 40.413 ms [state-dump] - 12 total (0 active), Execution time: mean = 1.109 us, total = 13.308 us, Queueing time: mean = 117.235 us, max = 207.332 us, min = 32.817 us, total = 1.407 ms [state-dump] RaySyncer.BroadcastMessage - 12 total (0 active), Execution time: mean = 229.790 us, total = 2.757 ms, Queueing time: mean = 796.500 ns, max = 1.330 us, min = 180.000 ns, total = 9.558 us [state-dump] NodeManagerService.grpc_server.RequestWorkerLease.HandleRequestImpl - 10 total (0 active), Execution time: mean = 133.400 us, total = 1.334 ms, Queueing time: mean = 133.654 us, max = 258.016 us, min = 50.206 us, total = 1.337 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker - 10 total (0 active), Execution time: mean = 698.937 us, total = 6.989 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.RequestWorkerLease - 10 total (0 active), Execution time: mean = 967.372 us, total = 9.674 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] WorkerPool.PopWorkerCallback - 10 total (0 active), Execution time: mean = 24.488 us, total = 244.878 us, Queueing time: mean = 114.997 us, max = 206.869 us, min = 23.653 us, total = 1.150 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker.HandleRequestImpl - 10 total (0 active), Execution time: mean = 129.906 us, total = 1.299 ms, Queueing time: mean = 109.463 us, max = 137.712 us, min = 31.332 us, total = 1.095 ms [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 9 total (1 active), Execution time: mean = 466.401 s, total = 4197.607 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 8 total (0 active), Execution time: mean = 381.115 us, total = 3.049 ms, Queueing time: mean = 142.114 us, max = 243.371 us, min = 20.299 us, total = 1.137 ms [state-dump] NodeManager.GCTaskFailureReason - 5 total (1 active), Execution time: mean = 6.901 us, total = 34.506 us, Queueing time: mean = 58.173 us, max = 97.290 us, min = 24.344 us, total = 290.867 us [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 1.320 ms, total = 2.640 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 142.119 us, total = 284.237 us, Queueing time: mean = 702.200 us, max = 1.287 ms, min = 117.240 us, total = 1.404 ms [state-dump] RaySyncerRegister - 2 total (0 active), Execution time: mean = 2.053 us, total = 4.106 us, Queueing time: mean = 222.000 ns, max = 369.000 ns, min = 75.000 ns, total = 444.000 ns [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 2.178 ms, total = 2.178 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 1.022 s, total = 1.022 s, Queueing time: mean = 126.779 us, max = 126.779 us, min = 126.779 us, total = 126.779 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 43.810 us, total = 43.810 us, Queueing time: mean = 116.080 us, max = 116.080 us, min = 116.080 us, total = 116.080 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 1.935 ms, total = 1.935 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 2.002 ms, total = 2.002 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 1.278 ms, total = 1.278 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 288.995 us, total = 288.995 us, Queueing time: mean = 122.868 us, max = 122.868 us, min = 122.868 us, total = 122.868 us [state-dump] Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 74.750 us, total = 74.750 us, Queueing time: mean = 225.429 us, max = 225.429 us, min = 225.429 us, total = 225.429 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 242.297 us, total = 242.297 us, Queueing time: mean = 122.977 us, max = 122.977 us, min = 122.977 us, total = 122.977 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 92.080 us, total = 92.080 us, Queueing time: mean = 36.492 us, max = 36.492 us, min = 36.492 us, total = 36.492 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 1.746 ms, total = 1.746 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] DebugString() time ms: 1 [state-dump] [state-dump] [2025-01-21 00:16:54,926 I 13636 13664] (raylet) store.cc:564: Plasma store debug dump: Current usage: 0 / 2.14748 GB - num bytes created total: 168 0 pending objects of total size 0MB - objects spillable: 0 - bytes spillable: 0 - objects unsealed: 0 - bytes unsealed: 0 - objects in use: 0 - bytes in use: 0 - objects evictable: 0 - bytes evictable: 0 - objects created by worker: 0 - bytes created by worker: 0 - objects restored: 0 - bytes restored: 0 - objects received: 0 - bytes received: 0 - objects errored: 0 - bytes errored: 0 [2025-01-21 00:16:56,137 I 13636 13636] (raylet) node_manager.cc:525: [state-dump] NodeManager: [state-dump] Node ID: 94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3 [state-dump] Node name: 192.168.0.2 [state-dump] InitialConfigResources: {object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, GPU: 20000, node:__internal_head__: 10000, accelerator_type:A40: 10000, memory: 844966629380000} [state-dump] ClusterTaskManager: [state-dump] ========== Node: 94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3 ================= [state-dump] Infeasible queue length: 0 [state-dump] Schedule queue length: 0 [state-dump] Dispatch queue length: 0 [state-dump] num_waiting_for_resource: 0 [state-dump] num_waiting_for_plasma_memory: 0 [state-dump] num_waiting_for_remote_node_resources: 0 [state-dump] num_worker_not_started_by_job_config_not_exist: 0 [state-dump] num_worker_not_started_by_registration_timeout: 0 [state-dump] num_tasks_waiting_for_workers: 0 [state-dump] num_cancelled_tasks: 0 [state-dump] cluster_resource_scheduler state: [state-dump] Local id: -2729954820020547849 Local resources: {"total":{CPU: [200000], memory: [844966629380000], accelerator_type:A40: [10000], node:192.168.0.2: [10000], node:__internal_head__: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000]}}, "available": {CPU: [200000], memory: [844966629380000], accelerator_type:A40: [10000], node:192.168.0.2: [10000], node:__internal_head__: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000]}}, "labels":{"ray.io/node_id":"94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3",} is_draining: 0 is_idle: 1 Cluster resources: node id: -2729954820020547849{"total":{object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, GPU: 20000, node:__internal_head__: 10000, accelerator_type:A40: 10000, memory: 844966629380000}}, "available": {GPU: 20000, node:__internal_head__: 10000, memory: 844966629380000, accelerator_type:A40: 10000, object_store_memory: 21474836480000, node:192.168.0.2: 10000, CPU: 200000}}, "labels":{"ray.io/node_id":"94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} [state-dump] Waiting tasks size: 0 [state-dump] Number of executing tasks: 0 [state-dump] Number of pinned task arguments: 0 [state-dump] Number of total spilled tasks: 0 [state-dump] Number of spilled waiting tasks: 0 [state-dump] Number of spilled unschedulable tasks: 0 [state-dump] Resource usage { [state-dump] } [state-dump] Backlog Size per scheduling descriptor :{workerId: num backlogs}: [state-dump] [state-dump] Running tasks by scheduling class: [state-dump] ================================================== [state-dump] [state-dump] ClusterResources: [state-dump] LocalObjectManager: [state-dump] - num pinned objects: 0 [state-dump] - pinned objects size: 0 [state-dump] - num objects pending restore: 0 [state-dump] - num objects pending spill: 0 [state-dump] - num bytes pending spill: 0 [state-dump] - num bytes currently spilled: 0 [state-dump] - cumulative spill requests: 0 [state-dump] - cumulative restore requests: 0 [state-dump] - spilled objects pending delete: 0 [state-dump] [state-dump] ObjectManager: [state-dump] - num local objects: 0 [state-dump] - num unfulfilled push requests: 0 [state-dump] - num object pull requests: 0 [state-dump] - num chunks received total: 0 [state-dump] - num chunks received failed (all): 0 [state-dump] - num chunks received failed / cancelled: 0 [state-dump] - num chunks received failed / plasma error: 0 [state-dump] Event stats: [state-dump] Global stats: 0 total (0 active) [state-dump] Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Execution time: mean = -nan s, total = 0.000 s [state-dump] Event stats: [state-dump] PushManager: [state-dump] - num pushes in flight: 0 [state-dump] - num chunks in flight: 0 [state-dump] - num chunks remaining: 0 [state-dump] - max chunks allowed: 409 [state-dump] OwnershipBasedObjectDirectory: [state-dump] - num listeners: 0 [state-dump] - cumulative location updates: 0 [state-dump] - num location updates per second: 0.000 [state-dump] - num location lookups per second: 0.000 [state-dump] - num locations added per second: 0.000 [state-dump] - num locations removed per second: 0.000 [state-dump] BufferPool: [state-dump] - create buffer state map size: 0 [state-dump] PullManager: [state-dump] - num bytes available for pulled objects: 2147483648 [state-dump] - num bytes being pulled (all): 0 [state-dump] - num bytes being pulled / pinned: 0 [state-dump] - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - first get request bundle: N/A [state-dump] - first wait request bundle: N/A [state-dump] - first task request bundle: N/A [state-dump] - num objects queued: 0 [state-dump] - num objects actively pulled (all): 0 [state-dump] - num objects actively pulled / pinned: 0 [state-dump] - num bundles being pulled: 0 [state-dump] - num pull retries: 0 [state-dump] - max timeout seconds: 0 [state-dump] - max timeout request is already processed. No entry. [state-dump] [state-dump] WorkerPool: [state-dump] - registered jobs: 1 [state-dump] - process_failed_job_config_missing: 0 [state-dump] - process_failed_rate_limited: 0 [state-dump] - process_failed_pending_registration: 0 [state-dump] - process_failed_runtime_env_setup_failed: 0 [state-dump] - num PYTHON workers: 20 [state-dump] - num PYTHON drivers: 1 [state-dump] - num PYTHON pending start requests: 0 [state-dump] - num PYTHON pending registration requests: 0 [state-dump] - num object spill callbacks queued: 0 [state-dump] - num object restore queued: 0 [state-dump] - num util functions queued: 0 [state-dump] - num idle workers: 20 [state-dump] TaskDependencyManager: [state-dump] - task deps map size: 0 [state-dump] - get req map size: 0 [state-dump] - wait req map size: 0 [state-dump] - local objects map size: 0 [state-dump] WaitManager: [state-dump] - num active wait requests: 0 [state-dump] Subscriber: [state-dump] Channel WORKER_OBJECT_LOCATIONS_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_OBJECT_EVICTION [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_REF_REMOVED_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] num async plasma notifications: 0 [state-dump] Remote node managers: [state-dump] Event stats: [state-dump] Global stats: 392765 total (35 active) [state-dump] Queueing time: mean = 237.825 us, max = 59.826 s, min = -0.001 s, total = 93.409 s [state-dump] Execution time: mean = 10.867 ms, total = 4268.294 s [state-dump] Event stats: [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog - 94458 total (0 active), Execution time: mean = 530.338 us, total = 50.095 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 94458 total (0 active), Execution time: mean = 35.286 us, total = 3.333 s, Queueing time: mean = 110.543 us, max = 3.003 ms, min = 1.846 us, total = 10.442 s [state-dump] ObjectManager.UpdateAvailableMemory - 44957 total (0 active), Execution time: mean = 6.144 us, total = 276.227 ms, Queueing time: mean = 110.257 us, max = 3.228 ms, min = 2.228 us, total = 4.957 s [state-dump] NodeManager.CheckGC - 44957 total (1 active), Execution time: mean = 2.891 us, total = 129.980 ms, Queueing time: mean = 98.218 us, max = 25.875 ms, min = -0.000 s, total = 4.416 s [state-dump] RaySyncer.OnDemandBroadcasting - 44957 total (1 active), Execution time: mean = 10.655 us, total = 479.038 ms, Queueing time: mean = 91.352 us, max = 25.869 ms, min = 6.166 us, total = 4.107 s [state-dump] RayletWorkerPool.deadline_timer.kill_idle_workers - 22491 total (1 active), Execution time: mean = 18.564 us, total = 417.513 ms, Queueing time: mean = 76.638 us, max = 26.386 ms, min = -0.001 s, total = 1.724 s [state-dump] MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 17963 total (1 active), Execution time: mean = 455.673 us, total = 8.185 s, Queueing time: mean = 75.217 us, max = 3.532 ms, min = -0.000 s, total = 1.351 s [state-dump] NodeManager.deadline_timer.spill_objects_when_over_threshold - 4500 total (1 active), Execution time: mean = 3.119 us, total = 14.037 ms, Queueing time: mean = 182.429 us, max = 2.946 ms, min = 3.845 us, total = 820.932 ms [state-dump] NodeManager.deadline_timer.flush_free_objects - 4500 total (1 active), Execution time: mean = 9.191 us, total = 41.360 ms, Queueing time: mean = 178.276 us, max = 2.947 ms, min = 3.811 us, total = 802.240 ms [state-dump] NodeManager.ScheduleAndDispatchTasks - 4500 total (1 active), Execution time: mean = 16.843 us, total = 75.793 ms, Queueing time: mean = 74.246 us, max = 2.581 ms, min = 8.735 us, total = 334.107 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 4498 total (0 active), Execution time: mean = 97.124 us, total = 436.862 ms, Queueing time: mean = 112.421 us, max = 2.934 ms, min = 4.027 us, total = 505.668 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad - 4498 total (0 active), Execution time: mean = 611.906 us, total = 2.752 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ClusterResourceManager.ResetRemoteNodeView - 1501 total (1 active), Execution time: mean = 10.082 us, total = 15.133 ms, Queueing time: mean = 76.016 us, max = 442.307 us, min = 14.043 us, total = 114.100 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 900 total (0 active), Execution time: mean = 1.596 ms, total = 1.437 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.deadline_timer.record_metrics - 900 total (1 active), Execution time: mean = 582.651 us, total = 524.386 ms, Queueing time: mean = 343.790 us, max = 2.010 ms, min = 9.115 us, total = 309.411 ms [state-dump] NodeManager.GcsCheckAlive - 900 total (1 active), Execution time: mean = 312.178 us, total = 280.960 ms, Queueing time: mean = 613.782 us, max = 2.567 ms, min = 6.690 us, total = 552.404 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 900 total (0 active), Execution time: mean = 55.233 us, total = 49.709 ms, Queueing time: mean = 113.088 us, max = 4.779 ms, min = 11.561 us, total = 101.779 ms [state-dump] NodeManager.deadline_timer.debug_state_dump - 450 total (1 active), Execution time: mean = 1.779 ms, total = 800.507 ms, Queueing time: mean = 70.346 us, max = 176.385 us, min = 11.609 us, total = 31.656 ms [state-dump] ClientConnection.async_read.ProcessMessageHeader - 95 total (21 active), Execution time: mean = 7.620 us, total = 723.861 us, Queueing time: mean = 660.815 ms, max = 59.826 s, min = 29.537 us, total = 62.777 s [state-dump] NodeManager.deadline_timer.print_event_loop_stats - 75 total (1 active, 1 running), Execution time: mean = 2.721 ms, total = 204.044 ms, Queueing time: mean = 79.234 us, max = 325.100 us, min = 15.835 us, total = 5.943 ms [state-dump] ClientConnection.async_read.ProcessMessage - 74 total (0 active), Execution time: mean = 752.373 us, total = 55.676 ms, Queueing time: mean = 18.898 us, max = 247.615 us, min = 4.453 us, total = 1.398 ms [state-dump] ClientConnection.async_write.DoAsyncWrites - 22 total (0 active), Execution time: mean = 2.129 us, total = 46.834 us, Queueing time: mean = 43.475 us, max = 264.443 us, min = 17.727 us, total = 956.459 us [state-dump] ObjectManager.ObjectDeleted - 21 total (0 active), Execution time: mean = 18.654 us, total = 391.728 us, Queueing time: mean = 66.932 us, max = 269.040 us, min = 24.280 us, total = 1.406 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 21 total (0 active), Execution time: mean = 119.783 us, total = 2.515 ms, Queueing time: mean = 119.709 us, max = 193.975 us, min = 26.529 us, total = 2.514 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig - 21 total (0 active), Execution time: mean = 800.451 us, total = 16.809 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ObjectManager.ObjectAdded - 21 total (0 active), Execution time: mean = 13.522 us, total = 283.961 us, Queueing time: mean = 115.200 us, max = 203.067 us, min = 15.853 us, total = 2.419 ms [state-dump] PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 204.482 us, total = 2.658 ms, Queueing time: mean = 3.109 ms, max = 9.915 ms, min = 28.359 us, total = 40.413 ms [state-dump] - 12 total (0 active), Execution time: mean = 1.109 us, total = 13.308 us, Queueing time: mean = 117.235 us, max = 207.332 us, min = 32.817 us, total = 1.407 ms [state-dump] RaySyncer.BroadcastMessage - 12 total (0 active), Execution time: mean = 229.790 us, total = 2.757 ms, Queueing time: mean = 796.500 ns, max = 1.330 us, min = 180.000 ns, total = 9.558 us [state-dump] NodeManagerService.grpc_server.RequestWorkerLease.HandleRequestImpl - 10 total (0 active), Execution time: mean = 133.400 us, total = 1.334 ms, Queueing time: mean = 133.654 us, max = 258.016 us, min = 50.206 us, total = 1.337 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker - 10 total (0 active), Execution time: mean = 698.937 us, total = 6.989 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.RequestWorkerLease - 10 total (0 active), Execution time: mean = 967.372 us, total = 9.674 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] WorkerPool.PopWorkerCallback - 10 total (0 active), Execution time: mean = 24.488 us, total = 244.878 us, Queueing time: mean = 114.997 us, max = 206.869 us, min = 23.653 us, total = 1.150 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker.HandleRequestImpl - 10 total (0 active), Execution time: mean = 129.906 us, total = 1.299 ms, Queueing time: mean = 109.463 us, max = 137.712 us, min = 31.332 us, total = 1.095 ms [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 9 total (1 active), Execution time: mean = 466.401 s, total = 4197.607 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 8 total (0 active), Execution time: mean = 381.115 us, total = 3.049 ms, Queueing time: mean = 142.114 us, max = 243.371 us, min = 20.299 us, total = 1.137 ms [state-dump] NodeManager.GCTaskFailureReason - 6 total (1 active), Execution time: mean = 7.110 us, total = 42.659 us, Queueing time: mean = 57.967 us, max = 97.290 us, min = 24.344 us, total = 347.804 us [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 1.320 ms, total = 2.640 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 142.119 us, total = 284.237 us, Queueing time: mean = 702.200 us, max = 1.287 ms, min = 117.240 us, total = 1.404 ms [state-dump] RaySyncerRegister - 2 total (0 active), Execution time: mean = 2.053 us, total = 4.106 us, Queueing time: mean = 222.000 ns, max = 369.000 ns, min = 75.000 ns, total = 444.000 ns [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 2.178 ms, total = 2.178 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 1.022 s, total = 1.022 s, Queueing time: mean = 126.779 us, max = 126.779 us, min = 126.779 us, total = 126.779 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 43.810 us, total = 43.810 us, Queueing time: mean = 116.080 us, max = 116.080 us, min = 116.080 us, total = 116.080 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 1.935 ms, total = 1.935 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 2.002 ms, total = 2.002 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 1.278 ms, total = 1.278 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 288.995 us, total = 288.995 us, Queueing time: mean = 122.868 us, max = 122.868 us, min = 122.868 us, total = 122.868 us [state-dump] Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 74.750 us, total = 74.750 us, Queueing time: mean = 225.429 us, max = 225.429 us, min = 225.429 us, total = 225.429 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 242.297 us, total = 242.297 us, Queueing time: mean = 122.977 us, max = 122.977 us, min = 122.977 us, total = 122.977 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 92.080 us, total = 92.080 us, Queueing time: mean = 36.492 us, max = 36.492 us, min = 36.492 us, total = 36.492 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 1.746 ms, total = 1.746 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] DebugString() time ms: 1 [state-dump] [state-dump] [2025-01-21 00:17:54,926 I 13636 13664] (raylet) store.cc:564: Plasma store debug dump: Current usage: 0 / 2.14748 GB - num bytes created total: 168 0 pending objects of total size 0MB - objects spillable: 0 - bytes spillable: 0 - objects unsealed: 0 - bytes unsealed: 0 - objects in use: 0 - bytes in use: 0 - objects evictable: 0 - bytes evictable: 0 - objects created by worker: 0 - bytes created by worker: 0 - objects restored: 0 - bytes restored: 0 - objects received: 0 - bytes received: 0 - objects errored: 0 - bytes errored: 0 [2025-01-21 00:17:56,140 I 13636 13636] (raylet) node_manager.cc:525: [state-dump] NodeManager: [state-dump] Node ID: 94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3 [state-dump] Node name: 192.168.0.2 [state-dump] InitialConfigResources: {object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, GPU: 20000, node:__internal_head__: 10000, accelerator_type:A40: 10000, memory: 844966629380000} [state-dump] ClusterTaskManager: [state-dump] ========== Node: 94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3 ================= [state-dump] Infeasible queue length: 0 [state-dump] Schedule queue length: 0 [state-dump] Dispatch queue length: 0 [state-dump] num_waiting_for_resource: 0 [state-dump] num_waiting_for_plasma_memory: 0 [state-dump] num_waiting_for_remote_node_resources: 0 [state-dump] num_worker_not_started_by_job_config_not_exist: 0 [state-dump] num_worker_not_started_by_registration_timeout: 0 [state-dump] num_tasks_waiting_for_workers: 0 [state-dump] num_cancelled_tasks: 0 [state-dump] cluster_resource_scheduler state: [state-dump] Local id: -2729954820020547849 Local resources: {"total":{CPU: [200000], memory: [844966629380000], accelerator_type:A40: [10000], node:192.168.0.2: [10000], node:__internal_head__: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000]}}, "available": {CPU: [200000], memory: [844966629380000], accelerator_type:A40: [10000], node:192.168.0.2: [10000], node:__internal_head__: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000]}}, "labels":{"ray.io/node_id":"94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3",} is_draining: 0 is_idle: 1 Cluster resources: node id: -2729954820020547849{"total":{object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, GPU: 20000, node:__internal_head__: 10000, accelerator_type:A40: 10000, memory: 844966629380000}}, "available": {GPU: 20000, node:__internal_head__: 10000, memory: 844966629380000, accelerator_type:A40: 10000, object_store_memory: 21474836480000, node:192.168.0.2: 10000, CPU: 200000}}, "labels":{"ray.io/node_id":"94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} [state-dump] Waiting tasks size: 0 [state-dump] Number of executing tasks: 0 [state-dump] Number of pinned task arguments: 0 [state-dump] Number of total spilled tasks: 0 [state-dump] Number of spilled waiting tasks: 0 [state-dump] Number of spilled unschedulable tasks: 0 [state-dump] Resource usage { [state-dump] } [state-dump] Backlog Size per scheduling descriptor :{workerId: num backlogs}: [state-dump] [state-dump] Running tasks by scheduling class: [state-dump] ================================================== [state-dump] [state-dump] ClusterResources: [state-dump] LocalObjectManager: [state-dump] - num pinned objects: 0 [state-dump] - pinned objects size: 0 [state-dump] - num objects pending restore: 0 [state-dump] - num objects pending spill: 0 [state-dump] - num bytes pending spill: 0 [state-dump] - num bytes currently spilled: 0 [state-dump] - cumulative spill requests: 0 [state-dump] - cumulative restore requests: 0 [state-dump] - spilled objects pending delete: 0 [state-dump] [state-dump] ObjectManager: [state-dump] - num local objects: 0 [state-dump] - num unfulfilled push requests: 0 [state-dump] - num object pull requests: 0 [state-dump] - num chunks received total: 0 [state-dump] - num chunks received failed (all): 0 [state-dump] - num chunks received failed / cancelled: 0 [state-dump] - num chunks received failed / plasma error: 0 [state-dump] Event stats: [state-dump] Global stats: 0 total (0 active) [state-dump] Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Execution time: mean = -nan s, total = 0.000 s [state-dump] Event stats: [state-dump] PushManager: [state-dump] - num pushes in flight: 0 [state-dump] - num chunks in flight: 0 [state-dump] - num chunks remaining: 0 [state-dump] - max chunks allowed: 409 [state-dump] OwnershipBasedObjectDirectory: [state-dump] - num listeners: 0 [state-dump] - cumulative location updates: 0 [state-dump] - num location updates per second: 0.000 [state-dump] - num location lookups per second: 0.000 [state-dump] - num locations added per second: 0.000 [state-dump] - num locations removed per second: 0.000 [state-dump] BufferPool: [state-dump] - create buffer state map size: 0 [state-dump] PullManager: [state-dump] - num bytes available for pulled objects: 2147483648 [state-dump] - num bytes being pulled (all): 0 [state-dump] - num bytes being pulled / pinned: 0 [state-dump] - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - first get request bundle: N/A [state-dump] - first wait request bundle: N/A [state-dump] - first task request bundle: N/A [state-dump] - num objects queued: 0 [state-dump] - num objects actively pulled (all): 0 [state-dump] - num objects actively pulled / pinned: 0 [state-dump] - num bundles being pulled: 0 [state-dump] - num pull retries: 0 [state-dump] - max timeout seconds: 0 [state-dump] - max timeout request is already processed. No entry. [state-dump] [state-dump] WorkerPool: [state-dump] - registered jobs: 1 [state-dump] - process_failed_job_config_missing: 0 [state-dump] - process_failed_rate_limited: 0 [state-dump] - process_failed_pending_registration: 0 [state-dump] - process_failed_runtime_env_setup_failed: 0 [state-dump] - num PYTHON workers: 20 [state-dump] - num PYTHON drivers: 1 [state-dump] - num PYTHON pending start requests: 0 [state-dump] - num PYTHON pending registration requests: 0 [state-dump] - num object spill callbacks queued: 0 [state-dump] - num object restore queued: 0 [state-dump] - num util functions queued: 0 [state-dump] - num idle workers: 20 [state-dump] TaskDependencyManager: [state-dump] - task deps map size: 0 [state-dump] - get req map size: 0 [state-dump] - wait req map size: 0 [state-dump] - local objects map size: 0 [state-dump] WaitManager: [state-dump] - num active wait requests: 0 [state-dump] Subscriber: [state-dump] Channel WORKER_OBJECT_LOCATIONS_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_OBJECT_EVICTION [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_REF_REMOVED_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] num async plasma notifications: 0 [state-dump] Remote node managers: [state-dump] Event stats: [state-dump] Global stats: 398000 total (35 active) [state-dump] Queueing time: mean = 235.753 us, max = 59.826 s, min = -0.001 s, total = 93.830 s [state-dump] Execution time: mean = 10.727 ms, total = 4269.225 s [state-dump] Event stats: [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog - 95718 total (0 active), Execution time: mean = 530.349 us, total = 50.764 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 95718 total (0 active), Execution time: mean = 35.286 us, total = 3.377 s, Queueing time: mean = 110.640 us, max = 3.003 ms, min = 1.846 us, total = 10.590 s [state-dump] ObjectManager.UpdateAvailableMemory - 45557 total (0 active), Execution time: mean = 6.149 us, total = 280.147 ms, Queueing time: mean = 110.272 us, max = 3.228 ms, min = 2.228 us, total = 5.024 s [state-dump] NodeManager.CheckGC - 45557 total (1 active), Execution time: mean = 2.893 us, total = 131.795 ms, Queueing time: mean = 98.267 us, max = 25.875 ms, min = -0.000 s, total = 4.477 s [state-dump] RaySyncer.OnDemandBroadcasting - 45557 total (1 active), Execution time: mean = 10.669 us, total = 486.058 ms, Queueing time: mean = 91.391 us, max = 25.869 ms, min = 6.166 us, total = 4.163 s [state-dump] RayletWorkerPool.deadline_timer.kill_idle_workers - 22791 total (1 active), Execution time: mean = 18.568 us, total = 423.193 ms, Queueing time: mean = 76.626 us, max = 26.386 ms, min = -0.001 s, total = 1.746 s [state-dump] MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 18203 total (1 active), Execution time: mean = 455.817 us, total = 8.297 s, Queueing time: mean = 75.236 us, max = 3.532 ms, min = -0.000 s, total = 1.370 s [state-dump] NodeManager.deadline_timer.spill_objects_when_over_threshold - 4560 total (1 active), Execution time: mean = 3.120 us, total = 14.227 ms, Queueing time: mean = 182.337 us, max = 2.946 ms, min = 3.845 us, total = 831.455 ms [state-dump] NodeManager.deadline_timer.flush_free_objects - 4560 total (1 active), Execution time: mean = 9.198 us, total = 41.942 ms, Queueing time: mean = 178.180 us, max = 2.947 ms, min = 3.811 us, total = 812.503 ms [state-dump] NodeManager.ScheduleAndDispatchTasks - 4560 total (1 active), Execution time: mean = 16.842 us, total = 76.797 ms, Queueing time: mean = 74.258 us, max = 2.581 ms, min = 8.735 us, total = 338.618 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 4558 total (0 active), Execution time: mean = 97.128 us, total = 442.708 ms, Queueing time: mean = 112.446 us, max = 2.934 ms, min = 4.027 us, total = 512.529 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad - 4558 total (0 active), Execution time: mean = 611.679 us, total = 2.788 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ClusterResourceManager.ResetRemoteNodeView - 1521 total (1 active), Execution time: mean = 10.076 us, total = 15.326 ms, Queueing time: mean = 75.986 us, max = 442.307 us, min = 14.043 us, total = 115.574 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 912 total (0 active), Execution time: mean = 1.596 ms, total = 1.456 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.deadline_timer.record_metrics - 912 total (1 active), Execution time: mean = 581.998 us, total = 530.783 ms, Queueing time: mean = 343.954 us, max = 2.010 ms, min = 9.115 us, total = 313.686 ms [state-dump] NodeManager.GcsCheckAlive - 912 total (1 active), Execution time: mean = 312.381 us, total = 284.891 ms, Queueing time: mean = 613.103 us, max = 2.567 ms, min = 6.690 us, total = 559.150 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 912 total (0 active), Execution time: mean = 55.222 us, total = 50.363 ms, Queueing time: mean = 112.793 us, max = 4.779 ms, min = 11.561 us, total = 102.867 ms [state-dump] NodeManager.deadline_timer.debug_state_dump - 456 total (1 active), Execution time: mean = 1.779 ms, total = 811.221 ms, Queueing time: mean = 70.469 us, max = 176.385 us, min = 11.609 us, total = 32.134 ms [state-dump] ClientConnection.async_read.ProcessMessageHeader - 95 total (21 active), Execution time: mean = 7.620 us, total = 723.861 us, Queueing time: mean = 660.815 ms, max = 59.826 s, min = 29.537 us, total = 62.777 s [state-dump] NodeManager.deadline_timer.print_event_loop_stats - 76 total (1 active, 1 running), Execution time: mean = 2.723 ms, total = 206.984 ms, Queueing time: mean = 79.190 us, max = 325.100 us, min = 15.835 us, total = 6.018 ms [state-dump] ClientConnection.async_read.ProcessMessage - 74 total (0 active), Execution time: mean = 752.373 us, total = 55.676 ms, Queueing time: mean = 18.898 us, max = 247.615 us, min = 4.453 us, total = 1.398 ms [state-dump] ClientConnection.async_write.DoAsyncWrites - 22 total (0 active), Execution time: mean = 2.129 us, total = 46.834 us, Queueing time: mean = 43.475 us, max = 264.443 us, min = 17.727 us, total = 956.459 us [state-dump] ObjectManager.ObjectDeleted - 21 total (0 active), Execution time: mean = 18.654 us, total = 391.728 us, Queueing time: mean = 66.932 us, max = 269.040 us, min = 24.280 us, total = 1.406 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 21 total (0 active), Execution time: mean = 119.783 us, total = 2.515 ms, Queueing time: mean = 119.709 us, max = 193.975 us, min = 26.529 us, total = 2.514 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig - 21 total (0 active), Execution time: mean = 800.451 us, total = 16.809 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ObjectManager.ObjectAdded - 21 total (0 active), Execution time: mean = 13.522 us, total = 283.961 us, Queueing time: mean = 115.200 us, max = 203.067 us, min = 15.853 us, total = 2.419 ms [state-dump] PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 204.482 us, total = 2.658 ms, Queueing time: mean = 3.109 ms, max = 9.915 ms, min = 28.359 us, total = 40.413 ms [state-dump] - 12 total (0 active), Execution time: mean = 1.109 us, total = 13.308 us, Queueing time: mean = 117.235 us, max = 207.332 us, min = 32.817 us, total = 1.407 ms [state-dump] RaySyncer.BroadcastMessage - 12 total (0 active), Execution time: mean = 229.790 us, total = 2.757 ms, Queueing time: mean = 796.500 ns, max = 1.330 us, min = 180.000 ns, total = 9.558 us [state-dump] NodeManagerService.grpc_server.RequestWorkerLease.HandleRequestImpl - 10 total (0 active), Execution time: mean = 133.400 us, total = 1.334 ms, Queueing time: mean = 133.654 us, max = 258.016 us, min = 50.206 us, total = 1.337 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker - 10 total (0 active), Execution time: mean = 698.937 us, total = 6.989 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.RequestWorkerLease - 10 total (0 active), Execution time: mean = 967.372 us, total = 9.674 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] WorkerPool.PopWorkerCallback - 10 total (0 active), Execution time: mean = 24.488 us, total = 244.878 us, Queueing time: mean = 114.997 us, max = 206.869 us, min = 23.653 us, total = 1.150 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker.HandleRequestImpl - 10 total (0 active), Execution time: mean = 129.906 us, total = 1.299 ms, Queueing time: mean = 109.463 us, max = 137.712 us, min = 31.332 us, total = 1.095 ms [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 9 total (1 active), Execution time: mean = 466.401 s, total = 4197.607 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 8 total (0 active), Execution time: mean = 381.115 us, total = 3.049 ms, Queueing time: mean = 142.114 us, max = 243.371 us, min = 20.299 us, total = 1.137 ms [state-dump] NodeManager.GCTaskFailureReason - 6 total (1 active), Execution time: mean = 7.110 us, total = 42.659 us, Queueing time: mean = 57.967 us, max = 97.290 us, min = 24.344 us, total = 347.804 us [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 1.320 ms, total = 2.640 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 142.119 us, total = 284.237 us, Queueing time: mean = 702.200 us, max = 1.287 ms, min = 117.240 us, total = 1.404 ms [state-dump] RaySyncerRegister - 2 total (0 active), Execution time: mean = 2.053 us, total = 4.106 us, Queueing time: mean = 222.000 ns, max = 369.000 ns, min = 75.000 ns, total = 444.000 ns [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 2.178 ms, total = 2.178 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 1.022 s, total = 1.022 s, Queueing time: mean = 126.779 us, max = 126.779 us, min = 126.779 us, total = 126.779 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 43.810 us, total = 43.810 us, Queueing time: mean = 116.080 us, max = 116.080 us, min = 116.080 us, total = 116.080 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 1.935 ms, total = 1.935 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 2.002 ms, total = 2.002 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 1.278 ms, total = 1.278 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 288.995 us, total = 288.995 us, Queueing time: mean = 122.868 us, max = 122.868 us, min = 122.868 us, total = 122.868 us [state-dump] Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 74.750 us, total = 74.750 us, Queueing time: mean = 225.429 us, max = 225.429 us, min = 225.429 us, total = 225.429 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 242.297 us, total = 242.297 us, Queueing time: mean = 122.977 us, max = 122.977 us, min = 122.977 us, total = 122.977 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 92.080 us, total = 92.080 us, Queueing time: mean = 36.492 us, max = 36.492 us, min = 36.492 us, total = 36.492 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 1.746 ms, total = 1.746 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] DebugString() time ms: 1 [state-dump] [state-dump] [2025-01-21 00:18:54,926 I 13636 13664] (raylet) store.cc:564: Plasma store debug dump: Current usage: 0 / 2.14748 GB - num bytes created total: 168 0 pending objects of total size 0MB - objects spillable: 0 - bytes spillable: 0 - objects unsealed: 0 - bytes unsealed: 0 - objects in use: 0 - bytes in use: 0 - objects evictable: 0 - bytes evictable: 0 - objects created by worker: 0 - bytes created by worker: 0 - objects restored: 0 - bytes restored: 0 - objects received: 0 - bytes received: 0 - objects errored: 0 - bytes errored: 0 [2025-01-21 00:18:56,143 I 13636 13636] (raylet) node_manager.cc:525: [state-dump] NodeManager: [state-dump] Node ID: 94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3 [state-dump] Node name: 192.168.0.2 [state-dump] InitialConfigResources: {object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, GPU: 20000, node:__internal_head__: 10000, accelerator_type:A40: 10000, memory: 844966629380000} [state-dump] ClusterTaskManager: [state-dump] ========== Node: 94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3 ================= [state-dump] Infeasible queue length: 0 [state-dump] Schedule queue length: 0 [state-dump] Dispatch queue length: 0 [state-dump] num_waiting_for_resource: 0 [state-dump] num_waiting_for_plasma_memory: 0 [state-dump] num_waiting_for_remote_node_resources: 0 [state-dump] num_worker_not_started_by_job_config_not_exist: 0 [state-dump] num_worker_not_started_by_registration_timeout: 0 [state-dump] num_tasks_waiting_for_workers: 0 [state-dump] num_cancelled_tasks: 0 [state-dump] cluster_resource_scheduler state: [state-dump] Local id: -2729954820020547849 Local resources: {"total":{CPU: [200000], memory: [844966629380000], accelerator_type:A40: [10000], node:192.168.0.2: [10000], node:__internal_head__: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000]}}, "available": {CPU: [200000], memory: [844966629380000], accelerator_type:A40: [10000], node:192.168.0.2: [10000], node:__internal_head__: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000]}}, "labels":{"ray.io/node_id":"94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3",} is_draining: 0 is_idle: 1 Cluster resources: node id: -2729954820020547849{"total":{object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, GPU: 20000, node:__internal_head__: 10000, accelerator_type:A40: 10000, memory: 844966629380000}}, "available": {GPU: 20000, node:__internal_head__: 10000, memory: 844966629380000, accelerator_type:A40: 10000, object_store_memory: 21474836480000, node:192.168.0.2: 10000, CPU: 200000}}, "labels":{"ray.io/node_id":"94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} [state-dump] Waiting tasks size: 0 [state-dump] Number of executing tasks: 0 [state-dump] Number of pinned task arguments: 0 [state-dump] Number of total spilled tasks: 0 [state-dump] Number of spilled waiting tasks: 0 [state-dump] Number of spilled unschedulable tasks: 0 [state-dump] Resource usage { [state-dump] } [state-dump] Backlog Size per scheduling descriptor :{workerId: num backlogs}: [state-dump] [state-dump] Running tasks by scheduling class: [state-dump] ================================================== [state-dump] [state-dump] ClusterResources: [state-dump] LocalObjectManager: [state-dump] - num pinned objects: 0 [state-dump] - pinned objects size: 0 [state-dump] - num objects pending restore: 0 [state-dump] - num objects pending spill: 0 [state-dump] - num bytes pending spill: 0 [state-dump] - num bytes currently spilled: 0 [state-dump] - cumulative spill requests: 0 [state-dump] - cumulative restore requests: 0 [state-dump] - spilled objects pending delete: 0 [state-dump] [state-dump] ObjectManager: [state-dump] - num local objects: 0 [state-dump] - num unfulfilled push requests: 0 [state-dump] - num object pull requests: 0 [state-dump] - num chunks received total: 0 [state-dump] - num chunks received failed (all): 0 [state-dump] - num chunks received failed / cancelled: 0 [state-dump] - num chunks received failed / plasma error: 0 [state-dump] Event stats: [state-dump] Global stats: 0 total (0 active) [state-dump] Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Execution time: mean = -nan s, total = 0.000 s [state-dump] Event stats: [state-dump] PushManager: [state-dump] - num pushes in flight: 0 [state-dump] - num chunks in flight: 0 [state-dump] - num chunks remaining: 0 [state-dump] - max chunks allowed: 409 [state-dump] OwnershipBasedObjectDirectory: [state-dump] - num listeners: 0 [state-dump] - cumulative location updates: 0 [state-dump] - num location updates per second: 0.000 [state-dump] - num location lookups per second: 0.000 [state-dump] - num locations added per second: 0.000 [state-dump] - num locations removed per second: 0.000 [state-dump] BufferPool: [state-dump] - create buffer state map size: 0 [state-dump] PullManager: [state-dump] - num bytes available for pulled objects: 2147483648 [state-dump] - num bytes being pulled (all): 0 [state-dump] - num bytes being pulled / pinned: 0 [state-dump] - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - first get request bundle: N/A [state-dump] - first wait request bundle: N/A [state-dump] - first task request bundle: N/A [state-dump] - num objects queued: 0 [state-dump] - num objects actively pulled (all): 0 [state-dump] - num objects actively pulled / pinned: 0 [state-dump] - num bundles being pulled: 0 [state-dump] - num pull retries: 0 [state-dump] - max timeout seconds: 0 [state-dump] - max timeout request is already processed. No entry. [state-dump] [state-dump] WorkerPool: [state-dump] - registered jobs: 1 [state-dump] - process_failed_job_config_missing: 0 [state-dump] - process_failed_rate_limited: 0 [state-dump] - process_failed_pending_registration: 0 [state-dump] - process_failed_runtime_env_setup_failed: 0 [state-dump] - num PYTHON workers: 20 [state-dump] - num PYTHON drivers: 1 [state-dump] - num PYTHON pending start requests: 0 [state-dump] - num PYTHON pending registration requests: 0 [state-dump] - num object spill callbacks queued: 0 [state-dump] - num object restore queued: 0 [state-dump] - num util functions queued: 0 [state-dump] - num idle workers: 20 [state-dump] TaskDependencyManager: [state-dump] - task deps map size: 0 [state-dump] - get req map size: 0 [state-dump] - wait req map size: 0 [state-dump] - local objects map size: 0 [state-dump] WaitManager: [state-dump] - num active wait requests: 0 [state-dump] Subscriber: [state-dump] Channel WORKER_OBJECT_LOCATIONS_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_OBJECT_EVICTION [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_REF_REMOVED_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] num async plasma notifications: 0 [state-dump] Remote node managers: [state-dump] Event stats: [state-dump] Global stats: 403231 total (35 active) [state-dump] Queueing time: mean = 233.771 us, max = 59.826 s, min = -0.001 s, total = 94.264 s [state-dump] Execution time: mean = 10.590 ms, total = 4270.188 s [state-dump] Event stats: [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog - 96978 total (0 active), Execution time: mean = 530.614 us, total = 51.458 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 96978 total (0 active), Execution time: mean = 35.308 us, total = 3.424 s, Queueing time: mean = 110.708 us, max = 3.003 ms, min = 1.846 us, total = 10.736 s [state-dump] ObjectManager.UpdateAvailableMemory - 46156 total (0 active), Execution time: mean = 6.160 us, total = 284.319 ms, Queueing time: mean = 110.402 us, max = 3.228 ms, min = 2.228 us, total = 5.096 s [state-dump] NodeManager.CheckGC - 46156 total (1 active), Execution time: mean = 2.896 us, total = 133.667 ms, Queueing time: mean = 98.328 us, max = 25.875 ms, min = -0.000 s, total = 4.538 s [state-dump] RaySyncer.OnDemandBroadcasting - 46156 total (1 active), Execution time: mean = 10.687 us, total = 493.260 ms, Queueing time: mean = 91.439 us, max = 25.869 ms, min = 6.166 us, total = 4.220 s [state-dump] RayletWorkerPool.deadline_timer.kill_idle_workers - 23091 total (1 active), Execution time: mean = 18.589 us, total = 429.236 ms, Queueing time: mean = 76.650 us, max = 26.386 ms, min = -0.001 s, total = 1.770 s [state-dump] MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 18442 total (1 active), Execution time: mean = 456.017 us, total = 8.410 s, Queueing time: mean = 75.378 us, max = 3.532 ms, min = -0.000 s, total = 1.390 s [state-dump] NodeManager.deadline_timer.spill_objects_when_over_threshold - 4620 total (1 active), Execution time: mean = 3.123 us, total = 14.428 ms, Queueing time: mean = 182.550 us, max = 2.946 ms, min = 3.845 us, total = 843.380 ms [state-dump] NodeManager.deadline_timer.flush_free_objects - 4620 total (1 active), Execution time: mean = 9.223 us, total = 42.608 ms, Queueing time: mean = 178.381 us, max = 2.947 ms, min = 3.811 us, total = 824.119 ms [state-dump] NodeManager.ScheduleAndDispatchTasks - 4620 total (1 active), Execution time: mean = 16.836 us, total = 77.784 ms, Queueing time: mean = 74.531 us, max = 2.581 ms, min = 8.735 us, total = 344.334 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 4618 total (0 active), Execution time: mean = 97.166 us, total = 448.711 ms, Queueing time: mean = 112.528 us, max = 2.934 ms, min = 4.027 us, total = 519.653 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad - 4618 total (0 active), Execution time: mean = 611.703 us, total = 2.825 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ClusterResourceManager.ResetRemoteNodeView - 1541 total (1 active), Execution time: mean = 10.064 us, total = 15.509 ms, Queueing time: mean = 75.853 us, max = 442.307 us, min = 14.043 us, total = 116.889 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 924 total (0 active), Execution time: mean = 1.597 ms, total = 1.476 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.deadline_timer.record_metrics - 924 total (1 active), Execution time: mean = 581.580 us, total = 537.380 ms, Queueing time: mean = 345.484 us, max = 2.010 ms, min = 9.115 us, total = 319.228 ms [state-dump] NodeManager.GcsCheckAlive - 924 total (1 active), Execution time: mean = 312.899 us, total = 289.119 ms, Queueing time: mean = 613.726 us, max = 2.567 ms, min = 6.690 us, total = 567.083 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 924 total (0 active), Execution time: mean = 55.326 us, total = 51.121 ms, Queueing time: mean = 112.934 us, max = 4.779 ms, min = 11.561 us, total = 104.351 ms [state-dump] NodeManager.deadline_timer.debug_state_dump - 462 total (1 active), Execution time: mean = 1.780 ms, total = 822.496 ms, Queueing time: mean = 70.764 us, max = 176.385 us, min = 11.609 us, total = 32.693 ms [state-dump] ClientConnection.async_read.ProcessMessageHeader - 95 total (21 active), Execution time: mean = 7.620 us, total = 723.861 us, Queueing time: mean = 660.815 ms, max = 59.826 s, min = 29.537 us, total = 62.777 s [state-dump] NodeManager.deadline_timer.print_event_loop_stats - 77 total (1 active, 1 running), Execution time: mean = 2.725 ms, total = 209.791 ms, Queueing time: mean = 78.939 us, max = 325.100 us, min = 15.835 us, total = 6.078 ms [state-dump] ClientConnection.async_read.ProcessMessage - 74 total (0 active), Execution time: mean = 752.373 us, total = 55.676 ms, Queueing time: mean = 18.898 us, max = 247.615 us, min = 4.453 us, total = 1.398 ms [state-dump] ClientConnection.async_write.DoAsyncWrites - 22 total (0 active), Execution time: mean = 2.129 us, total = 46.834 us, Queueing time: mean = 43.475 us, max = 264.443 us, min = 17.727 us, total = 956.459 us [state-dump] ObjectManager.ObjectDeleted - 21 total (0 active), Execution time: mean = 18.654 us, total = 391.728 us, Queueing time: mean = 66.932 us, max = 269.040 us, min = 24.280 us, total = 1.406 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 21 total (0 active), Execution time: mean = 119.783 us, total = 2.515 ms, Queueing time: mean = 119.709 us, max = 193.975 us, min = 26.529 us, total = 2.514 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig - 21 total (0 active), Execution time: mean = 800.451 us, total = 16.809 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ObjectManager.ObjectAdded - 21 total (0 active), Execution time: mean = 13.522 us, total = 283.961 us, Queueing time: mean = 115.200 us, max = 203.067 us, min = 15.853 us, total = 2.419 ms [state-dump] PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 204.482 us, total = 2.658 ms, Queueing time: mean = 3.109 ms, max = 9.915 ms, min = 28.359 us, total = 40.413 ms [state-dump] - 12 total (0 active), Execution time: mean = 1.109 us, total = 13.308 us, Queueing time: mean = 117.235 us, max = 207.332 us, min = 32.817 us, total = 1.407 ms [state-dump] RaySyncer.BroadcastMessage - 12 total (0 active), Execution time: mean = 229.790 us, total = 2.757 ms, Queueing time: mean = 796.500 ns, max = 1.330 us, min = 180.000 ns, total = 9.558 us [state-dump] NodeManagerService.grpc_server.RequestWorkerLease.HandleRequestImpl - 10 total (0 active), Execution time: mean = 133.400 us, total = 1.334 ms, Queueing time: mean = 133.654 us, max = 258.016 us, min = 50.206 us, total = 1.337 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker - 10 total (0 active), Execution time: mean = 698.937 us, total = 6.989 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.RequestWorkerLease - 10 total (0 active), Execution time: mean = 967.372 us, total = 9.674 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] WorkerPool.PopWorkerCallback - 10 total (0 active), Execution time: mean = 24.488 us, total = 244.878 us, Queueing time: mean = 114.997 us, max = 206.869 us, min = 23.653 us, total = 1.150 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker.HandleRequestImpl - 10 total (0 active), Execution time: mean = 129.906 us, total = 1.299 ms, Queueing time: mean = 109.463 us, max = 137.712 us, min = 31.332 us, total = 1.095 ms [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 9 total (1 active), Execution time: mean = 466.401 s, total = 4197.607 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 8 total (0 active), Execution time: mean = 381.115 us, total = 3.049 ms, Queueing time: mean = 142.114 us, max = 243.371 us, min = 20.299 us, total = 1.137 ms [state-dump] NodeManager.GCTaskFailureReason - 6 total (1 active), Execution time: mean = 7.110 us, total = 42.659 us, Queueing time: mean = 57.967 us, max = 97.290 us, min = 24.344 us, total = 347.804 us [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 1.320 ms, total = 2.640 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 142.119 us, total = 284.237 us, Queueing time: mean = 702.200 us, max = 1.287 ms, min = 117.240 us, total = 1.404 ms [state-dump] RaySyncerRegister - 2 total (0 active), Execution time: mean = 2.053 us, total = 4.106 us, Queueing time: mean = 222.000 ns, max = 369.000 ns, min = 75.000 ns, total = 444.000 ns [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 2.178 ms, total = 2.178 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 1.022 s, total = 1.022 s, Queueing time: mean = 126.779 us, max = 126.779 us, min = 126.779 us, total = 126.779 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 43.810 us, total = 43.810 us, Queueing time: mean = 116.080 us, max = 116.080 us, min = 116.080 us, total = 116.080 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 1.935 ms, total = 1.935 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 2.002 ms, total = 2.002 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 1.278 ms, total = 1.278 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 288.995 us, total = 288.995 us, Queueing time: mean = 122.868 us, max = 122.868 us, min = 122.868 us, total = 122.868 us [state-dump] Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 74.750 us, total = 74.750 us, Queueing time: mean = 225.429 us, max = 225.429 us, min = 225.429 us, total = 225.429 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 242.297 us, total = 242.297 us, Queueing time: mean = 122.977 us, max = 122.977 us, min = 122.977 us, total = 122.977 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 92.080 us, total = 92.080 us, Queueing time: mean = 36.492 us, max = 36.492 us, min = 36.492 us, total = 36.492 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 1.746 ms, total = 1.746 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] DebugString() time ms: 1 [state-dump] [state-dump] [2025-01-21 00:19:54,927 I 13636 13664] (raylet) store.cc:564: Plasma store debug dump: Current usage: 0 / 2.14748 GB - num bytes created total: 168 0 pending objects of total size 0MB - objects spillable: 0 - bytes spillable: 0 - objects unsealed: 0 - bytes unsealed: 0 - objects in use: 0 - bytes in use: 0 - objects evictable: 0 - bytes evictable: 0 - objects created by worker: 0 - bytes created by worker: 0 - objects restored: 0 - bytes restored: 0 - objects received: 0 - bytes received: 0 - objects errored: 0 - bytes errored: 0 [2025-01-21 00:19:56,146 I 13636 13636] (raylet) node_manager.cc:525: [state-dump] NodeManager: [state-dump] Node ID: 94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3 [state-dump] Node name: 192.168.0.2 [state-dump] InitialConfigResources: {object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, GPU: 20000, node:__internal_head__: 10000, accelerator_type:A40: 10000, memory: 844966629380000} [state-dump] ClusterTaskManager: [state-dump] ========== Node: 94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3 ================= [state-dump] Infeasible queue length: 0 [state-dump] Schedule queue length: 0 [state-dump] Dispatch queue length: 0 [state-dump] num_waiting_for_resource: 0 [state-dump] num_waiting_for_plasma_memory: 0 [state-dump] num_waiting_for_remote_node_resources: 0 [state-dump] num_worker_not_started_by_job_config_not_exist: 0 [state-dump] num_worker_not_started_by_registration_timeout: 0 [state-dump] num_tasks_waiting_for_workers: 0 [state-dump] num_cancelled_tasks: 0 [state-dump] cluster_resource_scheduler state: [state-dump] Local id: -2729954820020547849 Local resources: {"total":{CPU: [200000], memory: [844966629380000], accelerator_type:A40: [10000], node:192.168.0.2: [10000], node:__internal_head__: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000]}}, "available": {CPU: [200000], memory: [844966629380000], accelerator_type:A40: [10000], node:192.168.0.2: [10000], node:__internal_head__: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000]}}, "labels":{"ray.io/node_id":"94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3",} is_draining: 0 is_idle: 1 Cluster resources: node id: -2729954820020547849{"total":{object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, GPU: 20000, node:__internal_head__: 10000, accelerator_type:A40: 10000, memory: 844966629380000}}, "available": {GPU: 20000, node:__internal_head__: 10000, memory: 844966629380000, accelerator_type:A40: 10000, object_store_memory: 21474836480000, node:192.168.0.2: 10000, CPU: 200000}}, "labels":{"ray.io/node_id":"94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} [state-dump] Waiting tasks size: 0 [state-dump] Number of executing tasks: 0 [state-dump] Number of pinned task arguments: 0 [state-dump] Number of total spilled tasks: 0 [state-dump] Number of spilled waiting tasks: 0 [state-dump] Number of spilled unschedulable tasks: 0 [state-dump] Resource usage { [state-dump] } [state-dump] Backlog Size per scheduling descriptor :{workerId: num backlogs}: [state-dump] [state-dump] Running tasks by scheduling class: [state-dump] ================================================== [state-dump] [state-dump] ClusterResources: [state-dump] LocalObjectManager: [state-dump] - num pinned objects: 0 [state-dump] - pinned objects size: 0 [state-dump] - num objects pending restore: 0 [state-dump] - num objects pending spill: 0 [state-dump] - num bytes pending spill: 0 [state-dump] - num bytes currently spilled: 0 [state-dump] - cumulative spill requests: 0 [state-dump] - cumulative restore requests: 0 [state-dump] - spilled objects pending delete: 0 [state-dump] [state-dump] ObjectManager: [state-dump] - num local objects: 0 [state-dump] - num unfulfilled push requests: 0 [state-dump] - num object pull requests: 0 [state-dump] - num chunks received total: 0 [state-dump] - num chunks received failed (all): 0 [state-dump] - num chunks received failed / cancelled: 0 [state-dump] - num chunks received failed / plasma error: 0 [state-dump] Event stats: [state-dump] Global stats: 0 total (0 active) [state-dump] Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Execution time: mean = -nan s, total = 0.000 s [state-dump] Event stats: [state-dump] PushManager: [state-dump] - num pushes in flight: 0 [state-dump] - num chunks in flight: 0 [state-dump] - num chunks remaining: 0 [state-dump] - max chunks allowed: 409 [state-dump] OwnershipBasedObjectDirectory: [state-dump] - num listeners: 0 [state-dump] - cumulative location updates: 0 [state-dump] - num location updates per second: 0.000 [state-dump] - num location lookups per second: 0.000 [state-dump] - num locations added per second: 0.000 [state-dump] - num locations removed per second: 0.000 [state-dump] BufferPool: [state-dump] - create buffer state map size: 0 [state-dump] PullManager: [state-dump] - num bytes available for pulled objects: 2147483648 [state-dump] - num bytes being pulled (all): 0 [state-dump] - num bytes being pulled / pinned: 0 [state-dump] - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - first get request bundle: N/A [state-dump] - first wait request bundle: N/A [state-dump] - first task request bundle: N/A [state-dump] - num objects queued: 0 [state-dump] - num objects actively pulled (all): 0 [state-dump] - num objects actively pulled / pinned: 0 [state-dump] - num bundles being pulled: 0 [state-dump] - num pull retries: 0 [state-dump] - max timeout seconds: 0 [state-dump] - max timeout request is already processed. No entry. [state-dump] [state-dump] WorkerPool: [state-dump] - registered jobs: 1 [state-dump] - process_failed_job_config_missing: 0 [state-dump] - process_failed_rate_limited: 0 [state-dump] - process_failed_pending_registration: 0 [state-dump] - process_failed_runtime_env_setup_failed: 0 [state-dump] - num PYTHON workers: 20 [state-dump] - num PYTHON drivers: 1 [state-dump] - num PYTHON pending start requests: 0 [state-dump] - num PYTHON pending registration requests: 0 [state-dump] - num object spill callbacks queued: 0 [state-dump] - num object restore queued: 0 [state-dump] - num util functions queued: 0 [state-dump] - num idle workers: 20 [state-dump] TaskDependencyManager: [state-dump] - task deps map size: 0 [state-dump] - get req map size: 0 [state-dump] - wait req map size: 0 [state-dump] - local objects map size: 0 [state-dump] WaitManager: [state-dump] - num active wait requests: 0 [state-dump] Subscriber: [state-dump] Channel WORKER_OBJECT_LOCATIONS_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_OBJECT_EVICTION [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_REF_REMOVED_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] num async plasma notifications: 0 [state-dump] Remote node managers: [state-dump] Event stats: [state-dump] Global stats: 408466 total (35 active) [state-dump] Queueing time: mean = 231.845 us, max = 59.826 s, min = -0.001 s, total = 94.701 s [state-dump] Execution time: mean = 10.457 ms, total = 4271.163 s [state-dump] Event stats: [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog - 98238 total (0 active), Execution time: mean = 530.931 us, total = 52.158 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 98238 total (0 active), Execution time: mean = 35.371 us, total = 3.475 s, Queueing time: mean = 110.810 us, max = 3.003 ms, min = 1.846 us, total = 10.886 s [state-dump] ObjectManager.UpdateAvailableMemory - 46756 total (0 active), Execution time: mean = 6.172 us, total = 288.570 ms, Queueing time: mean = 110.544 us, max = 3.228 ms, min = 2.228 us, total = 5.169 s [state-dump] NodeManager.CheckGC - 46756 total (1 active), Execution time: mean = 2.899 us, total = 135.562 ms, Queueing time: mean = 98.451 us, max = 25.875 ms, min = -0.000 s, total = 4.603 s [state-dump] RaySyncer.OnDemandBroadcasting - 46756 total (1 active), Execution time: mean = 10.709 us, total = 500.693 ms, Queueing time: mean = 91.546 us, max = 25.869 ms, min = 6.166 us, total = 4.280 s [state-dump] RayletWorkerPool.deadline_timer.kill_idle_workers - 23391 total (1 active), Execution time: mean = 18.644 us, total = 436.096 ms, Queueing time: mean = 76.725 us, max = 26.386 ms, min = -0.001 s, total = 1.795 s [state-dump] MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 18682 total (1 active), Execution time: mean = 456.246 us, total = 8.524 s, Queueing time: mean = 75.445 us, max = 3.532 ms, min = -0.000 s, total = 1.409 s [state-dump] NodeManager.deadline_timer.spill_objects_when_over_threshold - 4680 total (1 active), Execution time: mean = 3.126 us, total = 14.630 ms, Queueing time: mean = 182.471 us, max = 2.946 ms, min = 3.845 us, total = 853.963 ms [state-dump] NodeManager.deadline_timer.flush_free_objects - 4680 total (1 active), Execution time: mean = 9.245 us, total = 43.266 ms, Queueing time: mean = 178.290 us, max = 2.947 ms, min = 3.811 us, total = 834.395 ms [state-dump] NodeManager.ScheduleAndDispatchTasks - 4680 total (1 active), Execution time: mean = 16.852 us, total = 78.867 ms, Queueing time: mean = 74.546 us, max = 2.581 ms, min = 8.735 us, total = 348.877 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 4678 total (0 active), Execution time: mean = 97.181 us, total = 454.611 ms, Queueing time: mean = 112.562 us, max = 2.934 ms, min = 4.027 us, total = 526.566 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad - 4678 total (0 active), Execution time: mean = 611.597 us, total = 2.861 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ClusterResourceManager.ResetRemoteNodeView - 1561 total (1 active), Execution time: mean = 10.089 us, total = 15.748 ms, Queueing time: mean = 75.891 us, max = 442.307 us, min = 14.043 us, total = 118.466 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 936 total (0 active), Execution time: mean = 1.598 ms, total = 1.496 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.deadline_timer.record_metrics - 936 total (1 active), Execution time: mean = 582.233 us, total = 544.970 ms, Queueing time: mean = 344.479 us, max = 2.010 ms, min = 9.115 us, total = 322.432 ms [state-dump] NodeManager.GcsCheckAlive - 936 total (1 active), Execution time: mean = 313.314 us, total = 293.262 ms, Queueing time: mean = 612.953 us, max = 2.567 ms, min = 6.690 us, total = 573.724 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 936 total (0 active), Execution time: mean = 55.439 us, total = 51.891 ms, Queueing time: mean = 113.023 us, max = 4.779 ms, min = 11.561 us, total = 105.790 ms [state-dump] NodeManager.deadline_timer.debug_state_dump - 468 total (1 active), Execution time: mean = 1.779 ms, total = 832.733 ms, Queueing time: mean = 70.876 us, max = 176.385 us, min = 11.609 us, total = 33.170 ms [state-dump] ClientConnection.async_read.ProcessMessageHeader - 95 total (21 active), Execution time: mean = 7.620 us, total = 723.861 us, Queueing time: mean = 660.815 ms, max = 59.826 s, min = 29.537 us, total = 62.777 s [state-dump] NodeManager.deadline_timer.print_event_loop_stats - 78 total (1 active, 1 running), Execution time: mean = 2.727 ms, total = 212.676 ms, Queueing time: mean = 79.088 us, max = 325.100 us, min = 15.835 us, total = 6.169 ms [state-dump] ClientConnection.async_read.ProcessMessage - 74 total (0 active), Execution time: mean = 752.373 us, total = 55.676 ms, Queueing time: mean = 18.898 us, max = 247.615 us, min = 4.453 us, total = 1.398 ms [state-dump] ClientConnection.async_write.DoAsyncWrites - 22 total (0 active), Execution time: mean = 2.129 us, total = 46.834 us, Queueing time: mean = 43.475 us, max = 264.443 us, min = 17.727 us, total = 956.459 us [state-dump] ObjectManager.ObjectDeleted - 21 total (0 active), Execution time: mean = 18.654 us, total = 391.728 us, Queueing time: mean = 66.932 us, max = 269.040 us, min = 24.280 us, total = 1.406 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 21 total (0 active), Execution time: mean = 119.783 us, total = 2.515 ms, Queueing time: mean = 119.709 us, max = 193.975 us, min = 26.529 us, total = 2.514 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig - 21 total (0 active), Execution time: mean = 800.451 us, total = 16.809 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ObjectManager.ObjectAdded - 21 total (0 active), Execution time: mean = 13.522 us, total = 283.961 us, Queueing time: mean = 115.200 us, max = 203.067 us, min = 15.853 us, total = 2.419 ms [state-dump] PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 204.482 us, total = 2.658 ms, Queueing time: mean = 3.109 ms, max = 9.915 ms, min = 28.359 us, total = 40.413 ms [state-dump] - 12 total (0 active), Execution time: mean = 1.109 us, total = 13.308 us, Queueing time: mean = 117.235 us, max = 207.332 us, min = 32.817 us, total = 1.407 ms [state-dump] RaySyncer.BroadcastMessage - 12 total (0 active), Execution time: mean = 229.790 us, total = 2.757 ms, Queueing time: mean = 796.500 ns, max = 1.330 us, min = 180.000 ns, total = 9.558 us [state-dump] NodeManagerService.grpc_server.RequestWorkerLease.HandleRequestImpl - 10 total (0 active), Execution time: mean = 133.400 us, total = 1.334 ms, Queueing time: mean = 133.654 us, max = 258.016 us, min = 50.206 us, total = 1.337 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker - 10 total (0 active), Execution time: mean = 698.937 us, total = 6.989 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.RequestWorkerLease - 10 total (0 active), Execution time: mean = 967.372 us, total = 9.674 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] WorkerPool.PopWorkerCallback - 10 total (0 active), Execution time: mean = 24.488 us, total = 244.878 us, Queueing time: mean = 114.997 us, max = 206.869 us, min = 23.653 us, total = 1.150 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker.HandleRequestImpl - 10 total (0 active), Execution time: mean = 129.906 us, total = 1.299 ms, Queueing time: mean = 109.463 us, max = 137.712 us, min = 31.332 us, total = 1.095 ms [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 9 total (1 active), Execution time: mean = 466.401 s, total = 4197.607 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 8 total (0 active), Execution time: mean = 381.115 us, total = 3.049 ms, Queueing time: mean = 142.114 us, max = 243.371 us, min = 20.299 us, total = 1.137 ms [state-dump] NodeManager.GCTaskFailureReason - 6 total (1 active), Execution time: mean = 7.110 us, total = 42.659 us, Queueing time: mean = 57.967 us, max = 97.290 us, min = 24.344 us, total = 347.804 us [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 1.320 ms, total = 2.640 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 142.119 us, total = 284.237 us, Queueing time: mean = 702.200 us, max = 1.287 ms, min = 117.240 us, total = 1.404 ms [state-dump] RaySyncerRegister - 2 total (0 active), Execution time: mean = 2.053 us, total = 4.106 us, Queueing time: mean = 222.000 ns, max = 369.000 ns, min = 75.000 ns, total = 444.000 ns [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 2.178 ms, total = 2.178 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 1.022 s, total = 1.022 s, Queueing time: mean = 126.779 us, max = 126.779 us, min = 126.779 us, total = 126.779 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 43.810 us, total = 43.810 us, Queueing time: mean = 116.080 us, max = 116.080 us, min = 116.080 us, total = 116.080 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 1.935 ms, total = 1.935 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 2.002 ms, total = 2.002 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 1.278 ms, total = 1.278 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 288.995 us, total = 288.995 us, Queueing time: mean = 122.868 us, max = 122.868 us, min = 122.868 us, total = 122.868 us [state-dump] Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 74.750 us, total = 74.750 us, Queueing time: mean = 225.429 us, max = 225.429 us, min = 225.429 us, total = 225.429 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 242.297 us, total = 242.297 us, Queueing time: mean = 122.977 us, max = 122.977 us, min = 122.977 us, total = 122.977 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 92.080 us, total = 92.080 us, Queueing time: mean = 36.492 us, max = 36.492 us, min = 36.492 us, total = 36.492 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 1.746 ms, total = 1.746 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] DebugString() time ms: 1 [state-dump] [state-dump] [2025-01-21 00:20:54,927 I 13636 13664] (raylet) store.cc:564: Plasma store debug dump: Current usage: 0 / 2.14748 GB - num bytes created total: 168 0 pending objects of total size 0MB - objects spillable: 0 - bytes spillable: 0 - objects unsealed: 0 - bytes unsealed: 0 - objects in use: 0 - bytes in use: 0 - objects evictable: 0 - bytes evictable: 0 - objects created by worker: 0 - bytes created by worker: 0 - objects restored: 0 - bytes restored: 0 - objects received: 0 - bytes received: 0 - objects errored: 0 - bytes errored: 0 [2025-01-21 00:20:56,149 I 13636 13636] (raylet) node_manager.cc:525: [state-dump] NodeManager: [state-dump] Node ID: 94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3 [state-dump] Node name: 192.168.0.2 [state-dump] InitialConfigResources: {object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, GPU: 20000, node:__internal_head__: 10000, accelerator_type:A40: 10000, memory: 844966629380000} [state-dump] ClusterTaskManager: [state-dump] ========== Node: 94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3 ================= [state-dump] Infeasible queue length: 0 [state-dump] Schedule queue length: 0 [state-dump] Dispatch queue length: 0 [state-dump] num_waiting_for_resource: 0 [state-dump] num_waiting_for_plasma_memory: 0 [state-dump] num_waiting_for_remote_node_resources: 0 [state-dump] num_worker_not_started_by_job_config_not_exist: 0 [state-dump] num_worker_not_started_by_registration_timeout: 0 [state-dump] num_tasks_waiting_for_workers: 0 [state-dump] num_cancelled_tasks: 0 [state-dump] cluster_resource_scheduler state: [state-dump] Local id: -2729954820020547849 Local resources: {"total":{CPU: [200000], memory: [844966629380000], accelerator_type:A40: [10000], node:192.168.0.2: [10000], node:__internal_head__: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000]}}, "available": {CPU: [200000], memory: [844966629380000], accelerator_type:A40: [10000], node:192.168.0.2: [10000], node:__internal_head__: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000]}}, "labels":{"ray.io/node_id":"94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3",} is_draining: 0 is_idle: 1 Cluster resources: node id: -2729954820020547849{"total":{object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, GPU: 20000, node:__internal_head__: 10000, accelerator_type:A40: 10000, memory: 844966629380000}}, "available": {GPU: 20000, node:__internal_head__: 10000, memory: 844966629380000, accelerator_type:A40: 10000, object_store_memory: 21474836480000, node:192.168.0.2: 10000, CPU: 200000}}, "labels":{"ray.io/node_id":"94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} [state-dump] Waiting tasks size: 0 [state-dump] Number of executing tasks: 0 [state-dump] Number of pinned task arguments: 0 [state-dump] Number of total spilled tasks: 0 [state-dump] Number of spilled waiting tasks: 0 [state-dump] Number of spilled unschedulable tasks: 0 [state-dump] Resource usage { [state-dump] } [state-dump] Backlog Size per scheduling descriptor :{workerId: num backlogs}: [state-dump] [state-dump] Running tasks by scheduling class: [state-dump] ================================================== [state-dump] [state-dump] ClusterResources: [state-dump] LocalObjectManager: [state-dump] - num pinned objects: 0 [state-dump] - pinned objects size: 0 [state-dump] - num objects pending restore: 0 [state-dump] - num objects pending spill: 0 [state-dump] - num bytes pending spill: 0 [state-dump] - num bytes currently spilled: 0 [state-dump] - cumulative spill requests: 0 [state-dump] - cumulative restore requests: 0 [state-dump] - spilled objects pending delete: 0 [state-dump] [state-dump] ObjectManager: [state-dump] - num local objects: 0 [state-dump] - num unfulfilled push requests: 0 [state-dump] - num object pull requests: 0 [state-dump] - num chunks received total: 0 [state-dump] - num chunks received failed (all): 0 [state-dump] - num chunks received failed / cancelled: 0 [state-dump] - num chunks received failed / plasma error: 0 [state-dump] Event stats: [state-dump] Global stats: 0 total (0 active) [state-dump] Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Execution time: mean = -nan s, total = 0.000 s [state-dump] Event stats: [state-dump] PushManager: [state-dump] - num pushes in flight: 0 [state-dump] - num chunks in flight: 0 [state-dump] - num chunks remaining: 0 [state-dump] - max chunks allowed: 409 [state-dump] OwnershipBasedObjectDirectory: [state-dump] - num listeners: 0 [state-dump] - cumulative location updates: 0 [state-dump] - num location updates per second: 0.000 [state-dump] - num location lookups per second: 0.000 [state-dump] - num locations added per second: 0.000 [state-dump] - num locations removed per second: 0.000 [state-dump] BufferPool: [state-dump] - create buffer state map size: 0 [state-dump] PullManager: [state-dump] - num bytes available for pulled objects: 2147483648 [state-dump] - num bytes being pulled (all): 0 [state-dump] - num bytes being pulled / pinned: 0 [state-dump] - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - first get request bundle: N/A [state-dump] - first wait request bundle: N/A [state-dump] - first task request bundle: N/A [state-dump] - num objects queued: 0 [state-dump] - num objects actively pulled (all): 0 [state-dump] - num objects actively pulled / pinned: 0 [state-dump] - num bundles being pulled: 0 [state-dump] - num pull retries: 0 [state-dump] - max timeout seconds: 0 [state-dump] - max timeout request is already processed. No entry. [state-dump] [state-dump] WorkerPool: [state-dump] - registered jobs: 1 [state-dump] - process_failed_job_config_missing: 0 [state-dump] - process_failed_rate_limited: 0 [state-dump] - process_failed_pending_registration: 0 [state-dump] - process_failed_runtime_env_setup_failed: 0 [state-dump] - num PYTHON workers: 20 [state-dump] - num PYTHON drivers: 1 [state-dump] - num PYTHON pending start requests: 0 [state-dump] - num PYTHON pending registration requests: 0 [state-dump] - num object spill callbacks queued: 0 [state-dump] - num object restore queued: 0 [state-dump] - num util functions queued: 0 [state-dump] - num idle workers: 20 [state-dump] TaskDependencyManager: [state-dump] - task deps map size: 0 [state-dump] - get req map size: 0 [state-dump] - wait req map size: 0 [state-dump] - local objects map size: 0 [state-dump] WaitManager: [state-dump] - num active wait requests: 0 [state-dump] Subscriber: [state-dump] Channel WORKER_OBJECT_LOCATIONS_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_OBJECT_EVICTION [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_REF_REMOVED_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] num async plasma notifications: 0 [state-dump] Remote node managers: [state-dump] Event stats: [state-dump] Global stats: 413697 total (35 active) [state-dump] Queueing time: mean = 229.945 us, max = 59.826 s, min = -0.001 s, total = 95.128 s [state-dump] Execution time: mean = 10.327 ms, total = 4272.123 s [state-dump] Event stats: [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog - 99498 total (0 active), Execution time: mean = 531.161 us, total = 52.849 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 99498 total (0 active), Execution time: mean = 35.374 us, total = 3.520 s, Queueing time: mean = 110.824 us, max = 3.003 ms, min = 1.846 us, total = 11.027 s [state-dump] ObjectManager.UpdateAvailableMemory - 47355 total (0 active), Execution time: mean = 6.180 us, total = 292.669 ms, Queueing time: mean = 110.614 us, max = 3.228 ms, min = 2.228 us, total = 5.238 s [state-dump] NodeManager.CheckGC - 47355 total (1 active), Execution time: mean = 2.903 us, total = 137.452 ms, Queueing time: mean = 98.519 us, max = 25.875 ms, min = -0.000 s, total = 4.665 s [state-dump] RaySyncer.OnDemandBroadcasting - 47355 total (1 active), Execution time: mean = 10.727 us, total = 507.974 ms, Queueing time: mean = 91.599 us, max = 25.869 ms, min = 6.166 us, total = 4.338 s [state-dump] RayletWorkerPool.deadline_timer.kill_idle_workers - 23691 total (1 active), Execution time: mean = 18.650 us, total = 441.849 ms, Queueing time: mean = 76.689 us, max = 26.386 ms, min = -0.001 s, total = 1.817 s [state-dump] MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 18921 total (1 active), Execution time: mean = 456.305 us, total = 8.634 s, Queueing time: mean = 75.500 us, max = 3.532 ms, min = -0.000 s, total = 1.429 s [state-dump] NodeManager.deadline_timer.spill_objects_when_over_threshold - 4740 total (1 active), Execution time: mean = 3.127 us, total = 14.823 ms, Queueing time: mean = 182.923 us, max = 2.946 ms, min = 3.845 us, total = 867.057 ms [state-dump] NodeManager.deadline_timer.flush_free_objects - 4740 total (1 active), Execution time: mean = 9.256 us, total = 43.875 ms, Queueing time: mean = 178.737 us, max = 2.947 ms, min = 3.811 us, total = 847.214 ms [state-dump] NodeManager.ScheduleAndDispatchTasks - 4740 total (1 active), Execution time: mean = 16.861 us, total = 79.920 ms, Queueing time: mean = 74.989 us, max = 2.581 ms, min = 8.735 us, total = 355.449 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 4738 total (0 active), Execution time: mean = 97.232 us, total = 460.685 ms, Queueing time: mean = 112.536 us, max = 2.934 ms, min = 4.027 us, total = 533.195 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad - 4738 total (0 active), Execution time: mean = 611.724 us, total = 2.898 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ClusterResourceManager.ResetRemoteNodeView - 1581 total (1 active), Execution time: mean = 10.086 us, total = 15.947 ms, Queueing time: mean = 75.943 us, max = 442.307 us, min = 14.043 us, total = 120.066 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 948 total (0 active), Execution time: mean = 1.599 ms, total = 1.516 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.deadline_timer.record_metrics - 948 total (1 active), Execution time: mean = 584.850 us, total = 554.438 ms, Queueing time: mean = 344.218 us, max = 2.010 ms, min = 9.115 us, total = 326.319 ms [state-dump] NodeManager.GcsCheckAlive - 948 total (1 active), Execution time: mean = 313.808 us, total = 297.490 ms, Queueing time: mean = 614.777 us, max = 2.567 ms, min = 6.690 us, total = 582.809 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 948 total (0 active), Execution time: mean = 55.544 us, total = 52.655 ms, Queueing time: mean = 113.028 us, max = 4.779 ms, min = 11.561 us, total = 107.151 ms [state-dump] NodeManager.deadline_timer.debug_state_dump - 474 total (1 active), Execution time: mean = 1.781 ms, total = 844.120 ms, Queueing time: mean = 71.349 us, max = 176.385 us, min = 11.609 us, total = 33.819 ms [state-dump] ClientConnection.async_read.ProcessMessageHeader - 95 total (21 active), Execution time: mean = 7.620 us, total = 723.861 us, Queueing time: mean = 660.815 ms, max = 59.826 s, min = 29.537 us, total = 62.777 s [state-dump] NodeManager.deadline_timer.print_event_loop_stats - 79 total (1 active, 1 running), Execution time: mean = 2.729 ms, total = 215.609 ms, Queueing time: mean = 78.891 us, max = 325.100 us, min = 15.835 us, total = 6.232 ms [state-dump] ClientConnection.async_read.ProcessMessage - 74 total (0 active), Execution time: mean = 752.373 us, total = 55.676 ms, Queueing time: mean = 18.898 us, max = 247.615 us, min = 4.453 us, total = 1.398 ms [state-dump] ClientConnection.async_write.DoAsyncWrites - 22 total (0 active), Execution time: mean = 2.129 us, total = 46.834 us, Queueing time: mean = 43.475 us, max = 264.443 us, min = 17.727 us, total = 956.459 us [state-dump] ObjectManager.ObjectDeleted - 21 total (0 active), Execution time: mean = 18.654 us, total = 391.728 us, Queueing time: mean = 66.932 us, max = 269.040 us, min = 24.280 us, total = 1.406 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 21 total (0 active), Execution time: mean = 119.783 us, total = 2.515 ms, Queueing time: mean = 119.709 us, max = 193.975 us, min = 26.529 us, total = 2.514 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig - 21 total (0 active), Execution time: mean = 800.451 us, total = 16.809 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ObjectManager.ObjectAdded - 21 total (0 active), Execution time: mean = 13.522 us, total = 283.961 us, Queueing time: mean = 115.200 us, max = 203.067 us, min = 15.853 us, total = 2.419 ms [state-dump] PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 204.482 us, total = 2.658 ms, Queueing time: mean = 3.109 ms, max = 9.915 ms, min = 28.359 us, total = 40.413 ms [state-dump] - 12 total (0 active), Execution time: mean = 1.109 us, total = 13.308 us, Queueing time: mean = 117.235 us, max = 207.332 us, min = 32.817 us, total = 1.407 ms [state-dump] RaySyncer.BroadcastMessage - 12 total (0 active), Execution time: mean = 229.790 us, total = 2.757 ms, Queueing time: mean = 796.500 ns, max = 1.330 us, min = 180.000 ns, total = 9.558 us [state-dump] NodeManagerService.grpc_server.RequestWorkerLease.HandleRequestImpl - 10 total (0 active), Execution time: mean = 133.400 us, total = 1.334 ms, Queueing time: mean = 133.654 us, max = 258.016 us, min = 50.206 us, total = 1.337 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker - 10 total (0 active), Execution time: mean = 698.937 us, total = 6.989 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.RequestWorkerLease - 10 total (0 active), Execution time: mean = 967.372 us, total = 9.674 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] WorkerPool.PopWorkerCallback - 10 total (0 active), Execution time: mean = 24.488 us, total = 244.878 us, Queueing time: mean = 114.997 us, max = 206.869 us, min = 23.653 us, total = 1.150 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker.HandleRequestImpl - 10 total (0 active), Execution time: mean = 129.906 us, total = 1.299 ms, Queueing time: mean = 109.463 us, max = 137.712 us, min = 31.332 us, total = 1.095 ms [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 9 total (1 active), Execution time: mean = 466.401 s, total = 4197.607 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 8 total (0 active), Execution time: mean = 381.115 us, total = 3.049 ms, Queueing time: mean = 142.114 us, max = 243.371 us, min = 20.299 us, total = 1.137 ms [state-dump] NodeManager.GCTaskFailureReason - 6 total (1 active), Execution time: mean = 7.110 us, total = 42.659 us, Queueing time: mean = 57.967 us, max = 97.290 us, min = 24.344 us, total = 347.804 us [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 1.320 ms, total = 2.640 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 142.119 us, total = 284.237 us, Queueing time: mean = 702.200 us, max = 1.287 ms, min = 117.240 us, total = 1.404 ms [state-dump] RaySyncerRegister - 2 total (0 active), Execution time: mean = 2.053 us, total = 4.106 us, Queueing time: mean = 222.000 ns, max = 369.000 ns, min = 75.000 ns, total = 444.000 ns [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 2.178 ms, total = 2.178 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 1.022 s, total = 1.022 s, Queueing time: mean = 126.779 us, max = 126.779 us, min = 126.779 us, total = 126.779 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 43.810 us, total = 43.810 us, Queueing time: mean = 116.080 us, max = 116.080 us, min = 116.080 us, total = 116.080 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 1.935 ms, total = 1.935 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 2.002 ms, total = 2.002 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 1.278 ms, total = 1.278 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 288.995 us, total = 288.995 us, Queueing time: mean = 122.868 us, max = 122.868 us, min = 122.868 us, total = 122.868 us [state-dump] Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 74.750 us, total = 74.750 us, Queueing time: mean = 225.429 us, max = 225.429 us, min = 225.429 us, total = 225.429 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 242.297 us, total = 242.297 us, Queueing time: mean = 122.977 us, max = 122.977 us, min = 122.977 us, total = 122.977 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 92.080 us, total = 92.080 us, Queueing time: mean = 36.492 us, max = 36.492 us, min = 36.492 us, total = 36.492 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 1.746 ms, total = 1.746 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] DebugString() time ms: 1 [state-dump] [state-dump] [2025-01-21 00:21:54,927 I 13636 13664] (raylet) store.cc:564: Plasma store debug dump: Current usage: 0 / 2.14748 GB - num bytes created total: 168 0 pending objects of total size 0MB - objects spillable: 0 - bytes spillable: 0 - objects unsealed: 0 - bytes unsealed: 0 - objects in use: 0 - bytes in use: 0 - objects evictable: 0 - bytes evictable: 0 - objects created by worker: 0 - bytes created by worker: 0 - objects restored: 0 - bytes restored: 0 - objects received: 0 - bytes received: 0 - objects errored: 0 - bytes errored: 0 [2025-01-21 00:21:56,152 I 13636 13636] (raylet) node_manager.cc:525: [state-dump] NodeManager: [state-dump] Node ID: 94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3 [state-dump] Node name: 192.168.0.2 [state-dump] InitialConfigResources: {object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, GPU: 20000, node:__internal_head__: 10000, accelerator_type:A40: 10000, memory: 844966629380000} [state-dump] ClusterTaskManager: [state-dump] ========== Node: 94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3 ================= [state-dump] Infeasible queue length: 0 [state-dump] Schedule queue length: 0 [state-dump] Dispatch queue length: 0 [state-dump] num_waiting_for_resource: 0 [state-dump] num_waiting_for_plasma_memory: 0 [state-dump] num_waiting_for_remote_node_resources: 0 [state-dump] num_worker_not_started_by_job_config_not_exist: 0 [state-dump] num_worker_not_started_by_registration_timeout: 0 [state-dump] num_tasks_waiting_for_workers: 0 [state-dump] num_cancelled_tasks: 0 [state-dump] cluster_resource_scheduler state: [state-dump] Local id: -2729954820020547849 Local resources: {"total":{CPU: [200000], memory: [844966629380000], accelerator_type:A40: [10000], node:192.168.0.2: [10000], node:__internal_head__: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000]}}, "available": {CPU: [200000], memory: [844966629380000], accelerator_type:A40: [10000], node:192.168.0.2: [10000], node:__internal_head__: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000]}}, "labels":{"ray.io/node_id":"94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3",} is_draining: 0 is_idle: 1 Cluster resources: node id: -2729954820020547849{"total":{object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, GPU: 20000, node:__internal_head__: 10000, accelerator_type:A40: 10000, memory: 844966629380000}}, "available": {GPU: 20000, node:__internal_head__: 10000, memory: 844966629380000, accelerator_type:A40: 10000, object_store_memory: 21474836480000, node:192.168.0.2: 10000, CPU: 200000}}, "labels":{"ray.io/node_id":"94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} [state-dump] Waiting tasks size: 0 [state-dump] Number of executing tasks: 0 [state-dump] Number of pinned task arguments: 0 [state-dump] Number of total spilled tasks: 0 [state-dump] Number of spilled waiting tasks: 0 [state-dump] Number of spilled unschedulable tasks: 0 [state-dump] Resource usage { [state-dump] } [state-dump] Backlog Size per scheduling descriptor :{workerId: num backlogs}: [state-dump] [state-dump] Running tasks by scheduling class: [state-dump] ================================================== [state-dump] [state-dump] ClusterResources: [state-dump] LocalObjectManager: [state-dump] - num pinned objects: 0 [state-dump] - pinned objects size: 0 [state-dump] - num objects pending restore: 0 [state-dump] - num objects pending spill: 0 [state-dump] - num bytes pending spill: 0 [state-dump] - num bytes currently spilled: 0 [state-dump] - cumulative spill requests: 0 [state-dump] - cumulative restore requests: 0 [state-dump] - spilled objects pending delete: 0 [state-dump] [state-dump] ObjectManager: [state-dump] - num local objects: 0 [state-dump] - num unfulfilled push requests: 0 [state-dump] - num object pull requests: 0 [state-dump] - num chunks received total: 0 [state-dump] - num chunks received failed (all): 0 [state-dump] - num chunks received failed / cancelled: 0 [state-dump] - num chunks received failed / plasma error: 0 [state-dump] Event stats: [state-dump] Global stats: 0 total (0 active) [state-dump] Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Execution time: mean = -nan s, total = 0.000 s [state-dump] Event stats: [state-dump] PushManager: [state-dump] - num pushes in flight: 0 [state-dump] - num chunks in flight: 0 [state-dump] - num chunks remaining: 0 [state-dump] - max chunks allowed: 409 [state-dump] OwnershipBasedObjectDirectory: [state-dump] - num listeners: 0 [state-dump] - cumulative location updates: 0 [state-dump] - num location updates per second: 0.000 [state-dump] - num location lookups per second: 0.000 [state-dump] - num locations added per second: 0.000 [state-dump] - num locations removed per second: 0.000 [state-dump] BufferPool: [state-dump] - create buffer state map size: 0 [state-dump] PullManager: [state-dump] - num bytes available for pulled objects: 2147483648 [state-dump] - num bytes being pulled (all): 0 [state-dump] - num bytes being pulled / pinned: 0 [state-dump] - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - first get request bundle: N/A [state-dump] - first wait request bundle: N/A [state-dump] - first task request bundle: N/A [state-dump] - num objects queued: 0 [state-dump] - num objects actively pulled (all): 0 [state-dump] - num objects actively pulled / pinned: 0 [state-dump] - num bundles being pulled: 0 [state-dump] - num pull retries: 0 [state-dump] - max timeout seconds: 0 [state-dump] - max timeout request is already processed. No entry. [state-dump] [state-dump] WorkerPool: [state-dump] - registered jobs: 1 [state-dump] - process_failed_job_config_missing: 0 [state-dump] - process_failed_rate_limited: 0 [state-dump] - process_failed_pending_registration: 0 [state-dump] - process_failed_runtime_env_setup_failed: 0 [state-dump] - num PYTHON workers: 20 [state-dump] - num PYTHON drivers: 1 [state-dump] - num PYTHON pending start requests: 0 [state-dump] - num PYTHON pending registration requests: 0 [state-dump] - num object spill callbacks queued: 0 [state-dump] - num object restore queued: 0 [state-dump] - num util functions queued: 0 [state-dump] - num idle workers: 20 [state-dump] TaskDependencyManager: [state-dump] - task deps map size: 0 [state-dump] - get req map size: 0 [state-dump] - wait req map size: 0 [state-dump] - local objects map size: 0 [state-dump] WaitManager: [state-dump] - num active wait requests: 0 [state-dump] Subscriber: [state-dump] Channel WORKER_OBJECT_LOCATIONS_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_OBJECT_EVICTION [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_REF_REMOVED_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] num async plasma notifications: 0 [state-dump] Remote node managers: [state-dump] Event stats: [state-dump] Global stats: 418931 total (35 active) [state-dump] Queueing time: mean = 228.083 us, max = 59.826 s, min = -0.001 s, total = 95.551 s [state-dump] Execution time: mean = 11.632 ms, total = 4873.097 s [state-dump] Event stats: [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog - 100758 total (0 active), Execution time: mean = 531.431 us, total = 53.546 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 100758 total (0 active), Execution time: mean = 35.400 us, total = 3.567 s, Queueing time: mean = 110.901 us, max = 3.003 ms, min = 1.846 us, total = 11.174 s [state-dump] ObjectManager.UpdateAvailableMemory - 47954 total (0 active), Execution time: mean = 6.192 us, total = 296.916 ms, Queueing time: mean = 110.713 us, max = 3.228 ms, min = 2.228 us, total = 5.309 s [state-dump] NodeManager.CheckGC - 47954 total (1 active), Execution time: mean = 2.905 us, total = 139.325 ms, Queueing time: mean = 98.579 us, max = 25.875 ms, min = -0.000 s, total = 4.727 s [state-dump] RaySyncer.OnDemandBroadcasting - 47954 total (1 active), Execution time: mean = 10.747 us, total = 515.360 ms, Queueing time: mean = 91.644 us, max = 25.869 ms, min = 6.166 us, total = 4.395 s [state-dump] RayletWorkerPool.deadline_timer.kill_idle_workers - 23991 total (1 active), Execution time: mean = 18.681 us, total = 448.170 ms, Queueing time: mean = 76.746 us, max = 26.386 ms, min = -0.001 s, total = 1.841 s [state-dump] MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 19161 total (1 active), Execution time: mean = 456.679 us, total = 8.750 s, Queueing time: mean = 75.555 us, max = 3.532 ms, min = -0.000 s, total = 1.448 s [state-dump] NodeManager.deadline_timer.spill_objects_when_over_threshold - 4800 total (1 active), Execution time: mean = 3.131 us, total = 15.029 ms, Queueing time: mean = 182.704 us, max = 2.946 ms, min = 3.845 us, total = 876.980 ms [state-dump] NodeManager.deadline_timer.flush_free_objects - 4800 total (1 active), Execution time: mean = 9.279 us, total = 44.540 ms, Queueing time: mean = 178.508 us, max = 2.947 ms, min = 3.811 us, total = 856.837 ms [state-dump] NodeManager.ScheduleAndDispatchTasks - 4800 total (1 active), Execution time: mean = 16.849 us, total = 80.876 ms, Queueing time: mean = 75.075 us, max = 2.581 ms, min = 8.735 us, total = 360.360 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 4798 total (0 active), Execution time: mean = 97.279 us, total = 466.743 ms, Queueing time: mean = 112.499 us, max = 2.934 ms, min = 4.027 us, total = 539.770 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad - 4798 total (0 active), Execution time: mean = 611.944 us, total = 2.936 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ClusterResourceManager.ResetRemoteNodeView - 1601 total (1 active), Execution time: mean = 10.080 us, total = 16.137 ms, Queueing time: mean = 75.922 us, max = 442.307 us, min = 14.043 us, total = 121.551 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 960 total (0 active), Execution time: mean = 1.600 ms, total = 1.536 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.deadline_timer.record_metrics - 960 total (1 active), Execution time: mean = 585.609 us, total = 562.185 ms, Queueing time: mean = 342.176 us, max = 2.010 ms, min = 9.115 us, total = 328.489 ms [state-dump] NodeManager.GcsCheckAlive - 960 total (1 active), Execution time: mean = 314.665 us, total = 302.079 ms, Queueing time: mean = 612.878 us, max = 2.567 ms, min = 6.690 us, total = 588.363 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 960 total (0 active), Execution time: mean = 55.602 us, total = 53.378 ms, Queueing time: mean = 113.110 us, max = 4.779 ms, min = 11.561 us, total = 108.586 ms [state-dump] NodeManager.deadline_timer.debug_state_dump - 480 total (1 active), Execution time: mean = 1.781 ms, total = 854.962 ms, Queueing time: mean = 71.428 us, max = 176.385 us, min = 11.609 us, total = 34.285 ms [state-dump] ClientConnection.async_read.ProcessMessageHeader - 95 total (21 active), Execution time: mean = 7.620 us, total = 723.861 us, Queueing time: mean = 660.815 ms, max = 59.826 s, min = 29.537 us, total = 62.777 s [state-dump] NodeManager.deadline_timer.print_event_loop_stats - 80 total (1 active, 1 running), Execution time: mean = 2.733 ms, total = 218.616 ms, Queueing time: mean = 78.742 us, max = 325.100 us, min = 15.835 us, total = 6.299 ms [state-dump] ClientConnection.async_read.ProcessMessage - 74 total (0 active), Execution time: mean = 752.373 us, total = 55.676 ms, Queueing time: mean = 18.898 us, max = 247.615 us, min = 4.453 us, total = 1.398 ms [state-dump] ClientConnection.async_write.DoAsyncWrites - 22 total (0 active), Execution time: mean = 2.129 us, total = 46.834 us, Queueing time: mean = 43.475 us, max = 264.443 us, min = 17.727 us, total = 956.459 us [state-dump] ObjectManager.ObjectDeleted - 21 total (0 active), Execution time: mean = 18.654 us, total = 391.728 us, Queueing time: mean = 66.932 us, max = 269.040 us, min = 24.280 us, total = 1.406 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 21 total (0 active), Execution time: mean = 119.783 us, total = 2.515 ms, Queueing time: mean = 119.709 us, max = 193.975 us, min = 26.529 us, total = 2.514 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig - 21 total (0 active), Execution time: mean = 800.451 us, total = 16.809 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ObjectManager.ObjectAdded - 21 total (0 active), Execution time: mean = 13.522 us, total = 283.961 us, Queueing time: mean = 115.200 us, max = 203.067 us, min = 15.853 us, total = 2.419 ms [state-dump] PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 204.482 us, total = 2.658 ms, Queueing time: mean = 3.109 ms, max = 9.915 ms, min = 28.359 us, total = 40.413 ms [state-dump] - 12 total (0 active), Execution time: mean = 1.109 us, total = 13.308 us, Queueing time: mean = 117.235 us, max = 207.332 us, min = 32.817 us, total = 1.407 ms [state-dump] RaySyncer.BroadcastMessage - 12 total (0 active), Execution time: mean = 229.790 us, total = 2.757 ms, Queueing time: mean = 796.500 ns, max = 1.330 us, min = 180.000 ns, total = 9.558 us [state-dump] NodeManagerService.grpc_server.RequestWorkerLease.HandleRequestImpl - 10 total (0 active), Execution time: mean = 133.400 us, total = 1.334 ms, Queueing time: mean = 133.654 us, max = 258.016 us, min = 50.206 us, total = 1.337 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker - 10 total (0 active), Execution time: mean = 698.937 us, total = 6.989 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.RequestWorkerLease - 10 total (0 active), Execution time: mean = 967.372 us, total = 9.674 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] WorkerPool.PopWorkerCallback - 10 total (0 active), Execution time: mean = 24.488 us, total = 244.878 us, Queueing time: mean = 114.997 us, max = 206.869 us, min = 23.653 us, total = 1.150 ms [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 10 total (1 active), Execution time: mean = 479.761 s, total = 4797.607 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReturnWorker.HandleRequestImpl - 10 total (0 active), Execution time: mean = 129.906 us, total = 1.299 ms, Queueing time: mean = 109.463 us, max = 137.712 us, min = 31.332 us, total = 1.095 ms [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 9 total (0 active), Execution time: mean = 379.978 us, total = 3.420 ms, Queueing time: mean = 131.185 us, max = 243.371 us, min = 20.299 us, total = 1.181 ms [state-dump] NodeManager.GCTaskFailureReason - 6 total (1 active), Execution time: mean = 7.110 us, total = 42.659 us, Queueing time: mean = 57.967 us, max = 97.290 us, min = 24.344 us, total = 347.804 us [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 1.320 ms, total = 2.640 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 142.119 us, total = 284.237 us, Queueing time: mean = 702.200 us, max = 1.287 ms, min = 117.240 us, total = 1.404 ms [state-dump] RaySyncerRegister - 2 total (0 active), Execution time: mean = 2.053 us, total = 4.106 us, Queueing time: mean = 222.000 ns, max = 369.000 ns, min = 75.000 ns, total = 444.000 ns [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 2.178 ms, total = 2.178 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 1.022 s, total = 1.022 s, Queueing time: mean = 126.779 us, max = 126.779 us, min = 126.779 us, total = 126.779 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 43.810 us, total = 43.810 us, Queueing time: mean = 116.080 us, max = 116.080 us, min = 116.080 us, total = 116.080 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 1.935 ms, total = 1.935 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 2.002 ms, total = 2.002 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 1.278 ms, total = 1.278 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 288.995 us, total = 288.995 us, Queueing time: mean = 122.868 us, max = 122.868 us, min = 122.868 us, total = 122.868 us [state-dump] Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 74.750 us, total = 74.750 us, Queueing time: mean = 225.429 us, max = 225.429 us, min = 225.429 us, total = 225.429 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 242.297 us, total = 242.297 us, Queueing time: mean = 122.977 us, max = 122.977 us, min = 122.977 us, total = 122.977 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 92.080 us, total = 92.080 us, Queueing time: mean = 36.492 us, max = 36.492 us, min = 36.492 us, total = 36.492 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 1.746 ms, total = 1.746 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] DebugString() time ms: 1 [state-dump] [state-dump] [2025-01-21 00:22:54,927 I 13636 13664] (raylet) store.cc:564: Plasma store debug dump: Current usage: 0 / 2.14748 GB - num bytes created total: 168 0 pending objects of total size 0MB - objects spillable: 0 - bytes spillable: 0 - objects unsealed: 0 - bytes unsealed: 0 - objects in use: 0 - bytes in use: 0 - objects evictable: 0 - bytes evictable: 0 - objects created by worker: 0 - bytes created by worker: 0 - objects restored: 0 - bytes restored: 0 - objects received: 0 - bytes received: 0 - objects errored: 0 - bytes errored: 0 [2025-01-21 00:22:56,155 I 13636 13636] (raylet) node_manager.cc:525: [state-dump] NodeManager: [state-dump] Node ID: 94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3 [state-dump] Node name: 192.168.0.2 [state-dump] InitialConfigResources: {object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, GPU: 20000, node:__internal_head__: 10000, accelerator_type:A40: 10000, memory: 844966629380000} [state-dump] ClusterTaskManager: [state-dump] ========== Node: 94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3 ================= [state-dump] Infeasible queue length: 0 [state-dump] Schedule queue length: 0 [state-dump] Dispatch queue length: 0 [state-dump] num_waiting_for_resource: 0 [state-dump] num_waiting_for_plasma_memory: 0 [state-dump] num_waiting_for_remote_node_resources: 0 [state-dump] num_worker_not_started_by_job_config_not_exist: 0 [state-dump] num_worker_not_started_by_registration_timeout: 0 [state-dump] num_tasks_waiting_for_workers: 0 [state-dump] num_cancelled_tasks: 0 [state-dump] cluster_resource_scheduler state: [state-dump] Local id: -2729954820020547849 Local resources: {"total":{CPU: [200000], memory: [844966629380000], accelerator_type:A40: [10000], node:192.168.0.2: [10000], node:__internal_head__: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000]}}, "available": {CPU: [200000], memory: [844966629380000], accelerator_type:A40: [10000], node:192.168.0.2: [10000], node:__internal_head__: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000]}}, "labels":{"ray.io/node_id":"94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3",} is_draining: 0 is_idle: 1 Cluster resources: node id: -2729954820020547849{"total":{object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, GPU: 20000, node:__internal_head__: 10000, accelerator_type:A40: 10000, memory: 844966629380000}}, "available": {GPU: 20000, node:__internal_head__: 10000, memory: 844966629380000, accelerator_type:A40: 10000, object_store_memory: 21474836480000, node:192.168.0.2: 10000, CPU: 200000}}, "labels":{"ray.io/node_id":"94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} [state-dump] Waiting tasks size: 0 [state-dump] Number of executing tasks: 0 [state-dump] Number of pinned task arguments: 0 [state-dump] Number of total spilled tasks: 0 [state-dump] Number of spilled waiting tasks: 0 [state-dump] Number of spilled unschedulable tasks: 0 [state-dump] Resource usage { [state-dump] } [state-dump] Backlog Size per scheduling descriptor :{workerId: num backlogs}: [state-dump] [state-dump] Running tasks by scheduling class: [state-dump] ================================================== [state-dump] [state-dump] ClusterResources: [state-dump] LocalObjectManager: [state-dump] - num pinned objects: 0 [state-dump] - pinned objects size: 0 [state-dump] - num objects pending restore: 0 [state-dump] - num objects pending spill: 0 [state-dump] - num bytes pending spill: 0 [state-dump] - num bytes currently spilled: 0 [state-dump] - cumulative spill requests: 0 [state-dump] - cumulative restore requests: 0 [state-dump] - spilled objects pending delete: 0 [state-dump] [state-dump] ObjectManager: [state-dump] - num local objects: 0 [state-dump] - num unfulfilled push requests: 0 [state-dump] - num object pull requests: 0 [state-dump] - num chunks received total: 0 [state-dump] - num chunks received failed (all): 0 [state-dump] - num chunks received failed / cancelled: 0 [state-dump] - num chunks received failed / plasma error: 0 [state-dump] Event stats: [state-dump] Global stats: 0 total (0 active) [state-dump] Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Execution time: mean = -nan s, total = 0.000 s [state-dump] Event stats: [state-dump] PushManager: [state-dump] - num pushes in flight: 0 [state-dump] - num chunks in flight: 0 [state-dump] - num chunks remaining: 0 [state-dump] - max chunks allowed: 409 [state-dump] OwnershipBasedObjectDirectory: [state-dump] - num listeners: 0 [state-dump] - cumulative location updates: 0 [state-dump] - num location updates per second: 0.000 [state-dump] - num location lookups per second: 0.000 [state-dump] - num locations added per second: 0.000 [state-dump] - num locations removed per second: 0.000 [state-dump] BufferPool: [state-dump] - create buffer state map size: 0 [state-dump] PullManager: [state-dump] - num bytes available for pulled objects: 2147483648 [state-dump] - num bytes being pulled (all): 0 [state-dump] - num bytes being pulled / pinned: 0 [state-dump] - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - first get request bundle: N/A [state-dump] - first wait request bundle: N/A [state-dump] - first task request bundle: N/A [state-dump] - num objects queued: 0 [state-dump] - num objects actively pulled (all): 0 [state-dump] - num objects actively pulled / pinned: 0 [state-dump] - num bundles being pulled: 0 [state-dump] - num pull retries: 0 [state-dump] - max timeout seconds: 0 [state-dump] - max timeout request is already processed. No entry. [state-dump] [state-dump] WorkerPool: [state-dump] - registered jobs: 1 [state-dump] - process_failed_job_config_missing: 0 [state-dump] - process_failed_rate_limited: 0 [state-dump] - process_failed_pending_registration: 0 [state-dump] - process_failed_runtime_env_setup_failed: 0 [state-dump] - num PYTHON workers: 20 [state-dump] - num PYTHON drivers: 1 [state-dump] - num PYTHON pending start requests: 0 [state-dump] - num PYTHON pending registration requests: 0 [state-dump] - num object spill callbacks queued: 0 [state-dump] - num object restore queued: 0 [state-dump] - num util functions queued: 0 [state-dump] - num idle workers: 20 [state-dump] TaskDependencyManager: [state-dump] - task deps map size: 0 [state-dump] - get req map size: 0 [state-dump] - wait req map size: 0 [state-dump] - local objects map size: 0 [state-dump] WaitManager: [state-dump] - num active wait requests: 0 [state-dump] Subscriber: [state-dump] Channel WORKER_OBJECT_LOCATIONS_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_OBJECT_EVICTION [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_REF_REMOVED_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] num async plasma notifications: 0 [state-dump] Remote node managers: [state-dump] Event stats: [state-dump] Global stats: 424164 total (35 active) [state-dump] Queueing time: mean = 226.257 us, max = 59.826 s, min = -0.001 s, total = 95.970 s [state-dump] Execution time: mean = 11.491 ms, total = 4874.044 s [state-dump] Event stats: [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog - 102018 total (0 active), Execution time: mean = 531.542 us, total = 54.227 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 102018 total (0 active), Execution time: mean = 35.412 us, total = 3.613 s, Queueing time: mean = 110.919 us, max = 3.003 ms, min = 1.846 us, total = 11.316 s [state-dump] ObjectManager.UpdateAvailableMemory - 48554 total (0 active), Execution time: mean = 6.199 us, total = 300.981 ms, Queueing time: mean = 110.756 us, max = 3.228 ms, min = 2.228 us, total = 5.378 s [state-dump] NodeManager.CheckGC - 48554 total (1 active), Execution time: mean = 2.907 us, total = 141.151 ms, Queueing time: mean = 98.636 us, max = 25.875 ms, min = -0.000 s, total = 4.789 s [state-dump] RaySyncer.OnDemandBroadcasting - 48554 total (1 active), Execution time: mean = 10.760 us, total = 522.433 ms, Queueing time: mean = 91.691 us, max = 25.869 ms, min = 6.166 us, total = 4.452 s [state-dump] RayletWorkerPool.deadline_timer.kill_idle_workers - 24290 total (1 active), Execution time: mean = 18.692 us, total = 454.028 ms, Queueing time: mean = 76.777 us, max = 26.386 ms, min = -0.001 s, total = 1.865 s [state-dump] MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 19400 total (1 active), Execution time: mean = 456.812 us, total = 8.862 s, Queueing time: mean = 75.550 us, max = 3.532 ms, min = -0.000 s, total = 1.466 s [state-dump] NodeManager.deadline_timer.spill_objects_when_over_threshold - 4860 total (1 active), Execution time: mean = 3.130 us, total = 15.214 ms, Queueing time: mean = 182.691 us, max = 2.946 ms, min = 3.845 us, total = 887.877 ms [state-dump] NodeManager.deadline_timer.flush_free_objects - 4860 total (1 active), Execution time: mean = 9.286 us, total = 45.129 ms, Queueing time: mean = 178.490 us, max = 2.947 ms, min = 3.811 us, total = 867.459 ms [state-dump] NodeManager.ScheduleAndDispatchTasks - 4860 total (1 active), Execution time: mean = 16.829 us, total = 81.789 ms, Queueing time: mean = 75.245 us, max = 2.581 ms, min = 8.735 us, total = 365.689 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 4858 total (0 active), Execution time: mean = 97.282 us, total = 472.594 ms, Queueing time: mean = 112.500 us, max = 2.934 ms, min = 4.027 us, total = 546.523 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad - 4858 total (0 active), Execution time: mean = 611.977 us, total = 2.973 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ClusterResourceManager.ResetRemoteNodeView - 1621 total (1 active), Execution time: mean = 10.095 us, total = 16.364 ms, Queueing time: mean = 76.061 us, max = 442.307 us, min = 14.043 us, total = 123.294 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 972 total (0 active), Execution time: mean = 1.601 ms, total = 1.556 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.deadline_timer.record_metrics - 972 total (1 active), Execution time: mean = 585.622 us, total = 569.224 ms, Queueing time: mean = 342.295 us, max = 2.010 ms, min = 9.115 us, total = 332.711 ms [state-dump] NodeManager.GcsCheckAlive - 972 total (1 active), Execution time: mean = 315.166 us, total = 306.341 ms, Queueing time: mean = 612.279 us, max = 2.567 ms, min = 6.690 us, total = 595.135 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 972 total (0 active), Execution time: mean = 55.649 us, total = 54.091 ms, Queueing time: mean = 113.055 us, max = 4.779 ms, min = 11.561 us, total = 109.889 ms [state-dump] NodeManager.deadline_timer.debug_state_dump - 486 total (1 active), Execution time: mean = 1.782 ms, total = 865.932 ms, Queueing time: mean = 71.474 us, max = 176.385 us, min = 11.609 us, total = 34.736 ms [state-dump] ClientConnection.async_read.ProcessMessageHeader - 95 total (21 active), Execution time: mean = 7.620 us, total = 723.861 us, Queueing time: mean = 660.815 ms, max = 59.826 s, min = 29.537 us, total = 62.777 s [state-dump] NodeManager.deadline_timer.print_event_loop_stats - 81 total (1 active, 1 running), Execution time: mean = 2.736 ms, total = 221.586 ms, Queueing time: mean = 78.793 us, max = 325.100 us, min = 15.835 us, total = 6.382 ms [state-dump] ClientConnection.async_read.ProcessMessage - 74 total (0 active), Execution time: mean = 752.373 us, total = 55.676 ms, Queueing time: mean = 18.898 us, max = 247.615 us, min = 4.453 us, total = 1.398 ms [state-dump] ClientConnection.async_write.DoAsyncWrites - 22 total (0 active), Execution time: mean = 2.129 us, total = 46.834 us, Queueing time: mean = 43.475 us, max = 264.443 us, min = 17.727 us, total = 956.459 us [state-dump] ObjectManager.ObjectDeleted - 21 total (0 active), Execution time: mean = 18.654 us, total = 391.728 us, Queueing time: mean = 66.932 us, max = 269.040 us, min = 24.280 us, total = 1.406 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 21 total (0 active), Execution time: mean = 119.783 us, total = 2.515 ms, Queueing time: mean = 119.709 us, max = 193.975 us, min = 26.529 us, total = 2.514 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig - 21 total (0 active), Execution time: mean = 800.451 us, total = 16.809 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ObjectManager.ObjectAdded - 21 total (0 active), Execution time: mean = 13.522 us, total = 283.961 us, Queueing time: mean = 115.200 us, max = 203.067 us, min = 15.853 us, total = 2.419 ms [state-dump] PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 204.482 us, total = 2.658 ms, Queueing time: mean = 3.109 ms, max = 9.915 ms, min = 28.359 us, total = 40.413 ms [state-dump] - 12 total (0 active), Execution time: mean = 1.109 us, total = 13.308 us, Queueing time: mean = 117.235 us, max = 207.332 us, min = 32.817 us, total = 1.407 ms [state-dump] RaySyncer.BroadcastMessage - 12 total (0 active), Execution time: mean = 229.790 us, total = 2.757 ms, Queueing time: mean = 796.500 ns, max = 1.330 us, min = 180.000 ns, total = 9.558 us [state-dump] NodeManagerService.grpc_server.RequestWorkerLease.HandleRequestImpl - 10 total (0 active), Execution time: mean = 133.400 us, total = 1.334 ms, Queueing time: mean = 133.654 us, max = 258.016 us, min = 50.206 us, total = 1.337 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker - 10 total (0 active), Execution time: mean = 698.937 us, total = 6.989 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.RequestWorkerLease - 10 total (0 active), Execution time: mean = 967.372 us, total = 9.674 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] WorkerPool.PopWorkerCallback - 10 total (0 active), Execution time: mean = 24.488 us, total = 244.878 us, Queueing time: mean = 114.997 us, max = 206.869 us, min = 23.653 us, total = 1.150 ms [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 10 total (1 active), Execution time: mean = 479.761 s, total = 4797.607 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReturnWorker.HandleRequestImpl - 10 total (0 active), Execution time: mean = 129.906 us, total = 1.299 ms, Queueing time: mean = 109.463 us, max = 137.712 us, min = 31.332 us, total = 1.095 ms [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 9 total (0 active), Execution time: mean = 379.978 us, total = 3.420 ms, Queueing time: mean = 131.185 us, max = 243.371 us, min = 20.299 us, total = 1.181 ms [state-dump] NodeManager.GCTaskFailureReason - 6 total (1 active), Execution time: mean = 7.110 us, total = 42.659 us, Queueing time: mean = 57.967 us, max = 97.290 us, min = 24.344 us, total = 347.804 us [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 1.320 ms, total = 2.640 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 142.119 us, total = 284.237 us, Queueing time: mean = 702.200 us, max = 1.287 ms, min = 117.240 us, total = 1.404 ms [state-dump] RaySyncerRegister - 2 total (0 active), Execution time: mean = 2.053 us, total = 4.106 us, Queueing time: mean = 222.000 ns, max = 369.000 ns, min = 75.000 ns, total = 444.000 ns [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 2.178 ms, total = 2.178 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 1.022 s, total = 1.022 s, Queueing time: mean = 126.779 us, max = 126.779 us, min = 126.779 us, total = 126.779 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 43.810 us, total = 43.810 us, Queueing time: mean = 116.080 us, max = 116.080 us, min = 116.080 us, total = 116.080 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 1.935 ms, total = 1.935 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 2.002 ms, total = 2.002 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 1.278 ms, total = 1.278 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 288.995 us, total = 288.995 us, Queueing time: mean = 122.868 us, max = 122.868 us, min = 122.868 us, total = 122.868 us [state-dump] Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 74.750 us, total = 74.750 us, Queueing time: mean = 225.429 us, max = 225.429 us, min = 225.429 us, total = 225.429 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 242.297 us, total = 242.297 us, Queueing time: mean = 122.977 us, max = 122.977 us, min = 122.977 us, total = 122.977 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 92.080 us, total = 92.080 us, Queueing time: mean = 36.492 us, max = 36.492 us, min = 36.492 us, total = 36.492 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 1.746 ms, total = 1.746 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] DebugString() time ms: 1 [state-dump] [state-dump] [2025-01-21 00:23:54,928 I 13636 13664] (raylet) store.cc:564: Plasma store debug dump: Current usage: 0 / 2.14748 GB - num bytes created total: 168 0 pending objects of total size 0MB - objects spillable: 0 - bytes spillable: 0 - objects unsealed: 0 - bytes unsealed: 0 - objects in use: 0 - bytes in use: 0 - objects evictable: 0 - bytes evictable: 0 - objects created by worker: 0 - bytes created by worker: 0 - objects restored: 0 - bytes restored: 0 - objects received: 0 - bytes received: 0 - objects errored: 0 - bytes errored: 0 [2025-01-21 00:23:56,158 I 13636 13636] (raylet) node_manager.cc:525: [state-dump] NodeManager: [state-dump] Node ID: 94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3 [state-dump] Node name: 192.168.0.2 [state-dump] InitialConfigResources: {object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, GPU: 20000, node:__internal_head__: 10000, accelerator_type:A40: 10000, memory: 844966629380000} [state-dump] ClusterTaskManager: [state-dump] ========== Node: 94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3 ================= [state-dump] Infeasible queue length: 0 [state-dump] Schedule queue length: 0 [state-dump] Dispatch queue length: 0 [state-dump] num_waiting_for_resource: 0 [state-dump] num_waiting_for_plasma_memory: 0 [state-dump] num_waiting_for_remote_node_resources: 0 [state-dump] num_worker_not_started_by_job_config_not_exist: 0 [state-dump] num_worker_not_started_by_registration_timeout: 0 [state-dump] num_tasks_waiting_for_workers: 0 [state-dump] num_cancelled_tasks: 0 [state-dump] cluster_resource_scheduler state: [state-dump] Local id: -2729954820020547849 Local resources: {"total":{CPU: [200000], memory: [844966629380000], accelerator_type:A40: [10000], node:192.168.0.2: [10000], node:__internal_head__: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000]}}, "available": {CPU: [200000], memory: [844966629380000], accelerator_type:A40: [10000], node:192.168.0.2: [10000], node:__internal_head__: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000]}}, "labels":{"ray.io/node_id":"94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3",} is_draining: 0 is_idle: 1 Cluster resources: node id: -2729954820020547849{"total":{object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, GPU: 20000, node:__internal_head__: 10000, accelerator_type:A40: 10000, memory: 844966629380000}}, "available": {GPU: 20000, node:__internal_head__: 10000, memory: 844966629380000, accelerator_type:A40: 10000, object_store_memory: 21474836480000, node:192.168.0.2: 10000, CPU: 200000}}, "labels":{"ray.io/node_id":"94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} [state-dump] Waiting tasks size: 0 [state-dump] Number of executing tasks: 0 [state-dump] Number of pinned task arguments: 0 [state-dump] Number of total spilled tasks: 0 [state-dump] Number of spilled waiting tasks: 0 [state-dump] Number of spilled unschedulable tasks: 0 [state-dump] Resource usage { [state-dump] } [state-dump] Backlog Size per scheduling descriptor :{workerId: num backlogs}: [state-dump] [state-dump] Running tasks by scheduling class: [state-dump] ================================================== [state-dump] [state-dump] ClusterResources: [state-dump] LocalObjectManager: [state-dump] - num pinned objects: 0 [state-dump] - pinned objects size: 0 [state-dump] - num objects pending restore: 0 [state-dump] - num objects pending spill: 0 [state-dump] - num bytes pending spill: 0 [state-dump] - num bytes currently spilled: 0 [state-dump] - cumulative spill requests: 0 [state-dump] - cumulative restore requests: 0 [state-dump] - spilled objects pending delete: 0 [state-dump] [state-dump] ObjectManager: [state-dump] - num local objects: 0 [state-dump] - num unfulfilled push requests: 0 [state-dump] - num object pull requests: 0 [state-dump] - num chunks received total: 0 [state-dump] - num chunks received failed (all): 0 [state-dump] - num chunks received failed / cancelled: 0 [state-dump] - num chunks received failed / plasma error: 0 [state-dump] Event stats: [state-dump] Global stats: 0 total (0 active) [state-dump] Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Execution time: mean = -nan s, total = 0.000 s [state-dump] Event stats: [state-dump] PushManager: [state-dump] - num pushes in flight: 0 [state-dump] - num chunks in flight: 0 [state-dump] - num chunks remaining: 0 [state-dump] - max chunks allowed: 409 [state-dump] OwnershipBasedObjectDirectory: [state-dump] - num listeners: 0 [state-dump] - cumulative location updates: 0 [state-dump] - num location updates per second: 0.000 [state-dump] - num location lookups per second: 0.000 [state-dump] - num locations added per second: 0.000 [state-dump] - num locations removed per second: 0.000 [state-dump] BufferPool: [state-dump] - create buffer state map size: 0 [state-dump] PullManager: [state-dump] - num bytes available for pulled objects: 2147483648 [state-dump] - num bytes being pulled (all): 0 [state-dump] - num bytes being pulled / pinned: 0 [state-dump] - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - first get request bundle: N/A [state-dump] - first wait request bundle: N/A [state-dump] - first task request bundle: N/A [state-dump] - num objects queued: 0 [state-dump] - num objects actively pulled (all): 0 [state-dump] - num objects actively pulled / pinned: 0 [state-dump] - num bundles being pulled: 0 [state-dump] - num pull retries: 0 [state-dump] - max timeout seconds: 0 [state-dump] - max timeout request is already processed. No entry. [state-dump] [state-dump] WorkerPool: [state-dump] - registered jobs: 1 [state-dump] - process_failed_job_config_missing: 0 [state-dump] - process_failed_rate_limited: 0 [state-dump] - process_failed_pending_registration: 0 [state-dump] - process_failed_runtime_env_setup_failed: 0 [state-dump] - num PYTHON workers: 20 [state-dump] - num PYTHON drivers: 1 [state-dump] - num PYTHON pending start requests: 0 [state-dump] - num PYTHON pending registration requests: 0 [state-dump] - num object spill callbacks queued: 0 [state-dump] - num object restore queued: 0 [state-dump] - num util functions queued: 0 [state-dump] - num idle workers: 20 [state-dump] TaskDependencyManager: [state-dump] - task deps map size: 0 [state-dump] - get req map size: 0 [state-dump] - wait req map size: 0 [state-dump] - local objects map size: 0 [state-dump] WaitManager: [state-dump] - num active wait requests: 0 [state-dump] Subscriber: [state-dump] Channel WORKER_OBJECT_LOCATIONS_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_OBJECT_EVICTION [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_REF_REMOVED_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] num async plasma notifications: 0 [state-dump] Remote node managers: [state-dump] Event stats: [state-dump] Global stats: 429396 total (35 active) [state-dump] Queueing time: mean = 224.488 us, max = 59.826 s, min = -0.001 s, total = 96.394 s [state-dump] Execution time: mean = 11.353 ms, total = 4874.982 s [state-dump] Event stats: [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog - 103278 total (0 active), Execution time: mean = 531.535 us, total = 54.896 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 103278 total (0 active), Execution time: mean = 35.432 us, total = 3.659 s, Queueing time: mean = 110.896 us, max = 3.003 ms, min = 1.846 us, total = 11.453 s [state-dump] ObjectManager.UpdateAvailableMemory - 49153 total (0 active), Execution time: mean = 6.208 us, total = 305.153 ms, Queueing time: mean = 110.846 us, max = 3.228 ms, min = 2.228 us, total = 5.448 s [state-dump] NodeManager.CheckGC - 49153 total (1 active), Execution time: mean = 2.909 us, total = 142.999 ms, Queueing time: mean = 98.772 us, max = 25.875 ms, min = -0.000 s, total = 4.855 s [state-dump] RaySyncer.OnDemandBroadcasting - 49153 total (1 active), Execution time: mean = 10.776 us, total = 529.668 ms, Queueing time: mean = 91.814 us, max = 25.869 ms, min = 6.166 us, total = 4.513 s [state-dump] RayletWorkerPool.deadline_timer.kill_idle_workers - 24590 total (1 active), Execution time: mean = 18.699 us, total = 459.813 ms, Queueing time: mean = 76.746 us, max = 26.386 ms, min = -0.001 s, total = 1.887 s [state-dump] MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 19640 total (1 active), Execution time: mean = 457.058 us, total = 8.977 s, Queueing time: mean = 75.580 us, max = 3.532 ms, min = -0.000 s, total = 1.484 s [state-dump] NodeManager.deadline_timer.spill_objects_when_over_threshold - 4920 total (1 active), Execution time: mean = 3.136 us, total = 15.430 ms, Queueing time: mean = 182.756 us, max = 2.946 ms, min = 3.845 us, total = 899.160 ms [state-dump] NodeManager.deadline_timer.flush_free_objects - 4920 total (1 active), Execution time: mean = 9.309 us, total = 45.800 ms, Queueing time: mean = 178.548 us, max = 2.947 ms, min = 3.811 us, total = 878.458 ms [state-dump] NodeManager.ScheduleAndDispatchTasks - 4920 total (1 active), Execution time: mean = 16.834 us, total = 82.822 ms, Queueing time: mean = 75.317 us, max = 2.581 ms, min = 8.735 us, total = 370.561 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 4918 total (0 active), Execution time: mean = 97.243 us, total = 478.243 ms, Queueing time: mean = 112.428 us, max = 2.934 ms, min = 4.027 us, total = 552.922 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad - 4918 total (0 active), Execution time: mean = 611.674 us, total = 3.008 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ClusterResourceManager.ResetRemoteNodeView - 1641 total (1 active), Execution time: mean = 10.091 us, total = 16.559 ms, Queueing time: mean = 76.018 us, max = 442.307 us, min = 14.043 us, total = 124.746 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 984 total (0 active), Execution time: mean = 1.601 ms, total = 1.576 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.deadline_timer.record_metrics - 984 total (1 active), Execution time: mean = 585.645 us, total = 576.274 ms, Queueing time: mean = 342.741 us, max = 2.010 ms, min = 9.115 us, total = 337.257 ms [state-dump] NodeManager.GcsCheckAlive - 984 total (1 active), Execution time: mean = 315.614 us, total = 310.564 ms, Queueing time: mean = 612.250 us, max = 2.567 ms, min = 6.690 us, total = 602.454 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 984 total (0 active), Execution time: mean = 55.695 us, total = 54.803 ms, Queueing time: mean = 112.939 us, max = 4.779 ms, min = 11.561 us, total = 111.132 ms [state-dump] NodeManager.deadline_timer.debug_state_dump - 492 total (1 active), Execution time: mean = 1.783 ms, total = 877.050 ms, Queueing time: mean = 71.520 us, max = 176.385 us, min = 11.609 us, total = 35.188 ms [state-dump] ClientConnection.async_read.ProcessMessageHeader - 95 total (21 active), Execution time: mean = 7.620 us, total = 723.861 us, Queueing time: mean = 660.815 ms, max = 59.826 s, min = 29.537 us, total = 62.777 s [state-dump] NodeManager.deadline_timer.print_event_loop_stats - 82 total (1 active, 1 running), Execution time: mean = 2.738 ms, total = 224.524 ms, Queueing time: mean = 78.896 us, max = 325.100 us, min = 15.835 us, total = 6.469 ms [state-dump] ClientConnection.async_read.ProcessMessage - 74 total (0 active), Execution time: mean = 752.373 us, total = 55.676 ms, Queueing time: mean = 18.898 us, max = 247.615 us, min = 4.453 us, total = 1.398 ms [state-dump] ClientConnection.async_write.DoAsyncWrites - 22 total (0 active), Execution time: mean = 2.129 us, total = 46.834 us, Queueing time: mean = 43.475 us, max = 264.443 us, min = 17.727 us, total = 956.459 us [state-dump] ObjectManager.ObjectDeleted - 21 total (0 active), Execution time: mean = 18.654 us, total = 391.728 us, Queueing time: mean = 66.932 us, max = 269.040 us, min = 24.280 us, total = 1.406 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 21 total (0 active), Execution time: mean = 119.783 us, total = 2.515 ms, Queueing time: mean = 119.709 us, max = 193.975 us, min = 26.529 us, total = 2.514 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig - 21 total (0 active), Execution time: mean = 800.451 us, total = 16.809 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ObjectManager.ObjectAdded - 21 total (0 active), Execution time: mean = 13.522 us, total = 283.961 us, Queueing time: mean = 115.200 us, max = 203.067 us, min = 15.853 us, total = 2.419 ms [state-dump] PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 204.482 us, total = 2.658 ms, Queueing time: mean = 3.109 ms, max = 9.915 ms, min = 28.359 us, total = 40.413 ms [state-dump] - 12 total (0 active), Execution time: mean = 1.109 us, total = 13.308 us, Queueing time: mean = 117.235 us, max = 207.332 us, min = 32.817 us, total = 1.407 ms [state-dump] RaySyncer.BroadcastMessage - 12 total (0 active), Execution time: mean = 229.790 us, total = 2.757 ms, Queueing time: mean = 796.500 ns, max = 1.330 us, min = 180.000 ns, total = 9.558 us [state-dump] NodeManagerService.grpc_server.RequestWorkerLease.HandleRequestImpl - 10 total (0 active), Execution time: mean = 133.400 us, total = 1.334 ms, Queueing time: mean = 133.654 us, max = 258.016 us, min = 50.206 us, total = 1.337 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker - 10 total (0 active), Execution time: mean = 698.937 us, total = 6.989 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.RequestWorkerLease - 10 total (0 active), Execution time: mean = 967.372 us, total = 9.674 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] WorkerPool.PopWorkerCallback - 10 total (0 active), Execution time: mean = 24.488 us, total = 244.878 us, Queueing time: mean = 114.997 us, max = 206.869 us, min = 23.653 us, total = 1.150 ms [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 10 total (1 active), Execution time: mean = 479.761 s, total = 4797.607 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReturnWorker.HandleRequestImpl - 10 total (0 active), Execution time: mean = 129.906 us, total = 1.299 ms, Queueing time: mean = 109.463 us, max = 137.712 us, min = 31.332 us, total = 1.095 ms [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 9 total (0 active), Execution time: mean = 379.978 us, total = 3.420 ms, Queueing time: mean = 131.185 us, max = 243.371 us, min = 20.299 us, total = 1.181 ms [state-dump] NodeManager.GCTaskFailureReason - 6 total (1 active), Execution time: mean = 7.110 us, total = 42.659 us, Queueing time: mean = 57.967 us, max = 97.290 us, min = 24.344 us, total = 347.804 us [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 1.320 ms, total = 2.640 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 142.119 us, total = 284.237 us, Queueing time: mean = 702.200 us, max = 1.287 ms, min = 117.240 us, total = 1.404 ms [state-dump] RaySyncerRegister - 2 total (0 active), Execution time: mean = 2.053 us, total = 4.106 us, Queueing time: mean = 222.000 ns, max = 369.000 ns, min = 75.000 ns, total = 444.000 ns [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 2.178 ms, total = 2.178 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 1.022 s, total = 1.022 s, Queueing time: mean = 126.779 us, max = 126.779 us, min = 126.779 us, total = 126.779 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 43.810 us, total = 43.810 us, Queueing time: mean = 116.080 us, max = 116.080 us, min = 116.080 us, total = 116.080 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 1.935 ms, total = 1.935 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 2.002 ms, total = 2.002 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 1.278 ms, total = 1.278 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 288.995 us, total = 288.995 us, Queueing time: mean = 122.868 us, max = 122.868 us, min = 122.868 us, total = 122.868 us [state-dump] Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 74.750 us, total = 74.750 us, Queueing time: mean = 225.429 us, max = 225.429 us, min = 225.429 us, total = 225.429 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 242.297 us, total = 242.297 us, Queueing time: mean = 122.977 us, max = 122.977 us, min = 122.977 us, total = 122.977 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 92.080 us, total = 92.080 us, Queueing time: mean = 36.492 us, max = 36.492 us, min = 36.492 us, total = 36.492 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 1.746 ms, total = 1.746 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] DebugString() time ms: 1 [state-dump] [state-dump] [2025-01-21 00:24:54,928 I 13636 13664] (raylet) store.cc:564: Plasma store debug dump: Current usage: 0 / 2.14748 GB - num bytes created total: 168 0 pending objects of total size 0MB - objects spillable: 0 - bytes spillable: 0 - objects unsealed: 0 - bytes unsealed: 0 - objects in use: 0 - bytes in use: 0 - objects evictable: 0 - bytes evictable: 0 - objects created by worker: 0 - bytes created by worker: 0 - objects restored: 0 - bytes restored: 0 - objects received: 0 - bytes received: 0 - objects errored: 0 - bytes errored: 0 [2025-01-21 00:24:56,160 I 13636 13636] (raylet) node_manager.cc:525: [state-dump] NodeManager: [state-dump] Node ID: 94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3 [state-dump] Node name: 192.168.0.2 [state-dump] InitialConfigResources: {object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, GPU: 20000, node:__internal_head__: 10000, accelerator_type:A40: 10000, memory: 844966629380000} [state-dump] ClusterTaskManager: [state-dump] ========== Node: 94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3 ================= [state-dump] Infeasible queue length: 0 [state-dump] Schedule queue length: 0 [state-dump] Dispatch queue length: 0 [state-dump] num_waiting_for_resource: 0 [state-dump] num_waiting_for_plasma_memory: 0 [state-dump] num_waiting_for_remote_node_resources: 0 [state-dump] num_worker_not_started_by_job_config_not_exist: 0 [state-dump] num_worker_not_started_by_registration_timeout: 0 [state-dump] num_tasks_waiting_for_workers: 0 [state-dump] num_cancelled_tasks: 0 [state-dump] cluster_resource_scheduler state: [state-dump] Local id: -2729954820020547849 Local resources: {"total":{CPU: [200000], memory: [844966629380000], accelerator_type:A40: [10000], node:192.168.0.2: [10000], node:__internal_head__: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000]}}, "available": {CPU: [200000], memory: [844966629380000], accelerator_type:A40: [10000], node:192.168.0.2: [10000], node:__internal_head__: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000]}}, "labels":{"ray.io/node_id":"94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3",} is_draining: 0 is_idle: 1 Cluster resources: node id: -2729954820020547849{"total":{object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, GPU: 20000, node:__internal_head__: 10000, accelerator_type:A40: 10000, memory: 844966629380000}}, "available": {GPU: 20000, node:__internal_head__: 10000, memory: 844966629380000, accelerator_type:A40: 10000, object_store_memory: 21474836480000, node:192.168.0.2: 10000, CPU: 200000}}, "labels":{"ray.io/node_id":"94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} [state-dump] Waiting tasks size: 0 [state-dump] Number of executing tasks: 0 [state-dump] Number of pinned task arguments: 0 [state-dump] Number of total spilled tasks: 0 [state-dump] Number of spilled waiting tasks: 0 [state-dump] Number of spilled unschedulable tasks: 0 [state-dump] Resource usage { [state-dump] } [state-dump] Backlog Size per scheduling descriptor :{workerId: num backlogs}: [state-dump] [state-dump] Running tasks by scheduling class: [state-dump] ================================================== [state-dump] [state-dump] ClusterResources: [state-dump] LocalObjectManager: [state-dump] - num pinned objects: 0 [state-dump] - pinned objects size: 0 [state-dump] - num objects pending restore: 0 [state-dump] - num objects pending spill: 0 [state-dump] - num bytes pending spill: 0 [state-dump] - num bytes currently spilled: 0 [state-dump] - cumulative spill requests: 0 [state-dump] - cumulative restore requests: 0 [state-dump] - spilled objects pending delete: 0 [state-dump] [state-dump] ObjectManager: [state-dump] - num local objects: 0 [state-dump] - num unfulfilled push requests: 0 [state-dump] - num object pull requests: 0 [state-dump] - num chunks received total: 0 [state-dump] - num chunks received failed (all): 0 [state-dump] - num chunks received failed / cancelled: 0 [state-dump] - num chunks received failed / plasma error: 0 [state-dump] Event stats: [state-dump] Global stats: 0 total (0 active) [state-dump] Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Execution time: mean = -nan s, total = 0.000 s [state-dump] Event stats: [state-dump] PushManager: [state-dump] - num pushes in flight: 0 [state-dump] - num chunks in flight: 0 [state-dump] - num chunks remaining: 0 [state-dump] - max chunks allowed: 409 [state-dump] OwnershipBasedObjectDirectory: [state-dump] - num listeners: 0 [state-dump] - cumulative location updates: 0 [state-dump] - num location updates per second: 0.000 [state-dump] - num location lookups per second: 0.000 [state-dump] - num locations added per second: 0.000 [state-dump] - num locations removed per second: 0.000 [state-dump] BufferPool: [state-dump] - create buffer state map size: 0 [state-dump] PullManager: [state-dump] - num bytes available for pulled objects: 2147483648 [state-dump] - num bytes being pulled (all): 0 [state-dump] - num bytes being pulled / pinned: 0 [state-dump] - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - first get request bundle: N/A [state-dump] - first wait request bundle: N/A [state-dump] - first task request bundle: N/A [state-dump] - num objects queued: 0 [state-dump] - num objects actively pulled (all): 0 [state-dump] - num objects actively pulled / pinned: 0 [state-dump] - num bundles being pulled: 0 [state-dump] - num pull retries: 0 [state-dump] - max timeout seconds: 0 [state-dump] - max timeout request is already processed. No entry. [state-dump] [state-dump] WorkerPool: [state-dump] - registered jobs: 1 [state-dump] - process_failed_job_config_missing: 0 [state-dump] - process_failed_rate_limited: 0 [state-dump] - process_failed_pending_registration: 0 [state-dump] - process_failed_runtime_env_setup_failed: 0 [state-dump] - num PYTHON workers: 20 [state-dump] - num PYTHON drivers: 1 [state-dump] - num PYTHON pending start requests: 0 [state-dump] - num PYTHON pending registration requests: 0 [state-dump] - num object spill callbacks queued: 0 [state-dump] - num object restore queued: 0 [state-dump] - num util functions queued: 0 [state-dump] - num idle workers: 20 [state-dump] TaskDependencyManager: [state-dump] - task deps map size: 0 [state-dump] - get req map size: 0 [state-dump] - wait req map size: 0 [state-dump] - local objects map size: 0 [state-dump] WaitManager: [state-dump] - num active wait requests: 0 [state-dump] Subscriber: [state-dump] Channel WORKER_OBJECT_LOCATIONS_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_OBJECT_EVICTION [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_REF_REMOVED_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] num async plasma notifications: 0 [state-dump] Remote node managers: [state-dump] Event stats: [state-dump] Global stats: 434630 total (35 active) [state-dump] Queueing time: mean = 222.752 us, max = 59.826 s, min = -0.001 s, total = 96.815 s [state-dump] Execution time: mean = 11.219 ms, total = 4875.941 s [state-dump] Event stats: [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog - 104538 total (0 active), Execution time: mean = 531.734 us, total = 55.586 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 104538 total (0 active), Execution time: mean = 35.454 us, total = 3.706 s, Queueing time: mean = 110.951 us, max = 3.003 ms, min = 1.846 us, total = 11.599 s [state-dump] ObjectManager.UpdateAvailableMemory - 49753 total (0 active), Execution time: mean = 6.219 us, total = 309.391 ms, Queueing time: mean = 110.897 us, max = 3.228 ms, min = 2.228 us, total = 5.517 s [state-dump] NodeManager.CheckGC - 49753 total (1 active), Execution time: mean = 2.911 us, total = 144.813 ms, Queueing time: mean = 98.796 us, max = 25.875 ms, min = -0.000 s, total = 4.915 s [state-dump] RaySyncer.OnDemandBroadcasting - 49753 total (1 active), Execution time: mean = 10.789 us, total = 536.789 ms, Queueing time: mean = 91.827 us, max = 25.869 ms, min = 6.166 us, total = 4.569 s [state-dump] RayletWorkerPool.deadline_timer.kill_idle_workers - 24890 total (1 active), Execution time: mean = 18.737 us, total = 466.353 ms, Queueing time: mean = 76.835 us, max = 26.386 ms, min = -0.001 s, total = 1.912 s [state-dump] MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 19879 total (1 active), Execution time: mean = 457.157 us, total = 9.088 s, Queueing time: mean = 75.612 us, max = 3.532 ms, min = -0.000 s, total = 1.503 s [state-dump] NodeManager.deadline_timer.spill_objects_when_over_threshold - 4980 total (1 active), Execution time: mean = 3.139 us, total = 15.631 ms, Queueing time: mean = 182.620 us, max = 2.946 ms, min = 3.845 us, total = 909.445 ms [state-dump] NodeManager.deadline_timer.flush_free_objects - 4980 total (1 active), Execution time: mean = 9.312 us, total = 46.374 ms, Queueing time: mean = 178.411 us, max = 2.947 ms, min = 3.811 us, total = 888.487 ms [state-dump] NodeManager.ScheduleAndDispatchTasks - 4980 total (1 active), Execution time: mean = 16.874 us, total = 84.032 ms, Queueing time: mean = 75.335 us, max = 2.581 ms, min = 8.735 us, total = 375.168 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 4978 total (0 active), Execution time: mean = 97.285 us, total = 484.287 ms, Queueing time: mean = 112.607 us, max = 2.934 ms, min = 4.027 us, total = 560.559 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad - 4978 total (0 active), Execution time: mean = 611.911 us, total = 3.046 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ClusterResourceManager.ResetRemoteNodeView - 1661 total (1 active), Execution time: mean = 10.104 us, total = 16.783 ms, Queueing time: mean = 76.094 us, max = 442.307 us, min = 14.043 us, total = 126.391 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 996 total (0 active), Execution time: mean = 1.602 ms, total = 1.596 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.deadline_timer.record_metrics - 996 total (1 active), Execution time: mean = 585.500 us, total = 583.158 ms, Queueing time: mean = 342.098 us, max = 2.010 ms, min = 9.115 us, total = 340.730 ms [state-dump] NodeManager.GcsCheckAlive - 996 total (1 active), Execution time: mean = 316.003 us, total = 314.739 ms, Queueing time: mean = 611.113 us, max = 2.567 ms, min = 6.690 us, total = 608.668 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 996 total (0 active), Execution time: mean = 55.706 us, total = 55.484 ms, Queueing time: mean = 112.861 us, max = 4.779 ms, min = 11.561 us, total = 112.409 ms [state-dump] NodeManager.deadline_timer.debug_state_dump - 498 total (1 active), Execution time: mean = 1.781 ms, total = 887.094 ms, Queueing time: mean = 71.406 us, max = 176.385 us, min = 11.609 us, total = 35.560 ms [state-dump] ClientConnection.async_read.ProcessMessageHeader - 95 total (21 active), Execution time: mean = 7.620 us, total = 723.861 us, Queueing time: mean = 660.815 ms, max = 59.826 s, min = 29.537 us, total = 62.777 s [state-dump] NodeManager.deadline_timer.print_event_loop_stats - 83 total (1 active, 1 running), Execution time: mean = 2.737 ms, total = 227.185 ms, Queueing time: mean = 79.146 us, max = 325.100 us, min = 15.835 us, total = 6.569 ms [state-dump] ClientConnection.async_read.ProcessMessage - 74 total (0 active), Execution time: mean = 752.373 us, total = 55.676 ms, Queueing time: mean = 18.898 us, max = 247.615 us, min = 4.453 us, total = 1.398 ms [state-dump] ClientConnection.async_write.DoAsyncWrites - 22 total (0 active), Execution time: mean = 2.129 us, total = 46.834 us, Queueing time: mean = 43.475 us, max = 264.443 us, min = 17.727 us, total = 956.459 us [state-dump] ObjectManager.ObjectDeleted - 21 total (0 active), Execution time: mean = 18.654 us, total = 391.728 us, Queueing time: mean = 66.932 us, max = 269.040 us, min = 24.280 us, total = 1.406 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 21 total (0 active), Execution time: mean = 119.783 us, total = 2.515 ms, Queueing time: mean = 119.709 us, max = 193.975 us, min = 26.529 us, total = 2.514 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig - 21 total (0 active), Execution time: mean = 800.451 us, total = 16.809 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ObjectManager.ObjectAdded - 21 total (0 active), Execution time: mean = 13.522 us, total = 283.961 us, Queueing time: mean = 115.200 us, max = 203.067 us, min = 15.853 us, total = 2.419 ms [state-dump] PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 204.482 us, total = 2.658 ms, Queueing time: mean = 3.109 ms, max = 9.915 ms, min = 28.359 us, total = 40.413 ms [state-dump] - 12 total (0 active), Execution time: mean = 1.109 us, total = 13.308 us, Queueing time: mean = 117.235 us, max = 207.332 us, min = 32.817 us, total = 1.407 ms [state-dump] RaySyncer.BroadcastMessage - 12 total (0 active), Execution time: mean = 229.790 us, total = 2.757 ms, Queueing time: mean = 796.500 ns, max = 1.330 us, min = 180.000 ns, total = 9.558 us [state-dump] NodeManagerService.grpc_server.RequestWorkerLease.HandleRequestImpl - 10 total (0 active), Execution time: mean = 133.400 us, total = 1.334 ms, Queueing time: mean = 133.654 us, max = 258.016 us, min = 50.206 us, total = 1.337 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker - 10 total (0 active), Execution time: mean = 698.937 us, total = 6.989 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.RequestWorkerLease - 10 total (0 active), Execution time: mean = 967.372 us, total = 9.674 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] WorkerPool.PopWorkerCallback - 10 total (0 active), Execution time: mean = 24.488 us, total = 244.878 us, Queueing time: mean = 114.997 us, max = 206.869 us, min = 23.653 us, total = 1.150 ms [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 10 total (1 active), Execution time: mean = 479.761 s, total = 4797.607 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReturnWorker.HandleRequestImpl - 10 total (0 active), Execution time: mean = 129.906 us, total = 1.299 ms, Queueing time: mean = 109.463 us, max = 137.712 us, min = 31.332 us, total = 1.095 ms [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 9 total (0 active), Execution time: mean = 379.978 us, total = 3.420 ms, Queueing time: mean = 131.185 us, max = 243.371 us, min = 20.299 us, total = 1.181 ms [state-dump] NodeManager.GCTaskFailureReason - 6 total (1 active), Execution time: mean = 7.110 us, total = 42.659 us, Queueing time: mean = 57.967 us, max = 97.290 us, min = 24.344 us, total = 347.804 us [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 1.320 ms, total = 2.640 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 142.119 us, total = 284.237 us, Queueing time: mean = 702.200 us, max = 1.287 ms, min = 117.240 us, total = 1.404 ms [state-dump] RaySyncerRegister - 2 total (0 active), Execution time: mean = 2.053 us, total = 4.106 us, Queueing time: mean = 222.000 ns, max = 369.000 ns, min = 75.000 ns, total = 444.000 ns [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 2.178 ms, total = 2.178 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 1.022 s, total = 1.022 s, Queueing time: mean = 126.779 us, max = 126.779 us, min = 126.779 us, total = 126.779 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 43.810 us, total = 43.810 us, Queueing time: mean = 116.080 us, max = 116.080 us, min = 116.080 us, total = 116.080 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 1.935 ms, total = 1.935 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 2.002 ms, total = 2.002 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 1.278 ms, total = 1.278 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 288.995 us, total = 288.995 us, Queueing time: mean = 122.868 us, max = 122.868 us, min = 122.868 us, total = 122.868 us [state-dump] Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 74.750 us, total = 74.750 us, Queueing time: mean = 225.429 us, max = 225.429 us, min = 225.429 us, total = 225.429 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 242.297 us, total = 242.297 us, Queueing time: mean = 122.977 us, max = 122.977 us, min = 122.977 us, total = 122.977 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 92.080 us, total = 92.080 us, Queueing time: mean = 36.492 us, max = 36.492 us, min = 36.492 us, total = 36.492 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 1.746 ms, total = 1.746 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] DebugString() time ms: 1 [state-dump] [state-dump] [2025-01-21 00:25:54,929 I 13636 13664] (raylet) store.cc:564: Plasma store debug dump: Current usage: 0 / 2.14748 GB - num bytes created total: 168 0 pending objects of total size 0MB - objects spillable: 0 - bytes spillable: 0 - objects unsealed: 0 - bytes unsealed: 0 - objects in use: 0 - bytes in use: 0 - objects evictable: 0 - bytes evictable: 0 - objects created by worker: 0 - bytes created by worker: 0 - objects restored: 0 - bytes restored: 0 - objects received: 0 - bytes received: 0 - objects errored: 0 - bytes errored: 0 [2025-01-21 00:25:56,162 I 13636 13636] (raylet) node_manager.cc:525: [state-dump] NodeManager: [state-dump] Node ID: 94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3 [state-dump] Node name: 192.168.0.2 [state-dump] InitialConfigResources: {object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, GPU: 20000, node:__internal_head__: 10000, accelerator_type:A40: 10000, memory: 844966629380000} [state-dump] ClusterTaskManager: [state-dump] ========== Node: 94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3 ================= [state-dump] Infeasible queue length: 0 [state-dump] Schedule queue length: 0 [state-dump] Dispatch queue length: 0 [state-dump] num_waiting_for_resource: 0 [state-dump] num_waiting_for_plasma_memory: 0 [state-dump] num_waiting_for_remote_node_resources: 0 [state-dump] num_worker_not_started_by_job_config_not_exist: 0 [state-dump] num_worker_not_started_by_registration_timeout: 0 [state-dump] num_tasks_waiting_for_workers: 0 [state-dump] num_cancelled_tasks: 0 [state-dump] cluster_resource_scheduler state: [state-dump] Local id: -2729954820020547849 Local resources: {"total":{CPU: [200000], memory: [844966629380000], accelerator_type:A40: [10000], node:192.168.0.2: [10000], node:__internal_head__: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000]}}, "available": {CPU: [200000], memory: [844966629380000], accelerator_type:A40: [10000], node:192.168.0.2: [10000], node:__internal_head__: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000]}}, "labels":{"ray.io/node_id":"94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3",} is_draining: 0 is_idle: 1 Cluster resources: node id: -2729954820020547849{"total":{object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, GPU: 20000, node:__internal_head__: 10000, accelerator_type:A40: 10000, memory: 844966629380000}}, "available": {GPU: 20000, node:__internal_head__: 10000, memory: 844966629380000, accelerator_type:A40: 10000, object_store_memory: 21474836480000, node:192.168.0.2: 10000, CPU: 200000}}, "labels":{"ray.io/node_id":"94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} [state-dump] Waiting tasks size: 0 [state-dump] Number of executing tasks: 0 [state-dump] Number of pinned task arguments: 0 [state-dump] Number of total spilled tasks: 0 [state-dump] Number of spilled waiting tasks: 0 [state-dump] Number of spilled unschedulable tasks: 0 [state-dump] Resource usage { [state-dump] } [state-dump] Backlog Size per scheduling descriptor :{workerId: num backlogs}: [state-dump] [state-dump] Running tasks by scheduling class: [state-dump] ================================================== [state-dump] [state-dump] ClusterResources: [state-dump] LocalObjectManager: [state-dump] - num pinned objects: 0 [state-dump] - pinned objects size: 0 [state-dump] - num objects pending restore: 0 [state-dump] - num objects pending spill: 0 [state-dump] - num bytes pending spill: 0 [state-dump] - num bytes currently spilled: 0 [state-dump] - cumulative spill requests: 0 [state-dump] - cumulative restore requests: 0 [state-dump] - spilled objects pending delete: 0 [state-dump] [state-dump] ObjectManager: [state-dump] - num local objects: 0 [state-dump] - num unfulfilled push requests: 0 [state-dump] - num object pull requests: 0 [state-dump] - num chunks received total: 0 [state-dump] - num chunks received failed (all): 0 [state-dump] - num chunks received failed / cancelled: 0 [state-dump] - num chunks received failed / plasma error: 0 [state-dump] Event stats: [state-dump] Global stats: 0 total (0 active) [state-dump] Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Execution time: mean = -nan s, total = 0.000 s [state-dump] Event stats: [state-dump] PushManager: [state-dump] - num pushes in flight: 0 [state-dump] - num chunks in flight: 0 [state-dump] - num chunks remaining: 0 [state-dump] - max chunks allowed: 409 [state-dump] OwnershipBasedObjectDirectory: [state-dump] - num listeners: 0 [state-dump] - cumulative location updates: 0 [state-dump] - num location updates per second: 0.000 [state-dump] - num location lookups per second: 0.000 [state-dump] - num locations added per second: 0.000 [state-dump] - num locations removed per second: 0.000 [state-dump] BufferPool: [state-dump] - create buffer state map size: 0 [state-dump] PullManager: [state-dump] - num bytes available for pulled objects: 2147483648 [state-dump] - num bytes being pulled (all): 0 [state-dump] - num bytes being pulled / pinned: 0 [state-dump] - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - first get request bundle: N/A [state-dump] - first wait request bundle: N/A [state-dump] - first task request bundle: N/A [state-dump] - num objects queued: 0 [state-dump] - num objects actively pulled (all): 0 [state-dump] - num objects actively pulled / pinned: 0 [state-dump] - num bundles being pulled: 0 [state-dump] - num pull retries: 0 [state-dump] - max timeout seconds: 0 [state-dump] - max timeout request is already processed. No entry. [state-dump] [state-dump] WorkerPool: [state-dump] - registered jobs: 1 [state-dump] - process_failed_job_config_missing: 0 [state-dump] - process_failed_rate_limited: 0 [state-dump] - process_failed_pending_registration: 0 [state-dump] - process_failed_runtime_env_setup_failed: 0 [state-dump] - num PYTHON workers: 20 [state-dump] - num PYTHON drivers: 1 [state-dump] - num PYTHON pending start requests: 0 [state-dump] - num PYTHON pending registration requests: 0 [state-dump] - num object spill callbacks queued: 0 [state-dump] - num object restore queued: 0 [state-dump] - num util functions queued: 0 [state-dump] - num idle workers: 20 [state-dump] TaskDependencyManager: [state-dump] - task deps map size: 0 [state-dump] - get req map size: 0 [state-dump] - wait req map size: 0 [state-dump] - local objects map size: 0 [state-dump] WaitManager: [state-dump] - num active wait requests: 0 [state-dump] Subscriber: [state-dump] Channel WORKER_OBJECT_LOCATIONS_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_OBJECT_EVICTION [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_REF_REMOVED_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] num async plasma notifications: 0 [state-dump] Remote node managers: [state-dump] Event stats: [state-dump] Global stats: 439862 total (35 active) [state-dump] Queueing time: mean = 221.071 us, max = 59.826 s, min = -0.001 s, total = 97.241 s [state-dump] Execution time: mean = 11.087 ms, total = 4876.900 s [state-dump] Event stats: [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog - 105798 total (0 active), Execution time: mean = 531.913 us, total = 56.275 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 105798 total (0 active), Execution time: mean = 35.472 us, total = 3.753 s, Queueing time: mean = 110.979 us, max = 3.003 ms, min = 1.846 us, total = 11.741 s [state-dump] ObjectManager.UpdateAvailableMemory - 50352 total (0 active), Execution time: mean = 6.228 us, total = 313.617 ms, Queueing time: mean = 110.925 us, max = 3.228 ms, min = 2.228 us, total = 5.585 s [state-dump] NodeManager.CheckGC - 50352 total (1 active), Execution time: mean = 2.913 us, total = 146.664 ms, Queueing time: mean = 98.877 us, max = 25.875 ms, min = -0.000 s, total = 4.979 s [state-dump] RaySyncer.OnDemandBroadcasting - 50352 total (1 active), Execution time: mean = 10.811 us, total = 544.332 ms, Queueing time: mean = 91.890 us, max = 25.869 ms, min = 6.166 us, total = 4.627 s [state-dump] RayletWorkerPool.deadline_timer.kill_idle_workers - 25190 total (1 active), Execution time: mean = 18.759 us, total = 472.542 ms, Queueing time: mean = 76.924 us, max = 26.386 ms, min = -0.001 s, total = 1.938 s [state-dump] MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 20119 total (1 active), Execution time: mean = 457.267 us, total = 9.200 s, Queueing time: mean = 75.650 us, max = 3.532 ms, min = -0.000 s, total = 1.522 s [state-dump] NodeManager.deadline_timer.spill_objects_when_over_threshold - 5040 total (1 active), Execution time: mean = 3.140 us, total = 15.828 ms, Queueing time: mean = 182.707 us, max = 2.946 ms, min = 3.845 us, total = 920.844 ms [state-dump] NodeManager.deadline_timer.flush_free_objects - 5040 total (1 active), Execution time: mean = 9.326 us, total = 47.005 ms, Queueing time: mean = 178.495 us, max = 2.947 ms, min = 3.811 us, total = 899.614 ms [state-dump] NodeManager.ScheduleAndDispatchTasks - 5040 total (1 active), Execution time: mean = 16.919 us, total = 85.272 ms, Queueing time: mean = 75.427 us, max = 2.581 ms, min = 8.735 us, total = 380.151 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 5038 total (0 active), Execution time: mean = 97.343 us, total = 490.416 ms, Queueing time: mean = 112.681 us, max = 2.934 ms, min = 4.027 us, total = 567.686 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad - 5038 total (0 active), Execution time: mean = 612.106 us, total = 3.084 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ClusterResourceManager.ResetRemoteNodeView - 1681 total (1 active), Execution time: mean = 10.109 us, total = 16.993 ms, Queueing time: mean = 76.063 us, max = 442.307 us, min = 14.043 us, total = 127.862 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 1008 total (0 active), Execution time: mean = 1.602 ms, total = 1.615 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.deadline_timer.record_metrics - 1008 total (1 active), Execution time: mean = 585.821 us, total = 590.507 ms, Queueing time: mean = 342.389 us, max = 2.010 ms, min = 9.115 us, total = 345.128 ms [state-dump] NodeManager.GcsCheckAlive - 1008 total (1 active), Execution time: mean = 316.469 us, total = 319.001 ms, Queueing time: mean = 611.276 us, max = 2.567 ms, min = 6.690 us, total = 616.166 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 1008 total (0 active), Execution time: mean = 55.827 us, total = 56.273 ms, Queueing time: mean = 112.786 us, max = 4.779 ms, min = 11.561 us, total = 113.689 ms [state-dump] NodeManager.deadline_timer.debug_state_dump - 504 total (1 active), Execution time: mean = 1.782 ms, total = 898.273 ms, Queueing time: mean = 71.577 us, max = 176.385 us, min = 11.609 us, total = 36.075 ms [state-dump] ClientConnection.async_read.ProcessMessageHeader - 95 total (21 active), Execution time: mean = 7.620 us, total = 723.861 us, Queueing time: mean = 660.815 ms, max = 59.826 s, min = 29.537 us, total = 62.777 s [state-dump] NodeManager.deadline_timer.print_event_loop_stats - 84 total (1 active, 1 running), Execution time: mean = 2.732 ms, total = 229.471 ms, Queueing time: mean = 78.867 us, max = 325.100 us, min = 15.835 us, total = 6.625 ms [state-dump] ClientConnection.async_read.ProcessMessage - 74 total (0 active), Execution time: mean = 752.373 us, total = 55.676 ms, Queueing time: mean = 18.898 us, max = 247.615 us, min = 4.453 us, total = 1.398 ms [state-dump] ClientConnection.async_write.DoAsyncWrites - 22 total (0 active), Execution time: mean = 2.129 us, total = 46.834 us, Queueing time: mean = 43.475 us, max = 264.443 us, min = 17.727 us, total = 956.459 us [state-dump] ObjectManager.ObjectDeleted - 21 total (0 active), Execution time: mean = 18.654 us, total = 391.728 us, Queueing time: mean = 66.932 us, max = 269.040 us, min = 24.280 us, total = 1.406 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 21 total (0 active), Execution time: mean = 119.783 us, total = 2.515 ms, Queueing time: mean = 119.709 us, max = 193.975 us, min = 26.529 us, total = 2.514 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig - 21 total (0 active), Execution time: mean = 800.451 us, total = 16.809 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ObjectManager.ObjectAdded - 21 total (0 active), Execution time: mean = 13.522 us, total = 283.961 us, Queueing time: mean = 115.200 us, max = 203.067 us, min = 15.853 us, total = 2.419 ms [state-dump] PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 204.482 us, total = 2.658 ms, Queueing time: mean = 3.109 ms, max = 9.915 ms, min = 28.359 us, total = 40.413 ms [state-dump] - 12 total (0 active), Execution time: mean = 1.109 us, total = 13.308 us, Queueing time: mean = 117.235 us, max = 207.332 us, min = 32.817 us, total = 1.407 ms [state-dump] RaySyncer.BroadcastMessage - 12 total (0 active), Execution time: mean = 229.790 us, total = 2.757 ms, Queueing time: mean = 796.500 ns, max = 1.330 us, min = 180.000 ns, total = 9.558 us [state-dump] NodeManagerService.grpc_server.RequestWorkerLease.HandleRequestImpl - 10 total (0 active), Execution time: mean = 133.400 us, total = 1.334 ms, Queueing time: mean = 133.654 us, max = 258.016 us, min = 50.206 us, total = 1.337 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker - 10 total (0 active), Execution time: mean = 698.937 us, total = 6.989 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.RequestWorkerLease - 10 total (0 active), Execution time: mean = 967.372 us, total = 9.674 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] WorkerPool.PopWorkerCallback - 10 total (0 active), Execution time: mean = 24.488 us, total = 244.878 us, Queueing time: mean = 114.997 us, max = 206.869 us, min = 23.653 us, total = 1.150 ms [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 10 total (1 active), Execution time: mean = 479.761 s, total = 4797.607 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReturnWorker.HandleRequestImpl - 10 total (0 active), Execution time: mean = 129.906 us, total = 1.299 ms, Queueing time: mean = 109.463 us, max = 137.712 us, min = 31.332 us, total = 1.095 ms [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 9 total (0 active), Execution time: mean = 379.978 us, total = 3.420 ms, Queueing time: mean = 131.185 us, max = 243.371 us, min = 20.299 us, total = 1.181 ms [state-dump] NodeManager.GCTaskFailureReason - 6 total (1 active), Execution time: mean = 7.110 us, total = 42.659 us, Queueing time: mean = 57.967 us, max = 97.290 us, min = 24.344 us, total = 347.804 us [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 1.320 ms, total = 2.640 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 142.119 us, total = 284.237 us, Queueing time: mean = 702.200 us, max = 1.287 ms, min = 117.240 us, total = 1.404 ms [state-dump] RaySyncerRegister - 2 total (0 active), Execution time: mean = 2.053 us, total = 4.106 us, Queueing time: mean = 222.000 ns, max = 369.000 ns, min = 75.000 ns, total = 444.000 ns [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 2.178 ms, total = 2.178 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 1.022 s, total = 1.022 s, Queueing time: mean = 126.779 us, max = 126.779 us, min = 126.779 us, total = 126.779 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 43.810 us, total = 43.810 us, Queueing time: mean = 116.080 us, max = 116.080 us, min = 116.080 us, total = 116.080 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 1.935 ms, total = 1.935 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 2.002 ms, total = 2.002 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 1.278 ms, total = 1.278 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 288.995 us, total = 288.995 us, Queueing time: mean = 122.868 us, max = 122.868 us, min = 122.868 us, total = 122.868 us [state-dump] Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 74.750 us, total = 74.750 us, Queueing time: mean = 225.429 us, max = 225.429 us, min = 225.429 us, total = 225.429 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 242.297 us, total = 242.297 us, Queueing time: mean = 122.977 us, max = 122.977 us, min = 122.977 us, total = 122.977 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 92.080 us, total = 92.080 us, Queueing time: mean = 36.492 us, max = 36.492 us, min = 36.492 us, total = 36.492 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 1.746 ms, total = 1.746 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] DebugString() time ms: 1 [state-dump] [state-dump] [2025-01-21 00:26:54,929 I 13636 13664] (raylet) store.cc:564: Plasma store debug dump: Current usage: 0 / 2.14748 GB - num bytes created total: 168 0 pending objects of total size 0MB - objects spillable: 0 - bytes spillable: 0 - objects unsealed: 0 - bytes unsealed: 0 - objects in use: 0 - bytes in use: 0 - objects evictable: 0 - bytes evictable: 0 - objects created by worker: 0 - bytes created by worker: 0 - objects restored: 0 - bytes restored: 0 - objects received: 0 - bytes received: 0 - objects errored: 0 - bytes errored: 0 [2025-01-21 00:26:56,166 I 13636 13636] (raylet) node_manager.cc:525: [state-dump] NodeManager: [state-dump] Node ID: 94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3 [state-dump] Node name: 192.168.0.2 [state-dump] InitialConfigResources: {object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, GPU: 20000, node:__internal_head__: 10000, accelerator_type:A40: 10000, memory: 844966629380000} [state-dump] ClusterTaskManager: [state-dump] ========== Node: 94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3 ================= [state-dump] Infeasible queue length: 0 [state-dump] Schedule queue length: 0 [state-dump] Dispatch queue length: 0 [state-dump] num_waiting_for_resource: 0 [state-dump] num_waiting_for_plasma_memory: 0 [state-dump] num_waiting_for_remote_node_resources: 0 [state-dump] num_worker_not_started_by_job_config_not_exist: 0 [state-dump] num_worker_not_started_by_registration_timeout: 0 [state-dump] num_tasks_waiting_for_workers: 0 [state-dump] num_cancelled_tasks: 0 [state-dump] cluster_resource_scheduler state: [state-dump] Local id: -2729954820020547849 Local resources: {"total":{CPU: [200000], memory: [844966629380000], accelerator_type:A40: [10000], node:192.168.0.2: [10000], node:__internal_head__: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000]}}, "available": {CPU: [200000], memory: [844966629380000], accelerator_type:A40: [10000], node:192.168.0.2: [10000], node:__internal_head__: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000]}}, "labels":{"ray.io/node_id":"94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3",} is_draining: 0 is_idle: 1 Cluster resources: node id: -2729954820020547849{"total":{object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, GPU: 20000, node:__internal_head__: 10000, accelerator_type:A40: 10000, memory: 844966629380000}}, "available": {GPU: 20000, node:__internal_head__: 10000, memory: 844966629380000, accelerator_type:A40: 10000, object_store_memory: 21474836480000, node:192.168.0.2: 10000, CPU: 200000}}, "labels":{"ray.io/node_id":"94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} [state-dump] Waiting tasks size: 0 [state-dump] Number of executing tasks: 0 [state-dump] Number of pinned task arguments: 0 [state-dump] Number of total spilled tasks: 0 [state-dump] Number of spilled waiting tasks: 0 [state-dump] Number of spilled unschedulable tasks: 0 [state-dump] Resource usage { [state-dump] } [state-dump] Backlog Size per scheduling descriptor :{workerId: num backlogs}: [state-dump] [state-dump] Running tasks by scheduling class: [state-dump] ================================================== [state-dump] [state-dump] ClusterResources: [state-dump] LocalObjectManager: [state-dump] - num pinned objects: 0 [state-dump] - pinned objects size: 0 [state-dump] - num objects pending restore: 0 [state-dump] - num objects pending spill: 0 [state-dump] - num bytes pending spill: 0 [state-dump] - num bytes currently spilled: 0 [state-dump] - cumulative spill requests: 0 [state-dump] - cumulative restore requests: 0 [state-dump] - spilled objects pending delete: 0 [state-dump] [state-dump] ObjectManager: [state-dump] - num local objects: 0 [state-dump] - num unfulfilled push requests: 0 [state-dump] - num object pull requests: 0 [state-dump] - num chunks received total: 0 [state-dump] - num chunks received failed (all): 0 [state-dump] - num chunks received failed / cancelled: 0 [state-dump] - num chunks received failed / plasma error: 0 [state-dump] Event stats: [state-dump] Global stats: 0 total (0 active) [state-dump] Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Execution time: mean = -nan s, total = 0.000 s [state-dump] Event stats: [state-dump] PushManager: [state-dump] - num pushes in flight: 0 [state-dump] - num chunks in flight: 0 [state-dump] - num chunks remaining: 0 [state-dump] - max chunks allowed: 409 [state-dump] OwnershipBasedObjectDirectory: [state-dump] - num listeners: 0 [state-dump] - cumulative location updates: 0 [state-dump] - num location updates per second: 0.000 [state-dump] - num location lookups per second: 0.000 [state-dump] - num locations added per second: 0.000 [state-dump] - num locations removed per second: 0.000 [state-dump] BufferPool: [state-dump] - create buffer state map size: 0 [state-dump] PullManager: [state-dump] - num bytes available for pulled objects: 2147483648 [state-dump] - num bytes being pulled (all): 0 [state-dump] - num bytes being pulled / pinned: 0 [state-dump] - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - first get request bundle: N/A [state-dump] - first wait request bundle: N/A [state-dump] - first task request bundle: N/A [state-dump] - num objects queued: 0 [state-dump] - num objects actively pulled (all): 0 [state-dump] - num objects actively pulled / pinned: 0 [state-dump] - num bundles being pulled: 0 [state-dump] - num pull retries: 0 [state-dump] - max timeout seconds: 0 [state-dump] - max timeout request is already processed. No entry. [state-dump] [state-dump] WorkerPool: [state-dump] - registered jobs: 1 [state-dump] - process_failed_job_config_missing: 0 [state-dump] - process_failed_rate_limited: 0 [state-dump] - process_failed_pending_registration: 0 [state-dump] - process_failed_runtime_env_setup_failed: 0 [state-dump] - num PYTHON workers: 20 [state-dump] - num PYTHON drivers: 1 [state-dump] - num PYTHON pending start requests: 0 [state-dump] - num PYTHON pending registration requests: 0 [state-dump] - num object spill callbacks queued: 0 [state-dump] - num object restore queued: 0 [state-dump] - num util functions queued: 0 [state-dump] - num idle workers: 20 [state-dump] TaskDependencyManager: [state-dump] - task deps map size: 0 [state-dump] - get req map size: 0 [state-dump] - wait req map size: 0 [state-dump] - local objects map size: 0 [state-dump] WaitManager: [state-dump] - num active wait requests: 0 [state-dump] Subscriber: [state-dump] Channel WORKER_OBJECT_LOCATIONS_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_OBJECT_EVICTION [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_REF_REMOVED_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] num async plasma notifications: 0 [state-dump] Remote node managers: [state-dump] Event stats: [state-dump] Global stats: 445093 total (35 active) [state-dump] Queueing time: mean = 219.435 us, max = 59.826 s, min = -0.001 s, total = 97.669 s [state-dump] Execution time: mean = 10.959 ms, total = 4877.853 s [state-dump] Event stats: [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog - 107058 total (0 active), Execution time: mean = 532.063 us, total = 56.962 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 107058 total (0 active), Execution time: mean = 35.480 us, total = 3.798 s, Queueing time: mean = 111.031 us, max = 3.003 ms, min = 1.846 us, total = 11.887 s [state-dump] ObjectManager.UpdateAvailableMemory - 50951 total (0 active), Execution time: mean = 6.237 us, total = 317.777 ms, Queueing time: mean = 110.971 us, max = 3.228 ms, min = 2.228 us, total = 5.654 s [state-dump] NodeManager.CheckGC - 50951 total (1 active), Execution time: mean = 2.915 us, total = 148.503 ms, Queueing time: mean = 98.975 us, max = 25.875 ms, min = -0.000 s, total = 5.043 s [state-dump] RaySyncer.OnDemandBroadcasting - 50951 total (1 active), Execution time: mean = 10.828 us, total = 551.702 ms, Queueing time: mean = 91.974 us, max = 25.869 ms, min = 6.166 us, total = 4.686 s [state-dump] RayletWorkerPool.deadline_timer.kill_idle_workers - 25490 total (1 active), Execution time: mean = 18.763 us, total = 478.270 ms, Queueing time: mean = 76.953 us, max = 26.386 ms, min = -0.001 s, total = 1.962 s [state-dump] MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 20358 total (1 active), Execution time: mean = 457.422 us, total = 9.312 s, Queueing time: mean = 75.682 us, max = 3.532 ms, min = -0.000 s, total = 1.541 s [state-dump] NodeManager.deadline_timer.spill_objects_when_over_threshold - 5100 total (1 active), Execution time: mean = 3.141 us, total = 16.019 ms, Queueing time: mean = 182.670 us, max = 2.946 ms, min = 3.845 us, total = 931.617 ms [state-dump] NodeManager.deadline_timer.flush_free_objects - 5100 total (1 active), Execution time: mean = 9.327 us, total = 47.568 ms, Queueing time: mean = 178.459 us, max = 2.947 ms, min = 3.811 us, total = 910.138 ms [state-dump] NodeManager.ScheduleAndDispatchTasks - 5100 total (1 active), Execution time: mean = 16.948 us, total = 86.435 ms, Queueing time: mean = 75.416 us, max = 2.581 ms, min = 8.735 us, total = 384.619 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 5098 total (0 active), Execution time: mean = 97.410 us, total = 496.598 ms, Queueing time: mean = 112.783 us, max = 2.934 ms, min = 4.027 us, total = 574.966 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad - 5098 total (0 active), Execution time: mean = 612.389 us, total = 3.122 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ClusterResourceManager.ResetRemoteNodeView - 1701 total (1 active), Execution time: mean = 10.102 us, total = 17.184 ms, Queueing time: mean = 76.038 us, max = 442.307 us, min = 14.043 us, total = 129.341 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 1020 total (0 active), Execution time: mean = 1.603 ms, total = 1.635 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.deadline_timer.record_metrics - 1020 total (1 active), Execution time: mean = 584.973 us, total = 596.672 ms, Queueing time: mean = 342.963 us, max = 2.010 ms, min = 9.115 us, total = 349.822 ms [state-dump] NodeManager.GcsCheckAlive - 1020 total (1 active), Execution time: mean = 316.604 us, total = 322.936 ms, Queueing time: mean = 610.796 us, max = 2.567 ms, min = 6.690 us, total = 623.012 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 1020 total (0 active), Execution time: mean = 55.805 us, total = 56.921 ms, Queueing time: mean = 112.694 us, max = 4.779 ms, min = 11.561 us, total = 114.948 ms [state-dump] NodeManager.deadline_timer.debug_state_dump - 510 total (1 active), Execution time: mean = 1.783 ms, total = 909.076 ms, Queueing time: mean = 71.524 us, max = 176.385 us, min = 11.609 us, total = 36.477 ms [state-dump] ClientConnection.async_read.ProcessMessageHeader - 95 total (21 active), Execution time: mean = 7.620 us, total = 723.861 us, Queueing time: mean = 660.815 ms, max = 59.826 s, min = 29.537 us, total = 62.777 s [state-dump] NodeManager.deadline_timer.print_event_loop_stats - 85 total (1 active, 1 running), Execution time: mean = 2.726 ms, total = 231.705 ms, Queueing time: mean = 78.758 us, max = 325.100 us, min = 15.835 us, total = 6.694 ms [state-dump] ClientConnection.async_read.ProcessMessage - 74 total (0 active), Execution time: mean = 752.373 us, total = 55.676 ms, Queueing time: mean = 18.898 us, max = 247.615 us, min = 4.453 us, total = 1.398 ms [state-dump] ClientConnection.async_write.DoAsyncWrites - 22 total (0 active), Execution time: mean = 2.129 us, total = 46.834 us, Queueing time: mean = 43.475 us, max = 264.443 us, min = 17.727 us, total = 956.459 us [state-dump] ObjectManager.ObjectDeleted - 21 total (0 active), Execution time: mean = 18.654 us, total = 391.728 us, Queueing time: mean = 66.932 us, max = 269.040 us, min = 24.280 us, total = 1.406 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 21 total (0 active), Execution time: mean = 119.783 us, total = 2.515 ms, Queueing time: mean = 119.709 us, max = 193.975 us, min = 26.529 us, total = 2.514 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig - 21 total (0 active), Execution time: mean = 800.451 us, total = 16.809 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ObjectManager.ObjectAdded - 21 total (0 active), Execution time: mean = 13.522 us, total = 283.961 us, Queueing time: mean = 115.200 us, max = 203.067 us, min = 15.853 us, total = 2.419 ms [state-dump] PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 204.482 us, total = 2.658 ms, Queueing time: mean = 3.109 ms, max = 9.915 ms, min = 28.359 us, total = 40.413 ms [state-dump] - 12 total (0 active), Execution time: mean = 1.109 us, total = 13.308 us, Queueing time: mean = 117.235 us, max = 207.332 us, min = 32.817 us, total = 1.407 ms [state-dump] RaySyncer.BroadcastMessage - 12 total (0 active), Execution time: mean = 229.790 us, total = 2.757 ms, Queueing time: mean = 796.500 ns, max = 1.330 us, min = 180.000 ns, total = 9.558 us [state-dump] NodeManagerService.grpc_server.RequestWorkerLease.HandleRequestImpl - 10 total (0 active), Execution time: mean = 133.400 us, total = 1.334 ms, Queueing time: mean = 133.654 us, max = 258.016 us, min = 50.206 us, total = 1.337 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker - 10 total (0 active), Execution time: mean = 698.937 us, total = 6.989 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.RequestWorkerLease - 10 total (0 active), Execution time: mean = 967.372 us, total = 9.674 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] WorkerPool.PopWorkerCallback - 10 total (0 active), Execution time: mean = 24.488 us, total = 244.878 us, Queueing time: mean = 114.997 us, max = 206.869 us, min = 23.653 us, total = 1.150 ms [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 10 total (1 active), Execution time: mean = 479.761 s, total = 4797.607 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReturnWorker.HandleRequestImpl - 10 total (0 active), Execution time: mean = 129.906 us, total = 1.299 ms, Queueing time: mean = 109.463 us, max = 137.712 us, min = 31.332 us, total = 1.095 ms [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 9 total (0 active), Execution time: mean = 379.978 us, total = 3.420 ms, Queueing time: mean = 131.185 us, max = 243.371 us, min = 20.299 us, total = 1.181 ms [state-dump] NodeManager.GCTaskFailureReason - 6 total (1 active), Execution time: mean = 7.110 us, total = 42.659 us, Queueing time: mean = 57.967 us, max = 97.290 us, min = 24.344 us, total = 347.804 us [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 1.320 ms, total = 2.640 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 142.119 us, total = 284.237 us, Queueing time: mean = 702.200 us, max = 1.287 ms, min = 117.240 us, total = 1.404 ms [state-dump] RaySyncerRegister - 2 total (0 active), Execution time: mean = 2.053 us, total = 4.106 us, Queueing time: mean = 222.000 ns, max = 369.000 ns, min = 75.000 ns, total = 444.000 ns [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 2.178 ms, total = 2.178 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 1.022 s, total = 1.022 s, Queueing time: mean = 126.779 us, max = 126.779 us, min = 126.779 us, total = 126.779 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 43.810 us, total = 43.810 us, Queueing time: mean = 116.080 us, max = 116.080 us, min = 116.080 us, total = 116.080 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 1.935 ms, total = 1.935 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 2.002 ms, total = 2.002 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 1.278 ms, total = 1.278 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 288.995 us, total = 288.995 us, Queueing time: mean = 122.868 us, max = 122.868 us, min = 122.868 us, total = 122.868 us [state-dump] Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 74.750 us, total = 74.750 us, Queueing time: mean = 225.429 us, max = 225.429 us, min = 225.429 us, total = 225.429 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 242.297 us, total = 242.297 us, Queueing time: mean = 122.977 us, max = 122.977 us, min = 122.977 us, total = 122.977 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 92.080 us, total = 92.080 us, Queueing time: mean = 36.492 us, max = 36.492 us, min = 36.492 us, total = 36.492 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 1.746 ms, total = 1.746 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] DebugString() time ms: 2 [state-dump] [state-dump] [2025-01-21 00:27:54,929 I 13636 13664] (raylet) store.cc:564: Plasma store debug dump: Current usage: 0 / 2.14748 GB - num bytes created total: 168 0 pending objects of total size 0MB - objects spillable: 0 - bytes spillable: 0 - objects unsealed: 0 - bytes unsealed: 0 - objects in use: 0 - bytes in use: 0 - objects evictable: 0 - bytes evictable: 0 - objects created by worker: 0 - bytes created by worker: 0 - objects restored: 0 - bytes restored: 0 - objects received: 0 - bytes received: 0 - objects errored: 0 - bytes errored: 0 [2025-01-21 00:27:56,169 I 13636 13636] (raylet) node_manager.cc:525: [state-dump] NodeManager: [state-dump] Node ID: 94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3 [state-dump] Node name: 192.168.0.2 [state-dump] InitialConfigResources: {object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, GPU: 20000, node:__internal_head__: 10000, accelerator_type:A40: 10000, memory: 844966629380000} [state-dump] ClusterTaskManager: [state-dump] ========== Node: 94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3 ================= [state-dump] Infeasible queue length: 0 [state-dump] Schedule queue length: 0 [state-dump] Dispatch queue length: 0 [state-dump] num_waiting_for_resource: 0 [state-dump] num_waiting_for_plasma_memory: 0 [state-dump] num_waiting_for_remote_node_resources: 0 [state-dump] num_worker_not_started_by_job_config_not_exist: 0 [state-dump] num_worker_not_started_by_registration_timeout: 0 [state-dump] num_tasks_waiting_for_workers: 0 [state-dump] num_cancelled_tasks: 0 [state-dump] cluster_resource_scheduler state: [state-dump] Local id: -2729954820020547849 Local resources: {"total":{CPU: [200000], memory: [844966629380000], accelerator_type:A40: [10000], node:192.168.0.2: [10000], node:__internal_head__: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000]}}, "available": {CPU: [200000], memory: [844966629380000], accelerator_type:A40: [10000], node:192.168.0.2: [10000], node:__internal_head__: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000]}}, "labels":{"ray.io/node_id":"94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3",} is_draining: 0 is_idle: 1 Cluster resources: node id: -2729954820020547849{"total":{object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, GPU: 20000, node:__internal_head__: 10000, accelerator_type:A40: 10000, memory: 844966629380000}}, "available": {GPU: 20000, node:__internal_head__: 10000, memory: 844966629380000, accelerator_type:A40: 10000, object_store_memory: 21474836480000, node:192.168.0.2: 10000, CPU: 200000}}, "labels":{"ray.io/node_id":"94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} [state-dump] Waiting tasks size: 0 [state-dump] Number of executing tasks: 0 [state-dump] Number of pinned task arguments: 0 [state-dump] Number of total spilled tasks: 0 [state-dump] Number of spilled waiting tasks: 0 [state-dump] Number of spilled unschedulable tasks: 0 [state-dump] Resource usage { [state-dump] } [state-dump] Backlog Size per scheduling descriptor :{workerId: num backlogs}: [state-dump] [state-dump] Running tasks by scheduling class: [state-dump] ================================================== [state-dump] [state-dump] ClusterResources: [state-dump] LocalObjectManager: [state-dump] - num pinned objects: 0 [state-dump] - pinned objects size: 0 [state-dump] - num objects pending restore: 0 [state-dump] - num objects pending spill: 0 [state-dump] - num bytes pending spill: 0 [state-dump] - num bytes currently spilled: 0 [state-dump] - cumulative spill requests: 0 [state-dump] - cumulative restore requests: 0 [state-dump] - spilled objects pending delete: 0 [state-dump] [state-dump] ObjectManager: [state-dump] - num local objects: 0 [state-dump] - num unfulfilled push requests: 0 [state-dump] - num object pull requests: 0 [state-dump] - num chunks received total: 0 [state-dump] - num chunks received failed (all): 0 [state-dump] - num chunks received failed / cancelled: 0 [state-dump] - num chunks received failed / plasma error: 0 [state-dump] Event stats: [state-dump] Global stats: 0 total (0 active) [state-dump] Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Execution time: mean = -nan s, total = 0.000 s [state-dump] Event stats: [state-dump] PushManager: [state-dump] - num pushes in flight: 0 [state-dump] - num chunks in flight: 0 [state-dump] - num chunks remaining: 0 [state-dump] - max chunks allowed: 409 [state-dump] OwnershipBasedObjectDirectory: [state-dump] - num listeners: 0 [state-dump] - cumulative location updates: 0 [state-dump] - num location updates per second: 0.000 [state-dump] - num location lookups per second: 0.000 [state-dump] - num locations added per second: 0.000 [state-dump] - num locations removed per second: 0.000 [state-dump] BufferPool: [state-dump] - create buffer state map size: 0 [state-dump] PullManager: [state-dump] - num bytes available for pulled objects: 2147483648 [state-dump] - num bytes being pulled (all): 0 [state-dump] - num bytes being pulled / pinned: 0 [state-dump] - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - first get request bundle: N/A [state-dump] - first wait request bundle: N/A [state-dump] - first task request bundle: N/A [state-dump] - num objects queued: 0 [state-dump] - num objects actively pulled (all): 0 [state-dump] - num objects actively pulled / pinned: 0 [state-dump] - num bundles being pulled: 0 [state-dump] - num pull retries: 0 [state-dump] - max timeout seconds: 0 [state-dump] - max timeout request is already processed. No entry. [state-dump] [state-dump] WorkerPool: [state-dump] - registered jobs: 1 [state-dump] - process_failed_job_config_missing: 0 [state-dump] - process_failed_rate_limited: 0 [state-dump] - process_failed_pending_registration: 0 [state-dump] - process_failed_runtime_env_setup_failed: 0 [state-dump] - num PYTHON workers: 20 [state-dump] - num PYTHON drivers: 1 [state-dump] - num PYTHON pending start requests: 0 [state-dump] - num PYTHON pending registration requests: 0 [state-dump] - num object spill callbacks queued: 0 [state-dump] - num object restore queued: 0 [state-dump] - num util functions queued: 0 [state-dump] - num idle workers: 20 [state-dump] TaskDependencyManager: [state-dump] - task deps map size: 0 [state-dump] - get req map size: 0 [state-dump] - wait req map size: 0 [state-dump] - local objects map size: 0 [state-dump] WaitManager: [state-dump] - num active wait requests: 0 [state-dump] Subscriber: [state-dump] Channel WORKER_OBJECT_LOCATIONS_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_OBJECT_EVICTION [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_REF_REMOVED_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] num async plasma notifications: 0 [state-dump] Remote node managers: [state-dump] Event stats: [state-dump] Global stats: 450328 total (35 active) [state-dump] Queueing time: mean = 217.848 us, max = 59.826 s, min = -0.001 s, total = 98.103 s [state-dump] Execution time: mean = 10.834 ms, total = 4878.823 s [state-dump] Event stats: [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog - 108318 total (0 active), Execution time: mean = 532.271 us, total = 57.655 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 108318 total (0 active), Execution time: mean = 35.512 us, total = 3.847 s, Queueing time: mean = 111.105 us, max = 3.003 ms, min = 1.846 us, total = 12.035 s [state-dump] ObjectManager.UpdateAvailableMemory - 51551 total (0 active), Execution time: mean = 6.248 us, total = 322.069 ms, Queueing time: mean = 111.070 us, max = 3.228 ms, min = 2.228 us, total = 5.726 s [state-dump] NodeManager.CheckGC - 51551 total (1 active), Execution time: mean = 2.917 us, total = 150.361 ms, Queueing time: mean = 99.010 us, max = 25.875 ms, min = -0.000 s, total = 5.104 s [state-dump] RaySyncer.OnDemandBroadcasting - 51551 total (1 active), Execution time: mean = 10.844 us, total = 559.033 ms, Queueing time: mean = 91.995 us, max = 25.869 ms, min = 6.166 us, total = 4.742 s [state-dump] RayletWorkerPool.deadline_timer.kill_idle_workers - 25790 total (1 active), Execution time: mean = 18.791 us, total = 484.610 ms, Queueing time: mean = 76.999 us, max = 26.386 ms, min = -0.001 s, total = 1.986 s [state-dump] MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 20598 total (1 active), Execution time: mean = 457.678 us, total = 9.427 s, Queueing time: mean = 75.729 us, max = 3.532 ms, min = -0.000 s, total = 1.560 s [state-dump] NodeManager.deadline_timer.spill_objects_when_over_threshold - 5160 total (1 active), Execution time: mean = 3.143 us, total = 16.220 ms, Queueing time: mean = 182.916 us, max = 2.946 ms, min = 3.845 us, total = 943.845 ms [state-dump] NodeManager.deadline_timer.flush_free_objects - 5160 total (1 active), Execution time: mean = 9.356 us, total = 48.277 ms, Queueing time: mean = 178.685 us, max = 2.947 ms, min = 3.811 us, total = 922.014 ms [state-dump] NodeManager.ScheduleAndDispatchTasks - 5160 total (1 active), Execution time: mean = 16.999 us, total = 87.713 ms, Queueing time: mean = 75.493 us, max = 2.581 ms, min = 8.735 us, total = 389.546 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 5158 total (0 active), Execution time: mean = 97.473 us, total = 502.767 ms, Queueing time: mean = 112.943 us, max = 2.934 ms, min = 4.027 us, total = 582.561 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad - 5158 total (0 active), Execution time: mean = 612.765 us, total = 3.161 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ClusterResourceManager.ResetRemoteNodeView - 1721 total (1 active), Execution time: mean = 10.108 us, total = 17.396 ms, Queueing time: mean = 76.052 us, max = 442.307 us, min = 14.043 us, total = 130.886 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 1032 total (0 active), Execution time: mean = 1.603 ms, total = 1.655 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.deadline_timer.record_metrics - 1032 total (1 active), Execution time: mean = 585.140 us, total = 603.864 ms, Queueing time: mean = 343.962 us, max = 2.010 ms, min = 9.115 us, total = 354.968 ms [state-dump] NodeManager.GcsCheckAlive - 1032 total (1 active), Execution time: mean = 317.160 us, total = 327.309 ms, Queueing time: mean = 611.515 us, max = 2.567 ms, min = 6.690 us, total = 631.084 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 1032 total (0 active), Execution time: mean = 55.847 us, total = 57.634 ms, Queueing time: mean = 112.929 us, max = 4.779 ms, min = 11.561 us, total = 116.542 ms [state-dump] NodeManager.deadline_timer.debug_state_dump - 516 total (1 active), Execution time: mean = 1.784 ms, total = 920.377 ms, Queueing time: mean = 71.581 us, max = 176.385 us, min = 11.609 us, total = 36.936 ms [state-dump] ClientConnection.async_read.ProcessMessageHeader - 95 total (21 active), Execution time: mean = 7.620 us, total = 723.861 us, Queueing time: mean = 660.815 ms, max = 59.826 s, min = 29.537 us, total = 62.777 s [state-dump] NodeManager.deadline_timer.print_event_loop_stats - 86 total (1 active, 1 running), Execution time: mean = 2.729 ms, total = 234.660 ms, Queueing time: mean = 79.028 us, max = 325.100 us, min = 15.835 us, total = 6.796 ms [state-dump] ClientConnection.async_read.ProcessMessage - 74 total (0 active), Execution time: mean = 752.373 us, total = 55.676 ms, Queueing time: mean = 18.898 us, max = 247.615 us, min = 4.453 us, total = 1.398 ms [state-dump] ClientConnection.async_write.DoAsyncWrites - 22 total (0 active), Execution time: mean = 2.129 us, total = 46.834 us, Queueing time: mean = 43.475 us, max = 264.443 us, min = 17.727 us, total = 956.459 us [state-dump] ObjectManager.ObjectDeleted - 21 total (0 active), Execution time: mean = 18.654 us, total = 391.728 us, Queueing time: mean = 66.932 us, max = 269.040 us, min = 24.280 us, total = 1.406 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 21 total (0 active), Execution time: mean = 119.783 us, total = 2.515 ms, Queueing time: mean = 119.709 us, max = 193.975 us, min = 26.529 us, total = 2.514 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig - 21 total (0 active), Execution time: mean = 800.451 us, total = 16.809 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ObjectManager.ObjectAdded - 21 total (0 active), Execution time: mean = 13.522 us, total = 283.961 us, Queueing time: mean = 115.200 us, max = 203.067 us, min = 15.853 us, total = 2.419 ms [state-dump] PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 204.482 us, total = 2.658 ms, Queueing time: mean = 3.109 ms, max = 9.915 ms, min = 28.359 us, total = 40.413 ms [state-dump] - 12 total (0 active), Execution time: mean = 1.109 us, total = 13.308 us, Queueing time: mean = 117.235 us, max = 207.332 us, min = 32.817 us, total = 1.407 ms [state-dump] RaySyncer.BroadcastMessage - 12 total (0 active), Execution time: mean = 229.790 us, total = 2.757 ms, Queueing time: mean = 796.500 ns, max = 1.330 us, min = 180.000 ns, total = 9.558 us [state-dump] NodeManagerService.grpc_server.RequestWorkerLease.HandleRequestImpl - 10 total (0 active), Execution time: mean = 133.400 us, total = 1.334 ms, Queueing time: mean = 133.654 us, max = 258.016 us, min = 50.206 us, total = 1.337 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker - 10 total (0 active), Execution time: mean = 698.937 us, total = 6.989 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.RequestWorkerLease - 10 total (0 active), Execution time: mean = 967.372 us, total = 9.674 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] WorkerPool.PopWorkerCallback - 10 total (0 active), Execution time: mean = 24.488 us, total = 244.878 us, Queueing time: mean = 114.997 us, max = 206.869 us, min = 23.653 us, total = 1.150 ms [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 10 total (1 active), Execution time: mean = 479.761 s, total = 4797.607 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReturnWorker.HandleRequestImpl - 10 total (0 active), Execution time: mean = 129.906 us, total = 1.299 ms, Queueing time: mean = 109.463 us, max = 137.712 us, min = 31.332 us, total = 1.095 ms [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 9 total (0 active), Execution time: mean = 379.978 us, total = 3.420 ms, Queueing time: mean = 131.185 us, max = 243.371 us, min = 20.299 us, total = 1.181 ms [state-dump] NodeManager.GCTaskFailureReason - 6 total (1 active), Execution time: mean = 7.110 us, total = 42.659 us, Queueing time: mean = 57.967 us, max = 97.290 us, min = 24.344 us, total = 347.804 us [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 1.320 ms, total = 2.640 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 142.119 us, total = 284.237 us, Queueing time: mean = 702.200 us, max = 1.287 ms, min = 117.240 us, total = 1.404 ms [state-dump] RaySyncerRegister - 2 total (0 active), Execution time: mean = 2.053 us, total = 4.106 us, Queueing time: mean = 222.000 ns, max = 369.000 ns, min = 75.000 ns, total = 444.000 ns [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 2.178 ms, total = 2.178 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 1.022 s, total = 1.022 s, Queueing time: mean = 126.779 us, max = 126.779 us, min = 126.779 us, total = 126.779 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 43.810 us, total = 43.810 us, Queueing time: mean = 116.080 us, max = 116.080 us, min = 116.080 us, total = 116.080 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 1.935 ms, total = 1.935 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 2.002 ms, total = 2.002 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 1.278 ms, total = 1.278 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 288.995 us, total = 288.995 us, Queueing time: mean = 122.868 us, max = 122.868 us, min = 122.868 us, total = 122.868 us [state-dump] Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 74.750 us, total = 74.750 us, Queueing time: mean = 225.429 us, max = 225.429 us, min = 225.429 us, total = 225.429 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 242.297 us, total = 242.297 us, Queueing time: mean = 122.977 us, max = 122.977 us, min = 122.977 us, total = 122.977 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 92.080 us, total = 92.080 us, Queueing time: mean = 36.492 us, max = 36.492 us, min = 36.492 us, total = 36.492 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 1.746 ms, total = 1.746 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] DebugString() time ms: 2 [state-dump] [state-dump] [2025-01-21 00:28:54,930 I 13636 13664] (raylet) store.cc:564: Plasma store debug dump: Current usage: 0 / 2.14748 GB - num bytes created total: 168 0 pending objects of total size 0MB - objects spillable: 0 - bytes spillable: 0 - objects unsealed: 0 - bytes unsealed: 0 - objects in use: 0 - bytes in use: 0 - objects evictable: 0 - bytes evictable: 0 - objects created by worker: 0 - bytes created by worker: 0 - objects restored: 0 - bytes restored: 0 - objects received: 0 - bytes received: 0 - objects errored: 0 - bytes errored: 0 [2025-01-21 00:28:56,171 I 13636 13636] (raylet) node_manager.cc:525: [state-dump] NodeManager: [state-dump] Node ID: 94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3 [state-dump] Node name: 192.168.0.2 [state-dump] InitialConfigResources: {object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, GPU: 20000, node:__internal_head__: 10000, accelerator_type:A40: 10000, memory: 844966629380000} [state-dump] ClusterTaskManager: [state-dump] ========== Node: 94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3 ================= [state-dump] Infeasible queue length: 0 [state-dump] Schedule queue length: 0 [state-dump] Dispatch queue length: 0 [state-dump] num_waiting_for_resource: 0 [state-dump] num_waiting_for_plasma_memory: 0 [state-dump] num_waiting_for_remote_node_resources: 0 [state-dump] num_worker_not_started_by_job_config_not_exist: 0 [state-dump] num_worker_not_started_by_registration_timeout: 0 [state-dump] num_tasks_waiting_for_workers: 0 [state-dump] num_cancelled_tasks: 0 [state-dump] cluster_resource_scheduler state: [state-dump] Local id: -2729954820020547849 Local resources: {"total":{CPU: [200000], memory: [844966629380000], accelerator_type:A40: [10000], node:192.168.0.2: [10000], node:__internal_head__: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000]}}, "available": {CPU: [200000], memory: [844966629380000], accelerator_type:A40: [10000], node:192.168.0.2: [10000], node:__internal_head__: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000]}}, "labels":{"ray.io/node_id":"94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3",} is_draining: 0 is_idle: 1 Cluster resources: node id: -2729954820020547849{"total":{object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, GPU: 20000, node:__internal_head__: 10000, accelerator_type:A40: 10000, memory: 844966629380000}}, "available": {GPU: 20000, node:__internal_head__: 10000, memory: 844966629380000, accelerator_type:A40: 10000, object_store_memory: 21474836480000, node:192.168.0.2: 10000, CPU: 200000}}, "labels":{"ray.io/node_id":"94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} [state-dump] Waiting tasks size: 0 [state-dump] Number of executing tasks: 0 [state-dump] Number of pinned task arguments: 0 [state-dump] Number of total spilled tasks: 0 [state-dump] Number of spilled waiting tasks: 0 [state-dump] Number of spilled unschedulable tasks: 0 [state-dump] Resource usage { [state-dump] } [state-dump] Backlog Size per scheduling descriptor :{workerId: num backlogs}: [state-dump] [state-dump] Running tasks by scheduling class: [state-dump] ================================================== [state-dump] [state-dump] ClusterResources: [state-dump] LocalObjectManager: [state-dump] - num pinned objects: 0 [state-dump] - pinned objects size: 0 [state-dump] - num objects pending restore: 0 [state-dump] - num objects pending spill: 0 [state-dump] - num bytes pending spill: 0 [state-dump] - num bytes currently spilled: 0 [state-dump] - cumulative spill requests: 0 [state-dump] - cumulative restore requests: 0 [state-dump] - spilled objects pending delete: 0 [state-dump] [state-dump] ObjectManager: [state-dump] - num local objects: 0 [state-dump] - num unfulfilled push requests: 0 [state-dump] - num object pull requests: 0 [state-dump] - num chunks received total: 0 [state-dump] - num chunks received failed (all): 0 [state-dump] - num chunks received failed / cancelled: 0 [state-dump] - num chunks received failed / plasma error: 0 [state-dump] Event stats: [state-dump] Global stats: 0 total (0 active) [state-dump] Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Execution time: mean = -nan s, total = 0.000 s [state-dump] Event stats: [state-dump] PushManager: [state-dump] - num pushes in flight: 0 [state-dump] - num chunks in flight: 0 [state-dump] - num chunks remaining: 0 [state-dump] - max chunks allowed: 409 [state-dump] OwnershipBasedObjectDirectory: [state-dump] - num listeners: 0 [state-dump] - cumulative location updates: 0 [state-dump] - num location updates per second: 0.000 [state-dump] - num location lookups per second: 0.000 [state-dump] - num locations added per second: 0.000 [state-dump] - num locations removed per second: 0.000 [state-dump] BufferPool: [state-dump] - create buffer state map size: 0 [state-dump] PullManager: [state-dump] - num bytes available for pulled objects: 2147483648 [state-dump] - num bytes being pulled (all): 0 [state-dump] - num bytes being pulled / pinned: 0 [state-dump] - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - first get request bundle: N/A [state-dump] - first wait request bundle: N/A [state-dump] - first task request bundle: N/A [state-dump] - num objects queued: 0 [state-dump] - num objects actively pulled (all): 0 [state-dump] - num objects actively pulled / pinned: 0 [state-dump] - num bundles being pulled: 0 [state-dump] - num pull retries: 0 [state-dump] - max timeout seconds: 0 [state-dump] - max timeout request is already processed. No entry. [state-dump] [state-dump] WorkerPool: [state-dump] - registered jobs: 1 [state-dump] - process_failed_job_config_missing: 0 [state-dump] - process_failed_rate_limited: 0 [state-dump] - process_failed_pending_registration: 0 [state-dump] - process_failed_runtime_env_setup_failed: 0 [state-dump] - num PYTHON workers: 20 [state-dump] - num PYTHON drivers: 1 [state-dump] - num PYTHON pending start requests: 0 [state-dump] - num PYTHON pending registration requests: 0 [state-dump] - num object spill callbacks queued: 0 [state-dump] - num object restore queued: 0 [state-dump] - num util functions queued: 0 [state-dump] - num idle workers: 20 [state-dump] TaskDependencyManager: [state-dump] - task deps map size: 0 [state-dump] - get req map size: 0 [state-dump] - wait req map size: 0 [state-dump] - local objects map size: 0 [state-dump] WaitManager: [state-dump] - num active wait requests: 0 [state-dump] Subscriber: [state-dump] Channel WORKER_OBJECT_LOCATIONS_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_OBJECT_EVICTION [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_REF_REMOVED_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] num async plasma notifications: 0 [state-dump] Remote node managers: [state-dump] Event stats: [state-dump] Global stats: 455559 total (35 active) [state-dump] Queueing time: mean = 216.303 us, max = 59.826 s, min = -0.001 s, total = 98.539 s [state-dump] Execution time: mean = 10.712 ms, total = 4879.771 s [state-dump] Event stats: [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog - 109578 total (0 active), Execution time: mean = 532.306 us, total = 58.329 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 109578 total (0 active), Execution time: mean = 35.541 us, total = 3.894 s, Queueing time: mean = 111.165 us, max = 3.003 ms, min = 1.846 us, total = 12.181 s [state-dump] ObjectManager.UpdateAvailableMemory - 52150 total (0 active), Execution time: mean = 6.258 us, total = 326.372 ms, Queueing time: mean = 111.169 us, max = 3.228 ms, min = 2.228 us, total = 5.797 s [state-dump] NodeManager.CheckGC - 52150 total (1 active), Execution time: mean = 2.919 us, total = 152.216 ms, Queueing time: mean = 99.123 us, max = 25.875 ms, min = -0.000 s, total = 5.169 s [state-dump] RaySyncer.OnDemandBroadcasting - 52150 total (1 active), Execution time: mean = 10.861 us, total = 566.424 ms, Queueing time: mean = 92.095 us, max = 25.869 ms, min = 6.166 us, total = 4.803 s [state-dump] RayletWorkerPool.deadline_timer.kill_idle_workers - 26090 total (1 active), Execution time: mean = 18.819 us, total = 490.999 ms, Queueing time: mean = 77.040 us, max = 26.386 ms, min = -0.001 s, total = 2.010 s [state-dump] MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 20837 total (1 active), Execution time: mean = 457.967 us, total = 9.543 s, Queueing time: mean = 75.800 us, max = 3.532 ms, min = -0.000 s, total = 1.579 s [state-dump] NodeManager.deadline_timer.spill_objects_when_over_threshold - 5220 total (1 active), Execution time: mean = 3.149 us, total = 16.437 ms, Queueing time: mean = 182.863 us, max = 2.946 ms, min = 3.845 us, total = 954.542 ms [state-dump] NodeManager.deadline_timer.flush_free_objects - 5220 total (1 active), Execution time: mean = 9.374 us, total = 48.933 ms, Queueing time: mean = 178.624 us, max = 2.947 ms, min = 3.811 us, total = 932.419 ms [state-dump] NodeManager.ScheduleAndDispatchTasks - 5220 total (1 active), Execution time: mean = 17.026 us, total = 88.874 ms, Queueing time: mean = 75.606 us, max = 2.581 ms, min = 8.735 us, total = 394.662 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 5218 total (0 active), Execution time: mean = 97.495 us, total = 508.729 ms, Queueing time: mean = 113.051 us, max = 2.934 ms, min = 4.027 us, total = 589.900 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad - 5218 total (0 active), Execution time: mean = 612.675 us, total = 3.197 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ClusterResourceManager.ResetRemoteNodeView - 1741 total (1 active), Execution time: mean = 10.127 us, total = 17.631 ms, Queueing time: mean = 76.142 us, max = 442.307 us, min = 14.043 us, total = 132.563 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 1044 total (0 active), Execution time: mean = 1.605 ms, total = 1.675 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.deadline_timer.record_metrics - 1044 total (1 active), Execution time: mean = 585.022 us, total = 610.763 ms, Queueing time: mean = 343.844 us, max = 2.010 ms, min = 9.115 us, total = 358.974 ms [state-dump] NodeManager.GcsCheckAlive - 1044 total (1 active), Execution time: mean = 317.452 us, total = 331.420 ms, Queueing time: mean = 610.974 us, max = 2.567 ms, min = 6.690 us, total = 637.856 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 1044 total (0 active), Execution time: mean = 55.886 us, total = 58.345 ms, Queueing time: mean = 112.978 us, max = 4.779 ms, min = 11.561 us, total = 117.949 ms [state-dump] NodeManager.deadline_timer.debug_state_dump - 522 total (1 active), Execution time: mean = 1.783 ms, total = 930.792 ms, Queueing time: mean = 71.956 us, max = 180.398 us, min = 11.609 us, total = 37.561 ms [state-dump] ClientConnection.async_read.ProcessMessageHeader - 95 total (21 active), Execution time: mean = 7.620 us, total = 723.861 us, Queueing time: mean = 660.815 ms, max = 59.826 s, min = 29.537 us, total = 62.777 s [state-dump] NodeManager.deadline_timer.print_event_loop_stats - 87 total (1 active, 1 running), Execution time: mean = 2.732 ms, total = 237.643 ms, Queueing time: mean = 78.888 us, max = 325.100 us, min = 15.835 us, total = 6.863 ms [state-dump] ClientConnection.async_read.ProcessMessage - 74 total (0 active), Execution time: mean = 752.373 us, total = 55.676 ms, Queueing time: mean = 18.898 us, max = 247.615 us, min = 4.453 us, total = 1.398 ms [state-dump] ClientConnection.async_write.DoAsyncWrites - 22 total (0 active), Execution time: mean = 2.129 us, total = 46.834 us, Queueing time: mean = 43.475 us, max = 264.443 us, min = 17.727 us, total = 956.459 us [state-dump] ObjectManager.ObjectDeleted - 21 total (0 active), Execution time: mean = 18.654 us, total = 391.728 us, Queueing time: mean = 66.932 us, max = 269.040 us, min = 24.280 us, total = 1.406 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 21 total (0 active), Execution time: mean = 119.783 us, total = 2.515 ms, Queueing time: mean = 119.709 us, max = 193.975 us, min = 26.529 us, total = 2.514 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig - 21 total (0 active), Execution time: mean = 800.451 us, total = 16.809 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ObjectManager.ObjectAdded - 21 total (0 active), Execution time: mean = 13.522 us, total = 283.961 us, Queueing time: mean = 115.200 us, max = 203.067 us, min = 15.853 us, total = 2.419 ms [state-dump] PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 204.482 us, total = 2.658 ms, Queueing time: mean = 3.109 ms, max = 9.915 ms, min = 28.359 us, total = 40.413 ms [state-dump] - 12 total (0 active), Execution time: mean = 1.109 us, total = 13.308 us, Queueing time: mean = 117.235 us, max = 207.332 us, min = 32.817 us, total = 1.407 ms [state-dump] RaySyncer.BroadcastMessage - 12 total (0 active), Execution time: mean = 229.790 us, total = 2.757 ms, Queueing time: mean = 796.500 ns, max = 1.330 us, min = 180.000 ns, total = 9.558 us [state-dump] NodeManagerService.grpc_server.RequestWorkerLease.HandleRequestImpl - 10 total (0 active), Execution time: mean = 133.400 us, total = 1.334 ms, Queueing time: mean = 133.654 us, max = 258.016 us, min = 50.206 us, total = 1.337 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker - 10 total (0 active), Execution time: mean = 698.937 us, total = 6.989 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.RequestWorkerLease - 10 total (0 active), Execution time: mean = 967.372 us, total = 9.674 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] WorkerPool.PopWorkerCallback - 10 total (0 active), Execution time: mean = 24.488 us, total = 244.878 us, Queueing time: mean = 114.997 us, max = 206.869 us, min = 23.653 us, total = 1.150 ms [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 10 total (1 active), Execution time: mean = 479.761 s, total = 4797.607 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReturnWorker.HandleRequestImpl - 10 total (0 active), Execution time: mean = 129.906 us, total = 1.299 ms, Queueing time: mean = 109.463 us, max = 137.712 us, min = 31.332 us, total = 1.095 ms [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 9 total (0 active), Execution time: mean = 379.978 us, total = 3.420 ms, Queueing time: mean = 131.185 us, max = 243.371 us, min = 20.299 us, total = 1.181 ms [state-dump] NodeManager.GCTaskFailureReason - 6 total (1 active), Execution time: mean = 7.110 us, total = 42.659 us, Queueing time: mean = 57.967 us, max = 97.290 us, min = 24.344 us, total = 347.804 us [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 1.320 ms, total = 2.640 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 142.119 us, total = 284.237 us, Queueing time: mean = 702.200 us, max = 1.287 ms, min = 117.240 us, total = 1.404 ms [state-dump] RaySyncerRegister - 2 total (0 active), Execution time: mean = 2.053 us, total = 4.106 us, Queueing time: mean = 222.000 ns, max = 369.000 ns, min = 75.000 ns, total = 444.000 ns [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 2.178 ms, total = 2.178 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 1.022 s, total = 1.022 s, Queueing time: mean = 126.779 us, max = 126.779 us, min = 126.779 us, total = 126.779 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 43.810 us, total = 43.810 us, Queueing time: mean = 116.080 us, max = 116.080 us, min = 116.080 us, total = 116.080 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 1.935 ms, total = 1.935 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 2.002 ms, total = 2.002 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 1.278 ms, total = 1.278 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 288.995 us, total = 288.995 us, Queueing time: mean = 122.868 us, max = 122.868 us, min = 122.868 us, total = 122.868 us [state-dump] Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 74.750 us, total = 74.750 us, Queueing time: mean = 225.429 us, max = 225.429 us, min = 225.429 us, total = 225.429 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 242.297 us, total = 242.297 us, Queueing time: mean = 122.977 us, max = 122.977 us, min = 122.977 us, total = 122.977 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 92.080 us, total = 92.080 us, Queueing time: mean = 36.492 us, max = 36.492 us, min = 36.492 us, total = 36.492 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 1.746 ms, total = 1.746 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] DebugString() time ms: 1 [state-dump] [state-dump] [2025-01-21 00:29:54,930 I 13636 13664] (raylet) store.cc:564: Plasma store debug dump: Current usage: 0 / 2.14748 GB - num bytes created total: 168 0 pending objects of total size 0MB - objects spillable: 0 - bytes spillable: 0 - objects unsealed: 0 - bytes unsealed: 0 - objects in use: 0 - bytes in use: 0 - objects evictable: 0 - bytes evictable: 0 - objects created by worker: 0 - bytes created by worker: 0 - objects restored: 0 - bytes restored: 0 - objects received: 0 - bytes received: 0 - objects errored: 0 - bytes errored: 0 [2025-01-21 00:29:56,174 I 13636 13636] (raylet) node_manager.cc:525: [state-dump] NodeManager: [state-dump] Node ID: 94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3 [state-dump] Node name: 192.168.0.2 [state-dump] InitialConfigResources: {object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, GPU: 20000, node:__internal_head__: 10000, accelerator_type:A40: 10000, memory: 844966629380000} [state-dump] ClusterTaskManager: [state-dump] ========== Node: 94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3 ================= [state-dump] Infeasible queue length: 0 [state-dump] Schedule queue length: 0 [state-dump] Dispatch queue length: 0 [state-dump] num_waiting_for_resource: 0 [state-dump] num_waiting_for_plasma_memory: 0 [state-dump] num_waiting_for_remote_node_resources: 0 [state-dump] num_worker_not_started_by_job_config_not_exist: 0 [state-dump] num_worker_not_started_by_registration_timeout: 0 [state-dump] num_tasks_waiting_for_workers: 0 [state-dump] num_cancelled_tasks: 0 [state-dump] cluster_resource_scheduler state: [state-dump] Local id: -2729954820020547849 Local resources: {"total":{CPU: [200000], memory: [844966629380000], accelerator_type:A40: [10000], node:192.168.0.2: [10000], node:__internal_head__: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000]}}, "available": {CPU: [200000], memory: [844966629380000], accelerator_type:A40: [10000], node:192.168.0.2: [10000], node:__internal_head__: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000]}}, "labels":{"ray.io/node_id":"94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3",} is_draining: 0 is_idle: 1 Cluster resources: node id: -2729954820020547849{"total":{object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, GPU: 20000, node:__internal_head__: 10000, accelerator_type:A40: 10000, memory: 844966629380000}}, "available": {GPU: 20000, node:__internal_head__: 10000, memory: 844966629380000, accelerator_type:A40: 10000, object_store_memory: 21474836480000, node:192.168.0.2: 10000, CPU: 200000}}, "labels":{"ray.io/node_id":"94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} [state-dump] Waiting tasks size: 0 [state-dump] Number of executing tasks: 0 [state-dump] Number of pinned task arguments: 0 [state-dump] Number of total spilled tasks: 0 [state-dump] Number of spilled waiting tasks: 0 [state-dump] Number of spilled unschedulable tasks: 0 [state-dump] Resource usage { [state-dump] } [state-dump] Backlog Size per scheduling descriptor :{workerId: num backlogs}: [state-dump] [state-dump] Running tasks by scheduling class: [state-dump] ================================================== [state-dump] [state-dump] ClusterResources: [state-dump] LocalObjectManager: [state-dump] - num pinned objects: 0 [state-dump] - pinned objects size: 0 [state-dump] - num objects pending restore: 0 [state-dump] - num objects pending spill: 0 [state-dump] - num bytes pending spill: 0 [state-dump] - num bytes currently spilled: 0 [state-dump] - cumulative spill requests: 0 [state-dump] - cumulative restore requests: 0 [state-dump] - spilled objects pending delete: 0 [state-dump] [state-dump] ObjectManager: [state-dump] - num local objects: 0 [state-dump] - num unfulfilled push requests: 0 [state-dump] - num object pull requests: 0 [state-dump] - num chunks received total: 0 [state-dump] - num chunks received failed (all): 0 [state-dump] - num chunks received failed / cancelled: 0 [state-dump] - num chunks received failed / plasma error: 0 [state-dump] Event stats: [state-dump] Global stats: 0 total (0 active) [state-dump] Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Execution time: mean = -nan s, total = 0.000 s [state-dump] Event stats: [state-dump] PushManager: [state-dump] - num pushes in flight: 0 [state-dump] - num chunks in flight: 0 [state-dump] - num chunks remaining: 0 [state-dump] - max chunks allowed: 409 [state-dump] OwnershipBasedObjectDirectory: [state-dump] - num listeners: 0 [state-dump] - cumulative location updates: 0 [state-dump] - num location updates per second: 0.000 [state-dump] - num location lookups per second: 0.000 [state-dump] - num locations added per second: 0.000 [state-dump] - num locations removed per second: 0.000 [state-dump] BufferPool: [state-dump] - create buffer state map size: 0 [state-dump] PullManager: [state-dump] - num bytes available for pulled objects: 2147483648 [state-dump] - num bytes being pulled (all): 0 [state-dump] - num bytes being pulled / pinned: 0 [state-dump] - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - first get request bundle: N/A [state-dump] - first wait request bundle: N/A [state-dump] - first task request bundle: N/A [state-dump] - num objects queued: 0 [state-dump] - num objects actively pulled (all): 0 [state-dump] - num objects actively pulled / pinned: 0 [state-dump] - num bundles being pulled: 0 [state-dump] - num pull retries: 0 [state-dump] - max timeout seconds: 0 [state-dump] - max timeout request is already processed. No entry. [state-dump] [state-dump] WorkerPool: [state-dump] - registered jobs: 1 [state-dump] - process_failed_job_config_missing: 0 [state-dump] - process_failed_rate_limited: 0 [state-dump] - process_failed_pending_registration: 0 [state-dump] - process_failed_runtime_env_setup_failed: 0 [state-dump] - num PYTHON workers: 20 [state-dump] - num PYTHON drivers: 1 [state-dump] - num PYTHON pending start requests: 0 [state-dump] - num PYTHON pending registration requests: 0 [state-dump] - num object spill callbacks queued: 0 [state-dump] - num object restore queued: 0 [state-dump] - num util functions queued: 0 [state-dump] - num idle workers: 20 [state-dump] TaskDependencyManager: [state-dump] - task deps map size: 0 [state-dump] - get req map size: 0 [state-dump] - wait req map size: 0 [state-dump] - local objects map size: 0 [state-dump] WaitManager: [state-dump] - num active wait requests: 0 [state-dump] Subscriber: [state-dump] Channel WORKER_OBJECT_LOCATIONS_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_OBJECT_EVICTION [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_REF_REMOVED_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] num async plasma notifications: 0 [state-dump] Remote node managers: [state-dump] Event stats: [state-dump] Global stats: 460791 total (35 active) [state-dump] Queueing time: mean = 214.763 us, max = 59.826 s, min = -0.001 s, total = 98.961 s [state-dump] Execution time: mean = 10.592 ms, total = 4880.701 s [state-dump] Event stats: [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog - 110838 total (0 active), Execution time: mean = 532.317 us, total = 59.001 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 110838 total (0 active), Execution time: mean = 35.535 us, total = 3.939 s, Queueing time: mean = 111.220 us, max = 3.003 ms, min = 1.846 us, total = 12.327 s [state-dump] ObjectManager.UpdateAvailableMemory - 52749 total (0 active), Execution time: mean = 6.262 us, total = 330.331 ms, Queueing time: mean = 111.186 us, max = 3.228 ms, min = 2.228 us, total = 5.865 s [state-dump] NodeManager.CheckGC - 52749 total (1 active), Execution time: mean = 2.920 us, total = 154.023 ms, Queueing time: mean = 99.138 us, max = 25.875 ms, min = -0.000 s, total = 5.229 s [state-dump] RaySyncer.OnDemandBroadcasting - 52749 total (1 active), Execution time: mean = 10.871 us, total = 573.458 ms, Queueing time: mean = 92.101 us, max = 25.869 ms, min = 6.166 us, total = 4.858 s [state-dump] RayletWorkerPool.deadline_timer.kill_idle_workers - 26390 total (1 active), Execution time: mean = 18.815 us, total = 496.532 ms, Queueing time: mean = 77.056 us, max = 26.386 ms, min = -0.001 s, total = 2.034 s [state-dump] MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 21077 total (1 active), Execution time: mean = 457.991 us, total = 9.653 s, Queueing time: mean = 76.027 us, max = 4.063 ms, min = -0.000 s, total = 1.602 s [state-dump] NodeManager.deadline_timer.spill_objects_when_over_threshold - 5280 total (1 active), Execution time: mean = 3.150 us, total = 16.633 ms, Queueing time: mean = 182.737 us, max = 2.946 ms, min = 3.845 us, total = 964.849 ms [state-dump] NodeManager.deadline_timer.flush_free_objects - 5280 total (1 active), Execution time: mean = 9.385 us, total = 49.550 ms, Queueing time: mean = 178.494 us, max = 2.947 ms, min = 3.811 us, total = 942.449 ms [state-dump] NodeManager.ScheduleAndDispatchTasks - 5280 total (1 active), Execution time: mean = 17.053 us, total = 90.040 ms, Queueing time: mean = 75.658 us, max = 2.581 ms, min = 8.735 us, total = 399.477 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 5278 total (0 active), Execution time: mean = 97.478 us, total = 514.490 ms, Queueing time: mean = 113.138 us, max = 2.934 ms, min = 4.027 us, total = 597.142 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad - 5278 total (0 active), Execution time: mean = 612.513 us, total = 3.233 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ClusterResourceManager.ResetRemoteNodeView - 1761 total (1 active), Execution time: mean = 10.123 us, total = 17.827 ms, Queueing time: mean = 76.167 us, max = 442.307 us, min = 14.043 us, total = 134.131 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 1056 total (0 active), Execution time: mean = 1.603 ms, total = 1.693 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.deadline_timer.record_metrics - 1056 total (1 active), Execution time: mean = 584.596 us, total = 617.333 ms, Queueing time: mean = 343.724 us, max = 2.010 ms, min = 9.115 us, total = 362.973 ms [state-dump] NodeManager.GcsCheckAlive - 1056 total (1 active), Execution time: mean = 317.657 us, total = 335.445 ms, Queueing time: mean = 610.188 us, max = 2.567 ms, min = 6.690 us, total = 644.358 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 1056 total (0 active), Execution time: mean = 55.915 us, total = 59.046 ms, Queueing time: mean = 113.045 us, max = 4.779 ms, min = 11.561 us, total = 119.376 ms [state-dump] NodeManager.deadline_timer.debug_state_dump - 528 total (1 active), Execution time: mean = 1.782 ms, total = 941.119 ms, Queueing time: mean = 71.972 us, max = 180.398 us, min = 11.609 us, total = 38.001 ms [state-dump] ClientConnection.async_read.ProcessMessageHeader - 95 total (21 active), Execution time: mean = 7.620 us, total = 723.861 us, Queueing time: mean = 660.815 ms, max = 59.826 s, min = 29.537 us, total = 62.777 s [state-dump] NodeManager.deadline_timer.print_event_loop_stats - 88 total (1 active, 1 running), Execution time: mean = 2.727 ms, total = 239.950 ms, Queueing time: mean = 78.593 us, max = 325.100 us, min = 15.835 us, total = 6.916 ms [state-dump] ClientConnection.async_read.ProcessMessage - 74 total (0 active), Execution time: mean = 752.373 us, total = 55.676 ms, Queueing time: mean = 18.898 us, max = 247.615 us, min = 4.453 us, total = 1.398 ms [state-dump] ClientConnection.async_write.DoAsyncWrites - 22 total (0 active), Execution time: mean = 2.129 us, total = 46.834 us, Queueing time: mean = 43.475 us, max = 264.443 us, min = 17.727 us, total = 956.459 us [state-dump] ObjectManager.ObjectDeleted - 21 total (0 active), Execution time: mean = 18.654 us, total = 391.728 us, Queueing time: mean = 66.932 us, max = 269.040 us, min = 24.280 us, total = 1.406 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 21 total (0 active), Execution time: mean = 119.783 us, total = 2.515 ms, Queueing time: mean = 119.709 us, max = 193.975 us, min = 26.529 us, total = 2.514 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig - 21 total (0 active), Execution time: mean = 800.451 us, total = 16.809 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ObjectManager.ObjectAdded - 21 total (0 active), Execution time: mean = 13.522 us, total = 283.961 us, Queueing time: mean = 115.200 us, max = 203.067 us, min = 15.853 us, total = 2.419 ms [state-dump] PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 204.482 us, total = 2.658 ms, Queueing time: mean = 3.109 ms, max = 9.915 ms, min = 28.359 us, total = 40.413 ms [state-dump] - 12 total (0 active), Execution time: mean = 1.109 us, total = 13.308 us, Queueing time: mean = 117.235 us, max = 207.332 us, min = 32.817 us, total = 1.407 ms [state-dump] RaySyncer.BroadcastMessage - 12 total (0 active), Execution time: mean = 229.790 us, total = 2.757 ms, Queueing time: mean = 796.500 ns, max = 1.330 us, min = 180.000 ns, total = 9.558 us [state-dump] NodeManagerService.grpc_server.RequestWorkerLease.HandleRequestImpl - 10 total (0 active), Execution time: mean = 133.400 us, total = 1.334 ms, Queueing time: mean = 133.654 us, max = 258.016 us, min = 50.206 us, total = 1.337 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker - 10 total (0 active), Execution time: mean = 698.937 us, total = 6.989 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.RequestWorkerLease - 10 total (0 active), Execution time: mean = 967.372 us, total = 9.674 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] WorkerPool.PopWorkerCallback - 10 total (0 active), Execution time: mean = 24.488 us, total = 244.878 us, Queueing time: mean = 114.997 us, max = 206.869 us, min = 23.653 us, total = 1.150 ms [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 10 total (1 active), Execution time: mean = 479.761 s, total = 4797.607 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReturnWorker.HandleRequestImpl - 10 total (0 active), Execution time: mean = 129.906 us, total = 1.299 ms, Queueing time: mean = 109.463 us, max = 137.712 us, min = 31.332 us, total = 1.095 ms [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 9 total (0 active), Execution time: mean = 379.978 us, total = 3.420 ms, Queueing time: mean = 131.185 us, max = 243.371 us, min = 20.299 us, total = 1.181 ms [state-dump] NodeManager.GCTaskFailureReason - 6 total (1 active), Execution time: mean = 7.110 us, total = 42.659 us, Queueing time: mean = 57.967 us, max = 97.290 us, min = 24.344 us, total = 347.804 us [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 1.320 ms, total = 2.640 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 142.119 us, total = 284.237 us, Queueing time: mean = 702.200 us, max = 1.287 ms, min = 117.240 us, total = 1.404 ms [state-dump] RaySyncerRegister - 2 total (0 active), Execution time: mean = 2.053 us, total = 4.106 us, Queueing time: mean = 222.000 ns, max = 369.000 ns, min = 75.000 ns, total = 444.000 ns [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 2.178 ms, total = 2.178 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 1.022 s, total = 1.022 s, Queueing time: mean = 126.779 us, max = 126.779 us, min = 126.779 us, total = 126.779 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 43.810 us, total = 43.810 us, Queueing time: mean = 116.080 us, max = 116.080 us, min = 116.080 us, total = 116.080 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 1.935 ms, total = 1.935 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 2.002 ms, total = 2.002 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 1.278 ms, total = 1.278 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 288.995 us, total = 288.995 us, Queueing time: mean = 122.868 us, max = 122.868 us, min = 122.868 us, total = 122.868 us [state-dump] Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 74.750 us, total = 74.750 us, Queueing time: mean = 225.429 us, max = 225.429 us, min = 225.429 us, total = 225.429 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 242.297 us, total = 242.297 us, Queueing time: mean = 122.977 us, max = 122.977 us, min = 122.977 us, total = 122.977 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 92.080 us, total = 92.080 us, Queueing time: mean = 36.492 us, max = 36.492 us, min = 36.492 us, total = 36.492 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 1.746 ms, total = 1.746 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] DebugString() time ms: 1 [state-dump] [state-dump] [2025-01-21 00:30:54,930 I 13636 13664] (raylet) store.cc:564: Plasma store debug dump: Current usage: 0 / 2.14748 GB - num bytes created total: 168 0 pending objects of total size 0MB - objects spillable: 0 - bytes spillable: 0 - objects unsealed: 0 - bytes unsealed: 0 - objects in use: 0 - bytes in use: 0 - objects evictable: 0 - bytes evictable: 0 - objects created by worker: 0 - bytes created by worker: 0 - objects restored: 0 - bytes restored: 0 - objects received: 0 - bytes received: 0 - objects errored: 0 - bytes errored: 0 [2025-01-21 00:30:56,177 I 13636 13636] (raylet) node_manager.cc:525: [state-dump] NodeManager: [state-dump] Node ID: 94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3 [state-dump] Node name: 192.168.0.2 [state-dump] InitialConfigResources: {object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, GPU: 20000, node:__internal_head__: 10000, accelerator_type:A40: 10000, memory: 844966629380000} [state-dump] ClusterTaskManager: [state-dump] ========== Node: 94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3 ================= [state-dump] Infeasible queue length: 0 [state-dump] Schedule queue length: 0 [state-dump] Dispatch queue length: 0 [state-dump] num_waiting_for_resource: 0 [state-dump] num_waiting_for_plasma_memory: 0 [state-dump] num_waiting_for_remote_node_resources: 0 [state-dump] num_worker_not_started_by_job_config_not_exist: 0 [state-dump] num_worker_not_started_by_registration_timeout: 0 [state-dump] num_tasks_waiting_for_workers: 0 [state-dump] num_cancelled_tasks: 0 [state-dump] cluster_resource_scheduler state: [state-dump] Local id: -2729954820020547849 Local resources: {"total":{CPU: [200000], memory: [844966629380000], accelerator_type:A40: [10000], node:192.168.0.2: [10000], node:__internal_head__: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000]}}, "available": {CPU: [200000], memory: [844966629380000], accelerator_type:A40: [10000], node:192.168.0.2: [10000], node:__internal_head__: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000]}}, "labels":{"ray.io/node_id":"94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3",} is_draining: 0 is_idle: 1 Cluster resources: node id: -2729954820020547849{"total":{object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, GPU: 20000, node:__internal_head__: 10000, accelerator_type:A40: 10000, memory: 844966629380000}}, "available": {GPU: 20000, node:__internal_head__: 10000, memory: 844966629380000, accelerator_type:A40: 10000, object_store_memory: 21474836480000, node:192.168.0.2: 10000, CPU: 200000}}, "labels":{"ray.io/node_id":"94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} [state-dump] Waiting tasks size: 0 [state-dump] Number of executing tasks: 0 [state-dump] Number of pinned task arguments: 0 [state-dump] Number of total spilled tasks: 0 [state-dump] Number of spilled waiting tasks: 0 [state-dump] Number of spilled unschedulable tasks: 0 [state-dump] Resource usage { [state-dump] } [state-dump] Backlog Size per scheduling descriptor :{workerId: num backlogs}: [state-dump] [state-dump] Running tasks by scheduling class: [state-dump] ================================================== [state-dump] [state-dump] ClusterResources: [state-dump] LocalObjectManager: [state-dump] - num pinned objects: 0 [state-dump] - pinned objects size: 0 [state-dump] - num objects pending restore: 0 [state-dump] - num objects pending spill: 0 [state-dump] - num bytes pending spill: 0 [state-dump] - num bytes currently spilled: 0 [state-dump] - cumulative spill requests: 0 [state-dump] - cumulative restore requests: 0 [state-dump] - spilled objects pending delete: 0 [state-dump] [state-dump] ObjectManager: [state-dump] - num local objects: 0 [state-dump] - num unfulfilled push requests: 0 [state-dump] - num object pull requests: 0 [state-dump] - num chunks received total: 0 [state-dump] - num chunks received failed (all): 0 [state-dump] - num chunks received failed / cancelled: 0 [state-dump] - num chunks received failed / plasma error: 0 [state-dump] Event stats: [state-dump] Global stats: 0 total (0 active) [state-dump] Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Execution time: mean = -nan s, total = 0.000 s [state-dump] Event stats: [state-dump] PushManager: [state-dump] - num pushes in flight: 0 [state-dump] - num chunks in flight: 0 [state-dump] - num chunks remaining: 0 [state-dump] - max chunks allowed: 409 [state-dump] OwnershipBasedObjectDirectory: [state-dump] - num listeners: 0 [state-dump] - cumulative location updates: 0 [state-dump] - num location updates per second: 0.000 [state-dump] - num location lookups per second: 0.000 [state-dump] - num locations added per second: 0.000 [state-dump] - num locations removed per second: 0.000 [state-dump] BufferPool: [state-dump] - create buffer state map size: 0 [state-dump] PullManager: [state-dump] - num bytes available for pulled objects: 2147483648 [state-dump] - num bytes being pulled (all): 0 [state-dump] - num bytes being pulled / pinned: 0 [state-dump] - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - first get request bundle: N/A [state-dump] - first wait request bundle: N/A [state-dump] - first task request bundle: N/A [state-dump] - num objects queued: 0 [state-dump] - num objects actively pulled (all): 0 [state-dump] - num objects actively pulled / pinned: 0 [state-dump] - num bundles being pulled: 0 [state-dump] - num pull retries: 0 [state-dump] - max timeout seconds: 0 [state-dump] - max timeout request is already processed. No entry. [state-dump] [state-dump] WorkerPool: [state-dump] - registered jobs: 1 [state-dump] - process_failed_job_config_missing: 0 [state-dump] - process_failed_rate_limited: 0 [state-dump] - process_failed_pending_registration: 0 [state-dump] - process_failed_runtime_env_setup_failed: 0 [state-dump] - num PYTHON workers: 20 [state-dump] - num PYTHON drivers: 1 [state-dump] - num PYTHON pending start requests: 0 [state-dump] - num PYTHON pending registration requests: 0 [state-dump] - num object spill callbacks queued: 0 [state-dump] - num object restore queued: 0 [state-dump] - num util functions queued: 0 [state-dump] - num idle workers: 20 [state-dump] TaskDependencyManager: [state-dump] - task deps map size: 0 [state-dump] - get req map size: 0 [state-dump] - wait req map size: 0 [state-dump] - local objects map size: 0 [state-dump] WaitManager: [state-dump] - num active wait requests: 0 [state-dump] Subscriber: [state-dump] Channel WORKER_OBJECT_LOCATIONS_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_OBJECT_EVICTION [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_REF_REMOVED_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] num async plasma notifications: 0 [state-dump] Remote node managers: [state-dump] Event stats: [state-dump] Global stats: 466024 total (35 active) [state-dump] Queueing time: mean = 213.260 us, max = 59.826 s, min = -0.001 s, total = 99.384 s [state-dump] Execution time: mean = 10.475 ms, total = 4881.669 s [state-dump] Event stats: [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog - 112098 total (0 active), Execution time: mean = 532.599 us, total = 59.703 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 112098 total (0 active), Execution time: mean = 35.535 us, total = 3.983 s, Queueing time: mean = 111.250 us, max = 3.003 ms, min = 1.846 us, total = 12.471 s [state-dump] ObjectManager.UpdateAvailableMemory - 53349 total (0 active), Execution time: mean = 6.266 us, total = 334.263 ms, Queueing time: mean = 111.244 us, max = 3.228 ms, min = 2.228 us, total = 5.935 s [state-dump] NodeManager.CheckGC - 53349 total (1 active), Execution time: mean = 2.920 us, total = 155.772 ms, Queueing time: mean = 99.156 us, max = 25.875 ms, min = -0.000 s, total = 5.290 s [state-dump] RaySyncer.OnDemandBroadcasting - 53349 total (1 active), Execution time: mean = 10.871 us, total = 579.970 ms, Queueing time: mean = 92.119 us, max = 25.869 ms, min = 6.166 us, total = 4.914 s [state-dump] RayletWorkerPool.deadline_timer.kill_idle_workers - 26689 total (1 active), Execution time: mean = 18.819 us, total = 502.270 ms, Queueing time: mean = 77.100 us, max = 26.386 ms, min = -0.001 s, total = 2.058 s [state-dump] MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 21316 total (1 active), Execution time: mean = 458.107 us, total = 9.765 s, Queueing time: mean = 76.024 us, max = 4.063 ms, min = -0.000 s, total = 1.621 s [state-dump] NodeManager.deadline_timer.spill_objects_when_over_threshold - 5340 total (1 active), Execution time: mean = 3.150 us, total = 16.821 ms, Queueing time: mean = 182.914 us, max = 2.946 ms, min = 3.845 us, total = 976.762 ms [state-dump] NodeManager.deadline_timer.flush_free_objects - 5340 total (1 active), Execution time: mean = 9.386 us, total = 50.121 ms, Queueing time: mean = 178.670 us, max = 2.947 ms, min = 3.811 us, total = 954.096 ms [state-dump] NodeManager.ScheduleAndDispatchTasks - 5340 total (1 active), Execution time: mean = 17.054 us, total = 91.069 ms, Queueing time: mean = 75.722 us, max = 2.581 ms, min = 8.735 us, total = 404.357 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 5338 total (0 active), Execution time: mean = 97.473 us, total = 520.310 ms, Queueing time: mean = 113.154 us, max = 2.934 ms, min = 4.027 us, total = 604.015 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad - 5338 total (0 active), Execution time: mean = 612.645 us, total = 3.270 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ClusterResourceManager.ResetRemoteNodeView - 1781 total (1 active), Execution time: mean = 10.132 us, total = 18.045 ms, Queueing time: mean = 76.136 us, max = 442.307 us, min = 14.043 us, total = 135.598 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 1068 total (0 active), Execution time: mean = 1.603 ms, total = 1.712 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.deadline_timer.record_metrics - 1068 total (1 active), Execution time: mean = 584.921 us, total = 624.696 ms, Queueing time: mean = 344.226 us, max = 2.010 ms, min = 9.115 us, total = 367.633 ms [state-dump] NodeManager.GcsCheckAlive - 1068 total (1 active), Execution time: mean = 317.801 us, total = 339.412 ms, Queueing time: mean = 610.905 us, max = 2.567 ms, min = 6.690 us, total = 652.446 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 1068 total (0 active), Execution time: mean = 55.941 us, total = 59.745 ms, Queueing time: mean = 112.920 us, max = 4.779 ms, min = 11.561 us, total = 120.599 ms [state-dump] NodeManager.deadline_timer.debug_state_dump - 534 total (1 active), Execution time: mean = 1.783 ms, total = 951.901 ms, Queueing time: mean = 71.900 us, max = 180.398 us, min = 11.609 us, total = 38.395 ms [state-dump] ClientConnection.async_read.ProcessMessageHeader - 95 total (21 active), Execution time: mean = 7.620 us, total = 723.861 us, Queueing time: mean = 660.815 ms, max = 59.826 s, min = 29.537 us, total = 62.777 s [state-dump] NodeManager.deadline_timer.print_event_loop_stats - 89 total (1 active, 1 running), Execution time: mean = 2.729 ms, total = 242.843 ms, Queueing time: mean = 78.402 us, max = 325.100 us, min = 15.835 us, total = 6.978 ms [state-dump] ClientConnection.async_read.ProcessMessage - 74 total (0 active), Execution time: mean = 752.373 us, total = 55.676 ms, Queueing time: mean = 18.898 us, max = 247.615 us, min = 4.453 us, total = 1.398 ms [state-dump] ClientConnection.async_write.DoAsyncWrites - 22 total (0 active), Execution time: mean = 2.129 us, total = 46.834 us, Queueing time: mean = 43.475 us, max = 264.443 us, min = 17.727 us, total = 956.459 us [state-dump] ObjectManager.ObjectDeleted - 21 total (0 active), Execution time: mean = 18.654 us, total = 391.728 us, Queueing time: mean = 66.932 us, max = 269.040 us, min = 24.280 us, total = 1.406 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 21 total (0 active), Execution time: mean = 119.783 us, total = 2.515 ms, Queueing time: mean = 119.709 us, max = 193.975 us, min = 26.529 us, total = 2.514 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig - 21 total (0 active), Execution time: mean = 800.451 us, total = 16.809 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ObjectManager.ObjectAdded - 21 total (0 active), Execution time: mean = 13.522 us, total = 283.961 us, Queueing time: mean = 115.200 us, max = 203.067 us, min = 15.853 us, total = 2.419 ms [state-dump] PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 204.482 us, total = 2.658 ms, Queueing time: mean = 3.109 ms, max = 9.915 ms, min = 28.359 us, total = 40.413 ms [state-dump] - 12 total (0 active), Execution time: mean = 1.109 us, total = 13.308 us, Queueing time: mean = 117.235 us, max = 207.332 us, min = 32.817 us, total = 1.407 ms [state-dump] RaySyncer.BroadcastMessage - 12 total (0 active), Execution time: mean = 229.790 us, total = 2.757 ms, Queueing time: mean = 796.500 ns, max = 1.330 us, min = 180.000 ns, total = 9.558 us [state-dump] NodeManagerService.grpc_server.RequestWorkerLease.HandleRequestImpl - 10 total (0 active), Execution time: mean = 133.400 us, total = 1.334 ms, Queueing time: mean = 133.654 us, max = 258.016 us, min = 50.206 us, total = 1.337 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker - 10 total (0 active), Execution time: mean = 698.937 us, total = 6.989 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.RequestWorkerLease - 10 total (0 active), Execution time: mean = 967.372 us, total = 9.674 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] WorkerPool.PopWorkerCallback - 10 total (0 active), Execution time: mean = 24.488 us, total = 244.878 us, Queueing time: mean = 114.997 us, max = 206.869 us, min = 23.653 us, total = 1.150 ms [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 10 total (1 active), Execution time: mean = 479.761 s, total = 4797.607 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReturnWorker.HandleRequestImpl - 10 total (0 active), Execution time: mean = 129.906 us, total = 1.299 ms, Queueing time: mean = 109.463 us, max = 137.712 us, min = 31.332 us, total = 1.095 ms [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 9 total (0 active), Execution time: mean = 379.978 us, total = 3.420 ms, Queueing time: mean = 131.185 us, max = 243.371 us, min = 20.299 us, total = 1.181 ms [state-dump] NodeManager.GCTaskFailureReason - 6 total (1 active), Execution time: mean = 7.110 us, total = 42.659 us, Queueing time: mean = 57.967 us, max = 97.290 us, min = 24.344 us, total = 347.804 us [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 1.320 ms, total = 2.640 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 142.119 us, total = 284.237 us, Queueing time: mean = 702.200 us, max = 1.287 ms, min = 117.240 us, total = 1.404 ms [state-dump] RaySyncerRegister - 2 total (0 active), Execution time: mean = 2.053 us, total = 4.106 us, Queueing time: mean = 222.000 ns, max = 369.000 ns, min = 75.000 ns, total = 444.000 ns [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 2.178 ms, total = 2.178 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 1.022 s, total = 1.022 s, Queueing time: mean = 126.779 us, max = 126.779 us, min = 126.779 us, total = 126.779 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 43.810 us, total = 43.810 us, Queueing time: mean = 116.080 us, max = 116.080 us, min = 116.080 us, total = 116.080 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 1.935 ms, total = 1.935 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 2.002 ms, total = 2.002 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 1.278 ms, total = 1.278 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 288.995 us, total = 288.995 us, Queueing time: mean = 122.868 us, max = 122.868 us, min = 122.868 us, total = 122.868 us [state-dump] Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 74.750 us, total = 74.750 us, Queueing time: mean = 225.429 us, max = 225.429 us, min = 225.429 us, total = 225.429 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 242.297 us, total = 242.297 us, Queueing time: mean = 122.977 us, max = 122.977 us, min = 122.977 us, total = 122.977 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 92.080 us, total = 92.080 us, Queueing time: mean = 36.492 us, max = 36.492 us, min = 36.492 us, total = 36.492 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 1.746 ms, total = 1.746 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] DebugString() time ms: 1 [state-dump] [state-dump] [2025-01-21 00:31:54,930 I 13636 13664] (raylet) store.cc:564: Plasma store debug dump: Current usage: 0 / 2.14748 GB - num bytes created total: 168 0 pending objects of total size 0MB - objects spillable: 0 - bytes spillable: 0 - objects unsealed: 0 - bytes unsealed: 0 - objects in use: 0 - bytes in use: 0 - objects evictable: 0 - bytes evictable: 0 - objects created by worker: 0 - bytes created by worker: 0 - objects restored: 0 - bytes restored: 0 - objects received: 0 - bytes received: 0 - objects errored: 0 - bytes errored: 0 [2025-01-21 00:31:55,993 I 13636 13636] (raylet) node_manager.cc:658: Sending Python GC request to 21 local workers to clean up Python cyclic references. [2025-01-21 00:31:56,179 I 13636 13636] (raylet) node_manager.cc:525: [state-dump] NodeManager: [state-dump] Node ID: 94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3 [state-dump] Node name: 192.168.0.2 [state-dump] InitialConfigResources: {object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, GPU: 20000, node:__internal_head__: 10000, accelerator_type:A40: 10000, memory: 844966629380000} [state-dump] ClusterTaskManager: [state-dump] ========== Node: 94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3 ================= [state-dump] Infeasible queue length: 0 [state-dump] Schedule queue length: 0 [state-dump] Dispatch queue length: 0 [state-dump] num_waiting_for_resource: 0 [state-dump] num_waiting_for_plasma_memory: 0 [state-dump] num_waiting_for_remote_node_resources: 0 [state-dump] num_worker_not_started_by_job_config_not_exist: 0 [state-dump] num_worker_not_started_by_registration_timeout: 0 [state-dump] num_tasks_waiting_for_workers: 0 [state-dump] num_cancelled_tasks: 0 [state-dump] cluster_resource_scheduler state: [state-dump] Local id: -2729954820020547849 Local resources: {"total":{CPU: [200000], memory: [844966629380000], accelerator_type:A40: [10000], node:192.168.0.2: [10000], node:__internal_head__: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000]}}, "available": {CPU: [200000], memory: [844966629380000], accelerator_type:A40: [10000], node:192.168.0.2: [10000], node:__internal_head__: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000]}}, "labels":{"ray.io/node_id":"94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3",} is_draining: 0 is_idle: 1 Cluster resources: node id: -2729954820020547849{"total":{object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, GPU: 20000, node:__internal_head__: 10000, accelerator_type:A40: 10000, memory: 844966629380000}}, "available": {GPU: 20000, node:__internal_head__: 10000, memory: 844966629380000, accelerator_type:A40: 10000, object_store_memory: 21474836480000, node:192.168.0.2: 10000, CPU: 200000}}, "labels":{"ray.io/node_id":"94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} [state-dump] Waiting tasks size: 0 [state-dump] Number of executing tasks: 0 [state-dump] Number of pinned task arguments: 0 [state-dump] Number of total spilled tasks: 0 [state-dump] Number of spilled waiting tasks: 0 [state-dump] Number of spilled unschedulable tasks: 0 [state-dump] Resource usage { [state-dump] } [state-dump] Backlog Size per scheduling descriptor :{workerId: num backlogs}: [state-dump] [state-dump] Running tasks by scheduling class: [state-dump] ================================================== [state-dump] [state-dump] ClusterResources: [state-dump] LocalObjectManager: [state-dump] - num pinned objects: 0 [state-dump] - pinned objects size: 0 [state-dump] - num objects pending restore: 0 [state-dump] - num objects pending spill: 0 [state-dump] - num bytes pending spill: 0 [state-dump] - num bytes currently spilled: 0 [state-dump] - cumulative spill requests: 0 [state-dump] - cumulative restore requests: 0 [state-dump] - spilled objects pending delete: 0 [state-dump] [state-dump] ObjectManager: [state-dump] - num local objects: 0 [state-dump] - num unfulfilled push requests: 0 [state-dump] - num object pull requests: 0 [state-dump] - num chunks received total: 0 [state-dump] - num chunks received failed (all): 0 [state-dump] - num chunks received failed / cancelled: 0 [state-dump] - num chunks received failed / plasma error: 0 [state-dump] Event stats: [state-dump] Global stats: 0 total (0 active) [state-dump] Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Execution time: mean = -nan s, total = 0.000 s [state-dump] Event stats: [state-dump] PushManager: [state-dump] - num pushes in flight: 0 [state-dump] - num chunks in flight: 0 [state-dump] - num chunks remaining: 0 [state-dump] - max chunks allowed: 409 [state-dump] OwnershipBasedObjectDirectory: [state-dump] - num listeners: 0 [state-dump] - cumulative location updates: 0 [state-dump] - num location updates per second: 0.000 [state-dump] - num location lookups per second: 0.000 [state-dump] - num locations added per second: 0.000 [state-dump] - num locations removed per second: 0.000 [state-dump] BufferPool: [state-dump] - create buffer state map size: 0 [state-dump] PullManager: [state-dump] - num bytes available for pulled objects: 2147483648 [state-dump] - num bytes being pulled (all): 0 [state-dump] - num bytes being pulled / pinned: 0 [state-dump] - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - first get request bundle: N/A [state-dump] - first wait request bundle: N/A [state-dump] - first task request bundle: N/A [state-dump] - num objects queued: 0 [state-dump] - num objects actively pulled (all): 0 [state-dump] - num objects actively pulled / pinned: 0 [state-dump] - num bundles being pulled: 0 [state-dump] - num pull retries: 0 [state-dump] - max timeout seconds: 0 [state-dump] - max timeout request is already processed. No entry. [state-dump] [state-dump] WorkerPool: [state-dump] - registered jobs: 1 [state-dump] - process_failed_job_config_missing: 0 [state-dump] - process_failed_rate_limited: 0 [state-dump] - process_failed_pending_registration: 0 [state-dump] - process_failed_runtime_env_setup_failed: 0 [state-dump] - num PYTHON workers: 20 [state-dump] - num PYTHON drivers: 1 [state-dump] - num PYTHON pending start requests: 0 [state-dump] - num PYTHON pending registration requests: 0 [state-dump] - num object spill callbacks queued: 0 [state-dump] - num object restore queued: 0 [state-dump] - num util functions queued: 0 [state-dump] - num idle workers: 20 [state-dump] TaskDependencyManager: [state-dump] - task deps map size: 0 [state-dump] - get req map size: 0 [state-dump] - wait req map size: 0 [state-dump] - local objects map size: 0 [state-dump] WaitManager: [state-dump] - num active wait requests: 0 [state-dump] Subscriber: [state-dump] Channel WORKER_OBJECT_LOCATIONS_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_OBJECT_EVICTION [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_REF_REMOVED_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] num async plasma notifications: 0 [state-dump] Remote node managers: [state-dump] Event stats: [state-dump] Global stats: 471300 total (36 active) [state-dump] Queueing time: mean = 211.828 us, max = 59.826 s, min = -0.001 s, total = 99.835 s [state-dump] Execution time: mean = 11.636 ms, total = 5484.221 s [state-dump] Event stats: [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog - 113358 total (0 active), Execution time: mean = 532.469 us, total = 60.360 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 113358 total (0 active), Execution time: mean = 35.504 us, total = 4.025 s, Queueing time: mean = 111.139 us, max = 3.003 ms, min = 1.846 us, total = 12.599 s [state-dump] RaySyncer.OnDemandBroadcasting - 53948 total (1 active), Execution time: mean = 10.875 us, total = 586.699 ms, Queueing time: mean = 92.083 us, max = 25.869 ms, min = 6.166 us, total = 4.968 s [state-dump] NodeManager.CheckGC - 53948 total (1 active), Execution time: mean = 3.774 us, total = 203.624 ms, Queueing time: mean = 99.120 us, max = 25.875 ms, min = -0.000 s, total = 5.347 s [state-dump] ObjectManager.UpdateAvailableMemory - 53948 total (0 active), Execution time: mean = 6.264 us, total = 337.931 ms, Queueing time: mean = 111.954 us, max = 45.939 ms, min = 2.228 us, total = 6.040 s [state-dump] RayletWorkerPool.deadline_timer.kill_idle_workers - 26989 total (1 active), Execution time: mean = 18.819 us, total = 507.905 ms, Queueing time: mean = 77.810 us, max = 26.386 ms, min = -0.001 s, total = 2.100 s [state-dump] MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 21556 total (1 active), Execution time: mean = 458.150 us, total = 9.876 s, Queueing time: mean = 75.999 us, max = 4.063 ms, min = -0.000 s, total = 1.638 s [state-dump] NodeManager.deadline_timer.flush_free_objects - 5400 total (1 active), Execution time: mean = 9.394 us, total = 50.727 ms, Queueing time: mean = 178.542 us, max = 2.947 ms, min = 3.811 us, total = 964.127 ms [state-dump] NodeManager.ScheduleAndDispatchTasks - 5400 total (1 active), Execution time: mean = 17.059 us, total = 92.121 ms, Queueing time: mean = 75.698 us, max = 2.581 ms, min = 8.735 us, total = 408.771 ms [state-dump] NodeManager.deadline_timer.spill_objects_when_over_threshold - 5400 total (1 active), Execution time: mean = 3.151 us, total = 17.014 ms, Queueing time: mean = 182.790 us, max = 2.946 ms, min = 3.845 us, total = 987.065 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad - 5398 total (0 active), Execution time: mean = 612.407 us, total = 3.306 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 5398 total (0 active), Execution time: mean = 97.453 us, total = 526.052 ms, Queueing time: mean = 113.034 us, max = 2.934 ms, min = 4.027 us, total = 610.157 ms [state-dump] ClusterResourceManager.ResetRemoteNodeView - 1801 total (1 active), Execution time: mean = 10.120 us, total = 18.227 ms, Queueing time: mean = 76.044 us, max = 442.307 us, min = 14.043 us, total = 136.955 ms [state-dump] NodeManager.deadline_timer.record_metrics - 1080 total (1 active), Execution time: mean = 584.921 us, total = 631.714 ms, Queueing time: mean = 343.681 us, max = 2.010 ms, min = 9.115 us, total = 371.176 ms [state-dump] NodeManager.GcsCheckAlive - 1080 total (1 active), Execution time: mean = 318.131 us, total = 343.581 ms, Queueing time: mean = 610.018 us, max = 2.567 ms, min = 6.690 us, total = 658.820 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 1080 total (0 active), Execution time: mean = 55.921 us, total = 60.394 ms, Queueing time: mean = 112.846 us, max = 4.779 ms, min = 11.561 us, total = 121.874 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 1080 total (0 active), Execution time: mean = 1.603 ms, total = 1.731 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.deadline_timer.debug_state_dump - 540 total (1 active), Execution time: mean = 1.783 ms, total = 962.801 ms, Queueing time: mean = 71.876 us, max = 180.398 us, min = 11.609 us, total = 38.813 ms [state-dump] ClientConnection.async_read.ProcessMessageHeader - 95 total (21 active), Execution time: mean = 7.620 us, total = 723.861 us, Queueing time: mean = 660.815 ms, max = 59.826 s, min = 29.537 us, total = 62.777 s [state-dump] NodeManager.deadline_timer.print_event_loop_stats - 90 total (1 active, 1 running), Execution time: mean = 2.727 ms, total = 245.470 ms, Queueing time: mean = 78.489 us, max = 325.100 us, min = 15.835 us, total = 7.064 ms [state-dump] ClientConnection.async_read.ProcessMessage - 74 total (0 active), Execution time: mean = 752.373 us, total = 55.676 ms, Queueing time: mean = 18.898 us, max = 247.615 us, min = 4.453 us, total = 1.398 ms [state-dump] ClientConnection.async_write.DoAsyncWrites - 22 total (0 active), Execution time: mean = 2.129 us, total = 46.834 us, Queueing time: mean = 43.475 us, max = 264.443 us, min = 17.727 us, total = 956.459 us [state-dump] NodeManagerService.grpc_server.GetSystemConfig - 21 total (0 active), Execution time: mean = 800.451 us, total = 16.809 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ObjectManager.ObjectAdded - 21 total (0 active), Execution time: mean = 13.522 us, total = 283.961 us, Queueing time: mean = 115.200 us, max = 203.067 us, min = 15.853 us, total = 2.419 ms [state-dump] ObjectManager.ObjectDeleted - 21 total (0 active), Execution time: mean = 18.654 us, total = 391.728 us, Queueing time: mean = 66.932 us, max = 269.040 us, min = 24.280 us, total = 1.406 ms [state-dump] CoreWorkerService.grpc_client.LocalGC - 21 total (1 active), Execution time: mean = 75.713 ms, total = 1.590 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 21 total (0 active), Execution time: mean = 119.783 us, total = 2.515 ms, Queueing time: mean = 119.709 us, max = 193.975 us, min = 26.529 us, total = 2.514 ms [state-dump] CoreWorkerService.grpc_client.LocalGC.OnReplyReceived - 20 total (0 active), Execution time: mean = 63.049 us, total = 1.261 ms, Queueing time: mean = 151.094 us, max = 223.485 us, min = 56.603 us, total = 3.022 ms [state-dump] PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 204.482 us, total = 2.658 ms, Queueing time: mean = 3.109 ms, max = 9.915 ms, min = 28.359 us, total = 40.413 ms [state-dump] - 12 total (0 active), Execution time: mean = 1.109 us, total = 13.308 us, Queueing time: mean = 117.235 us, max = 207.332 us, min = 32.817 us, total = 1.407 ms [state-dump] RaySyncer.BroadcastMessage - 12 total (0 active), Execution time: mean = 229.790 us, total = 2.757 ms, Queueing time: mean = 796.500 ns, max = 1.330 us, min = 180.000 ns, total = 9.558 us [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 11 total (1 active), Execution time: mean = 490.692 s, total = 5397.609 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 10 total (0 active), Execution time: mean = 376.502 us, total = 3.765 ms, Queueing time: mean = 122.227 us, max = 243.371 us, min = 20.299 us, total = 1.222 ms [state-dump] NodeManagerService.grpc_server.RequestWorkerLease - 10 total (0 active), Execution time: mean = 967.372 us, total = 9.674 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.RequestWorkerLease.HandleRequestImpl - 10 total (0 active), Execution time: mean = 133.400 us, total = 1.334 ms, Queueing time: mean = 133.654 us, max = 258.016 us, min = 50.206 us, total = 1.337 ms [state-dump] WorkerPool.PopWorkerCallback - 10 total (0 active), Execution time: mean = 24.488 us, total = 244.878 us, Queueing time: mean = 114.997 us, max = 206.869 us, min = 23.653 us, total = 1.150 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker - 10 total (0 active), Execution time: mean = 698.937 us, total = 6.989 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReturnWorker.HandleRequestImpl - 10 total (0 active), Execution time: mean = 129.906 us, total = 1.299 ms, Queueing time: mean = 109.463 us, max = 137.712 us, min = 31.332 us, total = 1.095 ms [state-dump] NodeManager.GCTaskFailureReason - 7 total (1 active), Execution time: mean = 7.313 us, total = 51.189 us, Queueing time: mean = 59.264 us, max = 97.290 us, min = 24.344 us, total = 414.847 us [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 1.320 ms, total = 2.640 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] RaySyncerRegister - 2 total (0 active), Execution time: mean = 2.053 us, total = 4.106 us, Queueing time: mean = 222.000 ns, max = 369.000 ns, min = 75.000 ns, total = 444.000 ns [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 142.119 us, total = 284.237 us, Queueing time: mean = 702.200 us, max = 1.287 ms, min = 117.240 us, total = 1.404 ms [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 92.080 us, total = 92.080 us, Queueing time: mean = 36.492 us, max = 36.492 us, min = 36.492 us, total = 36.492 us [state-dump] Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 74.750 us, total = 74.750 us, Queueing time: mean = 225.429 us, max = 225.429 us, min = 225.429 us, total = 225.429 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 1.746 ms, total = 1.746 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 1.935 ms, total = 1.935 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 1.022 s, total = 1.022 s, Queueing time: mean = 126.779 us, max = 126.779 us, min = 126.779 us, total = 126.779 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 2.178 ms, total = 2.178 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 43.810 us, total = 43.810 us, Queueing time: mean = 116.080 us, max = 116.080 us, min = 116.080 us, total = 116.080 us [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 2.002 ms, total = 2.002 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 1.278 ms, total = 1.278 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 288.995 us, total = 288.995 us, Queueing time: mean = 122.868 us, max = 122.868 us, min = 122.868 us, total = 122.868 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 242.297 us, total = 242.297 us, Queueing time: mean = 122.977 us, max = 122.977 us, min = 122.977 us, total = 122.977 us [state-dump] DebugString() time ms: 1 [state-dump] [state-dump] [2025-01-21 00:32:54,931 I 13636 13664] (raylet) store.cc:564: Plasma store debug dump: Current usage: 0 / 2.14748 GB - num bytes created total: 168 0 pending objects of total size 0MB - objects spillable: 0 - bytes spillable: 0 - objects unsealed: 0 - bytes unsealed: 0 - objects in use: 0 - bytes in use: 0 - objects evictable: 0 - bytes evictable: 0 - objects created by worker: 0 - bytes created by worker: 0 - objects restored: 0 - bytes restored: 0 - objects received: 0 - bytes received: 0 - objects errored: 0 - bytes errored: 0 [2025-01-21 00:32:56,182 I 13636 13636] (raylet) node_manager.cc:525: [state-dump] NodeManager: [state-dump] Node ID: 94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3 [state-dump] Node name: 192.168.0.2 [state-dump] InitialConfigResources: {object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, GPU: 20000, node:__internal_head__: 10000, accelerator_type:A40: 10000, memory: 844966629380000} [state-dump] ClusterTaskManager: [state-dump] ========== Node: 94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3 ================= [state-dump] Infeasible queue length: 0 [state-dump] Schedule queue length: 0 [state-dump] Dispatch queue length: 0 [state-dump] num_waiting_for_resource: 0 [state-dump] num_waiting_for_plasma_memory: 0 [state-dump] num_waiting_for_remote_node_resources: 0 [state-dump] num_worker_not_started_by_job_config_not_exist: 0 [state-dump] num_worker_not_started_by_registration_timeout: 0 [state-dump] num_tasks_waiting_for_workers: 0 [state-dump] num_cancelled_tasks: 0 [state-dump] cluster_resource_scheduler state: [state-dump] Local id: -2729954820020547849 Local resources: {"total":{CPU: [200000], memory: [844966629380000], accelerator_type:A40: [10000], node:192.168.0.2: [10000], node:__internal_head__: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000]}}, "available": {CPU: [200000], memory: [844966629380000], accelerator_type:A40: [10000], node:192.168.0.2: [10000], node:__internal_head__: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000]}}, "labels":{"ray.io/node_id":"94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3",} is_draining: 0 is_idle: 1 Cluster resources: node id: -2729954820020547849{"total":{object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, GPU: 20000, node:__internal_head__: 10000, accelerator_type:A40: 10000, memory: 844966629380000}}, "available": {GPU: 20000, node:__internal_head__: 10000, memory: 844966629380000, accelerator_type:A40: 10000, object_store_memory: 21474836480000, node:192.168.0.2: 10000, CPU: 200000}}, "labels":{"ray.io/node_id":"94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} [state-dump] Waiting tasks size: 0 [state-dump] Number of executing tasks: 0 [state-dump] Number of pinned task arguments: 0 [state-dump] Number of total spilled tasks: 0 [state-dump] Number of spilled waiting tasks: 0 [state-dump] Number of spilled unschedulable tasks: 0 [state-dump] Resource usage { [state-dump] } [state-dump] Backlog Size per scheduling descriptor :{workerId: num backlogs}: [state-dump] [state-dump] Running tasks by scheduling class: [state-dump] ================================================== [state-dump] [state-dump] ClusterResources: [state-dump] LocalObjectManager: [state-dump] - num pinned objects: 0 [state-dump] - pinned objects size: 0 [state-dump] - num objects pending restore: 0 [state-dump] - num objects pending spill: 0 [state-dump] - num bytes pending spill: 0 [state-dump] - num bytes currently spilled: 0 [state-dump] - cumulative spill requests: 0 [state-dump] - cumulative restore requests: 0 [state-dump] - spilled objects pending delete: 0 [state-dump] [state-dump] ObjectManager: [state-dump] - num local objects: 0 [state-dump] - num unfulfilled push requests: 0 [state-dump] - num object pull requests: 0 [state-dump] - num chunks received total: 0 [state-dump] - num chunks received failed (all): 0 [state-dump] - num chunks received failed / cancelled: 0 [state-dump] - num chunks received failed / plasma error: 0 [state-dump] Event stats: [state-dump] Global stats: 0 total (0 active) [state-dump] Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Execution time: mean = -nan s, total = 0.000 s [state-dump] Event stats: [state-dump] PushManager: [state-dump] - num pushes in flight: 0 [state-dump] - num chunks in flight: 0 [state-dump] - num chunks remaining: 0 [state-dump] - max chunks allowed: 409 [state-dump] OwnershipBasedObjectDirectory: [state-dump] - num listeners: 0 [state-dump] - cumulative location updates: 0 [state-dump] - num location updates per second: 0.000 [state-dump] - num location lookups per second: 0.000 [state-dump] - num locations added per second: 0.000 [state-dump] - num locations removed per second: 0.000 [state-dump] BufferPool: [state-dump] - create buffer state map size: 0 [state-dump] PullManager: [state-dump] - num bytes available for pulled objects: 2147483648 [state-dump] - num bytes being pulled (all): 0 [state-dump] - num bytes being pulled / pinned: 0 [state-dump] - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - first get request bundle: N/A [state-dump] - first wait request bundle: N/A [state-dump] - first task request bundle: N/A [state-dump] - num objects queued: 0 [state-dump] - num objects actively pulled (all): 0 [state-dump] - num objects actively pulled / pinned: 0 [state-dump] - num bundles being pulled: 0 [state-dump] - num pull retries: 0 [state-dump] - max timeout seconds: 0 [state-dump] - max timeout request is already processed. No entry. [state-dump] [state-dump] WorkerPool: [state-dump] - registered jobs: 1 [state-dump] - process_failed_job_config_missing: 0 [state-dump] - process_failed_rate_limited: 0 [state-dump] - process_failed_pending_registration: 0 [state-dump] - process_failed_runtime_env_setup_failed: 0 [state-dump] - num PYTHON workers: 20 [state-dump] - num PYTHON drivers: 1 [state-dump] - num PYTHON pending start requests: 0 [state-dump] - num PYTHON pending registration requests: 0 [state-dump] - num object spill callbacks queued: 0 [state-dump] - num object restore queued: 0 [state-dump] - num util functions queued: 0 [state-dump] - num idle workers: 20 [state-dump] TaskDependencyManager: [state-dump] - task deps map size: 0 [state-dump] - get req map size: 0 [state-dump] - wait req map size: 0 [state-dump] - local objects map size: 0 [state-dump] WaitManager: [state-dump] - num active wait requests: 0 [state-dump] Subscriber: [state-dump] Channel WORKER_OBJECT_LOCATIONS_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_OBJECT_EVICTION [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_REF_REMOVED_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] num async plasma notifications: 0 [state-dump] Remote node managers: [state-dump] Event stats: [state-dump] Global stats: 476534 total (35 active) [state-dump] Queueing time: mean = 210.366 us, max = 59.826 s, min = -0.001 s, total = 100.247 s [state-dump] Execution time: mean = 11.511 ms, total = 5485.560 s [state-dump] Event stats: [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog - 114618 total (0 active), Execution time: mean = 532.674 us, total = 61.054 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 114618 total (0 active), Execution time: mean = 35.522 us, total = 4.071 s, Queueing time: mean = 111.201 us, max = 3.003 ms, min = 1.846 us, total = 12.746 s [state-dump] RaySyncer.OnDemandBroadcasting - 54548 total (1 active), Execution time: mean = 10.883 us, total = 593.662 ms, Queueing time: mean = 92.091 us, max = 25.869 ms, min = 6.166 us, total = 5.023 s [state-dump] ObjectManager.UpdateAvailableMemory - 54548 total (0 active), Execution time: mean = 6.272 us, total = 342.144 ms, Queueing time: mean = 112.013 us, max = 45.939 ms, min = 2.228 us, total = 6.110 s [state-dump] NodeManager.CheckGC - 54547 total (1 active), Execution time: mean = 3.835 us, total = 209.202 ms, Queueing time: mean = 98.865 us, max = 25.875 ms, min = -0.000 s, total = 5.393 s [state-dump] RayletWorkerPool.deadline_timer.kill_idle_workers - 27289 total (1 active), Execution time: mean = 18.831 us, total = 513.875 ms, Queueing time: mean = 77.818 us, max = 26.386 ms, min = -0.001 s, total = 2.124 s [state-dump] MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 21795 total (1 active), Execution time: mean = 458.328 us, total = 9.989 s, Queueing time: mean = 76.080 us, max = 4.063 ms, min = -0.000 s, total = 1.658 s [state-dump] NodeManager.deadline_timer.flush_free_objects - 5460 total (1 active), Execution time: mean = 9.398 us, total = 51.313 ms, Queueing time: mean = 178.635 us, max = 2.947 ms, min = 3.811 us, total = 975.347 ms [state-dump] NodeManager.ScheduleAndDispatchTasks - 5460 total (1 active), Execution time: mean = 17.088 us, total = 93.301 ms, Queueing time: mean = 75.729 us, max = 2.581 ms, min = 8.735 us, total = 413.483 ms [state-dump] NodeManager.deadline_timer.spill_objects_when_over_threshold - 5460 total (1 active), Execution time: mean = 3.152 us, total = 17.209 ms, Queueing time: mean = 182.885 us, max = 2.946 ms, min = 3.845 us, total = 998.551 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad - 5458 total (0 active), Execution time: mean = 612.542 us, total = 3.343 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 5458 total (0 active), Execution time: mean = 97.483 us, total = 532.065 ms, Queueing time: mean = 113.138 us, max = 2.934 ms, min = 4.027 us, total = 617.508 ms [state-dump] ClusterResourceManager.ResetRemoteNodeView - 1821 total (1 active), Execution time: mean = 10.107 us, total = 18.404 ms, Queueing time: mean = 75.969 us, max = 442.307 us, min = 14.043 us, total = 138.340 ms [state-dump] NodeManager.deadline_timer.record_metrics - 1092 total (1 active), Execution time: mean = 584.962 us, total = 638.778 ms, Queueing time: mean = 344.102 us, max = 2.010 ms, min = 9.115 us, total = 375.760 ms [state-dump] NodeManager.GcsCheckAlive - 1092 total (1 active), Execution time: mean = 318.767 us, total = 348.093 ms, Queueing time: mean = 609.867 us, max = 2.567 ms, min = 6.690 us, total = 665.975 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 1092 total (0 active), Execution time: mean = 55.946 us, total = 61.093 ms, Queueing time: mean = 112.772 us, max = 4.779 ms, min = 11.561 us, total = 123.147 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 1092 total (0 active), Execution time: mean = 1.603 ms, total = 1.751 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.deadline_timer.debug_state_dump - 546 total (1 active), Execution time: mean = 1.784 ms, total = 973.837 ms, Queueing time: mean = 71.872 us, max = 180.398 us, min = 11.609 us, total = 39.242 ms [state-dump] ClientConnection.async_read.ProcessMessageHeader - 95 total (21 active), Execution time: mean = 7.620 us, total = 723.861 us, Queueing time: mean = 660.815 ms, max = 59.826 s, min = 29.537 us, total = 62.777 s [state-dump] NodeManager.deadline_timer.print_event_loop_stats - 91 total (1 active, 1 running), Execution time: mean = 2.729 ms, total = 248.301 ms, Queueing time: mean = 78.363 us, max = 325.100 us, min = 15.835 us, total = 7.131 ms [state-dump] ClientConnection.async_read.ProcessMessage - 74 total (0 active), Execution time: mean = 752.373 us, total = 55.676 ms, Queueing time: mean = 18.898 us, max = 247.615 us, min = 4.453 us, total = 1.398 ms [state-dump] ClientConnection.async_write.DoAsyncWrites - 22 total (0 active), Execution time: mean = 2.129 us, total = 46.834 us, Queueing time: mean = 43.475 us, max = 264.443 us, min = 17.727 us, total = 956.459 us [state-dump] CoreWorkerService.grpc_client.LocalGC.OnReplyReceived - 21 total (0 active), Execution time: mean = 65.428 us, total = 1.374 ms, Queueing time: mean = 149.534 us, max = 223.485 us, min = 56.603 us, total = 3.140 ms [state-dump] ObjectManager.ObjectAdded - 21 total (0 active), Execution time: mean = 13.522 us, total = 283.961 us, Queueing time: mean = 115.200 us, max = 203.067 us, min = 15.853 us, total = 2.419 ms [state-dump] ObjectManager.ObjectDeleted - 21 total (0 active), Execution time: mean = 18.654 us, total = 391.728 us, Queueing time: mean = 66.932 us, max = 269.040 us, min = 24.280 us, total = 1.406 ms [state-dump] CoreWorkerService.grpc_client.LocalGC - 21 total (0 active), Execution time: mean = 93.349 ms, total = 1.960 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.GetSystemConfig - 21 total (0 active), Execution time: mean = 800.451 us, total = 16.809 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 21 total (0 active), Execution time: mean = 119.783 us, total = 2.515 ms, Queueing time: mean = 119.709 us, max = 193.975 us, min = 26.529 us, total = 2.514 ms [state-dump] PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 204.482 us, total = 2.658 ms, Queueing time: mean = 3.109 ms, max = 9.915 ms, min = 28.359 us, total = 40.413 ms [state-dump] - 12 total (0 active), Execution time: mean = 1.109 us, total = 13.308 us, Queueing time: mean = 117.235 us, max = 207.332 us, min = 32.817 us, total = 1.407 ms [state-dump] RaySyncer.BroadcastMessage - 12 total (0 active), Execution time: mean = 229.790 us, total = 2.757 ms, Queueing time: mean = 796.500 ns, max = 1.330 us, min = 180.000 ns, total = 9.558 us [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 11 total (1 active), Execution time: mean = 490.692 s, total = 5397.609 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.RequestWorkerLease - 10 total (0 active), Execution time: mean = 967.372 us, total = 9.674 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReturnWorker.HandleRequestImpl - 10 total (0 active), Execution time: mean = 129.906 us, total = 1.299 ms, Queueing time: mean = 109.463 us, max = 137.712 us, min = 31.332 us, total = 1.095 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker - 10 total (0 active), Execution time: mean = 698.937 us, total = 6.989 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.RequestWorkerLease.HandleRequestImpl - 10 total (0 active), Execution time: mean = 133.400 us, total = 1.334 ms, Queueing time: mean = 133.654 us, max = 258.016 us, min = 50.206 us, total = 1.337 ms [state-dump] WorkerPool.PopWorkerCallback - 10 total (0 active), Execution time: mean = 24.488 us, total = 244.878 us, Queueing time: mean = 114.997 us, max = 206.869 us, min = 23.653 us, total = 1.150 ms [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 10 total (0 active), Execution time: mean = 376.502 us, total = 3.765 ms, Queueing time: mean = 122.227 us, max = 243.371 us, min = 20.299 us, total = 1.222 ms [state-dump] NodeManager.GCTaskFailureReason - 7 total (1 active), Execution time: mean = 7.313 us, total = 51.189 us, Queueing time: mean = 59.264 us, max = 97.290 us, min = 24.344 us, total = 414.847 us [state-dump] RaySyncerRegister - 2 total (0 active), Execution time: mean = 2.053 us, total = 4.106 us, Queueing time: mean = 222.000 ns, max = 369.000 ns, min = 75.000 ns, total = 444.000 ns [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 1.320 ms, total = 2.640 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 142.119 us, total = 284.237 us, Queueing time: mean = 702.200 us, max = 1.287 ms, min = 117.240 us, total = 1.404 ms [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 1.746 ms, total = 1.746 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 74.750 us, total = 74.750 us, Queueing time: mean = 225.429 us, max = 225.429 us, min = 225.429 us, total = 225.429 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 92.080 us, total = 92.080 us, Queueing time: mean = 36.492 us, max = 36.492 us, min = 36.492 us, total = 36.492 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 1.935 ms, total = 1.935 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 1.022 s, total = 1.022 s, Queueing time: mean = 126.779 us, max = 126.779 us, min = 126.779 us, total = 126.779 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 2.178 ms, total = 2.178 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 43.810 us, total = 43.810 us, Queueing time: mean = 116.080 us, max = 116.080 us, min = 116.080 us, total = 116.080 us [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 2.002 ms, total = 2.002 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 1.278 ms, total = 1.278 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 288.995 us, total = 288.995 us, Queueing time: mean = 122.868 us, max = 122.868 us, min = 122.868 us, total = 122.868 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 242.297 us, total = 242.297 us, Queueing time: mean = 122.977 us, max = 122.977 us, min = 122.977 us, total = 122.977 us [state-dump] DebugString() time ms: 2 [state-dump] [state-dump] [2025-01-21 00:33:54,931 I 13636 13664] (raylet) store.cc:564: Plasma store debug dump: Current usage: 0 / 2.14748 GB - num bytes created total: 168 0 pending objects of total size 0MB - objects spillable: 0 - bytes spillable: 0 - objects unsealed: 0 - bytes unsealed: 0 - objects in use: 0 - bytes in use: 0 - objects evictable: 0 - bytes evictable: 0 - objects created by worker: 0 - bytes created by worker: 0 - objects restored: 0 - bytes restored: 0 - objects received: 0 - bytes received: 0 - objects errored: 0 - bytes errored: 0 [2025-01-21 00:33:56,186 I 13636 13636] (raylet) node_manager.cc:525: [state-dump] NodeManager: [state-dump] Node ID: 94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3 [state-dump] Node name: 192.168.0.2 [state-dump] InitialConfigResources: {object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, GPU: 20000, node:__internal_head__: 10000, accelerator_type:A40: 10000, memory: 844966629380000} [state-dump] ClusterTaskManager: [state-dump] ========== Node: 94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3 ================= [state-dump] Infeasible queue length: 0 [state-dump] Schedule queue length: 0 [state-dump] Dispatch queue length: 0 [state-dump] num_waiting_for_resource: 0 [state-dump] num_waiting_for_plasma_memory: 0 [state-dump] num_waiting_for_remote_node_resources: 0 [state-dump] num_worker_not_started_by_job_config_not_exist: 0 [state-dump] num_worker_not_started_by_registration_timeout: 0 [state-dump] num_tasks_waiting_for_workers: 0 [state-dump] num_cancelled_tasks: 0 [state-dump] cluster_resource_scheduler state: [state-dump] Local id: -2729954820020547849 Local resources: {"total":{CPU: [200000], memory: [844966629380000], accelerator_type:A40: [10000], node:192.168.0.2: [10000], node:__internal_head__: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000]}}, "available": {CPU: [200000], memory: [844966629380000], accelerator_type:A40: [10000], node:192.168.0.2: [10000], node:__internal_head__: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000]}}, "labels":{"ray.io/node_id":"94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3",} is_draining: 0 is_idle: 1 Cluster resources: node id: -2729954820020547849{"total":{object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, GPU: 20000, node:__internal_head__: 10000, accelerator_type:A40: 10000, memory: 844966629380000}}, "available": {GPU: 20000, node:__internal_head__: 10000, memory: 844966629380000, accelerator_type:A40: 10000, object_store_memory: 21474836480000, node:192.168.0.2: 10000, CPU: 200000}}, "labels":{"ray.io/node_id":"94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} [state-dump] Waiting tasks size: 0 [state-dump] Number of executing tasks: 0 [state-dump] Number of pinned task arguments: 0 [state-dump] Number of total spilled tasks: 0 [state-dump] Number of spilled waiting tasks: 0 [state-dump] Number of spilled unschedulable tasks: 0 [state-dump] Resource usage { [state-dump] } [state-dump] Backlog Size per scheduling descriptor :{workerId: num backlogs}: [state-dump] [state-dump] Running tasks by scheduling class: [state-dump] ================================================== [state-dump] [state-dump] ClusterResources: [state-dump] LocalObjectManager: [state-dump] - num pinned objects: 0 [state-dump] - pinned objects size: 0 [state-dump] - num objects pending restore: 0 [state-dump] - num objects pending spill: 0 [state-dump] - num bytes pending spill: 0 [state-dump] - num bytes currently spilled: 0 [state-dump] - cumulative spill requests: 0 [state-dump] - cumulative restore requests: 0 [state-dump] - spilled objects pending delete: 0 [state-dump] [state-dump] ObjectManager: [state-dump] - num local objects: 0 [state-dump] - num unfulfilled push requests: 0 [state-dump] - num object pull requests: 0 [state-dump] - num chunks received total: 0 [state-dump] - num chunks received failed (all): 0 [state-dump] - num chunks received failed / cancelled: 0 [state-dump] - num chunks received failed / plasma error: 0 [state-dump] Event stats: [state-dump] Global stats: 0 total (0 active) [state-dump] Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Execution time: mean = -nan s, total = 0.000 s [state-dump] Event stats: [state-dump] PushManager: [state-dump] - num pushes in flight: 0 [state-dump] - num chunks in flight: 0 [state-dump] - num chunks remaining: 0 [state-dump] - max chunks allowed: 409 [state-dump] OwnershipBasedObjectDirectory: [state-dump] - num listeners: 0 [state-dump] - cumulative location updates: 0 [state-dump] - num location updates per second: 0.000 [state-dump] - num location lookups per second: 0.000 [state-dump] - num locations added per second: 0.000 [state-dump] - num locations removed per second: 0.000 [state-dump] BufferPool: [state-dump] - create buffer state map size: 0 [state-dump] PullManager: [state-dump] - num bytes available for pulled objects: 2147483648 [state-dump] - num bytes being pulled (all): 0 [state-dump] - num bytes being pulled / pinned: 0 [state-dump] - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - first get request bundle: N/A [state-dump] - first wait request bundle: N/A [state-dump] - first task request bundle: N/A [state-dump] - num objects queued: 0 [state-dump] - num objects actively pulled (all): 0 [state-dump] - num objects actively pulled / pinned: 0 [state-dump] - num bundles being pulled: 0 [state-dump] - num pull retries: 0 [state-dump] - max timeout seconds: 0 [state-dump] - max timeout request is already processed. No entry. [state-dump] [state-dump] WorkerPool: [state-dump] - registered jobs: 1 [state-dump] - process_failed_job_config_missing: 0 [state-dump] - process_failed_rate_limited: 0 [state-dump] - process_failed_pending_registration: 0 [state-dump] - process_failed_runtime_env_setup_failed: 0 [state-dump] - num PYTHON workers: 20 [state-dump] - num PYTHON drivers: 1 [state-dump] - num PYTHON pending start requests: 0 [state-dump] - num PYTHON pending registration requests: 0 [state-dump] - num object spill callbacks queued: 0 [state-dump] - num object restore queued: 0 [state-dump] - num util functions queued: 0 [state-dump] - num idle workers: 20 [state-dump] TaskDependencyManager: [state-dump] - task deps map size: 0 [state-dump] - get req map size: 0 [state-dump] - wait req map size: 0 [state-dump] - local objects map size: 0 [state-dump] WaitManager: [state-dump] - num active wait requests: 0 [state-dump] Subscriber: [state-dump] Channel WORKER_OBJECT_LOCATIONS_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_OBJECT_EVICTION [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_REF_REMOVED_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] num async plasma notifications: 0 [state-dump] Remote node managers: [state-dump] Event stats: [state-dump] Global stats: 481767 total (35 active) [state-dump] Queueing time: mean = 208.926 us, max = 59.826 s, min = -0.001 s, total = 100.653 s [state-dump] Execution time: mean = 11.388 ms, total = 5486.514 s [state-dump] Event stats: [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog - 115878 total (0 active), Execution time: mean = 532.773 us, total = 61.737 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 115878 total (0 active), Execution time: mean = 35.540 us, total = 4.118 s, Queueing time: mean = 111.240 us, max = 3.003 ms, min = 1.846 us, total = 12.890 s [state-dump] RaySyncer.OnDemandBroadcasting - 55147 total (1 active), Execution time: mean = 10.886 us, total = 600.345 ms, Queueing time: mean = 92.116 us, max = 25.869 ms, min = 6.166 us, total = 5.080 s [state-dump] NodeManager.CheckGC - 55147 total (1 active), Execution time: mean = 3.894 us, total = 214.744 ms, Queueing time: mean = 98.597 us, max = 25.875 ms, min = -0.000 s, total = 5.437 s [state-dump] ObjectManager.UpdateAvailableMemory - 55147 total (0 active), Execution time: mean = 6.277 us, total = 346.152 ms, Queueing time: mean = 112.026 us, max = 45.939 ms, min = 2.228 us, total = 6.178 s [state-dump] RayletWorkerPool.deadline_timer.kill_idle_workers - 27589 total (1 active), Execution time: mean = 18.838 us, total = 519.729 ms, Queueing time: mean = 77.859 us, max = 26.386 ms, min = -0.001 s, total = 2.148 s [state-dump] MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 22035 total (1 active), Execution time: mean = 458.392 us, total = 10.101 s, Queueing time: mean = 76.108 us, max = 4.063 ms, min = -0.000 s, total = 1.677 s [state-dump] NodeManager.deadline_timer.flush_free_objects - 5520 total (1 active), Execution time: mean = 9.403 us, total = 51.903 ms, Queueing time: mean = 178.739 us, max = 2.947 ms, min = 3.811 us, total = 986.640 ms [state-dump] NodeManager.ScheduleAndDispatchTasks - 5520 total (1 active), Execution time: mean = 17.093 us, total = 94.351 ms, Queueing time: mean = 75.670 us, max = 2.581 ms, min = 8.735 us, total = 417.699 ms [state-dump] NodeManager.deadline_timer.spill_objects_when_over_threshold - 5520 total (1 active), Execution time: mean = 3.152 us, total = 17.397 ms, Queueing time: mean = 182.991 us, max = 2.946 ms, min = 3.845 us, total = 1.010 s [state-dump] NodeManagerService.grpc_server.GetResourceLoad - 5518 total (0 active), Execution time: mean = 612.583 us, total = 3.380 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 5518 total (0 active), Execution time: mean = 97.502 us, total = 538.014 ms, Queueing time: mean = 113.143 us, max = 2.934 ms, min = 4.027 us, total = 624.320 ms [state-dump] ClusterResourceManager.ResetRemoteNodeView - 1841 total (1 active), Execution time: mean = 10.114 us, total = 18.619 ms, Queueing time: mean = 76.009 us, max = 442.307 us, min = 14.043 us, total = 139.933 ms [state-dump] NodeManager.deadline_timer.record_metrics - 1104 total (1 active), Execution time: mean = 584.684 us, total = 645.491 ms, Queueing time: mean = 344.827 us, max = 2.010 ms, min = 9.115 us, total = 380.689 ms [state-dump] NodeManager.GcsCheckAlive - 1104 total (1 active), Execution time: mean = 319.074 us, total = 352.258 ms, Queueing time: mean = 610.165 us, max = 2.567 ms, min = 6.690 us, total = 673.622 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 1104 total (0 active), Execution time: mean = 55.967 us, total = 61.787 ms, Queueing time: mean = 112.836 us, max = 4.779 ms, min = 11.561 us, total = 124.571 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 1104 total (0 active), Execution time: mean = 1.604 ms, total = 1.771 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.deadline_timer.debug_state_dump - 552 total (1 active), Execution time: mean = 1.784 ms, total = 985.041 ms, Queueing time: mean = 72.012 us, max = 180.398 us, min = 11.609 us, total = 39.750 ms [state-dump] ClientConnection.async_read.ProcessMessageHeader - 95 total (21 active), Execution time: mean = 7.620 us, total = 723.861 us, Queueing time: mean = 660.815 ms, max = 59.826 s, min = 29.537 us, total = 62.777 s [state-dump] NodeManager.deadline_timer.print_event_loop_stats - 92 total (1 active, 1 running), Execution time: mean = 2.731 ms, total = 251.277 ms, Queueing time: mean = 78.188 us, max = 325.100 us, min = 15.835 us, total = 7.193 ms [state-dump] ClientConnection.async_read.ProcessMessage - 74 total (0 active), Execution time: mean = 752.373 us, total = 55.676 ms, Queueing time: mean = 18.898 us, max = 247.615 us, min = 4.453 us, total = 1.398 ms [state-dump] ClientConnection.async_write.DoAsyncWrites - 22 total (0 active), Execution time: mean = 2.129 us, total = 46.834 us, Queueing time: mean = 43.475 us, max = 264.443 us, min = 17.727 us, total = 956.459 us [state-dump] CoreWorkerService.grpc_client.LocalGC.OnReplyReceived - 21 total (0 active), Execution time: mean = 65.428 us, total = 1.374 ms, Queueing time: mean = 149.534 us, max = 223.485 us, min = 56.603 us, total = 3.140 ms [state-dump] ObjectManager.ObjectAdded - 21 total (0 active), Execution time: mean = 13.522 us, total = 283.961 us, Queueing time: mean = 115.200 us, max = 203.067 us, min = 15.853 us, total = 2.419 ms [state-dump] ObjectManager.ObjectDeleted - 21 total (0 active), Execution time: mean = 18.654 us, total = 391.728 us, Queueing time: mean = 66.932 us, max = 269.040 us, min = 24.280 us, total = 1.406 ms [state-dump] CoreWorkerService.grpc_client.LocalGC - 21 total (0 active), Execution time: mean = 93.349 ms, total = 1.960 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.GetSystemConfig - 21 total (0 active), Execution time: mean = 800.451 us, total = 16.809 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 21 total (0 active), Execution time: mean = 119.783 us, total = 2.515 ms, Queueing time: mean = 119.709 us, max = 193.975 us, min = 26.529 us, total = 2.514 ms [state-dump] PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 204.482 us, total = 2.658 ms, Queueing time: mean = 3.109 ms, max = 9.915 ms, min = 28.359 us, total = 40.413 ms [state-dump] - 12 total (0 active), Execution time: mean = 1.109 us, total = 13.308 us, Queueing time: mean = 117.235 us, max = 207.332 us, min = 32.817 us, total = 1.407 ms [state-dump] RaySyncer.BroadcastMessage - 12 total (0 active), Execution time: mean = 229.790 us, total = 2.757 ms, Queueing time: mean = 796.500 ns, max = 1.330 us, min = 180.000 ns, total = 9.558 us [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 11 total (1 active), Execution time: mean = 490.692 s, total = 5397.609 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.RequestWorkerLease - 10 total (0 active), Execution time: mean = 967.372 us, total = 9.674 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReturnWorker.HandleRequestImpl - 10 total (0 active), Execution time: mean = 129.906 us, total = 1.299 ms, Queueing time: mean = 109.463 us, max = 137.712 us, min = 31.332 us, total = 1.095 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker - 10 total (0 active), Execution time: mean = 698.937 us, total = 6.989 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.RequestWorkerLease.HandleRequestImpl - 10 total (0 active), Execution time: mean = 133.400 us, total = 1.334 ms, Queueing time: mean = 133.654 us, max = 258.016 us, min = 50.206 us, total = 1.337 ms [state-dump] WorkerPool.PopWorkerCallback - 10 total (0 active), Execution time: mean = 24.488 us, total = 244.878 us, Queueing time: mean = 114.997 us, max = 206.869 us, min = 23.653 us, total = 1.150 ms [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 10 total (0 active), Execution time: mean = 376.502 us, total = 3.765 ms, Queueing time: mean = 122.227 us, max = 243.371 us, min = 20.299 us, total = 1.222 ms [state-dump] NodeManager.GCTaskFailureReason - 7 total (1 active), Execution time: mean = 7.313 us, total = 51.189 us, Queueing time: mean = 59.264 us, max = 97.290 us, min = 24.344 us, total = 414.847 us [state-dump] RaySyncerRegister - 2 total (0 active), Execution time: mean = 2.053 us, total = 4.106 us, Queueing time: mean = 222.000 ns, max = 369.000 ns, min = 75.000 ns, total = 444.000 ns [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 1.320 ms, total = 2.640 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 142.119 us, total = 284.237 us, Queueing time: mean = 702.200 us, max = 1.287 ms, min = 117.240 us, total = 1.404 ms [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 1.746 ms, total = 1.746 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 74.750 us, total = 74.750 us, Queueing time: mean = 225.429 us, max = 225.429 us, min = 225.429 us, total = 225.429 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 92.080 us, total = 92.080 us, Queueing time: mean = 36.492 us, max = 36.492 us, min = 36.492 us, total = 36.492 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 1.935 ms, total = 1.935 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 1.022 s, total = 1.022 s, Queueing time: mean = 126.779 us, max = 126.779 us, min = 126.779 us, total = 126.779 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 2.178 ms, total = 2.178 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 43.810 us, total = 43.810 us, Queueing time: mean = 116.080 us, max = 116.080 us, min = 116.080 us, total = 116.080 us [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 2.002 ms, total = 2.002 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 1.278 ms, total = 1.278 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 288.995 us, total = 288.995 us, Queueing time: mean = 122.868 us, max = 122.868 us, min = 122.868 us, total = 122.868 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 242.297 us, total = 242.297 us, Queueing time: mean = 122.977 us, max = 122.977 us, min = 122.977 us, total = 122.977 us [state-dump] DebugString() time ms: 2 [state-dump] [state-dump] [2025-01-21 00:34:54,931 I 13636 13664] (raylet) store.cc:564: Plasma store debug dump: Current usage: 0 / 2.14748 GB - num bytes created total: 168 0 pending objects of total size 0MB - objects spillable: 0 - bytes spillable: 0 - objects unsealed: 0 - bytes unsealed: 0 - objects in use: 0 - bytes in use: 0 - objects evictable: 0 - bytes evictable: 0 - objects created by worker: 0 - bytes created by worker: 0 - objects restored: 0 - bytes restored: 0 - objects received: 0 - bytes received: 0 - objects errored: 0 - bytes errored: 0 [2025-01-21 00:34:56,187 I 13636 13636] (raylet) node_manager.cc:525: [state-dump] NodeManager: [state-dump] Node ID: 94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3 [state-dump] Node name: 192.168.0.2 [state-dump] InitialConfigResources: {object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, GPU: 20000, node:__internal_head__: 10000, accelerator_type:A40: 10000, memory: 844966629380000} [state-dump] ClusterTaskManager: [state-dump] ========== Node: 94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3 ================= [state-dump] Infeasible queue length: 0 [state-dump] Schedule queue length: 0 [state-dump] Dispatch queue length: 0 [state-dump] num_waiting_for_resource: 0 [state-dump] num_waiting_for_plasma_memory: 0 [state-dump] num_waiting_for_remote_node_resources: 0 [state-dump] num_worker_not_started_by_job_config_not_exist: 0 [state-dump] num_worker_not_started_by_registration_timeout: 0 [state-dump] num_tasks_waiting_for_workers: 0 [state-dump] num_cancelled_tasks: 0 [state-dump] cluster_resource_scheduler state: [state-dump] Local id: -2729954820020547849 Local resources: {"total":{CPU: [200000], memory: [844966629380000], accelerator_type:A40: [10000], node:192.168.0.2: [10000], node:__internal_head__: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000]}}, "available": {CPU: [200000], memory: [844966629380000], accelerator_type:A40: [10000], node:192.168.0.2: [10000], node:__internal_head__: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000]}}, "labels":{"ray.io/node_id":"94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3",} is_draining: 0 is_idle: 1 Cluster resources: node id: -2729954820020547849{"total":{object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, GPU: 20000, node:__internal_head__: 10000, accelerator_type:A40: 10000, memory: 844966629380000}}, "available": {GPU: 20000, node:__internal_head__: 10000, memory: 844966629380000, accelerator_type:A40: 10000, object_store_memory: 21474836480000, node:192.168.0.2: 10000, CPU: 200000}}, "labels":{"ray.io/node_id":"94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} [state-dump] Waiting tasks size: 0 [state-dump] Number of executing tasks: 0 [state-dump] Number of pinned task arguments: 0 [state-dump] Number of total spilled tasks: 0 [state-dump] Number of spilled waiting tasks: 0 [state-dump] Number of spilled unschedulable tasks: 0 [state-dump] Resource usage { [state-dump] } [state-dump] Backlog Size per scheduling descriptor :{workerId: num backlogs}: [state-dump] [state-dump] Running tasks by scheduling class: [state-dump] ================================================== [state-dump] [state-dump] ClusterResources: [state-dump] LocalObjectManager: [state-dump] - num pinned objects: 0 [state-dump] - pinned objects size: 0 [state-dump] - num objects pending restore: 0 [state-dump] - num objects pending spill: 0 [state-dump] - num bytes pending spill: 0 [state-dump] - num bytes currently spilled: 0 [state-dump] - cumulative spill requests: 0 [state-dump] - cumulative restore requests: 0 [state-dump] - spilled objects pending delete: 0 [state-dump] [state-dump] ObjectManager: [state-dump] - num local objects: 0 [state-dump] - num unfulfilled push requests: 0 [state-dump] - num object pull requests: 0 [state-dump] - num chunks received total: 0 [state-dump] - num chunks received failed (all): 0 [state-dump] - num chunks received failed / cancelled: 0 [state-dump] - num chunks received failed / plasma error: 0 [state-dump] Event stats: [state-dump] Global stats: 0 total (0 active) [state-dump] Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Execution time: mean = -nan s, total = 0.000 s [state-dump] Event stats: [state-dump] PushManager: [state-dump] - num pushes in flight: 0 [state-dump] - num chunks in flight: 0 [state-dump] - num chunks remaining: 0 [state-dump] - max chunks allowed: 409 [state-dump] OwnershipBasedObjectDirectory: [state-dump] - num listeners: 0 [state-dump] - cumulative location updates: 0 [state-dump] - num location updates per second: 0.000 [state-dump] - num location lookups per second: 0.000 [state-dump] - num locations added per second: 0.000 [state-dump] - num locations removed per second: 0.000 [state-dump] BufferPool: [state-dump] - create buffer state map size: 0 [state-dump] PullManager: [state-dump] - num bytes available for pulled objects: 2147483648 [state-dump] - num bytes being pulled (all): 0 [state-dump] - num bytes being pulled / pinned: 0 [state-dump] - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - first get request bundle: N/A [state-dump] - first wait request bundle: N/A [state-dump] - first task request bundle: N/A [state-dump] - num objects queued: 0 [state-dump] - num objects actively pulled (all): 0 [state-dump] - num objects actively pulled / pinned: 0 [state-dump] - num bundles being pulled: 0 [state-dump] - num pull retries: 0 [state-dump] - max timeout seconds: 0 [state-dump] - max timeout request is already processed. No entry. [state-dump] [state-dump] WorkerPool: [state-dump] - registered jobs: 1 [state-dump] - process_failed_job_config_missing: 0 [state-dump] - process_failed_rate_limited: 0 [state-dump] - process_failed_pending_registration: 0 [state-dump] - process_failed_runtime_env_setup_failed: 0 [state-dump] - num PYTHON workers: 20 [state-dump] - num PYTHON drivers: 1 [state-dump] - num PYTHON pending start requests: 0 [state-dump] - num PYTHON pending registration requests: 0 [state-dump] - num object spill callbacks queued: 0 [state-dump] - num object restore queued: 0 [state-dump] - num util functions queued: 0 [state-dump] - num idle workers: 20 [state-dump] TaskDependencyManager: [state-dump] - task deps map size: 0 [state-dump] - get req map size: 0 [state-dump] - wait req map size: 0 [state-dump] - local objects map size: 0 [state-dump] WaitManager: [state-dump] - num active wait requests: 0 [state-dump] Subscriber: [state-dump] Channel WORKER_OBJECT_LOCATIONS_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_OBJECT_EVICTION [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_REF_REMOVED_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] num async plasma notifications: 0 [state-dump] Remote node managers: [state-dump] Event stats: [state-dump] Global stats: 487001 total (35 active) [state-dump] Queueing time: mean = 207.522 us, max = 59.826 s, min = -0.001 s, total = 101.064 s [state-dump] Execution time: mean = 11.268 ms, total = 5487.477 s [state-dump] Event stats: [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog - 117138 total (0 active), Execution time: mean = 532.943 us, total = 62.428 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 117138 total (0 active), Execution time: mean = 35.548 us, total = 4.164 s, Queueing time: mean = 111.311 us, max = 3.003 ms, min = 1.846 us, total = 13.039 s [state-dump] RaySyncer.OnDemandBroadcasting - 55747 total (1 active), Execution time: mean = 10.883 us, total = 606.673 ms, Queueing time: mean = 92.138 us, max = 25.869 ms, min = 6.166 us, total = 5.136 s [state-dump] NodeManager.CheckGC - 55747 total (1 active), Execution time: mean = 3.942 us, total = 219.758 ms, Queueing time: mean = 98.318 us, max = 25.875 ms, min = -0.000 s, total = 5.481 s [state-dump] ObjectManager.UpdateAvailableMemory - 55747 total (0 active), Execution time: mean = 6.279 us, total = 350.033 ms, Queueing time: mean = 112.080 us, max = 45.939 ms, min = 2.228 us, total = 6.248 s [state-dump] RayletWorkerPool.deadline_timer.kill_idle_workers - 27889 total (1 active), Execution time: mean = 18.848 us, total = 525.658 ms, Queueing time: mean = 77.850 us, max = 26.386 ms, min = -0.001 s, total = 2.171 s [state-dump] MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 22274 total (1 active), Execution time: mean = 458.523 us, total = 10.213 s, Queueing time: mean = 76.097 us, max = 4.063 ms, min = -0.000 s, total = 1.695 s [state-dump] NodeManager.deadline_timer.flush_free_objects - 5580 total (1 active), Execution time: mean = 9.398 us, total = 52.438 ms, Queueing time: mean = 178.782 us, max = 2.947 ms, min = 3.811 us, total = 997.604 ms [state-dump] NodeManager.ScheduleAndDispatchTasks - 5580 total (1 active), Execution time: mean = 17.095 us, total = 95.391 ms, Queueing time: mean = 75.742 us, max = 2.581 ms, min = 8.735 us, total = 422.640 ms [state-dump] NodeManager.deadline_timer.spill_objects_when_over_threshold - 5580 total (1 active), Execution time: mean = 3.154 us, total = 17.601 ms, Queueing time: mean = 183.029 us, max = 2.946 ms, min = 3.845 us, total = 1.021 s [state-dump] NodeManagerService.grpc_server.GetResourceLoad - 5578 total (0 active), Execution time: mean = 612.933 us, total = 3.419 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 5578 total (0 active), Execution time: mean = 97.551 us, total = 544.142 ms, Queueing time: mean = 113.302 us, max = 2.934 ms, min = 4.027 us, total = 631.998 ms [state-dump] ClusterResourceManager.ResetRemoteNodeView - 1861 total (1 active), Execution time: mean = 10.102 us, total = 18.799 ms, Queueing time: mean = 75.932 us, max = 442.307 us, min = 14.043 us, total = 141.309 ms [state-dump] NodeManager.deadline_timer.record_metrics - 1116 total (1 active), Execution time: mean = 584.501 us, total = 652.303 ms, Queueing time: mean = 345.406 us, max = 2.010 ms, min = 9.115 us, total = 385.474 ms [state-dump] NodeManager.GcsCheckAlive - 1116 total (1 active), Execution time: mean = 319.064 us, total = 356.076 ms, Queueing time: mean = 610.378 us, max = 2.567 ms, min = 6.690 us, total = 681.181 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 1116 total (0 active), Execution time: mean = 56.000 us, total = 62.495 ms, Queueing time: mean = 112.737 us, max = 4.779 ms, min = 11.561 us, total = 125.814 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 1116 total (0 active), Execution time: mean = 1.604 ms, total = 1.791 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.deadline_timer.debug_state_dump - 558 total (1 active), Execution time: mean = 1.785 ms, total = 995.916 ms, Queueing time: mean = 72.082 us, max = 180.398 us, min = 11.609 us, total = 40.222 ms [state-dump] ClientConnection.async_read.ProcessMessageHeader - 95 total (21 active), Execution time: mean = 7.620 us, total = 723.861 us, Queueing time: mean = 660.815 ms, max = 59.826 s, min = 29.537 us, total = 62.777 s [state-dump] NodeManager.deadline_timer.print_event_loop_stats - 93 total (1 active, 1 running), Execution time: mean = 2.734 ms, total = 254.274 ms, Queueing time: mean = 78.039 us, max = 325.100 us, min = 15.835 us, total = 7.258 ms [state-dump] ClientConnection.async_read.ProcessMessage - 74 total (0 active), Execution time: mean = 752.373 us, total = 55.676 ms, Queueing time: mean = 18.898 us, max = 247.615 us, min = 4.453 us, total = 1.398 ms [state-dump] ClientConnection.async_write.DoAsyncWrites - 22 total (0 active), Execution time: mean = 2.129 us, total = 46.834 us, Queueing time: mean = 43.475 us, max = 264.443 us, min = 17.727 us, total = 956.459 us [state-dump] CoreWorkerService.grpc_client.LocalGC.OnReplyReceived - 21 total (0 active), Execution time: mean = 65.428 us, total = 1.374 ms, Queueing time: mean = 149.534 us, max = 223.485 us, min = 56.603 us, total = 3.140 ms [state-dump] ObjectManager.ObjectAdded - 21 total (0 active), Execution time: mean = 13.522 us, total = 283.961 us, Queueing time: mean = 115.200 us, max = 203.067 us, min = 15.853 us, total = 2.419 ms [state-dump] ObjectManager.ObjectDeleted - 21 total (0 active), Execution time: mean = 18.654 us, total = 391.728 us, Queueing time: mean = 66.932 us, max = 269.040 us, min = 24.280 us, total = 1.406 ms [state-dump] CoreWorkerService.grpc_client.LocalGC - 21 total (0 active), Execution time: mean = 93.349 ms, total = 1.960 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.GetSystemConfig - 21 total (0 active), Execution time: mean = 800.451 us, total = 16.809 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 21 total (0 active), Execution time: mean = 119.783 us, total = 2.515 ms, Queueing time: mean = 119.709 us, max = 193.975 us, min = 26.529 us, total = 2.514 ms [state-dump] PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 204.482 us, total = 2.658 ms, Queueing time: mean = 3.109 ms, max = 9.915 ms, min = 28.359 us, total = 40.413 ms [state-dump] - 12 total (0 active), Execution time: mean = 1.109 us, total = 13.308 us, Queueing time: mean = 117.235 us, max = 207.332 us, min = 32.817 us, total = 1.407 ms [state-dump] RaySyncer.BroadcastMessage - 12 total (0 active), Execution time: mean = 229.790 us, total = 2.757 ms, Queueing time: mean = 796.500 ns, max = 1.330 us, min = 180.000 ns, total = 9.558 us [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 11 total (1 active), Execution time: mean = 490.692 s, total = 5397.609 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.RequestWorkerLease - 10 total (0 active), Execution time: mean = 967.372 us, total = 9.674 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReturnWorker.HandleRequestImpl - 10 total (0 active), Execution time: mean = 129.906 us, total = 1.299 ms, Queueing time: mean = 109.463 us, max = 137.712 us, min = 31.332 us, total = 1.095 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker - 10 total (0 active), Execution time: mean = 698.937 us, total = 6.989 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.RequestWorkerLease.HandleRequestImpl - 10 total (0 active), Execution time: mean = 133.400 us, total = 1.334 ms, Queueing time: mean = 133.654 us, max = 258.016 us, min = 50.206 us, total = 1.337 ms [state-dump] WorkerPool.PopWorkerCallback - 10 total (0 active), Execution time: mean = 24.488 us, total = 244.878 us, Queueing time: mean = 114.997 us, max = 206.869 us, min = 23.653 us, total = 1.150 ms [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 10 total (0 active), Execution time: mean = 376.502 us, total = 3.765 ms, Queueing time: mean = 122.227 us, max = 243.371 us, min = 20.299 us, total = 1.222 ms [state-dump] NodeManager.GCTaskFailureReason - 7 total (1 active), Execution time: mean = 7.313 us, total = 51.189 us, Queueing time: mean = 59.264 us, max = 97.290 us, min = 24.344 us, total = 414.847 us [state-dump] RaySyncerRegister - 2 total (0 active), Execution time: mean = 2.053 us, total = 4.106 us, Queueing time: mean = 222.000 ns, max = 369.000 ns, min = 75.000 ns, total = 444.000 ns [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 1.320 ms, total = 2.640 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 142.119 us, total = 284.237 us, Queueing time: mean = 702.200 us, max = 1.287 ms, min = 117.240 us, total = 1.404 ms [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 1.746 ms, total = 1.746 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 74.750 us, total = 74.750 us, Queueing time: mean = 225.429 us, max = 225.429 us, min = 225.429 us, total = 225.429 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 92.080 us, total = 92.080 us, Queueing time: mean = 36.492 us, max = 36.492 us, min = 36.492 us, total = 36.492 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 1.935 ms, total = 1.935 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 1.022 s, total = 1.022 s, Queueing time: mean = 126.779 us, max = 126.779 us, min = 126.779 us, total = 126.779 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 2.178 ms, total = 2.178 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 43.810 us, total = 43.810 us, Queueing time: mean = 116.080 us, max = 116.080 us, min = 116.080 us, total = 116.080 us [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 2.002 ms, total = 2.002 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 1.278 ms, total = 1.278 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 288.995 us, total = 288.995 us, Queueing time: mean = 122.868 us, max = 122.868 us, min = 122.868 us, total = 122.868 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 242.297 us, total = 242.297 us, Queueing time: mean = 122.977 us, max = 122.977 us, min = 122.977 us, total = 122.977 us [state-dump] DebugString() time ms: 1 [state-dump] [state-dump] [2025-01-21 00:35:54,932 I 13636 13664] (raylet) store.cc:564: Plasma store debug dump: Current usage: 0 / 2.14748 GB - num bytes created total: 168 0 pending objects of total size 0MB - objects spillable: 0 - bytes spillable: 0 - objects unsealed: 0 - bytes unsealed: 0 - objects in use: 0 - bytes in use: 0 - objects evictable: 0 - bytes evictable: 0 - objects created by worker: 0 - bytes created by worker: 0 - objects restored: 0 - bytes restored: 0 - objects received: 0 - bytes received: 0 - objects errored: 0 - bytes errored: 0 [2025-01-21 00:35:56,190 I 13636 13636] (raylet) node_manager.cc:525: [state-dump] NodeManager: [state-dump] Node ID: 94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3 [state-dump] Node name: 192.168.0.2 [state-dump] InitialConfigResources: {object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, GPU: 20000, node:__internal_head__: 10000, accelerator_type:A40: 10000, memory: 844966629380000} [state-dump] ClusterTaskManager: [state-dump] ========== Node: 94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3 ================= [state-dump] Infeasible queue length: 0 [state-dump] Schedule queue length: 0 [state-dump] Dispatch queue length: 0 [state-dump] num_waiting_for_resource: 0 [state-dump] num_waiting_for_plasma_memory: 0 [state-dump] num_waiting_for_remote_node_resources: 0 [state-dump] num_worker_not_started_by_job_config_not_exist: 0 [state-dump] num_worker_not_started_by_registration_timeout: 0 [state-dump] num_tasks_waiting_for_workers: 0 [state-dump] num_cancelled_tasks: 0 [state-dump] cluster_resource_scheduler state: [state-dump] Local id: -2729954820020547849 Local resources: {"total":{CPU: [200000], memory: [844966629380000], accelerator_type:A40: [10000], node:192.168.0.2: [10000], node:__internal_head__: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000]}}, "available": {CPU: [200000], memory: [844966629380000], accelerator_type:A40: [10000], node:192.168.0.2: [10000], node:__internal_head__: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000]}}, "labels":{"ray.io/node_id":"94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3",} is_draining: 0 is_idle: 1 Cluster resources: node id: -2729954820020547849{"total":{object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, GPU: 20000, node:__internal_head__: 10000, accelerator_type:A40: 10000, memory: 844966629380000}}, "available": {GPU: 20000, node:__internal_head__: 10000, memory: 844966629380000, accelerator_type:A40: 10000, object_store_memory: 21474836480000, node:192.168.0.2: 10000, CPU: 200000}}, "labels":{"ray.io/node_id":"94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} [state-dump] Waiting tasks size: 0 [state-dump] Number of executing tasks: 0 [state-dump] Number of pinned task arguments: 0 [state-dump] Number of total spilled tasks: 0 [state-dump] Number of spilled waiting tasks: 0 [state-dump] Number of spilled unschedulable tasks: 0 [state-dump] Resource usage { [state-dump] } [state-dump] Backlog Size per scheduling descriptor :{workerId: num backlogs}: [state-dump] [state-dump] Running tasks by scheduling class: [state-dump] ================================================== [state-dump] [state-dump] ClusterResources: [state-dump] LocalObjectManager: [state-dump] - num pinned objects: 0 [state-dump] - pinned objects size: 0 [state-dump] - num objects pending restore: 0 [state-dump] - num objects pending spill: 0 [state-dump] - num bytes pending spill: 0 [state-dump] - num bytes currently spilled: 0 [state-dump] - cumulative spill requests: 0 [state-dump] - cumulative restore requests: 0 [state-dump] - spilled objects pending delete: 0 [state-dump] [state-dump] ObjectManager: [state-dump] - num local objects: 0 [state-dump] - num unfulfilled push requests: 0 [state-dump] - num object pull requests: 0 [state-dump] - num chunks received total: 0 [state-dump] - num chunks received failed (all): 0 [state-dump] - num chunks received failed / cancelled: 0 [state-dump] - num chunks received failed / plasma error: 0 [state-dump] Event stats: [state-dump] Global stats: 0 total (0 active) [state-dump] Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Execution time: mean = -nan s, total = 0.000 s [state-dump] Event stats: [state-dump] PushManager: [state-dump] - num pushes in flight: 0 [state-dump] - num chunks in flight: 0 [state-dump] - num chunks remaining: 0 [state-dump] - max chunks allowed: 409 [state-dump] OwnershipBasedObjectDirectory: [state-dump] - num listeners: 0 [state-dump] - cumulative location updates: 0 [state-dump] - num location updates per second: 0.000 [state-dump] - num location lookups per second: 0.000 [state-dump] - num locations added per second: 0.000 [state-dump] - num locations removed per second: 0.000 [state-dump] BufferPool: [state-dump] - create buffer state map size: 0 [state-dump] PullManager: [state-dump] - num bytes available for pulled objects: 2147483648 [state-dump] - num bytes being pulled (all): 0 [state-dump] - num bytes being pulled / pinned: 0 [state-dump] - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - first get request bundle: N/A [state-dump] - first wait request bundle: N/A [state-dump] - first task request bundle: N/A [state-dump] - num objects queued: 0 [state-dump] - num objects actively pulled (all): 0 [state-dump] - num objects actively pulled / pinned: 0 [state-dump] - num bundles being pulled: 0 [state-dump] - num pull retries: 0 [state-dump] - max timeout seconds: 0 [state-dump] - max timeout request is already processed. No entry. [state-dump] [state-dump] WorkerPool: [state-dump] - registered jobs: 1 [state-dump] - process_failed_job_config_missing: 0 [state-dump] - process_failed_rate_limited: 0 [state-dump] - process_failed_pending_registration: 0 [state-dump] - process_failed_runtime_env_setup_failed: 0 [state-dump] - num PYTHON workers: 20 [state-dump] - num PYTHON drivers: 1 [state-dump] - num PYTHON pending start requests: 0 [state-dump] - num PYTHON pending registration requests: 0 [state-dump] - num object spill callbacks queued: 0 [state-dump] - num object restore queued: 0 [state-dump] - num util functions queued: 0 [state-dump] - num idle workers: 20 [state-dump] TaskDependencyManager: [state-dump] - task deps map size: 0 [state-dump] - get req map size: 0 [state-dump] - wait req map size: 0 [state-dump] - local objects map size: 0 [state-dump] WaitManager: [state-dump] - num active wait requests: 0 [state-dump] Subscriber: [state-dump] Channel WORKER_OBJECT_LOCATIONS_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_OBJECT_EVICTION [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_REF_REMOVED_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] num async plasma notifications: 0 [state-dump] Remote node managers: [state-dump] Event stats: [state-dump] Global stats: 492233 total (35 active) [state-dump] Queueing time: mean = 206.164 us, max = 59.826 s, min = -0.001 s, total = 101.481 s [state-dump] Execution time: mean = 11.150 ms, total = 5488.427 s [state-dump] Event stats: [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog - 118398 total (0 active), Execution time: mean = 533.054 us, total = 63.113 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 118398 total (0 active), Execution time: mean = 35.556 us, total = 4.210 s, Queueing time: mean = 111.318 us, max = 3.003 ms, min = 1.846 us, total = 13.180 s [state-dump] RaySyncer.OnDemandBroadcasting - 56346 total (1 active), Execution time: mean = 10.891 us, total = 613.692 ms, Queueing time: mean = 92.183 us, max = 25.869 ms, min = 6.166 us, total = 5.194 s [state-dump] NodeManager.CheckGC - 56346 total (1 active), Execution time: mean = 3.941 us, total = 222.072 ms, Queueing time: mean = 98.299 us, max = 25.875 ms, min = -0.000 s, total = 5.539 s [state-dump] ObjectManager.UpdateAvailableMemory - 56346 total (0 active), Execution time: mean = 6.283 us, total = 354.039 ms, Queueing time: mean = 112.094 us, max = 45.939 ms, min = 2.228 us, total = 6.316 s [state-dump] RayletWorkerPool.deadline_timer.kill_idle_workers - 28189 total (1 active), Execution time: mean = 18.858 us, total = 531.588 ms, Queueing time: mean = 77.878 us, max = 26.386 ms, min = -0.001 s, total = 2.195 s [state-dump] MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 22514 total (1 active), Execution time: mean = 458.625 us, total = 10.325 s, Queueing time: mean = 76.114 us, max = 4.063 ms, min = -0.000 s, total = 1.714 s [state-dump] NodeManager.deadline_timer.flush_free_objects - 5640 total (1 active), Execution time: mean = 9.405 us, total = 53.042 ms, Queueing time: mean = 178.845 us, max = 2.947 ms, min = 3.811 us, total = 1.009 s [state-dump] NodeManager.ScheduleAndDispatchTasks - 5640 total (1 active), Execution time: mean = 17.114 us, total = 96.525 ms, Queueing time: mean = 75.756 us, max = 2.581 ms, min = 8.735 us, total = 427.265 ms [state-dump] NodeManager.deadline_timer.spill_objects_when_over_threshold - 5640 total (1 active), Execution time: mean = 3.156 us, total = 17.799 ms, Queueing time: mean = 183.094 us, max = 2.946 ms, min = 3.845 us, total = 1.033 s [state-dump] NodeManagerService.grpc_server.GetResourceLoad - 5638 total (0 active), Execution time: mean = 613.183 us, total = 3.457 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 5638 total (0 active), Execution time: mean = 97.556 us, total = 550.020 ms, Queueing time: mean = 113.414 us, max = 2.934 ms, min = 4.027 us, total = 639.431 ms [state-dump] ClusterResourceManager.ResetRemoteNodeView - 1881 total (1 active), Execution time: mean = 10.112 us, total = 19.020 ms, Queueing time: mean = 75.966 us, max = 442.307 us, min = 14.043 us, total = 142.892 ms [state-dump] NodeManager.deadline_timer.record_metrics - 1128 total (1 active), Execution time: mean = 584.458 us, total = 659.268 ms, Queueing time: mean = 345.700 us, max = 2.010 ms, min = 9.115 us, total = 389.950 ms [state-dump] NodeManager.GcsCheckAlive - 1128 total (1 active), Execution time: mean = 319.136 us, total = 359.986 ms, Queueing time: mean = 610.589 us, max = 2.567 ms, min = 6.690 us, total = 688.744 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 1128 total (0 active), Execution time: mean = 56.037 us, total = 63.210 ms, Queueing time: mean = 112.688 us, max = 4.779 ms, min = 11.561 us, total = 127.112 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 1128 total (0 active), Execution time: mean = 1.604 ms, total = 1.809 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.deadline_timer.debug_state_dump - 564 total (1 active), Execution time: mean = 1.786 ms, total = 1.007 s, Queueing time: mean = 71.989 us, max = 180.398 us, min = 11.609 us, total = 40.602 ms [state-dump] ClientConnection.async_read.ProcessMessageHeader - 95 total (21 active), Execution time: mean = 7.620 us, total = 723.861 us, Queueing time: mean = 660.815 ms, max = 59.826 s, min = 29.537 us, total = 62.777 s [state-dump] NodeManager.deadline_timer.print_event_loop_stats - 94 total (1 active, 1 running), Execution time: mean = 2.717 ms, total = 255.429 ms, Queueing time: mean = 77.311 us, max = 325.100 us, min = 9.635 us, total = 7.267 ms [state-dump] ClientConnection.async_read.ProcessMessage - 74 total (0 active), Execution time: mean = 752.373 us, total = 55.676 ms, Queueing time: mean = 18.898 us, max = 247.615 us, min = 4.453 us, total = 1.398 ms [state-dump] ClientConnection.async_write.DoAsyncWrites - 22 total (0 active), Execution time: mean = 2.129 us, total = 46.834 us, Queueing time: mean = 43.475 us, max = 264.443 us, min = 17.727 us, total = 956.459 us [state-dump] CoreWorkerService.grpc_client.LocalGC.OnReplyReceived - 21 total (0 active), Execution time: mean = 65.428 us, total = 1.374 ms, Queueing time: mean = 149.534 us, max = 223.485 us, min = 56.603 us, total = 3.140 ms [state-dump] ObjectManager.ObjectAdded - 21 total (0 active), Execution time: mean = 13.522 us, total = 283.961 us, Queueing time: mean = 115.200 us, max = 203.067 us, min = 15.853 us, total = 2.419 ms [state-dump] ObjectManager.ObjectDeleted - 21 total (0 active), Execution time: mean = 18.654 us, total = 391.728 us, Queueing time: mean = 66.932 us, max = 269.040 us, min = 24.280 us, total = 1.406 ms [state-dump] CoreWorkerService.grpc_client.LocalGC - 21 total (0 active), Execution time: mean = 93.349 ms, total = 1.960 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.GetSystemConfig - 21 total (0 active), Execution time: mean = 800.451 us, total = 16.809 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 21 total (0 active), Execution time: mean = 119.783 us, total = 2.515 ms, Queueing time: mean = 119.709 us, max = 193.975 us, min = 26.529 us, total = 2.514 ms [state-dump] PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 204.482 us, total = 2.658 ms, Queueing time: mean = 3.109 ms, max = 9.915 ms, min = 28.359 us, total = 40.413 ms [state-dump] - 12 total (0 active), Execution time: mean = 1.109 us, total = 13.308 us, Queueing time: mean = 117.235 us, max = 207.332 us, min = 32.817 us, total = 1.407 ms [state-dump] RaySyncer.BroadcastMessage - 12 total (0 active), Execution time: mean = 229.790 us, total = 2.757 ms, Queueing time: mean = 796.500 ns, max = 1.330 us, min = 180.000 ns, total = 9.558 us [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 11 total (1 active), Execution time: mean = 490.692 s, total = 5397.609 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.RequestWorkerLease - 10 total (0 active), Execution time: mean = 967.372 us, total = 9.674 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReturnWorker.HandleRequestImpl - 10 total (0 active), Execution time: mean = 129.906 us, total = 1.299 ms, Queueing time: mean = 109.463 us, max = 137.712 us, min = 31.332 us, total = 1.095 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker - 10 total (0 active), Execution time: mean = 698.937 us, total = 6.989 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.RequestWorkerLease.HandleRequestImpl - 10 total (0 active), Execution time: mean = 133.400 us, total = 1.334 ms, Queueing time: mean = 133.654 us, max = 258.016 us, min = 50.206 us, total = 1.337 ms [state-dump] WorkerPool.PopWorkerCallback - 10 total (0 active), Execution time: mean = 24.488 us, total = 244.878 us, Queueing time: mean = 114.997 us, max = 206.869 us, min = 23.653 us, total = 1.150 ms [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 10 total (0 active), Execution time: mean = 376.502 us, total = 3.765 ms, Queueing time: mean = 122.227 us, max = 243.371 us, min = 20.299 us, total = 1.222 ms [state-dump] NodeManager.GCTaskFailureReason - 7 total (1 active), Execution time: mean = 7.313 us, total = 51.189 us, Queueing time: mean = 59.264 us, max = 97.290 us, min = 24.344 us, total = 414.847 us [state-dump] RaySyncerRegister - 2 total (0 active), Execution time: mean = 2.053 us, total = 4.106 us, Queueing time: mean = 222.000 ns, max = 369.000 ns, min = 75.000 ns, total = 444.000 ns [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 1.320 ms, total = 2.640 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 142.119 us, total = 284.237 us, Queueing time: mean = 702.200 us, max = 1.287 ms, min = 117.240 us, total = 1.404 ms [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 1.746 ms, total = 1.746 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 74.750 us, total = 74.750 us, Queueing time: mean = 225.429 us, max = 225.429 us, min = 225.429 us, total = 225.429 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 92.080 us, total = 92.080 us, Queueing time: mean = 36.492 us, max = 36.492 us, min = 36.492 us, total = 36.492 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 1.935 ms, total = 1.935 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 1.022 s, total = 1.022 s, Queueing time: mean = 126.779 us, max = 126.779 us, min = 126.779 us, total = 126.779 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 2.178 ms, total = 2.178 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 43.810 us, total = 43.810 us, Queueing time: mean = 116.080 us, max = 116.080 us, min = 116.080 us, total = 116.080 us [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 2.002 ms, total = 2.002 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 1.278 ms, total = 1.278 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 288.995 us, total = 288.995 us, Queueing time: mean = 122.868 us, max = 122.868 us, min = 122.868 us, total = 122.868 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 242.297 us, total = 242.297 us, Queueing time: mean = 122.977 us, max = 122.977 us, min = 122.977 us, total = 122.977 us [state-dump] DebugString() time ms: 1 [state-dump] [state-dump] [2025-01-21 00:36:54,932 I 13636 13664] (raylet) store.cc:564: Plasma store debug dump: Current usage: 0 / 2.14748 GB - num bytes created total: 168 0 pending objects of total size 0MB - objects spillable: 0 - bytes spillable: 0 - objects unsealed: 0 - bytes unsealed: 0 - objects in use: 0 - bytes in use: 0 - objects evictable: 0 - bytes evictable: 0 - objects created by worker: 0 - bytes created by worker: 0 - objects restored: 0 - bytes restored: 0 - objects received: 0 - bytes received: 0 - objects errored: 0 - bytes errored: 0 [2025-01-21 00:36:56,193 I 13636 13636] (raylet) node_manager.cc:525: [state-dump] NodeManager: [state-dump] Node ID: 94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3 [state-dump] Node name: 192.168.0.2 [state-dump] InitialConfigResources: {object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, GPU: 20000, node:__internal_head__: 10000, accelerator_type:A40: 10000, memory: 844966629380000} [state-dump] ClusterTaskManager: [state-dump] ========== Node: 94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3 ================= [state-dump] Infeasible queue length: 0 [state-dump] Schedule queue length: 0 [state-dump] Dispatch queue length: 0 [state-dump] num_waiting_for_resource: 0 [state-dump] num_waiting_for_plasma_memory: 0 [state-dump] num_waiting_for_remote_node_resources: 0 [state-dump] num_worker_not_started_by_job_config_not_exist: 0 [state-dump] num_worker_not_started_by_registration_timeout: 0 [state-dump] num_tasks_waiting_for_workers: 0 [state-dump] num_cancelled_tasks: 0 [state-dump] cluster_resource_scheduler state: [state-dump] Local id: -2729954820020547849 Local resources: {"total":{CPU: [200000], memory: [844966629380000], accelerator_type:A40: [10000], node:192.168.0.2: [10000], node:__internal_head__: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000]}}, "available": {CPU: [200000], memory: [844966629380000], accelerator_type:A40: [10000], node:192.168.0.2: [10000], node:__internal_head__: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000]}}, "labels":{"ray.io/node_id":"94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3",} is_draining: 0 is_idle: 1 Cluster resources: node id: -2729954820020547849{"total":{object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, GPU: 20000, node:__internal_head__: 10000, accelerator_type:A40: 10000, memory: 844966629380000}}, "available": {GPU: 20000, node:__internal_head__: 10000, memory: 844966629380000, accelerator_type:A40: 10000, object_store_memory: 21474836480000, node:192.168.0.2: 10000, CPU: 200000}}, "labels":{"ray.io/node_id":"94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} [state-dump] Waiting tasks size: 0 [state-dump] Number of executing tasks: 0 [state-dump] Number of pinned task arguments: 0 [state-dump] Number of total spilled tasks: 0 [state-dump] Number of spilled waiting tasks: 0 [state-dump] Number of spilled unschedulable tasks: 0 [state-dump] Resource usage { [state-dump] } [state-dump] Backlog Size per scheduling descriptor :{workerId: num backlogs}: [state-dump] [state-dump] Running tasks by scheduling class: [state-dump] ================================================== [state-dump] [state-dump] ClusterResources: [state-dump] LocalObjectManager: [state-dump] - num pinned objects: 0 [state-dump] - pinned objects size: 0 [state-dump] - num objects pending restore: 0 [state-dump] - num objects pending spill: 0 [state-dump] - num bytes pending spill: 0 [state-dump] - num bytes currently spilled: 0 [state-dump] - cumulative spill requests: 0 [state-dump] - cumulative restore requests: 0 [state-dump] - spilled objects pending delete: 0 [state-dump] [state-dump] ObjectManager: [state-dump] - num local objects: 0 [state-dump] - num unfulfilled push requests: 0 [state-dump] - num object pull requests: 0 [state-dump] - num chunks received total: 0 [state-dump] - num chunks received failed (all): 0 [state-dump] - num chunks received failed / cancelled: 0 [state-dump] - num chunks received failed / plasma error: 0 [state-dump] Event stats: [state-dump] Global stats: 0 total (0 active) [state-dump] Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Execution time: mean = -nan s, total = 0.000 s [state-dump] Event stats: [state-dump] PushManager: [state-dump] - num pushes in flight: 0 [state-dump] - num chunks in flight: 0 [state-dump] - num chunks remaining: 0 [state-dump] - max chunks allowed: 409 [state-dump] OwnershipBasedObjectDirectory: [state-dump] - num listeners: 0 [state-dump] - cumulative location updates: 0 [state-dump] - num location updates per second: 0.000 [state-dump] - num location lookups per second: 0.000 [state-dump] - num locations added per second: 0.000 [state-dump] - num locations removed per second: 0.000 [state-dump] BufferPool: [state-dump] - create buffer state map size: 0 [state-dump] PullManager: [state-dump] - num bytes available for pulled objects: 2147483648 [state-dump] - num bytes being pulled (all): 0 [state-dump] - num bytes being pulled / pinned: 0 [state-dump] - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - first get request bundle: N/A [state-dump] - first wait request bundle: N/A [state-dump] - first task request bundle: N/A [state-dump] - num objects queued: 0 [state-dump] - num objects actively pulled (all): 0 [state-dump] - num objects actively pulled / pinned: 0 [state-dump] - num bundles being pulled: 0 [state-dump] - num pull retries: 0 [state-dump] - max timeout seconds: 0 [state-dump] - max timeout request is already processed. No entry. [state-dump] [state-dump] WorkerPool: [state-dump] - registered jobs: 1 [state-dump] - process_failed_job_config_missing: 0 [state-dump] - process_failed_rate_limited: 0 [state-dump] - process_failed_pending_registration: 0 [state-dump] - process_failed_runtime_env_setup_failed: 0 [state-dump] - num PYTHON workers: 20 [state-dump] - num PYTHON drivers: 1 [state-dump] - num PYTHON pending start requests: 0 [state-dump] - num PYTHON pending registration requests: 0 [state-dump] - num object spill callbacks queued: 0 [state-dump] - num object restore queued: 0 [state-dump] - num util functions queued: 0 [state-dump] - num idle workers: 20 [state-dump] TaskDependencyManager: [state-dump] - task deps map size: 0 [state-dump] - get req map size: 0 [state-dump] - wait req map size: 0 [state-dump] - local objects map size: 0 [state-dump] WaitManager: [state-dump] - num active wait requests: 0 [state-dump] Subscriber: [state-dump] Channel WORKER_OBJECT_LOCATIONS_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_OBJECT_EVICTION [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_REF_REMOVED_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] num async plasma notifications: 0 [state-dump] Remote node managers: [state-dump] Event stats: [state-dump] Global stats: 497464 total (35 active) [state-dump] Queueing time: mean = 204.864 us, max = 59.826 s, min = -0.001 s, total = 101.912 s [state-dump] Execution time: mean = 11.035 ms, total = 5489.400 s [state-dump] Event stats: [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog - 119658 total (0 active), Execution time: mean = 533.294 us, total = 63.813 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 119658 total (0 active), Execution time: mean = 35.582 us, total = 4.258 s, Queueing time: mean = 111.363 us, max = 3.003 ms, min = 1.846 us, total = 13.325 s [state-dump] RaySyncer.OnDemandBroadcasting - 56945 total (1 active), Execution time: mean = 10.896 us, total = 620.477 ms, Queueing time: mean = 92.237 us, max = 25.869 ms, min = 6.166 us, total = 5.252 s [state-dump] NodeManager.CheckGC - 56945 total (1 active), Execution time: mean = 3.931 us, total = 223.879 ms, Queueing time: mean = 98.366 us, max = 25.875 ms, min = -0.000 s, total = 5.601 s [state-dump] ObjectManager.UpdateAvailableMemory - 56945 total (0 active), Execution time: mean = 6.290 us, total = 358.173 ms, Queueing time: mean = 112.198 us, max = 45.939 ms, min = 2.228 us, total = 6.389 s [state-dump] RayletWorkerPool.deadline_timer.kill_idle_workers - 28489 total (1 active), Execution time: mean = 18.867 us, total = 537.508 ms, Queueing time: mean = 77.857 us, max = 26.386 ms, min = -0.001 s, total = 2.218 s [state-dump] MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 22753 total (1 active), Execution time: mean = 458.751 us, total = 10.438 s, Queueing time: mean = 76.123 us, max = 4.063 ms, min = -0.000 s, total = 1.732 s [state-dump] NodeManager.deadline_timer.flush_free_objects - 5700 total (1 active), Execution time: mean = 9.405 us, total = 53.609 ms, Queueing time: mean = 178.981 us, max = 2.947 ms, min = 3.811 us, total = 1.020 s [state-dump] NodeManager.ScheduleAndDispatchTasks - 5700 total (1 active), Execution time: mean = 17.140 us, total = 97.698 ms, Queueing time: mean = 75.763 us, max = 2.581 ms, min = 8.735 us, total = 431.850 ms [state-dump] NodeManager.deadline_timer.spill_objects_when_over_threshold - 5700 total (1 active), Execution time: mean = 3.156 us, total = 17.990 ms, Queueing time: mean = 183.229 us, max = 2.946 ms, min = 3.845 us, total = 1.044 s [state-dump] NodeManagerService.grpc_server.GetResourceLoad - 5698 total (0 active), Execution time: mean = 613.477 us, total = 3.496 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 5698 total (0 active), Execution time: mean = 97.594 us, total = 556.089 ms, Queueing time: mean = 113.528 us, max = 2.934 ms, min = 4.027 us, total = 646.882 ms [state-dump] ClusterResourceManager.ResetRemoteNodeView - 1901 total (1 active), Execution time: mean = 10.118 us, total = 19.233 ms, Queueing time: mean = 75.972 us, max = 442.307 us, min = 14.043 us, total = 144.422 ms [state-dump] NodeManager.deadline_timer.record_metrics - 1140 total (1 active), Execution time: mean = 584.772 us, total = 666.640 ms, Queueing time: mean = 346.070 us, max = 2.010 ms, min = 9.115 us, total = 394.520 ms [state-dump] NodeManager.GcsCheckAlive - 1140 total (1 active), Execution time: mean = 319.485 us, total = 364.213 ms, Queueing time: mean = 610.980 us, max = 2.567 ms, min = 6.690 us, total = 696.518 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 1140 total (0 active), Execution time: mean = 56.120 us, total = 63.977 ms, Queueing time: mean = 112.481 us, max = 4.779 ms, min = 11.561 us, total = 128.228 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 1140 total (0 active), Execution time: mean = 1.605 ms, total = 1.830 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.deadline_timer.debug_state_dump - 570 total (1 active), Execution time: mean = 1.787 ms, total = 1.018 s, Queueing time: mean = 72.091 us, max = 180.398 us, min = 11.609 us, total = 41.092 ms [state-dump] NodeManager.deadline_timer.print_event_loop_stats - 95 total (1 active, 1 running), Execution time: mean = 2.720 ms, total = 258.423 ms, Queueing time: mean = 77.038 us, max = 325.100 us, min = 9.635 us, total = 7.319 ms [state-dump] ClientConnection.async_read.ProcessMessageHeader - 95 total (21 active), Execution time: mean = 7.620 us, total = 723.861 us, Queueing time: mean = 660.815 ms, max = 59.826 s, min = 29.537 us, total = 62.777 s [state-dump] ClientConnection.async_read.ProcessMessage - 74 total (0 active), Execution time: mean = 752.373 us, total = 55.676 ms, Queueing time: mean = 18.898 us, max = 247.615 us, min = 4.453 us, total = 1.398 ms [state-dump] ClientConnection.async_write.DoAsyncWrites - 22 total (0 active), Execution time: mean = 2.129 us, total = 46.834 us, Queueing time: mean = 43.475 us, max = 264.443 us, min = 17.727 us, total = 956.459 us [state-dump] CoreWorkerService.grpc_client.LocalGC.OnReplyReceived - 21 total (0 active), Execution time: mean = 65.428 us, total = 1.374 ms, Queueing time: mean = 149.534 us, max = 223.485 us, min = 56.603 us, total = 3.140 ms [state-dump] ObjectManager.ObjectAdded - 21 total (0 active), Execution time: mean = 13.522 us, total = 283.961 us, Queueing time: mean = 115.200 us, max = 203.067 us, min = 15.853 us, total = 2.419 ms [state-dump] ObjectManager.ObjectDeleted - 21 total (0 active), Execution time: mean = 18.654 us, total = 391.728 us, Queueing time: mean = 66.932 us, max = 269.040 us, min = 24.280 us, total = 1.406 ms [state-dump] CoreWorkerService.grpc_client.LocalGC - 21 total (0 active), Execution time: mean = 93.349 ms, total = 1.960 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.GetSystemConfig - 21 total (0 active), Execution time: mean = 800.451 us, total = 16.809 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 21 total (0 active), Execution time: mean = 119.783 us, total = 2.515 ms, Queueing time: mean = 119.709 us, max = 193.975 us, min = 26.529 us, total = 2.514 ms [state-dump] PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 204.482 us, total = 2.658 ms, Queueing time: mean = 3.109 ms, max = 9.915 ms, min = 28.359 us, total = 40.413 ms [state-dump] - 12 total (0 active), Execution time: mean = 1.109 us, total = 13.308 us, Queueing time: mean = 117.235 us, max = 207.332 us, min = 32.817 us, total = 1.407 ms [state-dump] RaySyncer.BroadcastMessage - 12 total (0 active), Execution time: mean = 229.790 us, total = 2.757 ms, Queueing time: mean = 796.500 ns, max = 1.330 us, min = 180.000 ns, total = 9.558 us [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 11 total (1 active), Execution time: mean = 490.692 s, total = 5397.609 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.RequestWorkerLease - 10 total (0 active), Execution time: mean = 967.372 us, total = 9.674 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReturnWorker.HandleRequestImpl - 10 total (0 active), Execution time: mean = 129.906 us, total = 1.299 ms, Queueing time: mean = 109.463 us, max = 137.712 us, min = 31.332 us, total = 1.095 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker - 10 total (0 active), Execution time: mean = 698.937 us, total = 6.989 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.RequestWorkerLease.HandleRequestImpl - 10 total (0 active), Execution time: mean = 133.400 us, total = 1.334 ms, Queueing time: mean = 133.654 us, max = 258.016 us, min = 50.206 us, total = 1.337 ms [state-dump] WorkerPool.PopWorkerCallback - 10 total (0 active), Execution time: mean = 24.488 us, total = 244.878 us, Queueing time: mean = 114.997 us, max = 206.869 us, min = 23.653 us, total = 1.150 ms [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 10 total (0 active), Execution time: mean = 376.502 us, total = 3.765 ms, Queueing time: mean = 122.227 us, max = 243.371 us, min = 20.299 us, total = 1.222 ms [state-dump] NodeManager.GCTaskFailureReason - 7 total (1 active), Execution time: mean = 7.313 us, total = 51.189 us, Queueing time: mean = 59.264 us, max = 97.290 us, min = 24.344 us, total = 414.847 us [state-dump] RaySyncerRegister - 2 total (0 active), Execution time: mean = 2.053 us, total = 4.106 us, Queueing time: mean = 222.000 ns, max = 369.000 ns, min = 75.000 ns, total = 444.000 ns [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 1.320 ms, total = 2.640 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 142.119 us, total = 284.237 us, Queueing time: mean = 702.200 us, max = 1.287 ms, min = 117.240 us, total = 1.404 ms [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 1.746 ms, total = 1.746 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 74.750 us, total = 74.750 us, Queueing time: mean = 225.429 us, max = 225.429 us, min = 225.429 us, total = 225.429 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 92.080 us, total = 92.080 us, Queueing time: mean = 36.492 us, max = 36.492 us, min = 36.492 us, total = 36.492 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 1.935 ms, total = 1.935 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 1.022 s, total = 1.022 s, Queueing time: mean = 126.779 us, max = 126.779 us, min = 126.779 us, total = 126.779 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 2.178 ms, total = 2.178 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 43.810 us, total = 43.810 us, Queueing time: mean = 116.080 us, max = 116.080 us, min = 116.080 us, total = 116.080 us [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 2.002 ms, total = 2.002 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 1.278 ms, total = 1.278 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 288.995 us, total = 288.995 us, Queueing time: mean = 122.868 us, max = 122.868 us, min = 122.868 us, total = 122.868 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 242.297 us, total = 242.297 us, Queueing time: mean = 122.977 us, max = 122.977 us, min = 122.977 us, total = 122.977 us [state-dump] DebugString() time ms: 1 [state-dump] [state-dump] [2025-01-21 00:37:54,932 I 13636 13664] (raylet) store.cc:564: Plasma store debug dump: Current usage: 0 / 2.14748 GB - num bytes created total: 168 0 pending objects of total size 0MB - objects spillable: 0 - bytes spillable: 0 - objects unsealed: 0 - bytes unsealed: 0 - objects in use: 0 - bytes in use: 0 - objects evictable: 0 - bytes evictable: 0 - objects created by worker: 0 - bytes created by worker: 0 - objects restored: 0 - bytes restored: 0 - objects received: 0 - bytes received: 0 - objects errored: 0 - bytes errored: 0 [2025-01-21 00:37:56,196 I 13636 13636] (raylet) node_manager.cc:525: [state-dump] NodeManager: [state-dump] Node ID: 94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3 [state-dump] Node name: 192.168.0.2 [state-dump] InitialConfigResources: {object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, GPU: 20000, node:__internal_head__: 10000, accelerator_type:A40: 10000, memory: 844966629380000} [state-dump] ClusterTaskManager: [state-dump] ========== Node: 94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3 ================= [state-dump] Infeasible queue length: 0 [state-dump] Schedule queue length: 0 [state-dump] Dispatch queue length: 0 [state-dump] num_waiting_for_resource: 0 [state-dump] num_waiting_for_plasma_memory: 0 [state-dump] num_waiting_for_remote_node_resources: 0 [state-dump] num_worker_not_started_by_job_config_not_exist: 0 [state-dump] num_worker_not_started_by_registration_timeout: 0 [state-dump] num_tasks_waiting_for_workers: 0 [state-dump] num_cancelled_tasks: 0 [state-dump] cluster_resource_scheduler state: [state-dump] Local id: -2729954820020547849 Local resources: {"total":{CPU: [200000], memory: [844966629380000], accelerator_type:A40: [10000], node:192.168.0.2: [10000], node:__internal_head__: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000]}}, "available": {CPU: [200000], memory: [844966629380000], accelerator_type:A40: [10000], node:192.168.0.2: [10000], node:__internal_head__: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000]}}, "labels":{"ray.io/node_id":"94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3",} is_draining: 0 is_idle: 1 Cluster resources: node id: -2729954820020547849{"total":{object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, GPU: 20000, node:__internal_head__: 10000, accelerator_type:A40: 10000, memory: 844966629380000}}, "available": {GPU: 20000, node:__internal_head__: 10000, memory: 844966629380000, accelerator_type:A40: 10000, object_store_memory: 21474836480000, node:192.168.0.2: 10000, CPU: 200000}}, "labels":{"ray.io/node_id":"94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} [state-dump] Waiting tasks size: 0 [state-dump] Number of executing tasks: 0 [state-dump] Number of pinned task arguments: 0 [state-dump] Number of total spilled tasks: 0 [state-dump] Number of spilled waiting tasks: 0 [state-dump] Number of spilled unschedulable tasks: 0 [state-dump] Resource usage { [state-dump] } [state-dump] Backlog Size per scheduling descriptor :{workerId: num backlogs}: [state-dump] [state-dump] Running tasks by scheduling class: [state-dump] ================================================== [state-dump] [state-dump] ClusterResources: [state-dump] LocalObjectManager: [state-dump] - num pinned objects: 0 [state-dump] - pinned objects size: 0 [state-dump] - num objects pending restore: 0 [state-dump] - num objects pending spill: 0 [state-dump] - num bytes pending spill: 0 [state-dump] - num bytes currently spilled: 0 [state-dump] - cumulative spill requests: 0 [state-dump] - cumulative restore requests: 0 [state-dump] - spilled objects pending delete: 0 [state-dump] [state-dump] ObjectManager: [state-dump] - num local objects: 0 [state-dump] - num unfulfilled push requests: 0 [state-dump] - num object pull requests: 0 [state-dump] - num chunks received total: 0 [state-dump] - num chunks received failed (all): 0 [state-dump] - num chunks received failed / cancelled: 0 [state-dump] - num chunks received failed / plasma error: 0 [state-dump] Event stats: [state-dump] Global stats: 0 total (0 active) [state-dump] Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Execution time: mean = -nan s, total = 0.000 s [state-dump] Event stats: [state-dump] PushManager: [state-dump] - num pushes in flight: 0 [state-dump] - num chunks in flight: 0 [state-dump] - num chunks remaining: 0 [state-dump] - max chunks allowed: 409 [state-dump] OwnershipBasedObjectDirectory: [state-dump] - num listeners: 0 [state-dump] - cumulative location updates: 0 [state-dump] - num location updates per second: 0.000 [state-dump] - num location lookups per second: 0.000 [state-dump] - num locations added per second: 0.000 [state-dump] - num locations removed per second: 0.000 [state-dump] BufferPool: [state-dump] - create buffer state map size: 0 [state-dump] PullManager: [state-dump] - num bytes available for pulled objects: 2147483648 [state-dump] - num bytes being pulled (all): 0 [state-dump] - num bytes being pulled / pinned: 0 [state-dump] - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - first get request bundle: N/A [state-dump] - first wait request bundle: N/A [state-dump] - first task request bundle: N/A [state-dump] - num objects queued: 0 [state-dump] - num objects actively pulled (all): 0 [state-dump] - num objects actively pulled / pinned: 0 [state-dump] - num bundles being pulled: 0 [state-dump] - num pull retries: 0 [state-dump] - max timeout seconds: 0 [state-dump] - max timeout request is already processed. No entry. [state-dump] [state-dump] WorkerPool: [state-dump] - registered jobs: 1 [state-dump] - process_failed_job_config_missing: 0 [state-dump] - process_failed_rate_limited: 0 [state-dump] - process_failed_pending_registration: 0 [state-dump] - process_failed_runtime_env_setup_failed: 0 [state-dump] - num PYTHON workers: 20 [state-dump] - num PYTHON drivers: 1 [state-dump] - num PYTHON pending start requests: 0 [state-dump] - num PYTHON pending registration requests: 0 [state-dump] - num object spill callbacks queued: 0 [state-dump] - num object restore queued: 0 [state-dump] - num util functions queued: 0 [state-dump] - num idle workers: 20 [state-dump] TaskDependencyManager: [state-dump] - task deps map size: 0 [state-dump] - get req map size: 0 [state-dump] - wait req map size: 0 [state-dump] - local objects map size: 0 [state-dump] WaitManager: [state-dump] - num active wait requests: 0 [state-dump] Subscriber: [state-dump] Channel WORKER_OBJECT_LOCATIONS_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_OBJECT_EVICTION [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_REF_REMOVED_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] num async plasma notifications: 0 [state-dump] Remote node managers: [state-dump] Event stats: [state-dump] Global stats: 502698 total (35 active) [state-dump] Queueing time: mean = 203.563 us, max = 59.826 s, min = -0.001 s, total = 102.331 s [state-dump] Execution time: mean = 10.922 ms, total = 5490.335 s [state-dump] Event stats: [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog - 120918 total (0 active), Execution time: mean = 533.324 us, total = 64.488 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 120918 total (0 active), Execution time: mean = 35.573 us, total = 4.301 s, Queueing time: mean = 111.423 us, max = 3.003 ms, min = 1.846 us, total = 13.473 s [state-dump] RaySyncer.OnDemandBroadcasting - 57545 total (1 active), Execution time: mean = 10.897 us, total = 627.051 ms, Queueing time: mean = 92.237 us, max = 25.869 ms, min = 6.166 us, total = 5.308 s [state-dump] NodeManager.CheckGC - 57545 total (1 active), Execution time: mean = 3.922 us, total = 225.692 ms, Queueing time: mean = 98.375 us, max = 25.875 ms, min = -0.000 s, total = 5.661 s [state-dump] ObjectManager.UpdateAvailableMemory - 57545 total (0 active), Execution time: mean = 6.291 us, total = 362.000 ms, Queueing time: mean = 112.252 us, max = 45.939 ms, min = 2.228 us, total = 6.460 s [state-dump] RayletWorkerPool.deadline_timer.kill_idle_workers - 28788 total (1 active), Execution time: mean = 18.869 us, total = 543.213 ms, Queueing time: mean = 77.819 us, max = 26.386 ms, min = -0.001 s, total = 2.240 s [state-dump] MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 22993 total (1 active), Execution time: mean = 458.792 us, total = 10.549 s, Queueing time: mean = 76.115 us, max = 4.063 ms, min = -0.000 s, total = 1.750 s [state-dump] NodeManager.deadline_timer.flush_free_objects - 5760 total (1 active), Execution time: mean = 9.412 us, total = 54.215 ms, Queueing time: mean = 178.826 us, max = 2.947 ms, min = 3.811 us, total = 1.030 s [state-dump] NodeManager.ScheduleAndDispatchTasks - 5760 total (1 active), Execution time: mean = 17.145 us, total = 98.755 ms, Queueing time: mean = 75.753 us, max = 2.581 ms, min = 8.735 us, total = 436.336 ms [state-dump] NodeManager.deadline_timer.spill_objects_when_over_threshold - 5760 total (1 active), Execution time: mean = 3.155 us, total = 18.171 ms, Queueing time: mean = 183.078 us, max = 2.946 ms, min = 3.845 us, total = 1.055 s [state-dump] NodeManagerService.grpc_server.GetResourceLoad - 5758 total (0 active), Execution time: mean = 613.418 us, total = 3.532 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 5758 total (0 active), Execution time: mean = 97.547 us, total = 561.673 ms, Queueing time: mean = 113.599 us, max = 2.934 ms, min = 4.027 us, total = 654.104 ms [state-dump] ClusterResourceManager.ResetRemoteNodeView - 1921 total (1 active), Execution time: mean = 10.110 us, total = 19.421 ms, Queueing time: mean = 75.945 us, max = 442.307 us, min = 14.043 us, total = 145.890 ms [state-dump] NodeManager.deadline_timer.record_metrics - 1152 total (1 active), Execution time: mean = 584.472 us, total = 673.312 ms, Queueing time: mean = 345.652 us, max = 2.010 ms, min = 9.115 us, total = 398.191 ms [state-dump] NodeManager.GcsCheckAlive - 1152 total (1 active), Execution time: mean = 319.347 us, total = 367.888 ms, Queueing time: mean = 610.331 us, max = 2.567 ms, min = 6.690 us, total = 703.101 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 1152 total (0 active), Execution time: mean = 56.075 us, total = 64.598 ms, Queueing time: mean = 112.402 us, max = 4.779 ms, min = 11.561 us, total = 129.487 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 1152 total (0 active), Execution time: mean = 1.604 ms, total = 1.848 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.deadline_timer.debug_state_dump - 576 total (1 active), Execution time: mean = 1.786 ms, total = 1.028 s, Queueing time: mean = 72.071 us, max = 180.398 us, min = 11.609 us, total = 41.513 ms [state-dump] NodeManager.deadline_timer.print_event_loop_stats - 96 total (1 active, 1 running), Execution time: mean = 2.725 ms, total = 261.576 ms, Queueing time: mean = 76.995 us, max = 325.100 us, min = 9.635 us, total = 7.392 ms [state-dump] ClientConnection.async_read.ProcessMessageHeader - 95 total (21 active), Execution time: mean = 7.620 us, total = 723.861 us, Queueing time: mean = 660.815 ms, max = 59.826 s, min = 29.537 us, total = 62.777 s [state-dump] ClientConnection.async_read.ProcessMessage - 74 total (0 active), Execution time: mean = 752.373 us, total = 55.676 ms, Queueing time: mean = 18.898 us, max = 247.615 us, min = 4.453 us, total = 1.398 ms [state-dump] ClientConnection.async_write.DoAsyncWrites - 22 total (0 active), Execution time: mean = 2.129 us, total = 46.834 us, Queueing time: mean = 43.475 us, max = 264.443 us, min = 17.727 us, total = 956.459 us [state-dump] CoreWorkerService.grpc_client.LocalGC.OnReplyReceived - 21 total (0 active), Execution time: mean = 65.428 us, total = 1.374 ms, Queueing time: mean = 149.534 us, max = 223.485 us, min = 56.603 us, total = 3.140 ms [state-dump] ObjectManager.ObjectAdded - 21 total (0 active), Execution time: mean = 13.522 us, total = 283.961 us, Queueing time: mean = 115.200 us, max = 203.067 us, min = 15.853 us, total = 2.419 ms [state-dump] ObjectManager.ObjectDeleted - 21 total (0 active), Execution time: mean = 18.654 us, total = 391.728 us, Queueing time: mean = 66.932 us, max = 269.040 us, min = 24.280 us, total = 1.406 ms [state-dump] CoreWorkerService.grpc_client.LocalGC - 21 total (0 active), Execution time: mean = 93.349 ms, total = 1.960 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.GetSystemConfig - 21 total (0 active), Execution time: mean = 800.451 us, total = 16.809 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 21 total (0 active), Execution time: mean = 119.783 us, total = 2.515 ms, Queueing time: mean = 119.709 us, max = 193.975 us, min = 26.529 us, total = 2.514 ms [state-dump] PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 204.482 us, total = 2.658 ms, Queueing time: mean = 3.109 ms, max = 9.915 ms, min = 28.359 us, total = 40.413 ms [state-dump] - 12 total (0 active), Execution time: mean = 1.109 us, total = 13.308 us, Queueing time: mean = 117.235 us, max = 207.332 us, min = 32.817 us, total = 1.407 ms [state-dump] RaySyncer.BroadcastMessage - 12 total (0 active), Execution time: mean = 229.790 us, total = 2.757 ms, Queueing time: mean = 796.500 ns, max = 1.330 us, min = 180.000 ns, total = 9.558 us [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 11 total (1 active), Execution time: mean = 490.692 s, total = 5397.609 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.RequestWorkerLease - 10 total (0 active), Execution time: mean = 967.372 us, total = 9.674 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReturnWorker.HandleRequestImpl - 10 total (0 active), Execution time: mean = 129.906 us, total = 1.299 ms, Queueing time: mean = 109.463 us, max = 137.712 us, min = 31.332 us, total = 1.095 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker - 10 total (0 active), Execution time: mean = 698.937 us, total = 6.989 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.RequestWorkerLease.HandleRequestImpl - 10 total (0 active), Execution time: mean = 133.400 us, total = 1.334 ms, Queueing time: mean = 133.654 us, max = 258.016 us, min = 50.206 us, total = 1.337 ms [state-dump] WorkerPool.PopWorkerCallback - 10 total (0 active), Execution time: mean = 24.488 us, total = 244.878 us, Queueing time: mean = 114.997 us, max = 206.869 us, min = 23.653 us, total = 1.150 ms [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 10 total (0 active), Execution time: mean = 376.502 us, total = 3.765 ms, Queueing time: mean = 122.227 us, max = 243.371 us, min = 20.299 us, total = 1.222 ms [state-dump] NodeManager.GCTaskFailureReason - 7 total (1 active), Execution time: mean = 7.313 us, total = 51.189 us, Queueing time: mean = 59.264 us, max = 97.290 us, min = 24.344 us, total = 414.847 us [state-dump] RaySyncerRegister - 2 total (0 active), Execution time: mean = 2.053 us, total = 4.106 us, Queueing time: mean = 222.000 ns, max = 369.000 ns, min = 75.000 ns, total = 444.000 ns [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 1.320 ms, total = 2.640 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 142.119 us, total = 284.237 us, Queueing time: mean = 702.200 us, max = 1.287 ms, min = 117.240 us, total = 1.404 ms [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 1.746 ms, total = 1.746 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 74.750 us, total = 74.750 us, Queueing time: mean = 225.429 us, max = 225.429 us, min = 225.429 us, total = 225.429 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 92.080 us, total = 92.080 us, Queueing time: mean = 36.492 us, max = 36.492 us, min = 36.492 us, total = 36.492 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 1.935 ms, total = 1.935 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 1.022 s, total = 1.022 s, Queueing time: mean = 126.779 us, max = 126.779 us, min = 126.779 us, total = 126.779 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 2.178 ms, total = 2.178 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 43.810 us, total = 43.810 us, Queueing time: mean = 116.080 us, max = 116.080 us, min = 116.080 us, total = 116.080 us [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 2.002 ms, total = 2.002 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 1.278 ms, total = 1.278 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 288.995 us, total = 288.995 us, Queueing time: mean = 122.868 us, max = 122.868 us, min = 122.868 us, total = 122.868 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 242.297 us, total = 242.297 us, Queueing time: mean = 122.977 us, max = 122.977 us, min = 122.977 us, total = 122.977 us [state-dump] DebugString() time ms: 1 [state-dump] [state-dump] [2025-01-21 00:38:54,932 I 13636 13664] (raylet) store.cc:564: Plasma store debug dump: Current usage: 0 / 2.14748 GB - num bytes created total: 168 0 pending objects of total size 0MB - objects spillable: 0 - bytes spillable: 0 - objects unsealed: 0 - bytes unsealed: 0 - objects in use: 0 - bytes in use: 0 - objects evictable: 0 - bytes evictable: 0 - objects created by worker: 0 - bytes created by worker: 0 - objects restored: 0 - bytes restored: 0 - objects received: 0 - bytes received: 0 - objects errored: 0 - bytes errored: 0 [2025-01-21 00:38:56,198 I 13636 13636] (raylet) node_manager.cc:525: [state-dump] NodeManager: [state-dump] Node ID: 94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3 [state-dump] Node name: 192.168.0.2 [state-dump] InitialConfigResources: {object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, GPU: 20000, node:__internal_head__: 10000, accelerator_type:A40: 10000, memory: 844966629380000} [state-dump] ClusterTaskManager: [state-dump] ========== Node: 94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3 ================= [state-dump] Infeasible queue length: 0 [state-dump] Schedule queue length: 0 [state-dump] Dispatch queue length: 0 [state-dump] num_waiting_for_resource: 0 [state-dump] num_waiting_for_plasma_memory: 0 [state-dump] num_waiting_for_remote_node_resources: 0 [state-dump] num_worker_not_started_by_job_config_not_exist: 0 [state-dump] num_worker_not_started_by_registration_timeout: 0 [state-dump] num_tasks_waiting_for_workers: 0 [state-dump] num_cancelled_tasks: 0 [state-dump] cluster_resource_scheduler state: [state-dump] Local id: -2729954820020547849 Local resources: {"total":{CPU: [200000], memory: [844966629380000], accelerator_type:A40: [10000], node:192.168.0.2: [10000], node:__internal_head__: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000]}}, "available": {CPU: [200000], memory: [844966629380000], accelerator_type:A40: [10000], node:192.168.0.2: [10000], node:__internal_head__: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000]}}, "labels":{"ray.io/node_id":"94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3",} is_draining: 0 is_idle: 1 Cluster resources: node id: -2729954820020547849{"total":{object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, GPU: 20000, node:__internal_head__: 10000, accelerator_type:A40: 10000, memory: 844966629380000}}, "available": {GPU: 20000, node:__internal_head__: 10000, memory: 844966629380000, accelerator_type:A40: 10000, object_store_memory: 21474836480000, node:192.168.0.2: 10000, CPU: 200000}}, "labels":{"ray.io/node_id":"94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} [state-dump] Waiting tasks size: 0 [state-dump] Number of executing tasks: 0 [state-dump] Number of pinned task arguments: 0 [state-dump] Number of total spilled tasks: 0 [state-dump] Number of spilled waiting tasks: 0 [state-dump] Number of spilled unschedulable tasks: 0 [state-dump] Resource usage { [state-dump] } [state-dump] Backlog Size per scheduling descriptor :{workerId: num backlogs}: [state-dump] [state-dump] Running tasks by scheduling class: [state-dump] ================================================== [state-dump] [state-dump] ClusterResources: [state-dump] LocalObjectManager: [state-dump] - num pinned objects: 0 [state-dump] - pinned objects size: 0 [state-dump] - num objects pending restore: 0 [state-dump] - num objects pending spill: 0 [state-dump] - num bytes pending spill: 0 [state-dump] - num bytes currently spilled: 0 [state-dump] - cumulative spill requests: 0 [state-dump] - cumulative restore requests: 0 [state-dump] - spilled objects pending delete: 0 [state-dump] [state-dump] ObjectManager: [state-dump] - num local objects: 0 [state-dump] - num unfulfilled push requests: 0 [state-dump] - num object pull requests: 0 [state-dump] - num chunks received total: 0 [state-dump] - num chunks received failed (all): 0 [state-dump] - num chunks received failed / cancelled: 0 [state-dump] - num chunks received failed / plasma error: 0 [state-dump] Event stats: [state-dump] Global stats: 0 total (0 active) [state-dump] Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Execution time: mean = -nan s, total = 0.000 s [state-dump] Event stats: [state-dump] PushManager: [state-dump] - num pushes in flight: 0 [state-dump] - num chunks in flight: 0 [state-dump] - num chunks remaining: 0 [state-dump] - max chunks allowed: 409 [state-dump] OwnershipBasedObjectDirectory: [state-dump] - num listeners: 0 [state-dump] - cumulative location updates: 0 [state-dump] - num location updates per second: 0.000 [state-dump] - num location lookups per second: 0.000 [state-dump] - num locations added per second: 0.000 [state-dump] - num locations removed per second: 0.000 [state-dump] BufferPool: [state-dump] - create buffer state map size: 0 [state-dump] PullManager: [state-dump] - num bytes available for pulled objects: 2147483648 [state-dump] - num bytes being pulled (all): 0 [state-dump] - num bytes being pulled / pinned: 0 [state-dump] - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - first get request bundle: N/A [state-dump] - first wait request bundle: N/A [state-dump] - first task request bundle: N/A [state-dump] - num objects queued: 0 [state-dump] - num objects actively pulled (all): 0 [state-dump] - num objects actively pulled / pinned: 0 [state-dump] - num bundles being pulled: 0 [state-dump] - num pull retries: 0 [state-dump] - max timeout seconds: 0 [state-dump] - max timeout request is already processed. No entry. [state-dump] [state-dump] WorkerPool: [state-dump] - registered jobs: 1 [state-dump] - process_failed_job_config_missing: 0 [state-dump] - process_failed_rate_limited: 0 [state-dump] - process_failed_pending_registration: 0 [state-dump] - process_failed_runtime_env_setup_failed: 0 [state-dump] - num PYTHON workers: 20 [state-dump] - num PYTHON drivers: 1 [state-dump] - num PYTHON pending start requests: 0 [state-dump] - num PYTHON pending registration requests: 0 [state-dump] - num object spill callbacks queued: 0 [state-dump] - num object restore queued: 0 [state-dump] - num util functions queued: 0 [state-dump] - num idle workers: 20 [state-dump] TaskDependencyManager: [state-dump] - task deps map size: 0 [state-dump] - get req map size: 0 [state-dump] - wait req map size: 0 [state-dump] - local objects map size: 0 [state-dump] WaitManager: [state-dump] - num active wait requests: 0 [state-dump] Subscriber: [state-dump] Channel WORKER_OBJECT_LOCATIONS_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_OBJECT_EVICTION [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_REF_REMOVED_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] num async plasma notifications: 0 [state-dump] Remote node managers: [state-dump] Event stats: [state-dump] Global stats: 507929 total (35 active) [state-dump] Queueing time: mean = 202.319 us, max = 59.826 s, min = -0.001 s, total = 102.764 s [state-dump] Execution time: mean = 10.811 ms, total = 5491.303 s [state-dump] Event stats: [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog - 122178 total (0 active), Execution time: mean = 533.530 us, total = 65.186 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 122178 total (0 active), Execution time: mean = 35.600 us, total = 4.350 s, Queueing time: mean = 111.480 us, max = 3.003 ms, min = 1.846 us, total = 13.620 s [state-dump] RaySyncer.OnDemandBroadcasting - 58144 total (1 active), Execution time: mean = 10.915 us, total = 634.662 ms, Queueing time: mean = 92.317 us, max = 25.869 ms, min = 6.166 us, total = 5.368 s [state-dump] NodeManager.CheckGC - 58144 total (1 active), Execution time: mean = 3.913 us, total = 227.536 ms, Queueing time: mean = 98.480 us, max = 25.875 ms, min = -0.000 s, total = 5.726 s [state-dump] ObjectManager.UpdateAvailableMemory - 58144 total (0 active), Execution time: mean = 6.299 us, total = 366.249 ms, Queueing time: mean = 112.322 us, max = 45.939 ms, min = 2.228 us, total = 6.531 s [state-dump] RayletWorkerPool.deadline_timer.kill_idle_workers - 29088 total (1 active), Execution time: mean = 18.873 us, total = 548.983 ms, Queueing time: mean = 77.806 us, max = 26.386 ms, min = -0.001 s, total = 2.263 s [state-dump] MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 23232 total (1 active), Execution time: mean = 458.899 us, total = 10.661 s, Queueing time: mean = 76.149 us, max = 4.063 ms, min = -0.000 s, total = 1.769 s [state-dump] NodeManager.deadline_timer.flush_free_objects - 5820 total (1 active), Execution time: mean = 9.421 us, total = 54.828 ms, Queueing time: mean = 178.778 us, max = 2.947 ms, min = 3.811 us, total = 1.040 s [state-dump] NodeManager.ScheduleAndDispatchTasks - 5820 total (1 active), Execution time: mean = 17.155 us, total = 99.841 ms, Queueing time: mean = 75.756 us, max = 2.581 ms, min = 8.735 us, total = 440.900 ms [state-dump] NodeManager.deadline_timer.spill_objects_when_over_threshold - 5820 total (1 active), Execution time: mean = 3.155 us, total = 18.360 ms, Queueing time: mean = 183.035 us, max = 2.946 ms, min = 3.845 us, total = 1.065 s [state-dump] NodeManagerService.grpc_server.GetResourceLoad - 5818 total (0 active), Execution time: mean = 613.605 us, total = 3.570 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 5818 total (0 active), Execution time: mean = 97.610 us, total = 567.895 ms, Queueing time: mean = 113.697 us, max = 2.934 ms, min = 4.027 us, total = 661.488 ms [state-dump] ClusterResourceManager.ResetRemoteNodeView - 1941 total (1 active), Execution time: mean = 10.123 us, total = 19.649 ms, Queueing time: mean = 75.991 us, max = 442.307 us, min = 14.043 us, total = 147.498 ms [state-dump] NodeManager.deadline_timer.record_metrics - 1164 total (1 active), Execution time: mean = 584.495 us, total = 680.352 ms, Queueing time: mean = 345.376 us, max = 2.010 ms, min = 9.115 us, total = 402.018 ms [state-dump] NodeManager.GcsCheckAlive - 1164 total (1 active), Execution time: mean = 319.411 us, total = 371.795 ms, Queueing time: mean = 609.997 us, max = 2.567 ms, min = 6.690 us, total = 710.036 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 1164 total (0 active), Execution time: mean = 56.104 us, total = 65.305 ms, Queueing time: mean = 112.441 us, max = 4.779 ms, min = 11.561 us, total = 130.882 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 1164 total (0 active), Execution time: mean = 1.604 ms, total = 1.868 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.deadline_timer.debug_state_dump - 582 total (1 active), Execution time: mean = 1.785 ms, total = 1.039 s, Queueing time: mean = 72.016 us, max = 180.398 us, min = 11.609 us, total = 41.914 ms [state-dump] NodeManager.deadline_timer.print_event_loop_stats - 97 total (1 active, 1 running), Execution time: mean = 2.727 ms, total = 264.504 ms, Queueing time: mean = 77.095 us, max = 325.100 us, min = 9.635 us, total = 7.478 ms [state-dump] ClientConnection.async_read.ProcessMessageHeader - 95 total (21 active), Execution time: mean = 7.620 us, total = 723.861 us, Queueing time: mean = 660.815 ms, max = 59.826 s, min = 29.537 us, total = 62.777 s [state-dump] ClientConnection.async_read.ProcessMessage - 74 total (0 active), Execution time: mean = 752.373 us, total = 55.676 ms, Queueing time: mean = 18.898 us, max = 247.615 us, min = 4.453 us, total = 1.398 ms [state-dump] ClientConnection.async_write.DoAsyncWrites - 22 total (0 active), Execution time: mean = 2.129 us, total = 46.834 us, Queueing time: mean = 43.475 us, max = 264.443 us, min = 17.727 us, total = 956.459 us [state-dump] CoreWorkerService.grpc_client.LocalGC.OnReplyReceived - 21 total (0 active), Execution time: mean = 65.428 us, total = 1.374 ms, Queueing time: mean = 149.534 us, max = 223.485 us, min = 56.603 us, total = 3.140 ms [state-dump] ObjectManager.ObjectAdded - 21 total (0 active), Execution time: mean = 13.522 us, total = 283.961 us, Queueing time: mean = 115.200 us, max = 203.067 us, min = 15.853 us, total = 2.419 ms [state-dump] ObjectManager.ObjectDeleted - 21 total (0 active), Execution time: mean = 18.654 us, total = 391.728 us, Queueing time: mean = 66.932 us, max = 269.040 us, min = 24.280 us, total = 1.406 ms [state-dump] CoreWorkerService.grpc_client.LocalGC - 21 total (0 active), Execution time: mean = 93.349 ms, total = 1.960 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.GetSystemConfig - 21 total (0 active), Execution time: mean = 800.451 us, total = 16.809 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 21 total (0 active), Execution time: mean = 119.783 us, total = 2.515 ms, Queueing time: mean = 119.709 us, max = 193.975 us, min = 26.529 us, total = 2.514 ms [state-dump] PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 204.482 us, total = 2.658 ms, Queueing time: mean = 3.109 ms, max = 9.915 ms, min = 28.359 us, total = 40.413 ms [state-dump] - 12 total (0 active), Execution time: mean = 1.109 us, total = 13.308 us, Queueing time: mean = 117.235 us, max = 207.332 us, min = 32.817 us, total = 1.407 ms [state-dump] RaySyncer.BroadcastMessage - 12 total (0 active), Execution time: mean = 229.790 us, total = 2.757 ms, Queueing time: mean = 796.500 ns, max = 1.330 us, min = 180.000 ns, total = 9.558 us [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 11 total (1 active), Execution time: mean = 490.692 s, total = 5397.609 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.RequestWorkerLease - 10 total (0 active), Execution time: mean = 967.372 us, total = 9.674 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReturnWorker.HandleRequestImpl - 10 total (0 active), Execution time: mean = 129.906 us, total = 1.299 ms, Queueing time: mean = 109.463 us, max = 137.712 us, min = 31.332 us, total = 1.095 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker - 10 total (0 active), Execution time: mean = 698.937 us, total = 6.989 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.RequestWorkerLease.HandleRequestImpl - 10 total (0 active), Execution time: mean = 133.400 us, total = 1.334 ms, Queueing time: mean = 133.654 us, max = 258.016 us, min = 50.206 us, total = 1.337 ms [state-dump] WorkerPool.PopWorkerCallback - 10 total (0 active), Execution time: mean = 24.488 us, total = 244.878 us, Queueing time: mean = 114.997 us, max = 206.869 us, min = 23.653 us, total = 1.150 ms [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 10 total (0 active), Execution time: mean = 376.502 us, total = 3.765 ms, Queueing time: mean = 122.227 us, max = 243.371 us, min = 20.299 us, total = 1.222 ms [state-dump] NodeManager.GCTaskFailureReason - 7 total (1 active), Execution time: mean = 7.313 us, total = 51.189 us, Queueing time: mean = 59.264 us, max = 97.290 us, min = 24.344 us, total = 414.847 us [state-dump] RaySyncerRegister - 2 total (0 active), Execution time: mean = 2.053 us, total = 4.106 us, Queueing time: mean = 222.000 ns, max = 369.000 ns, min = 75.000 ns, total = 444.000 ns [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 1.320 ms, total = 2.640 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 142.119 us, total = 284.237 us, Queueing time: mean = 702.200 us, max = 1.287 ms, min = 117.240 us, total = 1.404 ms [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 1.746 ms, total = 1.746 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 74.750 us, total = 74.750 us, Queueing time: mean = 225.429 us, max = 225.429 us, min = 225.429 us, total = 225.429 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 92.080 us, total = 92.080 us, Queueing time: mean = 36.492 us, max = 36.492 us, min = 36.492 us, total = 36.492 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 1.935 ms, total = 1.935 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 1.022 s, total = 1.022 s, Queueing time: mean = 126.779 us, max = 126.779 us, min = 126.779 us, total = 126.779 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 2.178 ms, total = 2.178 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 43.810 us, total = 43.810 us, Queueing time: mean = 116.080 us, max = 116.080 us, min = 116.080 us, total = 116.080 us [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 2.002 ms, total = 2.002 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 1.278 ms, total = 1.278 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 288.995 us, total = 288.995 us, Queueing time: mean = 122.868 us, max = 122.868 us, min = 122.868 us, total = 122.868 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 242.297 us, total = 242.297 us, Queueing time: mean = 122.977 us, max = 122.977 us, min = 122.977 us, total = 122.977 us [state-dump] DebugString() time ms: 1 [state-dump] [state-dump] [2025-01-21 00:39:54,933 I 13636 13664] (raylet) store.cc:564: Plasma store debug dump: Current usage: 0 / 2.14748 GB - num bytes created total: 168 0 pending objects of total size 0MB - objects spillable: 0 - bytes spillable: 0 - objects unsealed: 0 - bytes unsealed: 0 - objects in use: 0 - bytes in use: 0 - objects evictable: 0 - bytes evictable: 0 - objects created by worker: 0 - bytes created by worker: 0 - objects restored: 0 - bytes restored: 0 - objects received: 0 - bytes received: 0 - objects errored: 0 - bytes errored: 0 [2025-01-21 00:39:56,202 I 13636 13636] (raylet) node_manager.cc:525: [state-dump] NodeManager: [state-dump] Node ID: 94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3 [state-dump] Node name: 192.168.0.2 [state-dump] InitialConfigResources: {object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, GPU: 20000, node:__internal_head__: 10000, accelerator_type:A40: 10000, memory: 844966629380000} [state-dump] ClusterTaskManager: [state-dump] ========== Node: 94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3 ================= [state-dump] Infeasible queue length: 0 [state-dump] Schedule queue length: 0 [state-dump] Dispatch queue length: 0 [state-dump] num_waiting_for_resource: 0 [state-dump] num_waiting_for_plasma_memory: 0 [state-dump] num_waiting_for_remote_node_resources: 0 [state-dump] num_worker_not_started_by_job_config_not_exist: 0 [state-dump] num_worker_not_started_by_registration_timeout: 0 [state-dump] num_tasks_waiting_for_workers: 0 [state-dump] num_cancelled_tasks: 0 [state-dump] cluster_resource_scheduler state: [state-dump] Local id: -2729954820020547849 Local resources: {"total":{CPU: [200000], memory: [844966629380000], accelerator_type:A40: [10000], node:192.168.0.2: [10000], node:__internal_head__: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000]}}, "available": {CPU: [200000], memory: [844966629380000], accelerator_type:A40: [10000], node:192.168.0.2: [10000], node:__internal_head__: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000]}}, "labels":{"ray.io/node_id":"94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3",} is_draining: 0 is_idle: 1 Cluster resources: node id: -2729954820020547849{"total":{object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, GPU: 20000, node:__internal_head__: 10000, accelerator_type:A40: 10000, memory: 844966629380000}}, "available": {GPU: 20000, node:__internal_head__: 10000, memory: 844966629380000, accelerator_type:A40: 10000, object_store_memory: 21474836480000, node:192.168.0.2: 10000, CPU: 200000}}, "labels":{"ray.io/node_id":"94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} [state-dump] Waiting tasks size: 0 [state-dump] Number of executing tasks: 0 [state-dump] Number of pinned task arguments: 0 [state-dump] Number of total spilled tasks: 0 [state-dump] Number of spilled waiting tasks: 0 [state-dump] Number of spilled unschedulable tasks: 0 [state-dump] Resource usage { [state-dump] } [state-dump] Backlog Size per scheduling descriptor :{workerId: num backlogs}: [state-dump] [state-dump] Running tasks by scheduling class: [state-dump] ================================================== [state-dump] [state-dump] ClusterResources: [state-dump] LocalObjectManager: [state-dump] - num pinned objects: 0 [state-dump] - pinned objects size: 0 [state-dump] - num objects pending restore: 0 [state-dump] - num objects pending spill: 0 [state-dump] - num bytes pending spill: 0 [state-dump] - num bytes currently spilled: 0 [state-dump] - cumulative spill requests: 0 [state-dump] - cumulative restore requests: 0 [state-dump] - spilled objects pending delete: 0 [state-dump] [state-dump] ObjectManager: [state-dump] - num local objects: 0 [state-dump] - num unfulfilled push requests: 0 [state-dump] - num object pull requests: 0 [state-dump] - num chunks received total: 0 [state-dump] - num chunks received failed (all): 0 [state-dump] - num chunks received failed / cancelled: 0 [state-dump] - num chunks received failed / plasma error: 0 [state-dump] Event stats: [state-dump] Global stats: 0 total (0 active) [state-dump] Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Execution time: mean = -nan s, total = 0.000 s [state-dump] Event stats: [state-dump] PushManager: [state-dump] - num pushes in flight: 0 [state-dump] - num chunks in flight: 0 [state-dump] - num chunks remaining: 0 [state-dump] - max chunks allowed: 409 [state-dump] OwnershipBasedObjectDirectory: [state-dump] - num listeners: 0 [state-dump] - cumulative location updates: 0 [state-dump] - num location updates per second: 0.000 [state-dump] - num location lookups per second: 0.000 [state-dump] - num locations added per second: 0.000 [state-dump] - num locations removed per second: 0.000 [state-dump] BufferPool: [state-dump] - create buffer state map size: 0 [state-dump] PullManager: [state-dump] - num bytes available for pulled objects: 2147483648 [state-dump] - num bytes being pulled (all): 0 [state-dump] - num bytes being pulled / pinned: 0 [state-dump] - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - first get request bundle: N/A [state-dump] - first wait request bundle: N/A [state-dump] - first task request bundle: N/A [state-dump] - num objects queued: 0 [state-dump] - num objects actively pulled (all): 0 [state-dump] - num objects actively pulled / pinned: 0 [state-dump] - num bundles being pulled: 0 [state-dump] - num pull retries: 0 [state-dump] - max timeout seconds: 0 [state-dump] - max timeout request is already processed. No entry. [state-dump] [state-dump] WorkerPool: [state-dump] - registered jobs: 1 [state-dump] - process_failed_job_config_missing: 0 [state-dump] - process_failed_rate_limited: 0 [state-dump] - process_failed_pending_registration: 0 [state-dump] - process_failed_runtime_env_setup_failed: 0 [state-dump] - num PYTHON workers: 20 [state-dump] - num PYTHON drivers: 1 [state-dump] - num PYTHON pending start requests: 0 [state-dump] - num PYTHON pending registration requests: 0 [state-dump] - num object spill callbacks queued: 0 [state-dump] - num object restore queued: 0 [state-dump] - num util functions queued: 0 [state-dump] - num idle workers: 20 [state-dump] TaskDependencyManager: [state-dump] - task deps map size: 0 [state-dump] - get req map size: 0 [state-dump] - wait req map size: 0 [state-dump] - local objects map size: 0 [state-dump] WaitManager: [state-dump] - num active wait requests: 0 [state-dump] Subscriber: [state-dump] Channel WORKER_OBJECT_LOCATIONS_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_OBJECT_EVICTION [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_REF_REMOVED_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] num async plasma notifications: 0 [state-dump] Remote node managers: [state-dump] Event stats: [state-dump] Global stats: 513161 total (35 active) [state-dump] Queueing time: mean = 201.111 us, max = 59.826 s, min = -0.001 s, total = 103.202 s [state-dump] Execution time: mean = 10.703 ms, total = 5492.279 s [state-dump] Event stats: [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog - 123438 total (0 active), Execution time: mean = 533.785 us, total = 65.889 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 123438 total (0 active), Execution time: mean = 35.607 us, total = 4.395 s, Queueing time: mean = 111.569 us, max = 3.003 ms, min = 1.846 us, total = 13.772 s [state-dump] RaySyncer.OnDemandBroadcasting - 58743 total (1 active), Execution time: mean = 10.927 us, total = 641.881 ms, Queueing time: mean = 92.374 us, max = 25.869 ms, min = 6.166 us, total = 5.426 s [state-dump] NodeManager.CheckGC - 58743 total (1 active), Execution time: mean = 3.905 us, total = 229.406 ms, Queueing time: mean = 98.555 us, max = 25.875 ms, min = -0.000 s, total = 5.789 s [state-dump] ObjectManager.UpdateAvailableMemory - 58743 total (0 active), Execution time: mean = 6.307 us, total = 370.468 ms, Queueing time: mean = 112.417 us, max = 45.939 ms, min = 2.228 us, total = 6.604 s [state-dump] RayletWorkerPool.deadline_timer.kill_idle_workers - 29388 total (1 active), Execution time: mean = 18.900 us, total = 555.433 ms, Queueing time: mean = 77.826 us, max = 26.386 ms, min = -0.001 s, total = 2.287 s [state-dump] MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 23472 total (1 active), Execution time: mean = 459.054 us, total = 10.775 s, Queueing time: mean = 76.187 us, max = 4.063 ms, min = -0.000 s, total = 1.788 s [state-dump] NodeManager.deadline_timer.flush_free_objects - 5880 total (1 active), Execution time: mean = 9.435 us, total = 55.479 ms, Queueing time: mean = 178.804 us, max = 2.947 ms, min = 3.811 us, total = 1.051 s [state-dump] NodeManager.ScheduleAndDispatchTasks - 5880 total (1 active), Execution time: mean = 17.187 us, total = 101.060 ms, Queueing time: mean = 75.790 us, max = 2.581 ms, min = 8.735 us, total = 445.646 ms [state-dump] NodeManager.deadline_timer.spill_objects_when_over_threshold - 5880 total (1 active), Execution time: mean = 3.156 us, total = 18.556 ms, Queueing time: mean = 183.068 us, max = 2.946 ms, min = 3.845 us, total = 1.076 s [state-dump] NodeManagerService.grpc_server.GetResourceLoad - 5878 total (0 active), Execution time: mean = 613.974 us, total = 3.609 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 5878 total (0 active), Execution time: mean = 97.662 us, total = 574.060 ms, Queueing time: mean = 113.792 us, max = 2.934 ms, min = 4.027 us, total = 668.867 ms [state-dump] ClusterResourceManager.ResetRemoteNodeView - 1961 total (1 active), Execution time: mean = 10.124 us, total = 19.854 ms, Queueing time: mean = 76.019 us, max = 442.307 us, min = 14.043 us, total = 149.073 ms [state-dump] NodeManager.deadline_timer.record_metrics - 1176 total (1 active), Execution time: mean = 584.714 us, total = 687.623 ms, Queueing time: mean = 345.395 us, max = 2.010 ms, min = 9.115 us, total = 406.185 ms [state-dump] NodeManager.GcsCheckAlive - 1176 total (1 active), Execution time: mean = 319.688 us, total = 375.953 ms, Queueing time: mean = 609.985 us, max = 2.567 ms, min = 6.690 us, total = 717.342 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 1176 total (0 active), Execution time: mean = 56.102 us, total = 65.976 ms, Queueing time: mean = 112.459 us, max = 4.779 ms, min = 11.561 us, total = 132.251 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 1176 total (0 active), Execution time: mean = 1.605 ms, total = 1.887 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.deadline_timer.debug_state_dump - 588 total (1 active), Execution time: mean = 1.785 ms, total = 1.050 s, Queueing time: mean = 72.160 us, max = 180.398 us, min = 11.609 us, total = 42.430 ms [state-dump] NodeManager.deadline_timer.print_event_loop_stats - 98 total (1 active, 1 running), Execution time: mean = 2.723 ms, total = 266.872 ms, Queueing time: mean = 77.420 us, max = 325.100 us, min = 9.635 us, total = 7.587 ms [state-dump] ClientConnection.async_read.ProcessMessageHeader - 95 total (21 active), Execution time: mean = 7.620 us, total = 723.861 us, Queueing time: mean = 660.815 ms, max = 59.826 s, min = 29.537 us, total = 62.777 s [state-dump] ClientConnection.async_read.ProcessMessage - 74 total (0 active), Execution time: mean = 752.373 us, total = 55.676 ms, Queueing time: mean = 18.898 us, max = 247.615 us, min = 4.453 us, total = 1.398 ms [state-dump] ClientConnection.async_write.DoAsyncWrites - 22 total (0 active), Execution time: mean = 2.129 us, total = 46.834 us, Queueing time: mean = 43.475 us, max = 264.443 us, min = 17.727 us, total = 956.459 us [state-dump] CoreWorkerService.grpc_client.LocalGC.OnReplyReceived - 21 total (0 active), Execution time: mean = 65.428 us, total = 1.374 ms, Queueing time: mean = 149.534 us, max = 223.485 us, min = 56.603 us, total = 3.140 ms [state-dump] ObjectManager.ObjectAdded - 21 total (0 active), Execution time: mean = 13.522 us, total = 283.961 us, Queueing time: mean = 115.200 us, max = 203.067 us, min = 15.853 us, total = 2.419 ms [state-dump] ObjectManager.ObjectDeleted - 21 total (0 active), Execution time: mean = 18.654 us, total = 391.728 us, Queueing time: mean = 66.932 us, max = 269.040 us, min = 24.280 us, total = 1.406 ms [state-dump] CoreWorkerService.grpc_client.LocalGC - 21 total (0 active), Execution time: mean = 93.349 ms, total = 1.960 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.GetSystemConfig - 21 total (0 active), Execution time: mean = 800.451 us, total = 16.809 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 21 total (0 active), Execution time: mean = 119.783 us, total = 2.515 ms, Queueing time: mean = 119.709 us, max = 193.975 us, min = 26.529 us, total = 2.514 ms [state-dump] PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 204.482 us, total = 2.658 ms, Queueing time: mean = 3.109 ms, max = 9.915 ms, min = 28.359 us, total = 40.413 ms [state-dump] - 12 total (0 active), Execution time: mean = 1.109 us, total = 13.308 us, Queueing time: mean = 117.235 us, max = 207.332 us, min = 32.817 us, total = 1.407 ms [state-dump] RaySyncer.BroadcastMessage - 12 total (0 active), Execution time: mean = 229.790 us, total = 2.757 ms, Queueing time: mean = 796.500 ns, max = 1.330 us, min = 180.000 ns, total = 9.558 us [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 11 total (1 active), Execution time: mean = 490.692 s, total = 5397.609 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.RequestWorkerLease - 10 total (0 active), Execution time: mean = 967.372 us, total = 9.674 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReturnWorker.HandleRequestImpl - 10 total (0 active), Execution time: mean = 129.906 us, total = 1.299 ms, Queueing time: mean = 109.463 us, max = 137.712 us, min = 31.332 us, total = 1.095 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker - 10 total (0 active), Execution time: mean = 698.937 us, total = 6.989 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.RequestWorkerLease.HandleRequestImpl - 10 total (0 active), Execution time: mean = 133.400 us, total = 1.334 ms, Queueing time: mean = 133.654 us, max = 258.016 us, min = 50.206 us, total = 1.337 ms [state-dump] WorkerPool.PopWorkerCallback - 10 total (0 active), Execution time: mean = 24.488 us, total = 244.878 us, Queueing time: mean = 114.997 us, max = 206.869 us, min = 23.653 us, total = 1.150 ms [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 10 total (0 active), Execution time: mean = 376.502 us, total = 3.765 ms, Queueing time: mean = 122.227 us, max = 243.371 us, min = 20.299 us, total = 1.222 ms [state-dump] NodeManager.GCTaskFailureReason - 7 total (1 active), Execution time: mean = 7.313 us, total = 51.189 us, Queueing time: mean = 59.264 us, max = 97.290 us, min = 24.344 us, total = 414.847 us [state-dump] RaySyncerRegister - 2 total (0 active), Execution time: mean = 2.053 us, total = 4.106 us, Queueing time: mean = 222.000 ns, max = 369.000 ns, min = 75.000 ns, total = 444.000 ns [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 1.320 ms, total = 2.640 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 142.119 us, total = 284.237 us, Queueing time: mean = 702.200 us, max = 1.287 ms, min = 117.240 us, total = 1.404 ms [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 1.746 ms, total = 1.746 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 74.750 us, total = 74.750 us, Queueing time: mean = 225.429 us, max = 225.429 us, min = 225.429 us, total = 225.429 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 92.080 us, total = 92.080 us, Queueing time: mean = 36.492 us, max = 36.492 us, min = 36.492 us, total = 36.492 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 1.935 ms, total = 1.935 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 1.022 s, total = 1.022 s, Queueing time: mean = 126.779 us, max = 126.779 us, min = 126.779 us, total = 126.779 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 2.178 ms, total = 2.178 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 43.810 us, total = 43.810 us, Queueing time: mean = 116.080 us, max = 116.080 us, min = 116.080 us, total = 116.080 us [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 2.002 ms, total = 2.002 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 1.278 ms, total = 1.278 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 288.995 us, total = 288.995 us, Queueing time: mean = 122.868 us, max = 122.868 us, min = 122.868 us, total = 122.868 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 242.297 us, total = 242.297 us, Queueing time: mean = 122.977 us, max = 122.977 us, min = 122.977 us, total = 122.977 us [state-dump] DebugString() time ms: 2 [state-dump] [state-dump] [2025-01-21 00:40:54,933 I 13636 13664] (raylet) store.cc:564: Plasma store debug dump: Current usage: 0 / 2.14748 GB - num bytes created total: 168 0 pending objects of total size 0MB - objects spillable: 0 - bytes spillable: 0 - objects unsealed: 0 - bytes unsealed: 0 - objects in use: 0 - bytes in use: 0 - objects evictable: 0 - bytes evictable: 0 - objects created by worker: 0 - bytes created by worker: 0 - objects restored: 0 - bytes restored: 0 - objects received: 0 - bytes received: 0 - objects errored: 0 - bytes errored: 0 [2025-01-21 00:40:56,204 I 13636 13636] (raylet) node_manager.cc:525: [state-dump] NodeManager: [state-dump] Node ID: 94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3 [state-dump] Node name: 192.168.0.2 [state-dump] InitialConfigResources: {object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, GPU: 20000, node:__internal_head__: 10000, accelerator_type:A40: 10000, memory: 844966629380000} [state-dump] ClusterTaskManager: [state-dump] ========== Node: 94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3 ================= [state-dump] Infeasible queue length: 0 [state-dump] Schedule queue length: 0 [state-dump] Dispatch queue length: 0 [state-dump] num_waiting_for_resource: 0 [state-dump] num_waiting_for_plasma_memory: 0 [state-dump] num_waiting_for_remote_node_resources: 0 [state-dump] num_worker_not_started_by_job_config_not_exist: 0 [state-dump] num_worker_not_started_by_registration_timeout: 0 [state-dump] num_tasks_waiting_for_workers: 0 [state-dump] num_cancelled_tasks: 0 [state-dump] cluster_resource_scheduler state: [state-dump] Local id: -2729954820020547849 Local resources: {"total":{CPU: [200000], memory: [844966629380000], accelerator_type:A40: [10000], node:192.168.0.2: [10000], node:__internal_head__: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000]}}, "available": {CPU: [200000], memory: [844966629380000], accelerator_type:A40: [10000], node:192.168.0.2: [10000], node:__internal_head__: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000]}}, "labels":{"ray.io/node_id":"94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3",} is_draining: 0 is_idle: 1 Cluster resources: node id: -2729954820020547849{"total":{object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, GPU: 20000, node:__internal_head__: 10000, accelerator_type:A40: 10000, memory: 844966629380000}}, "available": {GPU: 20000, node:__internal_head__: 10000, memory: 844966629380000, accelerator_type:A40: 10000, object_store_memory: 21474836480000, node:192.168.0.2: 10000, CPU: 200000}}, "labels":{"ray.io/node_id":"94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} [state-dump] Waiting tasks size: 0 [state-dump] Number of executing tasks: 0 [state-dump] Number of pinned task arguments: 0 [state-dump] Number of total spilled tasks: 0 [state-dump] Number of spilled waiting tasks: 0 [state-dump] Number of spilled unschedulable tasks: 0 [state-dump] Resource usage { [state-dump] } [state-dump] Backlog Size per scheduling descriptor :{workerId: num backlogs}: [state-dump] [state-dump] Running tasks by scheduling class: [state-dump] ================================================== [state-dump] [state-dump] ClusterResources: [state-dump] LocalObjectManager: [state-dump] - num pinned objects: 0 [state-dump] - pinned objects size: 0 [state-dump] - num objects pending restore: 0 [state-dump] - num objects pending spill: 0 [state-dump] - num bytes pending spill: 0 [state-dump] - num bytes currently spilled: 0 [state-dump] - cumulative spill requests: 0 [state-dump] - cumulative restore requests: 0 [state-dump] - spilled objects pending delete: 0 [state-dump] [state-dump] ObjectManager: [state-dump] - num local objects: 0 [state-dump] - num unfulfilled push requests: 0 [state-dump] - num object pull requests: 0 [state-dump] - num chunks received total: 0 [state-dump] - num chunks received failed (all): 0 [state-dump] - num chunks received failed / cancelled: 0 [state-dump] - num chunks received failed / plasma error: 0 [state-dump] Event stats: [state-dump] Global stats: 0 total (0 active) [state-dump] Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Execution time: mean = -nan s, total = 0.000 s [state-dump] Event stats: [state-dump] PushManager: [state-dump] - num pushes in flight: 0 [state-dump] - num chunks in flight: 0 [state-dump] - num chunks remaining: 0 [state-dump] - max chunks allowed: 409 [state-dump] OwnershipBasedObjectDirectory: [state-dump] - num listeners: 0 [state-dump] - cumulative location updates: 0 [state-dump] - num location updates per second: 0.000 [state-dump] - num location lookups per second: 0.000 [state-dump] - num locations added per second: 0.000 [state-dump] - num locations removed per second: 0.000 [state-dump] BufferPool: [state-dump] - create buffer state map size: 0 [state-dump] PullManager: [state-dump] - num bytes available for pulled objects: 2147483648 [state-dump] - num bytes being pulled (all): 0 [state-dump] - num bytes being pulled / pinned: 0 [state-dump] - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - first get request bundle: N/A [state-dump] - first wait request bundle: N/A [state-dump] - first task request bundle: N/A [state-dump] - num objects queued: 0 [state-dump] - num objects actively pulled (all): 0 [state-dump] - num objects actively pulled / pinned: 0 [state-dump] - num bundles being pulled: 0 [state-dump] - num pull retries: 0 [state-dump] - max timeout seconds: 0 [state-dump] - max timeout request is already processed. No entry. [state-dump] [state-dump] WorkerPool: [state-dump] - registered jobs: 1 [state-dump] - process_failed_job_config_missing: 0 [state-dump] - process_failed_rate_limited: 0 [state-dump] - process_failed_pending_registration: 0 [state-dump] - process_failed_runtime_env_setup_failed: 0 [state-dump] - num PYTHON workers: 20 [state-dump] - num PYTHON drivers: 1 [state-dump] - num PYTHON pending start requests: 0 [state-dump] - num PYTHON pending registration requests: 0 [state-dump] - num object spill callbacks queued: 0 [state-dump] - num object restore queued: 0 [state-dump] - num util functions queued: 0 [state-dump] - num idle workers: 20 [state-dump] TaskDependencyManager: [state-dump] - task deps map size: 0 [state-dump] - get req map size: 0 [state-dump] - wait req map size: 0 [state-dump] - local objects map size: 0 [state-dump] WaitManager: [state-dump] - num active wait requests: 0 [state-dump] Subscriber: [state-dump] Channel WORKER_OBJECT_LOCATIONS_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_OBJECT_EVICTION [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_REF_REMOVED_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] num async plasma notifications: 0 [state-dump] Remote node managers: [state-dump] Event stats: [state-dump] Global stats: 518395 total (35 active) [state-dump] Queueing time: mean = 199.868 us, max = 59.826 s, min = -0.001 s, total = 103.611 s [state-dump] Execution time: mean = 10.597 ms, total = 5493.205 s [state-dump] Event stats: [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog - 124698 total (0 active), Execution time: mean = 533.718 us, total = 66.554 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 124698 total (0 active), Execution time: mean = 35.604 us, total = 4.440 s, Queueing time: mean = 111.550 us, max = 3.003 ms, min = 1.846 us, total = 13.910 s [state-dump] RaySyncer.OnDemandBroadcasting - 59343 total (1 active), Execution time: mean = 10.929 us, total = 648.586 ms, Queueing time: mean = 92.414 us, max = 25.869 ms, min = 6.166 us, total = 5.484 s [state-dump] NodeManager.CheckGC - 59343 total (1 active), Execution time: mean = 3.896 us, total = 231.173 ms, Queueing time: mean = 98.607 us, max = 25.875 ms, min = -0.000 s, total = 5.852 s [state-dump] ObjectManager.UpdateAvailableMemory - 59343 total (0 active), Execution time: mean = 6.306 us, total = 374.221 ms, Queueing time: mean = 112.316 us, max = 45.939 ms, min = 2.228 us, total = 6.665 s [state-dump] RayletWorkerPool.deadline_timer.kill_idle_workers - 29688 total (1 active), Execution time: mean = 18.901 us, total = 561.144 ms, Queueing time: mean = 77.790 us, max = 26.386 ms, min = -0.001 s, total = 2.309 s [state-dump] MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 23711 total (1 active), Execution time: mean = 459.038 us, total = 10.884 s, Queueing time: mean = 76.222 us, max = 4.063 ms, min = -0.000 s, total = 1.807 s [state-dump] NodeManager.deadline_timer.flush_free_objects - 5940 total (1 active), Execution time: mean = 9.437 us, total = 56.055 ms, Queueing time: mean = 178.837 us, max = 2.947 ms, min = 3.811 us, total = 1.062 s [state-dump] NodeManager.ScheduleAndDispatchTasks - 5940 total (1 active), Execution time: mean = 17.192 us, total = 102.123 ms, Queueing time: mean = 75.779 us, max = 2.581 ms, min = 8.735 us, total = 450.125 ms [state-dump] NodeManager.deadline_timer.spill_objects_when_over_threshold - 5940 total (1 active), Execution time: mean = 3.157 us, total = 18.754 ms, Queueing time: mean = 183.100 us, max = 2.946 ms, min = 3.845 us, total = 1.088 s [state-dump] NodeManagerService.grpc_server.GetResourceLoad - 5938 total (0 active), Execution time: mean = 614.025 us, total = 3.646 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 5938 total (0 active), Execution time: mean = 97.668 us, total = 579.955 ms, Queueing time: mean = 113.860 us, max = 2.934 ms, min = 4.027 us, total = 676.100 ms [state-dump] ClusterResourceManager.ResetRemoteNodeView - 1981 total (1 active), Execution time: mean = 10.112 us, total = 20.032 ms, Queueing time: mean = 75.932 us, max = 442.307 us, min = 14.043 us, total = 150.421 ms [state-dump] NodeManager.deadline_timer.record_metrics - 1188 total (1 active), Execution time: mean = 584.757 us, total = 694.691 ms, Queueing time: mean = 345.304 us, max = 2.010 ms, min = 9.115 us, total = 410.221 ms [state-dump] NodeManager.GcsCheckAlive - 1188 total (1 active), Execution time: mean = 320.199 us, total = 380.396 ms, Queueing time: mean = 609.382 us, max = 2.567 ms, min = 6.690 us, total = 723.945 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 1188 total (0 active), Execution time: mean = 56.123 us, total = 66.674 ms, Queueing time: mean = 112.321 us, max = 4.779 ms, min = 11.561 us, total = 133.437 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 1188 total (0 active), Execution time: mean = 1.605 ms, total = 1.907 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.deadline_timer.debug_state_dump - 594 total (1 active), Execution time: mean = 1.786 ms, total = 1.061 s, Queueing time: mean = 72.171 us, max = 180.398 us, min = 11.609 us, total = 42.870 ms [state-dump] NodeManager.deadline_timer.print_event_loop_stats - 99 total (1 active, 1 running), Execution time: mean = 2.726 ms, total = 269.868 ms, Queueing time: mean = 77.311 us, max = 325.100 us, min = 9.635 us, total = 7.654 ms [state-dump] ClientConnection.async_read.ProcessMessageHeader - 95 total (21 active), Execution time: mean = 7.620 us, total = 723.861 us, Queueing time: mean = 660.815 ms, max = 59.826 s, min = 29.537 us, total = 62.777 s [state-dump] ClientConnection.async_read.ProcessMessage - 74 total (0 active), Execution time: mean = 752.373 us, total = 55.676 ms, Queueing time: mean = 18.898 us, max = 247.615 us, min = 4.453 us, total = 1.398 ms [state-dump] ClientConnection.async_write.DoAsyncWrites - 22 total (0 active), Execution time: mean = 2.129 us, total = 46.834 us, Queueing time: mean = 43.475 us, max = 264.443 us, min = 17.727 us, total = 956.459 us [state-dump] CoreWorkerService.grpc_client.LocalGC.OnReplyReceived - 21 total (0 active), Execution time: mean = 65.428 us, total = 1.374 ms, Queueing time: mean = 149.534 us, max = 223.485 us, min = 56.603 us, total = 3.140 ms [state-dump] ObjectManager.ObjectAdded - 21 total (0 active), Execution time: mean = 13.522 us, total = 283.961 us, Queueing time: mean = 115.200 us, max = 203.067 us, min = 15.853 us, total = 2.419 ms [state-dump] ObjectManager.ObjectDeleted - 21 total (0 active), Execution time: mean = 18.654 us, total = 391.728 us, Queueing time: mean = 66.932 us, max = 269.040 us, min = 24.280 us, total = 1.406 ms [state-dump] CoreWorkerService.grpc_client.LocalGC - 21 total (0 active), Execution time: mean = 93.349 ms, total = 1.960 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.GetSystemConfig - 21 total (0 active), Execution time: mean = 800.451 us, total = 16.809 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 21 total (0 active), Execution time: mean = 119.783 us, total = 2.515 ms, Queueing time: mean = 119.709 us, max = 193.975 us, min = 26.529 us, total = 2.514 ms [state-dump] PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 204.482 us, total = 2.658 ms, Queueing time: mean = 3.109 ms, max = 9.915 ms, min = 28.359 us, total = 40.413 ms [state-dump] - 12 total (0 active), Execution time: mean = 1.109 us, total = 13.308 us, Queueing time: mean = 117.235 us, max = 207.332 us, min = 32.817 us, total = 1.407 ms [state-dump] RaySyncer.BroadcastMessage - 12 total (0 active), Execution time: mean = 229.790 us, total = 2.757 ms, Queueing time: mean = 796.500 ns, max = 1.330 us, min = 180.000 ns, total = 9.558 us [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 11 total (1 active), Execution time: mean = 490.692 s, total = 5397.609 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.RequestWorkerLease - 10 total (0 active), Execution time: mean = 967.372 us, total = 9.674 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReturnWorker.HandleRequestImpl - 10 total (0 active), Execution time: mean = 129.906 us, total = 1.299 ms, Queueing time: mean = 109.463 us, max = 137.712 us, min = 31.332 us, total = 1.095 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker - 10 total (0 active), Execution time: mean = 698.937 us, total = 6.989 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.RequestWorkerLease.HandleRequestImpl - 10 total (0 active), Execution time: mean = 133.400 us, total = 1.334 ms, Queueing time: mean = 133.654 us, max = 258.016 us, min = 50.206 us, total = 1.337 ms [state-dump] WorkerPool.PopWorkerCallback - 10 total (0 active), Execution time: mean = 24.488 us, total = 244.878 us, Queueing time: mean = 114.997 us, max = 206.869 us, min = 23.653 us, total = 1.150 ms [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 10 total (0 active), Execution time: mean = 376.502 us, total = 3.765 ms, Queueing time: mean = 122.227 us, max = 243.371 us, min = 20.299 us, total = 1.222 ms [state-dump] NodeManager.GCTaskFailureReason - 7 total (1 active), Execution time: mean = 7.313 us, total = 51.189 us, Queueing time: mean = 59.264 us, max = 97.290 us, min = 24.344 us, total = 414.847 us [state-dump] RaySyncerRegister - 2 total (0 active), Execution time: mean = 2.053 us, total = 4.106 us, Queueing time: mean = 222.000 ns, max = 369.000 ns, min = 75.000 ns, total = 444.000 ns [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 1.320 ms, total = 2.640 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 142.119 us, total = 284.237 us, Queueing time: mean = 702.200 us, max = 1.287 ms, min = 117.240 us, total = 1.404 ms [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 1.746 ms, total = 1.746 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 74.750 us, total = 74.750 us, Queueing time: mean = 225.429 us, max = 225.429 us, min = 225.429 us, total = 225.429 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 92.080 us, total = 92.080 us, Queueing time: mean = 36.492 us, max = 36.492 us, min = 36.492 us, total = 36.492 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 1.935 ms, total = 1.935 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 1.022 s, total = 1.022 s, Queueing time: mean = 126.779 us, max = 126.779 us, min = 126.779 us, total = 126.779 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 2.178 ms, total = 2.178 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 43.810 us, total = 43.810 us, Queueing time: mean = 116.080 us, max = 116.080 us, min = 116.080 us, total = 116.080 us [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 2.002 ms, total = 2.002 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 1.278 ms, total = 1.278 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 288.995 us, total = 288.995 us, Queueing time: mean = 122.868 us, max = 122.868 us, min = 122.868 us, total = 122.868 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 242.297 us, total = 242.297 us, Queueing time: mean = 122.977 us, max = 122.977 us, min = 122.977 us, total = 122.977 us [state-dump] DebugString() time ms: 1 [state-dump] [state-dump] [2025-01-21 00:41:54,933 I 13636 13664] (raylet) store.cc:564: Plasma store debug dump: Current usage: 0 / 2.14748 GB - num bytes created total: 168 0 pending objects of total size 0MB - objects spillable: 0 - bytes spillable: 0 - objects unsealed: 0 - bytes unsealed: 0 - objects in use: 0 - bytes in use: 0 - objects evictable: 0 - bytes evictable: 0 - objects created by worker: 0 - bytes created by worker: 0 - objects restored: 0 - bytes restored: 0 - objects received: 0 - bytes received: 0 - objects errored: 0 - bytes errored: 0 [2025-01-21 00:41:56,207 I 13636 13636] (raylet) node_manager.cc:525: [state-dump] NodeManager: [state-dump] Node ID: 94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3 [state-dump] Node name: 192.168.0.2 [state-dump] InitialConfigResources: {object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, GPU: 20000, node:__internal_head__: 10000, accelerator_type:A40: 10000, memory: 844966629380000} [state-dump] ClusterTaskManager: [state-dump] ========== Node: 94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3 ================= [state-dump] Infeasible queue length: 0 [state-dump] Schedule queue length: 0 [state-dump] Dispatch queue length: 0 [state-dump] num_waiting_for_resource: 0 [state-dump] num_waiting_for_plasma_memory: 0 [state-dump] num_waiting_for_remote_node_resources: 0 [state-dump] num_worker_not_started_by_job_config_not_exist: 0 [state-dump] num_worker_not_started_by_registration_timeout: 0 [state-dump] num_tasks_waiting_for_workers: 0 [state-dump] num_cancelled_tasks: 0 [state-dump] cluster_resource_scheduler state: [state-dump] Local id: -2729954820020547849 Local resources: {"total":{CPU: [200000], memory: [844966629380000], accelerator_type:A40: [10000], node:192.168.0.2: [10000], node:__internal_head__: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000]}}, "available": {CPU: [200000], memory: [844966629380000], accelerator_type:A40: [10000], node:192.168.0.2: [10000], node:__internal_head__: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000]}}, "labels":{"ray.io/node_id":"94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3",} is_draining: 0 is_idle: 1 Cluster resources: node id: -2729954820020547849{"total":{object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, GPU: 20000, node:__internal_head__: 10000, accelerator_type:A40: 10000, memory: 844966629380000}}, "available": {GPU: 20000, node:__internal_head__: 10000, memory: 844966629380000, accelerator_type:A40: 10000, object_store_memory: 21474836480000, node:192.168.0.2: 10000, CPU: 200000}}, "labels":{"ray.io/node_id":"94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} [state-dump] Waiting tasks size: 0 [state-dump] Number of executing tasks: 0 [state-dump] Number of pinned task arguments: 0 [state-dump] Number of total spilled tasks: 0 [state-dump] Number of spilled waiting tasks: 0 [state-dump] Number of spilled unschedulable tasks: 0 [state-dump] Resource usage { [state-dump] } [state-dump] Backlog Size per scheduling descriptor :{workerId: num backlogs}: [state-dump] [state-dump] Running tasks by scheduling class: [state-dump] ================================================== [state-dump] [state-dump] ClusterResources: [state-dump] LocalObjectManager: [state-dump] - num pinned objects: 0 [state-dump] - pinned objects size: 0 [state-dump] - num objects pending restore: 0 [state-dump] - num objects pending spill: 0 [state-dump] - num bytes pending spill: 0 [state-dump] - num bytes currently spilled: 0 [state-dump] - cumulative spill requests: 0 [state-dump] - cumulative restore requests: 0 [state-dump] - spilled objects pending delete: 0 [state-dump] [state-dump] ObjectManager: [state-dump] - num local objects: 0 [state-dump] - num unfulfilled push requests: 0 [state-dump] - num object pull requests: 0 [state-dump] - num chunks received total: 0 [state-dump] - num chunks received failed (all): 0 [state-dump] - num chunks received failed / cancelled: 0 [state-dump] - num chunks received failed / plasma error: 0 [state-dump] Event stats: [state-dump] Global stats: 0 total (0 active) [state-dump] Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Execution time: mean = -nan s, total = 0.000 s [state-dump] Event stats: [state-dump] PushManager: [state-dump] - num pushes in flight: 0 [state-dump] - num chunks in flight: 0 [state-dump] - num chunks remaining: 0 [state-dump] - max chunks allowed: 409 [state-dump] OwnershipBasedObjectDirectory: [state-dump] - num listeners: 0 [state-dump] - cumulative location updates: 0 [state-dump] - num location updates per second: 0.000 [state-dump] - num location lookups per second: 0.000 [state-dump] - num locations added per second: 0.000 [state-dump] - num locations removed per second: 0.000 [state-dump] BufferPool: [state-dump] - create buffer state map size: 0 [state-dump] PullManager: [state-dump] - num bytes available for pulled objects: 2147483648 [state-dump] - num bytes being pulled (all): 0 [state-dump] - num bytes being pulled / pinned: 0 [state-dump] - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - first get request bundle: N/A [state-dump] - first wait request bundle: N/A [state-dump] - first task request bundle: N/A [state-dump] - num objects queued: 0 [state-dump] - num objects actively pulled (all): 0 [state-dump] - num objects actively pulled / pinned: 0 [state-dump] - num bundles being pulled: 0 [state-dump] - num pull retries: 0 [state-dump] - max timeout seconds: 0 [state-dump] - max timeout request is already processed. No entry. [state-dump] [state-dump] WorkerPool: [state-dump] - registered jobs: 1 [state-dump] - process_failed_job_config_missing: 0 [state-dump] - process_failed_rate_limited: 0 [state-dump] - process_failed_pending_registration: 0 [state-dump] - process_failed_runtime_env_setup_failed: 0 [state-dump] - num PYTHON workers: 20 [state-dump] - num PYTHON drivers: 1 [state-dump] - num PYTHON pending start requests: 0 [state-dump] - num PYTHON pending registration requests: 0 [state-dump] - num object spill callbacks queued: 0 [state-dump] - num object restore queued: 0 [state-dump] - num util functions queued: 0 [state-dump] - num idle workers: 20 [state-dump] TaskDependencyManager: [state-dump] - task deps map size: 0 [state-dump] - get req map size: 0 [state-dump] - wait req map size: 0 [state-dump] - local objects map size: 0 [state-dump] WaitManager: [state-dump] - num active wait requests: 0 [state-dump] Subscriber: [state-dump] Channel WORKER_OBJECT_LOCATIONS_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_OBJECT_EVICTION [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_REF_REMOVED_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] num async plasma notifications: 0 [state-dump] Remote node managers: [state-dump] Event stats: [state-dump] Global stats: 523629 total (35 active) [state-dump] Queueing time: mean = 198.562 us, max = 59.826 s, min = -0.001 s, total = 103.973 s [state-dump] Execution time: mean = 11.638 ms, total = 6094.087 s [state-dump] Event stats: [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog - 125958 total (0 active), Execution time: mean = 533.361 us, total = 67.181 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 125958 total (0 active), Execution time: mean = 35.560 us, total = 4.479 s, Queueing time: mean = 111.367 us, max = 3.003 ms, min = 1.846 us, total = 14.028 s [state-dump] RaySyncer.OnDemandBroadcasting - 59942 total (1 active), Execution time: mean = 10.927 us, total = 654.957 ms, Queueing time: mean = 92.355 us, max = 25.869 ms, min = 6.166 us, total = 5.536 s [state-dump] NodeManager.CheckGC - 59942 total (1 active), Execution time: mean = 3.887 us, total = 232.973 ms, Queueing time: mean = 98.553 us, max = 25.875 ms, min = -0.000 s, total = 5.907 s [state-dump] ObjectManager.UpdateAvailableMemory - 59942 total (0 active), Execution time: mean = 6.302 us, total = 377.746 ms, Queueing time: mean = 112.118 us, max = 45.939 ms, min = 2.228 us, total = 6.721 s [state-dump] RayletWorkerPool.deadline_timer.kill_idle_workers - 29988 total (1 active), Execution time: mean = 18.880 us, total = 566.181 ms, Queueing time: mean = 77.670 us, max = 26.386 ms, min = -0.001 s, total = 2.329 s [state-dump] MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 23951 total (1 active), Execution time: mean = 459.024 us, total = 10.994 s, Queueing time: mean = 76.153 us, max = 4.063 ms, min = -0.000 s, total = 1.824 s [state-dump] NodeManager.deadline_timer.flush_free_objects - 6000 total (1 active), Execution time: mean = 9.440 us, total = 56.642 ms, Queueing time: mean = 178.747 us, max = 2.947 ms, min = 3.811 us, total = 1.072 s [state-dump] NodeManager.ScheduleAndDispatchTasks - 6000 total (1 active), Execution time: mean = 17.206 us, total = 103.237 ms, Queueing time: mean = 75.761 us, max = 2.581 ms, min = 8.735 us, total = 454.564 ms [state-dump] NodeManager.deadline_timer.spill_objects_when_over_threshold - 6000 total (1 active), Execution time: mean = 3.156 us, total = 18.935 ms, Queueing time: mean = 183.011 us, max = 2.946 ms, min = 3.845 us, total = 1.098 s [state-dump] NodeManagerService.grpc_server.GetResourceLoad - 5998 total (0 active), Execution time: mean = 613.604 us, total = 3.680 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 5998 total (0 active), Execution time: mean = 97.630 us, total = 585.585 ms, Queueing time: mean = 113.647 us, max = 2.934 ms, min = 4.027 us, total = 681.654 ms [state-dump] ClusterResourceManager.ResetRemoteNodeView - 2001 total (1 active), Execution time: mean = 10.096 us, total = 20.202 ms, Queueing time: mean = 75.755 us, max = 442.307 us, min = 14.043 us, total = 151.586 ms [state-dump] NodeManager.deadline_timer.record_metrics - 1200 total (1 active), Execution time: mean = 584.617 us, total = 701.540 ms, Queueing time: mean = 345.274 us, max = 2.010 ms, min = 9.115 us, total = 414.329 ms [state-dump] NodeManager.GcsCheckAlive - 1200 total (1 active), Execution time: mean = 320.221 us, total = 384.265 ms, Queueing time: mean = 609.217 us, max = 2.567 ms, min = 6.690 us, total = 731.060 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 1200 total (0 active), Execution time: mean = 56.092 us, total = 67.310 ms, Queueing time: mean = 112.263 us, max = 4.779 ms, min = 11.561 us, total = 134.716 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 1200 total (0 active), Execution time: mean = 1.604 ms, total = 1.925 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.deadline_timer.debug_state_dump - 600 total (1 active), Execution time: mean = 1.786 ms, total = 1.071 s, Queueing time: mean = 72.086 us, max = 180.398 us, min = 11.609 us, total = 43.252 ms [state-dump] NodeManager.deadline_timer.print_event_loop_stats - 100 total (1 active, 1 running), Execution time: mean = 2.728 ms, total = 272.757 ms, Queueing time: mean = 77.218 us, max = 325.100 us, min = 9.635 us, total = 7.722 ms [state-dump] ClientConnection.async_read.ProcessMessageHeader - 95 total (21 active), Execution time: mean = 7.620 us, total = 723.861 us, Queueing time: mean = 660.815 ms, max = 59.826 s, min = 29.537 us, total = 62.777 s [state-dump] ClientConnection.async_read.ProcessMessage - 74 total (0 active), Execution time: mean = 752.373 us, total = 55.676 ms, Queueing time: mean = 18.898 us, max = 247.615 us, min = 4.453 us, total = 1.398 ms [state-dump] ClientConnection.async_write.DoAsyncWrites - 22 total (0 active), Execution time: mean = 2.129 us, total = 46.834 us, Queueing time: mean = 43.475 us, max = 264.443 us, min = 17.727 us, total = 956.459 us [state-dump] CoreWorkerService.grpc_client.LocalGC.OnReplyReceived - 21 total (0 active), Execution time: mean = 65.428 us, total = 1.374 ms, Queueing time: mean = 149.534 us, max = 223.485 us, min = 56.603 us, total = 3.140 ms [state-dump] ObjectManager.ObjectAdded - 21 total (0 active), Execution time: mean = 13.522 us, total = 283.961 us, Queueing time: mean = 115.200 us, max = 203.067 us, min = 15.853 us, total = 2.419 ms [state-dump] ObjectManager.ObjectDeleted - 21 total (0 active), Execution time: mean = 18.654 us, total = 391.728 us, Queueing time: mean = 66.932 us, max = 269.040 us, min = 24.280 us, total = 1.406 ms [state-dump] CoreWorkerService.grpc_client.LocalGC - 21 total (0 active), Execution time: mean = 93.349 ms, total = 1.960 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 21 total (0 active), Execution time: mean = 119.783 us, total = 2.515 ms, Queueing time: mean = 119.709 us, max = 193.975 us, min = 26.529 us, total = 2.514 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig - 21 total (0 active), Execution time: mean = 800.451 us, total = 16.809 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 204.482 us, total = 2.658 ms, Queueing time: mean = 3.109 ms, max = 9.915 ms, min = 28.359 us, total = 40.413 ms [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 12 total (1 active), Execution time: mean = 499.801 s, total = 5997.612 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] - 12 total (0 active), Execution time: mean = 1.109 us, total = 13.308 us, Queueing time: mean = 117.235 us, max = 207.332 us, min = 32.817 us, total = 1.407 ms [state-dump] RaySyncer.BroadcastMessage - 12 total (0 active), Execution time: mean = 229.790 us, total = 2.757 ms, Queueing time: mean = 796.500 ns, max = 1.330 us, min = 180.000 ns, total = 9.558 us [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 11 total (0 active), Execution time: mean = 377.030 us, total = 4.147 ms, Queueing time: mean = 127.582 us, max = 243.371 us, min = 20.299 us, total = 1.403 ms [state-dump] NodeManagerService.grpc_server.RequestWorkerLease - 10 total (0 active), Execution time: mean = 967.372 us, total = 9.674 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.RequestWorkerLease.HandleRequestImpl - 10 total (0 active), Execution time: mean = 133.400 us, total = 1.334 ms, Queueing time: mean = 133.654 us, max = 258.016 us, min = 50.206 us, total = 1.337 ms [state-dump] WorkerPool.PopWorkerCallback - 10 total (0 active), Execution time: mean = 24.488 us, total = 244.878 us, Queueing time: mean = 114.997 us, max = 206.869 us, min = 23.653 us, total = 1.150 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker - 10 total (0 active), Execution time: mean = 698.937 us, total = 6.989 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReturnWorker.HandleRequestImpl - 10 total (0 active), Execution time: mean = 129.906 us, total = 1.299 ms, Queueing time: mean = 109.463 us, max = 137.712 us, min = 31.332 us, total = 1.095 ms [state-dump] NodeManager.GCTaskFailureReason - 7 total (1 active), Execution time: mean = 7.313 us, total = 51.189 us, Queueing time: mean = 59.264 us, max = 97.290 us, min = 24.344 us, total = 414.847 us [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 1.320 ms, total = 2.640 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] RaySyncerRegister - 2 total (0 active), Execution time: mean = 2.053 us, total = 4.106 us, Queueing time: mean = 222.000 ns, max = 369.000 ns, min = 75.000 ns, total = 444.000 ns [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 142.119 us, total = 284.237 us, Queueing time: mean = 702.200 us, max = 1.287 ms, min = 117.240 us, total = 1.404 ms [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 92.080 us, total = 92.080 us, Queueing time: mean = 36.492 us, max = 36.492 us, min = 36.492 us, total = 36.492 us [state-dump] Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 74.750 us, total = 74.750 us, Queueing time: mean = 225.429 us, max = 225.429 us, min = 225.429 us, total = 225.429 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 1.746 ms, total = 1.746 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 1.935 ms, total = 1.935 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 1.022 s, total = 1.022 s, Queueing time: mean = 126.779 us, max = 126.779 us, min = 126.779 us, total = 126.779 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 2.178 ms, total = 2.178 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 43.810 us, total = 43.810 us, Queueing time: mean = 116.080 us, max = 116.080 us, min = 116.080 us, total = 116.080 us [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 2.002 ms, total = 2.002 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 1.278 ms, total = 1.278 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 288.995 us, total = 288.995 us, Queueing time: mean = 122.868 us, max = 122.868 us, min = 122.868 us, total = 122.868 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 242.297 us, total = 242.297 us, Queueing time: mean = 122.977 us, max = 122.977 us, min = 122.977 us, total = 122.977 us [state-dump] DebugString() time ms: 2 [state-dump] [state-dump] [2025-01-21 00:42:54,933 I 13636 13664] (raylet) store.cc:564: Plasma store debug dump: Current usage: 0 / 2.14748 GB - num bytes created total: 168 0 pending objects of total size 0MB - objects spillable: 0 - bytes spillable: 0 - objects unsealed: 0 - bytes unsealed: 0 - objects in use: 0 - bytes in use: 0 - objects evictable: 0 - bytes evictable: 0 - objects created by worker: 0 - bytes created by worker: 0 - objects restored: 0 - bytes restored: 0 - objects received: 0 - bytes received: 0 - objects errored: 0 - bytes errored: 0 [2025-01-21 00:42:56,210 I 13636 13636] (raylet) node_manager.cc:525: [state-dump] NodeManager: [state-dump] Node ID: 94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3 [state-dump] Node name: 192.168.0.2 [state-dump] InitialConfigResources: {object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, GPU: 20000, node:__internal_head__: 10000, accelerator_type:A40: 10000, memory: 844966629380000} [state-dump] ClusterTaskManager: [state-dump] ========== Node: 94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3 ================= [state-dump] Infeasible queue length: 0 [state-dump] Schedule queue length: 0 [state-dump] Dispatch queue length: 0 [state-dump] num_waiting_for_resource: 0 [state-dump] num_waiting_for_plasma_memory: 0 [state-dump] num_waiting_for_remote_node_resources: 0 [state-dump] num_worker_not_started_by_job_config_not_exist: 0 [state-dump] num_worker_not_started_by_registration_timeout: 0 [state-dump] num_tasks_waiting_for_workers: 0 [state-dump] num_cancelled_tasks: 0 [state-dump] cluster_resource_scheduler state: [state-dump] Local id: -2729954820020547849 Local resources: {"total":{CPU: [200000], memory: [844966629380000], accelerator_type:A40: [10000], node:192.168.0.2: [10000], node:__internal_head__: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000]}}, "available": {CPU: [200000], memory: [844966629380000], accelerator_type:A40: [10000], node:192.168.0.2: [10000], node:__internal_head__: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000]}}, "labels":{"ray.io/node_id":"94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3",} is_draining: 0 is_idle: 1 Cluster resources: node id: -2729954820020547849{"total":{object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, GPU: 20000, node:__internal_head__: 10000, accelerator_type:A40: 10000, memory: 844966629380000}}, "available": {GPU: 20000, node:__internal_head__: 10000, memory: 844966629380000, accelerator_type:A40: 10000, object_store_memory: 21474836480000, node:192.168.0.2: 10000, CPU: 200000}}, "labels":{"ray.io/node_id":"94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} [state-dump] Waiting tasks size: 0 [state-dump] Number of executing tasks: 0 [state-dump] Number of pinned task arguments: 0 [state-dump] Number of total spilled tasks: 0 [state-dump] Number of spilled waiting tasks: 0 [state-dump] Number of spilled unschedulable tasks: 0 [state-dump] Resource usage { [state-dump] } [state-dump] Backlog Size per scheduling descriptor :{workerId: num backlogs}: [state-dump] [state-dump] Running tasks by scheduling class: [state-dump] ================================================== [state-dump] [state-dump] ClusterResources: [state-dump] LocalObjectManager: [state-dump] - num pinned objects: 0 [state-dump] - pinned objects size: 0 [state-dump] - num objects pending restore: 0 [state-dump] - num objects pending spill: 0 [state-dump] - num bytes pending spill: 0 [state-dump] - num bytes currently spilled: 0 [state-dump] - cumulative spill requests: 0 [state-dump] - cumulative restore requests: 0 [state-dump] - spilled objects pending delete: 0 [state-dump] [state-dump] ObjectManager: [state-dump] - num local objects: 0 [state-dump] - num unfulfilled push requests: 0 [state-dump] - num object pull requests: 0 [state-dump] - num chunks received total: 0 [state-dump] - num chunks received failed (all): 0 [state-dump] - num chunks received failed / cancelled: 0 [state-dump] - num chunks received failed / plasma error: 0 [state-dump] Event stats: [state-dump] Global stats: 0 total (0 active) [state-dump] Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Execution time: mean = -nan s, total = 0.000 s [state-dump] Event stats: [state-dump] PushManager: [state-dump] - num pushes in flight: 0 [state-dump] - num chunks in flight: 0 [state-dump] - num chunks remaining: 0 [state-dump] - max chunks allowed: 409 [state-dump] OwnershipBasedObjectDirectory: [state-dump] - num listeners: 0 [state-dump] - cumulative location updates: 0 [state-dump] - num location updates per second: 0.000 [state-dump] - num location lookups per second: 0.000 [state-dump] - num locations added per second: 0.000 [state-dump] - num locations removed per second: 0.000 [state-dump] BufferPool: [state-dump] - create buffer state map size: 0 [state-dump] PullManager: [state-dump] - num bytes available for pulled objects: 2147483648 [state-dump] - num bytes being pulled (all): 0 [state-dump] - num bytes being pulled / pinned: 0 [state-dump] - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - first get request bundle: N/A [state-dump] - first wait request bundle: N/A [state-dump] - first task request bundle: N/A [state-dump] - num objects queued: 0 [state-dump] - num objects actively pulled (all): 0 [state-dump] - num objects actively pulled / pinned: 0 [state-dump] - num bundles being pulled: 0 [state-dump] - num pull retries: 0 [state-dump] - max timeout seconds: 0 [state-dump] - max timeout request is already processed. No entry. [state-dump] [state-dump] WorkerPool: [state-dump] - registered jobs: 1 [state-dump] - process_failed_job_config_missing: 0 [state-dump] - process_failed_rate_limited: 0 [state-dump] - process_failed_pending_registration: 0 [state-dump] - process_failed_runtime_env_setup_failed: 0 [state-dump] - num PYTHON workers: 20 [state-dump] - num PYTHON drivers: 1 [state-dump] - num PYTHON pending start requests: 0 [state-dump] - num PYTHON pending registration requests: 0 [state-dump] - num object spill callbacks queued: 0 [state-dump] - num object restore queued: 0 [state-dump] - num util functions queued: 0 [state-dump] - num idle workers: 20 [state-dump] TaskDependencyManager: [state-dump] - task deps map size: 0 [state-dump] - get req map size: 0 [state-dump] - wait req map size: 0 [state-dump] - local objects map size: 0 [state-dump] WaitManager: [state-dump] - num active wait requests: 0 [state-dump] Subscriber: [state-dump] Channel WORKER_OBJECT_LOCATIONS_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_OBJECT_EVICTION [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_REF_REMOVED_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] num async plasma notifications: 0 [state-dump] Remote node managers: [state-dump] Event stats: [state-dump] Global stats: 528863 total (35 active) [state-dump] Queueing time: mean = 197.418 us, max = 59.826 s, min = -0.001 s, total = 104.407 s [state-dump] Execution time: mean = 11.525 ms, total = 6095.039 s [state-dump] Event stats: [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog - 127218 total (0 active), Execution time: mean = 533.453 us, total = 67.865 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 127218 total (0 active), Execution time: mean = 35.567 us, total = 4.525 s, Queueing time: mean = 111.445 us, max = 3.003 ms, min = 1.846 us, total = 14.178 s [state-dump] RaySyncer.OnDemandBroadcasting - 60542 total (1 active), Execution time: mean = 10.934 us, total = 661.976 ms, Queueing time: mean = 92.383 us, max = 25.869 ms, min = 6.166 us, total = 5.593 s [state-dump] NodeManager.CheckGC - 60542 total (1 active), Execution time: mean = 3.879 us, total = 234.834 ms, Queueing time: mean = 98.595 us, max = 25.875 ms, min = -0.000 s, total = 5.969 s [state-dump] ObjectManager.UpdateAvailableMemory - 60542 total (0 active), Execution time: mean = 6.310 us, total = 381.993 ms, Queueing time: mean = 112.196 us, max = 45.939 ms, min = 2.228 us, total = 6.793 s [state-dump] RayletWorkerPool.deadline_timer.kill_idle_workers - 30288 total (1 active), Execution time: mean = 18.906 us, total = 572.624 ms, Queueing time: mean = 77.796 us, max = 26.386 ms, min = -0.001 s, total = 2.356 s [state-dump] MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 24190 total (1 active), Execution time: mean = 459.157 us, total = 11.107 s, Queueing time: mean = 76.180 us, max = 4.063 ms, min = -0.000 s, total = 1.843 s [state-dump] NodeManager.deadline_timer.flush_free_objects - 6060 total (1 active), Execution time: mean = 9.449 us, total = 57.258 ms, Queueing time: mean = 178.711 us, max = 2.947 ms, min = 3.811 us, total = 1.083 s [state-dump] NodeManager.ScheduleAndDispatchTasks - 6060 total (1 active), Execution time: mean = 17.225 us, total = 104.383 ms, Queueing time: mean = 75.772 us, max = 2.581 ms, min = 8.735 us, total = 459.179 ms [state-dump] NodeManager.deadline_timer.spill_objects_when_over_threshold - 6060 total (1 active), Execution time: mean = 3.159 us, total = 19.145 ms, Queueing time: mean = 182.979 us, max = 2.946 ms, min = 3.845 us, total = 1.109 s [state-dump] NodeManagerService.grpc_server.GetResourceLoad - 6058 total (0 active), Execution time: mean = 613.767 us, total = 3.718 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 6058 total (0 active), Execution time: mean = 97.681 us, total = 591.752 ms, Queueing time: mean = 113.692 us, max = 2.934 ms, min = 4.027 us, total = 688.748 ms [state-dump] ClusterResourceManager.ResetRemoteNodeView - 2021 total (1 active), Execution time: mean = 10.103 us, total = 20.418 ms, Queueing time: mean = 75.813 us, max = 442.307 us, min = 14.043 us, total = 153.218 ms [state-dump] NodeManager.deadline_timer.record_metrics - 1212 total (1 active), Execution time: mean = 584.115 us, total = 707.947 ms, Queueing time: mean = 345.570 us, max = 2.010 ms, min = 9.115 us, total = 418.831 ms [state-dump] NodeManager.GcsCheckAlive - 1212 total (1 active), Execution time: mean = 320.690 us, total = 388.677 ms, Queueing time: mean = 608.589 us, max = 2.567 ms, min = 6.690 us, total = 737.609 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 1212 total (0 active), Execution time: mean = 56.117 us, total = 68.014 ms, Queueing time: mean = 112.220 us, max = 4.779 ms, min = 11.561 us, total = 136.010 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 1212 total (0 active), Execution time: mean = 1.604 ms, total = 1.944 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.deadline_timer.debug_state_dump - 606 total (1 active), Execution time: mean = 1.785 ms, total = 1.082 s, Queueing time: mean = 72.152 us, max = 180.398 us, min = 11.609 us, total = 43.724 ms [state-dump] NodeManager.deadline_timer.print_event_loop_stats - 101 total (1 active, 1 running), Execution time: mean = 2.730 ms, total = 275.695 ms, Queueing time: mean = 76.770 us, max = 325.100 us, min = 9.635 us, total = 7.754 ms [state-dump] ClientConnection.async_read.ProcessMessageHeader - 95 total (21 active), Execution time: mean = 7.620 us, total = 723.861 us, Queueing time: mean = 660.815 ms, max = 59.826 s, min = 29.537 us, total = 62.777 s [state-dump] ClientConnection.async_read.ProcessMessage - 74 total (0 active), Execution time: mean = 752.373 us, total = 55.676 ms, Queueing time: mean = 18.898 us, max = 247.615 us, min = 4.453 us, total = 1.398 ms [state-dump] ClientConnection.async_write.DoAsyncWrites - 22 total (0 active), Execution time: mean = 2.129 us, total = 46.834 us, Queueing time: mean = 43.475 us, max = 264.443 us, min = 17.727 us, total = 956.459 us [state-dump] CoreWorkerService.grpc_client.LocalGC.OnReplyReceived - 21 total (0 active), Execution time: mean = 65.428 us, total = 1.374 ms, Queueing time: mean = 149.534 us, max = 223.485 us, min = 56.603 us, total = 3.140 ms [state-dump] ObjectManager.ObjectAdded - 21 total (0 active), Execution time: mean = 13.522 us, total = 283.961 us, Queueing time: mean = 115.200 us, max = 203.067 us, min = 15.853 us, total = 2.419 ms [state-dump] ObjectManager.ObjectDeleted - 21 total (0 active), Execution time: mean = 18.654 us, total = 391.728 us, Queueing time: mean = 66.932 us, max = 269.040 us, min = 24.280 us, total = 1.406 ms [state-dump] CoreWorkerService.grpc_client.LocalGC - 21 total (0 active), Execution time: mean = 93.349 ms, total = 1.960 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 21 total (0 active), Execution time: mean = 119.783 us, total = 2.515 ms, Queueing time: mean = 119.709 us, max = 193.975 us, min = 26.529 us, total = 2.514 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig - 21 total (0 active), Execution time: mean = 800.451 us, total = 16.809 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 204.482 us, total = 2.658 ms, Queueing time: mean = 3.109 ms, max = 9.915 ms, min = 28.359 us, total = 40.413 ms [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 12 total (1 active), Execution time: mean = 499.801 s, total = 5997.612 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] - 12 total (0 active), Execution time: mean = 1.109 us, total = 13.308 us, Queueing time: mean = 117.235 us, max = 207.332 us, min = 32.817 us, total = 1.407 ms [state-dump] RaySyncer.BroadcastMessage - 12 total (0 active), Execution time: mean = 229.790 us, total = 2.757 ms, Queueing time: mean = 796.500 ns, max = 1.330 us, min = 180.000 ns, total = 9.558 us [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 11 total (0 active), Execution time: mean = 377.030 us, total = 4.147 ms, Queueing time: mean = 127.582 us, max = 243.371 us, min = 20.299 us, total = 1.403 ms [state-dump] NodeManagerService.grpc_server.RequestWorkerLease - 10 total (0 active), Execution time: mean = 967.372 us, total = 9.674 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.RequestWorkerLease.HandleRequestImpl - 10 total (0 active), Execution time: mean = 133.400 us, total = 1.334 ms, Queueing time: mean = 133.654 us, max = 258.016 us, min = 50.206 us, total = 1.337 ms [state-dump] WorkerPool.PopWorkerCallback - 10 total (0 active), Execution time: mean = 24.488 us, total = 244.878 us, Queueing time: mean = 114.997 us, max = 206.869 us, min = 23.653 us, total = 1.150 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker - 10 total (0 active), Execution time: mean = 698.937 us, total = 6.989 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReturnWorker.HandleRequestImpl - 10 total (0 active), Execution time: mean = 129.906 us, total = 1.299 ms, Queueing time: mean = 109.463 us, max = 137.712 us, min = 31.332 us, total = 1.095 ms [state-dump] NodeManager.GCTaskFailureReason - 7 total (1 active), Execution time: mean = 7.313 us, total = 51.189 us, Queueing time: mean = 59.264 us, max = 97.290 us, min = 24.344 us, total = 414.847 us [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 1.320 ms, total = 2.640 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] RaySyncerRegister - 2 total (0 active), Execution time: mean = 2.053 us, total = 4.106 us, Queueing time: mean = 222.000 ns, max = 369.000 ns, min = 75.000 ns, total = 444.000 ns [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 142.119 us, total = 284.237 us, Queueing time: mean = 702.200 us, max = 1.287 ms, min = 117.240 us, total = 1.404 ms [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 92.080 us, total = 92.080 us, Queueing time: mean = 36.492 us, max = 36.492 us, min = 36.492 us, total = 36.492 us [state-dump] Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 74.750 us, total = 74.750 us, Queueing time: mean = 225.429 us, max = 225.429 us, min = 225.429 us, total = 225.429 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 1.746 ms, total = 1.746 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 1.935 ms, total = 1.935 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 1.022 s, total = 1.022 s, Queueing time: mean = 126.779 us, max = 126.779 us, min = 126.779 us, total = 126.779 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 2.178 ms, total = 2.178 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 43.810 us, total = 43.810 us, Queueing time: mean = 116.080 us, max = 116.080 us, min = 116.080 us, total = 116.080 us [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 2.002 ms, total = 2.002 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 1.278 ms, total = 1.278 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 288.995 us, total = 288.995 us, Queueing time: mean = 122.868 us, max = 122.868 us, min = 122.868 us, total = 122.868 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 242.297 us, total = 242.297 us, Queueing time: mean = 122.977 us, max = 122.977 us, min = 122.977 us, total = 122.977 us [state-dump] DebugString() time ms: 2 [state-dump] [state-dump] [2025-01-21 00:43:54,934 I 13636 13664] (raylet) store.cc:564: Plasma store debug dump: Current usage: 0 / 2.14748 GB - num bytes created total: 168 0 pending objects of total size 0MB - objects spillable: 0 - bytes spillable: 0 - objects unsealed: 0 - bytes unsealed: 0 - objects in use: 0 - bytes in use: 0 - objects evictable: 0 - bytes evictable: 0 - objects created by worker: 0 - bytes created by worker: 0 - objects restored: 0 - bytes restored: 0 - objects received: 0 - bytes received: 0 - objects errored: 0 - bytes errored: 0 [2025-01-21 00:43:56,213 I 13636 13636] (raylet) node_manager.cc:525: [state-dump] NodeManager: [state-dump] Node ID: 94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3 [state-dump] Node name: 192.168.0.2 [state-dump] InitialConfigResources: {object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, GPU: 20000, node:__internal_head__: 10000, accelerator_type:A40: 10000, memory: 844966629380000} [state-dump] ClusterTaskManager: [state-dump] ========== Node: 94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3 ================= [state-dump] Infeasible queue length: 0 [state-dump] Schedule queue length: 0 [state-dump] Dispatch queue length: 0 [state-dump] num_waiting_for_resource: 0 [state-dump] num_waiting_for_plasma_memory: 0 [state-dump] num_waiting_for_remote_node_resources: 0 [state-dump] num_worker_not_started_by_job_config_not_exist: 0 [state-dump] num_worker_not_started_by_registration_timeout: 0 [state-dump] num_tasks_waiting_for_workers: 0 [state-dump] num_cancelled_tasks: 0 [state-dump] cluster_resource_scheduler state: [state-dump] Local id: -2729954820020547849 Local resources: {"total":{CPU: [200000], memory: [844966629380000], accelerator_type:A40: [10000], node:192.168.0.2: [10000], node:__internal_head__: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000]}}, "available": {CPU: [200000], memory: [844966629380000], accelerator_type:A40: [10000], node:192.168.0.2: [10000], node:__internal_head__: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000]}}, "labels":{"ray.io/node_id":"94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3",} is_draining: 0 is_idle: 1 Cluster resources: node id: -2729954820020547849{"total":{object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, GPU: 20000, node:__internal_head__: 10000, accelerator_type:A40: 10000, memory: 844966629380000}}, "available": {GPU: 20000, node:__internal_head__: 10000, memory: 844966629380000, accelerator_type:A40: 10000, object_store_memory: 21474836480000, node:192.168.0.2: 10000, CPU: 200000}}, "labels":{"ray.io/node_id":"94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} [state-dump] Waiting tasks size: 0 [state-dump] Number of executing tasks: 0 [state-dump] Number of pinned task arguments: 0 [state-dump] Number of total spilled tasks: 0 [state-dump] Number of spilled waiting tasks: 0 [state-dump] Number of spilled unschedulable tasks: 0 [state-dump] Resource usage { [state-dump] } [state-dump] Backlog Size per scheduling descriptor :{workerId: num backlogs}: [state-dump] [state-dump] Running tasks by scheduling class: [state-dump] ================================================== [state-dump] [state-dump] ClusterResources: [state-dump] LocalObjectManager: [state-dump] - num pinned objects: 0 [state-dump] - pinned objects size: 0 [state-dump] - num objects pending restore: 0 [state-dump] - num objects pending spill: 0 [state-dump] - num bytes pending spill: 0 [state-dump] - num bytes currently spilled: 0 [state-dump] - cumulative spill requests: 0 [state-dump] - cumulative restore requests: 0 [state-dump] - spilled objects pending delete: 0 [state-dump] [state-dump] ObjectManager: [state-dump] - num local objects: 0 [state-dump] - num unfulfilled push requests: 0 [state-dump] - num object pull requests: 0 [state-dump] - num chunks received total: 0 [state-dump] - num chunks received failed (all): 0 [state-dump] - num chunks received failed / cancelled: 0 [state-dump] - num chunks received failed / plasma error: 0 [state-dump] Event stats: [state-dump] Global stats: 0 total (0 active) [state-dump] Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Execution time: mean = -nan s, total = 0.000 s [state-dump] Event stats: [state-dump] PushManager: [state-dump] - num pushes in flight: 0 [state-dump] - num chunks in flight: 0 [state-dump] - num chunks remaining: 0 [state-dump] - max chunks allowed: 409 [state-dump] OwnershipBasedObjectDirectory: [state-dump] - num listeners: 0 [state-dump] - cumulative location updates: 0 [state-dump] - num location updates per second: 0.000 [state-dump] - num location lookups per second: 0.000 [state-dump] - num locations added per second: 0.000 [state-dump] - num locations removed per second: 0.000 [state-dump] BufferPool: [state-dump] - create buffer state map size: 0 [state-dump] PullManager: [state-dump] - num bytes available for pulled objects: 2147483648 [state-dump] - num bytes being pulled (all): 0 [state-dump] - num bytes being pulled / pinned: 0 [state-dump] - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - first get request bundle: N/A [state-dump] - first wait request bundle: N/A [state-dump] - first task request bundle: N/A [state-dump] - num objects queued: 0 [state-dump] - num objects actively pulled (all): 0 [state-dump] - num objects actively pulled / pinned: 0 [state-dump] - num bundles being pulled: 0 [state-dump] - num pull retries: 0 [state-dump] - max timeout seconds: 0 [state-dump] - max timeout request is already processed. No entry. [state-dump] [state-dump] WorkerPool: [state-dump] - registered jobs: 1 [state-dump] - process_failed_job_config_missing: 0 [state-dump] - process_failed_rate_limited: 0 [state-dump] - process_failed_pending_registration: 0 [state-dump] - process_failed_runtime_env_setup_failed: 0 [state-dump] - num PYTHON workers: 20 [state-dump] - num PYTHON drivers: 1 [state-dump] - num PYTHON pending start requests: 0 [state-dump] - num PYTHON pending registration requests: 0 [state-dump] - num object spill callbacks queued: 0 [state-dump] - num object restore queued: 0 [state-dump] - num util functions queued: 0 [state-dump] - num idle workers: 20 [state-dump] TaskDependencyManager: [state-dump] - task deps map size: 0 [state-dump] - get req map size: 0 [state-dump] - wait req map size: 0 [state-dump] - local objects map size: 0 [state-dump] WaitManager: [state-dump] - num active wait requests: 0 [state-dump] Subscriber: [state-dump] Channel WORKER_OBJECT_LOCATIONS_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_OBJECT_EVICTION [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_REF_REMOVED_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] num async plasma notifications: 0 [state-dump] Remote node managers: [state-dump] Event stats: [state-dump] Global stats: 534095 total (35 active) [state-dump] Queueing time: mean = 196.317 us, max = 59.826 s, min = -0.001 s, total = 104.852 s [state-dump] Execution time: mean = 11.414 ms, total = 6095.981 s [state-dump] Event stats: [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog - 128478 total (0 active), Execution time: mean = 533.427 us, total = 68.534 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 128478 total (0 active), Execution time: mean = 35.578 us, total = 4.571 s, Queueing time: mean = 111.482 us, max = 3.003 ms, min = 1.846 us, total = 14.323 s [state-dump] RaySyncer.OnDemandBroadcasting - 61141 total (1 active), Execution time: mean = 10.969 us, total = 670.642 ms, Queueing time: mean = 92.509 us, max = 25.869 ms, min = 6.166 us, total = 5.656 s [state-dump] NodeManager.CheckGC - 61141 total (1 active), Execution time: mean = 3.873 us, total = 236.796 ms, Queueing time: mean = 98.759 us, max = 25.875 ms, min = -0.000 s, total = 6.038 s [state-dump] ObjectManager.UpdateAvailableMemory - 61141 total (0 active), Execution time: mean = 6.319 us, total = 386.333 ms, Queueing time: mean = 112.173 us, max = 45.939 ms, min = 2.228 us, total = 6.858 s [state-dump] RayletWorkerPool.deadline_timer.kill_idle_workers - 30588 total (1 active), Execution time: mean = 18.956 us, total = 579.837 ms, Queueing time: mean = 78.002 us, max = 26.386 ms, min = -0.001 s, total = 2.386 s [state-dump] MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 24430 total (1 active), Execution time: mean = 459.151 us, total = 11.217 s, Queueing time: mean = 76.276 us, max = 4.063 ms, min = -0.000 s, total = 1.863 s [state-dump] NodeManager.deadline_timer.flush_free_objects - 6120 total (1 active), Execution time: mean = 9.476 us, total = 57.991 ms, Queueing time: mean = 178.846 us, max = 2.947 ms, min = 3.811 us, total = 1.095 s [state-dump] NodeManager.ScheduleAndDispatchTasks - 6120 total (1 active), Execution time: mean = 17.293 us, total = 105.836 ms, Queueing time: mean = 75.965 us, max = 2.581 ms, min = 8.735 us, total = 464.908 ms [state-dump] NodeManager.deadline_timer.spill_objects_when_over_threshold - 6120 total (1 active), Execution time: mean = 3.162 us, total = 19.352 ms, Queueing time: mean = 183.128 us, max = 2.946 ms, min = 3.845 us, total = 1.121 s [state-dump] NodeManagerService.grpc_server.GetResourceLoad - 6118 total (0 active), Execution time: mean = 614.098 us, total = 3.757 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 6118 total (0 active), Execution time: mean = 97.736 us, total = 597.950 ms, Queueing time: mean = 113.615 us, max = 2.934 ms, min = 4.027 us, total = 695.096 ms [state-dump] ClusterResourceManager.ResetRemoteNodeView - 2041 total (1 active), Execution time: mean = 10.140 us, total = 20.695 ms, Queueing time: mean = 75.813 us, max = 442.307 us, min = 14.043 us, total = 154.734 ms [state-dump] NodeManager.deadline_timer.record_metrics - 1224 total (1 active), Execution time: mean = 584.708 us, total = 715.682 ms, Queueing time: mean = 345.773 us, max = 2.010 ms, min = 9.115 us, total = 423.226 ms [state-dump] NodeManager.GcsCheckAlive - 1224 total (1 active), Execution time: mean = 320.985 us, total = 392.886 ms, Queueing time: mean = 609.051 us, max = 2.567 ms, min = 6.690 us, total = 745.479 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 1224 total (0 active), Execution time: mean = 56.178 us, total = 68.762 ms, Queueing time: mean = 112.417 us, max = 4.779 ms, min = 11.561 us, total = 137.598 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 1224 total (0 active), Execution time: mean = 1.605 ms, total = 1.964 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.deadline_timer.debug_state_dump - 612 total (1 active), Execution time: mean = 1.786 ms, total = 1.093 s, Queueing time: mean = 72.402 us, max = 248.460 us, min = 11.609 us, total = 44.310 ms [state-dump] NodeManager.deadline_timer.print_event_loop_stats - 102 total (1 active, 1 running), Execution time: mean = 2.729 ms, total = 278.378 ms, Queueing time: mean = 76.844 us, max = 325.100 us, min = 9.635 us, total = 7.838 ms [state-dump] ClientConnection.async_read.ProcessMessageHeader - 95 total (21 active), Execution time: mean = 7.620 us, total = 723.861 us, Queueing time: mean = 660.815 ms, max = 59.826 s, min = 29.537 us, total = 62.777 s [state-dump] ClientConnection.async_read.ProcessMessage - 74 total (0 active), Execution time: mean = 752.373 us, total = 55.676 ms, Queueing time: mean = 18.898 us, max = 247.615 us, min = 4.453 us, total = 1.398 ms [state-dump] ClientConnection.async_write.DoAsyncWrites - 22 total (0 active), Execution time: mean = 2.129 us, total = 46.834 us, Queueing time: mean = 43.475 us, max = 264.443 us, min = 17.727 us, total = 956.459 us [state-dump] CoreWorkerService.grpc_client.LocalGC.OnReplyReceived - 21 total (0 active), Execution time: mean = 65.428 us, total = 1.374 ms, Queueing time: mean = 149.534 us, max = 223.485 us, min = 56.603 us, total = 3.140 ms [state-dump] ObjectManager.ObjectAdded - 21 total (0 active), Execution time: mean = 13.522 us, total = 283.961 us, Queueing time: mean = 115.200 us, max = 203.067 us, min = 15.853 us, total = 2.419 ms [state-dump] ObjectManager.ObjectDeleted - 21 total (0 active), Execution time: mean = 18.654 us, total = 391.728 us, Queueing time: mean = 66.932 us, max = 269.040 us, min = 24.280 us, total = 1.406 ms [state-dump] CoreWorkerService.grpc_client.LocalGC - 21 total (0 active), Execution time: mean = 93.349 ms, total = 1.960 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 21 total (0 active), Execution time: mean = 119.783 us, total = 2.515 ms, Queueing time: mean = 119.709 us, max = 193.975 us, min = 26.529 us, total = 2.514 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig - 21 total (0 active), Execution time: mean = 800.451 us, total = 16.809 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 204.482 us, total = 2.658 ms, Queueing time: mean = 3.109 ms, max = 9.915 ms, min = 28.359 us, total = 40.413 ms [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 12 total (1 active), Execution time: mean = 499.801 s, total = 5997.612 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] - 12 total (0 active), Execution time: mean = 1.109 us, total = 13.308 us, Queueing time: mean = 117.235 us, max = 207.332 us, min = 32.817 us, total = 1.407 ms [state-dump] RaySyncer.BroadcastMessage - 12 total (0 active), Execution time: mean = 229.790 us, total = 2.757 ms, Queueing time: mean = 796.500 ns, max = 1.330 us, min = 180.000 ns, total = 9.558 us [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 11 total (0 active), Execution time: mean = 377.030 us, total = 4.147 ms, Queueing time: mean = 127.582 us, max = 243.371 us, min = 20.299 us, total = 1.403 ms [state-dump] NodeManagerService.grpc_server.RequestWorkerLease - 10 total (0 active), Execution time: mean = 967.372 us, total = 9.674 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.RequestWorkerLease.HandleRequestImpl - 10 total (0 active), Execution time: mean = 133.400 us, total = 1.334 ms, Queueing time: mean = 133.654 us, max = 258.016 us, min = 50.206 us, total = 1.337 ms [state-dump] WorkerPool.PopWorkerCallback - 10 total (0 active), Execution time: mean = 24.488 us, total = 244.878 us, Queueing time: mean = 114.997 us, max = 206.869 us, min = 23.653 us, total = 1.150 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker - 10 total (0 active), Execution time: mean = 698.937 us, total = 6.989 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReturnWorker.HandleRequestImpl - 10 total (0 active), Execution time: mean = 129.906 us, total = 1.299 ms, Queueing time: mean = 109.463 us, max = 137.712 us, min = 31.332 us, total = 1.095 ms [state-dump] NodeManager.GCTaskFailureReason - 7 total (1 active), Execution time: mean = 7.313 us, total = 51.189 us, Queueing time: mean = 59.264 us, max = 97.290 us, min = 24.344 us, total = 414.847 us [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 1.320 ms, total = 2.640 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] RaySyncerRegister - 2 total (0 active), Execution time: mean = 2.053 us, total = 4.106 us, Queueing time: mean = 222.000 ns, max = 369.000 ns, min = 75.000 ns, total = 444.000 ns [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 142.119 us, total = 284.237 us, Queueing time: mean = 702.200 us, max = 1.287 ms, min = 117.240 us, total = 1.404 ms [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 92.080 us, total = 92.080 us, Queueing time: mean = 36.492 us, max = 36.492 us, min = 36.492 us, total = 36.492 us [state-dump] Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 74.750 us, total = 74.750 us, Queueing time: mean = 225.429 us, max = 225.429 us, min = 225.429 us, total = 225.429 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 1.746 ms, total = 1.746 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 1.935 ms, total = 1.935 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 1.022 s, total = 1.022 s, Queueing time: mean = 126.779 us, max = 126.779 us, min = 126.779 us, total = 126.779 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 2.178 ms, total = 2.178 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 43.810 us, total = 43.810 us, Queueing time: mean = 116.080 us, max = 116.080 us, min = 116.080 us, total = 116.080 us [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 2.002 ms, total = 2.002 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 1.278 ms, total = 1.278 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 288.995 us, total = 288.995 us, Queueing time: mean = 122.868 us, max = 122.868 us, min = 122.868 us, total = 122.868 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 242.297 us, total = 242.297 us, Queueing time: mean = 122.977 us, max = 122.977 us, min = 122.977 us, total = 122.977 us [state-dump] DebugString() time ms: 1 [state-dump] [state-dump] [2025-01-21 00:44:54,934 I 13636 13664] (raylet) store.cc:564: Plasma store debug dump: Current usage: 0 / 2.14748 GB - num bytes created total: 168 0 pending objects of total size 0MB - objects spillable: 0 - bytes spillable: 0 - objects unsealed: 0 - bytes unsealed: 0 - objects in use: 0 - bytes in use: 0 - objects evictable: 0 - bytes evictable: 0 - objects created by worker: 0 - bytes created by worker: 0 - objects restored: 0 - bytes restored: 0 - objects received: 0 - bytes received: 0 - objects errored: 0 - bytes errored: 0 [2025-01-21 00:44:56,216 I 13636 13636] (raylet) node_manager.cc:525: [state-dump] NodeManager: [state-dump] Node ID: 94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3 [state-dump] Node name: 192.168.0.2 [state-dump] InitialConfigResources: {object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, GPU: 20000, node:__internal_head__: 10000, accelerator_type:A40: 10000, memory: 844966629380000} [state-dump] ClusterTaskManager: [state-dump] ========== Node: 94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3 ================= [state-dump] Infeasible queue length: 0 [state-dump] Schedule queue length: 0 [state-dump] Dispatch queue length: 0 [state-dump] num_waiting_for_resource: 0 [state-dump] num_waiting_for_plasma_memory: 0 [state-dump] num_waiting_for_remote_node_resources: 0 [state-dump] num_worker_not_started_by_job_config_not_exist: 0 [state-dump] num_worker_not_started_by_registration_timeout: 0 [state-dump] num_tasks_waiting_for_workers: 0 [state-dump] num_cancelled_tasks: 0 [state-dump] cluster_resource_scheduler state: [state-dump] Local id: -2729954820020547849 Local resources: {"total":{CPU: [200000], memory: [844966629380000], accelerator_type:A40: [10000], node:192.168.0.2: [10000], node:__internal_head__: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000]}}, "available": {CPU: [200000], memory: [844966629380000], accelerator_type:A40: [10000], node:192.168.0.2: [10000], node:__internal_head__: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000]}}, "labels":{"ray.io/node_id":"94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3",} is_draining: 0 is_idle: 1 Cluster resources: node id: -2729954820020547849{"total":{object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, GPU: 20000, node:__internal_head__: 10000, accelerator_type:A40: 10000, memory: 844966629380000}}, "available": {GPU: 20000, node:__internal_head__: 10000, memory: 844966629380000, accelerator_type:A40: 10000, object_store_memory: 21474836480000, node:192.168.0.2: 10000, CPU: 200000}}, "labels":{"ray.io/node_id":"94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} [state-dump] Waiting tasks size: 0 [state-dump] Number of executing tasks: 0 [state-dump] Number of pinned task arguments: 0 [state-dump] Number of total spilled tasks: 0 [state-dump] Number of spilled waiting tasks: 0 [state-dump] Number of spilled unschedulable tasks: 0 [state-dump] Resource usage { [state-dump] } [state-dump] Backlog Size per scheduling descriptor :{workerId: num backlogs}: [state-dump] [state-dump] Running tasks by scheduling class: [state-dump] ================================================== [state-dump] [state-dump] ClusterResources: [state-dump] LocalObjectManager: [state-dump] - num pinned objects: 0 [state-dump] - pinned objects size: 0 [state-dump] - num objects pending restore: 0 [state-dump] - num objects pending spill: 0 [state-dump] - num bytes pending spill: 0 [state-dump] - num bytes currently spilled: 0 [state-dump] - cumulative spill requests: 0 [state-dump] - cumulative restore requests: 0 [state-dump] - spilled objects pending delete: 0 [state-dump] [state-dump] ObjectManager: [state-dump] - num local objects: 0 [state-dump] - num unfulfilled push requests: 0 [state-dump] - num object pull requests: 0 [state-dump] - num chunks received total: 0 [state-dump] - num chunks received failed (all): 0 [state-dump] - num chunks received failed / cancelled: 0 [state-dump] - num chunks received failed / plasma error: 0 [state-dump] Event stats: [state-dump] Global stats: 0 total (0 active) [state-dump] Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Execution time: mean = -nan s, total = 0.000 s [state-dump] Event stats: [state-dump] PushManager: [state-dump] - num pushes in flight: 0 [state-dump] - num chunks in flight: 0 [state-dump] - num chunks remaining: 0 [state-dump] - max chunks allowed: 409 [state-dump] OwnershipBasedObjectDirectory: [state-dump] - num listeners: 0 [state-dump] - cumulative location updates: 0 [state-dump] - num location updates per second: 0.000 [state-dump] - num location lookups per second: 0.000 [state-dump] - num locations added per second: 0.000 [state-dump] - num locations removed per second: 0.000 [state-dump] BufferPool: [state-dump] - create buffer state map size: 0 [state-dump] PullManager: [state-dump] - num bytes available for pulled objects: 2147483648 [state-dump] - num bytes being pulled (all): 0 [state-dump] - num bytes being pulled / pinned: 0 [state-dump] - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - first get request bundle: N/A [state-dump] - first wait request bundle: N/A [state-dump] - first task request bundle: N/A [state-dump] - num objects queued: 0 [state-dump] - num objects actively pulled (all): 0 [state-dump] - num objects actively pulled / pinned: 0 [state-dump] - num bundles being pulled: 0 [state-dump] - num pull retries: 0 [state-dump] - max timeout seconds: 0 [state-dump] - max timeout request is already processed. No entry. [state-dump] [state-dump] WorkerPool: [state-dump] - registered jobs: 1 [state-dump] - process_failed_job_config_missing: 0 [state-dump] - process_failed_rate_limited: 0 [state-dump] - process_failed_pending_registration: 0 [state-dump] - process_failed_runtime_env_setup_failed: 0 [state-dump] - num PYTHON workers: 20 [state-dump] - num PYTHON drivers: 1 [state-dump] - num PYTHON pending start requests: 0 [state-dump] - num PYTHON pending registration requests: 0 [state-dump] - num object spill callbacks queued: 0 [state-dump] - num object restore queued: 0 [state-dump] - num util functions queued: 0 [state-dump] - num idle workers: 20 [state-dump] TaskDependencyManager: [state-dump] - task deps map size: 0 [state-dump] - get req map size: 0 [state-dump] - wait req map size: 0 [state-dump] - local objects map size: 0 [state-dump] WaitManager: [state-dump] - num active wait requests: 0 [state-dump] Subscriber: [state-dump] Channel WORKER_OBJECT_LOCATIONS_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_OBJECT_EVICTION [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_REF_REMOVED_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] num async plasma notifications: 0 [state-dump] Remote node managers: [state-dump] Event stats: [state-dump] Global stats: 539326 total (35 active) [state-dump] Queueing time: mean = 195.173 us, max = 59.826 s, min = -0.001 s, total = 105.262 s [state-dump] Execution time: mean = 11.305 ms, total = 6096.873 s [state-dump] Event stats: [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog - 129738 total (0 active), Execution time: mean = 533.099 us, total = 69.163 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 129738 total (0 active), Execution time: mean = 35.572 us, total = 4.615 s, Queueing time: mean = 111.437 us, max = 3.003 ms, min = 1.846 us, total = 14.458 s [state-dump] RaySyncer.OnDemandBroadcasting - 61740 total (1 active), Execution time: mean = 10.994 us, total = 678.798 ms, Queueing time: mean = 92.600 us, max = 25.869 ms, min = 6.166 us, total = 5.717 s [state-dump] NodeManager.CheckGC - 61740 total (1 active), Execution time: mean = 3.865 us, total = 238.624 ms, Queueing time: mean = 98.882 us, max = 25.875 ms, min = -0.000 s, total = 6.105 s [state-dump] ObjectManager.UpdateAvailableMemory - 61740 total (0 active), Execution time: mean = 6.325 us, total = 390.530 ms, Queueing time: mean = 112.084 us, max = 45.939 ms, min = 2.228 us, total = 6.920 s [state-dump] RayletWorkerPool.deadline_timer.kill_idle_workers - 30888 total (1 active), Execution time: mean = 18.984 us, total = 586.378 ms, Queueing time: mean = 77.988 us, max = 26.386 ms, min = -0.001 s, total = 2.409 s [state-dump] MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 24669 total (1 active), Execution time: mean = 459.184 us, total = 11.328 s, Queueing time: mean = 76.347 us, max = 4.063 ms, min = -0.000 s, total = 1.883 s [state-dump] NodeManager.deadline_timer.flush_free_objects - 6180 total (1 active), Execution time: mean = 9.507 us, total = 58.751 ms, Queueing time: mean = 178.650 us, max = 2.947 ms, min = 3.811 us, total = 1.104 s [state-dump] NodeManager.ScheduleAndDispatchTasks - 6180 total (1 active), Execution time: mean = 17.342 us, total = 107.174 ms, Queueing time: mean = 76.186 us, max = 2.581 ms, min = 8.735 us, total = 470.831 ms [state-dump] NodeManager.deadline_timer.spill_objects_when_over_threshold - 6180 total (1 active), Execution time: mean = 3.163 us, total = 19.549 ms, Queueing time: mean = 182.951 us, max = 2.946 ms, min = 3.845 us, total = 1.131 s [state-dump] NodeManagerService.grpc_server.GetResourceLoad - 6178 total (0 active), Execution time: mean = 613.852 us, total = 3.792 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 6178 total (0 active), Execution time: mean = 97.740 us, total = 603.839 ms, Queueing time: mean = 113.463 us, max = 2.934 ms, min = 4.027 us, total = 700.974 ms [state-dump] ClusterResourceManager.ResetRemoteNodeView - 2061 total (1 active), Execution time: mean = 10.161 us, total = 20.941 ms, Queueing time: mean = 75.770 us, max = 442.307 us, min = 14.043 us, total = 156.161 ms [state-dump] NodeManager.deadline_timer.record_metrics - 1236 total (1 active), Execution time: mean = 584.944 us, total = 722.990 ms, Queueing time: mean = 344.514 us, max = 2.010 ms, min = 9.115 us, total = 425.819 ms [state-dump] NodeManager.GcsCheckAlive - 1236 total (1 active), Execution time: mean = 321.266 us, total = 397.084 ms, Queueing time: mean = 607.739 us, max = 2.567 ms, min = 6.690 us, total = 751.165 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 1236 total (0 active), Execution time: mean = 56.148 us, total = 69.399 ms, Queueing time: mean = 112.404 us, max = 4.779 ms, min = 11.561 us, total = 138.931 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 1236 total (0 active), Execution time: mean = 1.604 ms, total = 1.983 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.deadline_timer.debug_state_dump - 618 total (1 active), Execution time: mean = 1.784 ms, total = 1.103 s, Queueing time: mean = 72.354 us, max = 248.460 us, min = 11.609 us, total = 44.715 ms [state-dump] NodeManager.deadline_timer.print_event_loop_stats - 103 total (1 active, 1 running), Execution time: mean = 2.729 ms, total = 281.045 ms, Queueing time: mean = 77.011 us, max = 325.100 us, min = 9.635 us, total = 7.932 ms [state-dump] ClientConnection.async_read.ProcessMessageHeader - 95 total (21 active), Execution time: mean = 7.620 us, total = 723.861 us, Queueing time: mean = 660.815 ms, max = 59.826 s, min = 29.537 us, total = 62.777 s [state-dump] ClientConnection.async_read.ProcessMessage - 74 total (0 active), Execution time: mean = 752.373 us, total = 55.676 ms, Queueing time: mean = 18.898 us, max = 247.615 us, min = 4.453 us, total = 1.398 ms [state-dump] ClientConnection.async_write.DoAsyncWrites - 22 total (0 active), Execution time: mean = 2.129 us, total = 46.834 us, Queueing time: mean = 43.475 us, max = 264.443 us, min = 17.727 us, total = 956.459 us [state-dump] CoreWorkerService.grpc_client.LocalGC.OnReplyReceived - 21 total (0 active), Execution time: mean = 65.428 us, total = 1.374 ms, Queueing time: mean = 149.534 us, max = 223.485 us, min = 56.603 us, total = 3.140 ms [state-dump] ObjectManager.ObjectAdded - 21 total (0 active), Execution time: mean = 13.522 us, total = 283.961 us, Queueing time: mean = 115.200 us, max = 203.067 us, min = 15.853 us, total = 2.419 ms [state-dump] ObjectManager.ObjectDeleted - 21 total (0 active), Execution time: mean = 18.654 us, total = 391.728 us, Queueing time: mean = 66.932 us, max = 269.040 us, min = 24.280 us, total = 1.406 ms [state-dump] CoreWorkerService.grpc_client.LocalGC - 21 total (0 active), Execution time: mean = 93.349 ms, total = 1.960 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 21 total (0 active), Execution time: mean = 119.783 us, total = 2.515 ms, Queueing time: mean = 119.709 us, max = 193.975 us, min = 26.529 us, total = 2.514 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig - 21 total (0 active), Execution time: mean = 800.451 us, total = 16.809 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 204.482 us, total = 2.658 ms, Queueing time: mean = 3.109 ms, max = 9.915 ms, min = 28.359 us, total = 40.413 ms [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 12 total (1 active), Execution time: mean = 499.801 s, total = 5997.612 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] - 12 total (0 active), Execution time: mean = 1.109 us, total = 13.308 us, Queueing time: mean = 117.235 us, max = 207.332 us, min = 32.817 us, total = 1.407 ms [state-dump] RaySyncer.BroadcastMessage - 12 total (0 active), Execution time: mean = 229.790 us, total = 2.757 ms, Queueing time: mean = 796.500 ns, max = 1.330 us, min = 180.000 ns, total = 9.558 us [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 11 total (0 active), Execution time: mean = 377.030 us, total = 4.147 ms, Queueing time: mean = 127.582 us, max = 243.371 us, min = 20.299 us, total = 1.403 ms [state-dump] NodeManagerService.grpc_server.RequestWorkerLease - 10 total (0 active), Execution time: mean = 967.372 us, total = 9.674 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.RequestWorkerLease.HandleRequestImpl - 10 total (0 active), Execution time: mean = 133.400 us, total = 1.334 ms, Queueing time: mean = 133.654 us, max = 258.016 us, min = 50.206 us, total = 1.337 ms [state-dump] WorkerPool.PopWorkerCallback - 10 total (0 active), Execution time: mean = 24.488 us, total = 244.878 us, Queueing time: mean = 114.997 us, max = 206.869 us, min = 23.653 us, total = 1.150 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker - 10 total (0 active), Execution time: mean = 698.937 us, total = 6.989 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReturnWorker.HandleRequestImpl - 10 total (0 active), Execution time: mean = 129.906 us, total = 1.299 ms, Queueing time: mean = 109.463 us, max = 137.712 us, min = 31.332 us, total = 1.095 ms [state-dump] NodeManager.GCTaskFailureReason - 7 total (1 active), Execution time: mean = 7.313 us, total = 51.189 us, Queueing time: mean = 59.264 us, max = 97.290 us, min = 24.344 us, total = 414.847 us [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 1.320 ms, total = 2.640 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] RaySyncerRegister - 2 total (0 active), Execution time: mean = 2.053 us, total = 4.106 us, Queueing time: mean = 222.000 ns, max = 369.000 ns, min = 75.000 ns, total = 444.000 ns [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 142.119 us, total = 284.237 us, Queueing time: mean = 702.200 us, max = 1.287 ms, min = 117.240 us, total = 1.404 ms [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 92.080 us, total = 92.080 us, Queueing time: mean = 36.492 us, max = 36.492 us, min = 36.492 us, total = 36.492 us [state-dump] Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 74.750 us, total = 74.750 us, Queueing time: mean = 225.429 us, max = 225.429 us, min = 225.429 us, total = 225.429 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 1.746 ms, total = 1.746 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 1.935 ms, total = 1.935 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 1.022 s, total = 1.022 s, Queueing time: mean = 126.779 us, max = 126.779 us, min = 126.779 us, total = 126.779 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 2.178 ms, total = 2.178 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 43.810 us, total = 43.810 us, Queueing time: mean = 116.080 us, max = 116.080 us, min = 116.080 us, total = 116.080 us [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 2.002 ms, total = 2.002 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 1.278 ms, total = 1.278 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 288.995 us, total = 288.995 us, Queueing time: mean = 122.868 us, max = 122.868 us, min = 122.868 us, total = 122.868 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 242.297 us, total = 242.297 us, Queueing time: mean = 122.977 us, max = 122.977 us, min = 122.977 us, total = 122.977 us [state-dump] DebugString() time ms: 1 [state-dump] [state-dump] [2025-01-21 00:45:54,934 I 13636 13664] (raylet) store.cc:564: Plasma store debug dump: Current usage: 0 / 2.14748 GB - num bytes created total: 168 0 pending objects of total size 0MB - objects spillable: 0 - bytes spillable: 0 - objects unsealed: 0 - bytes unsealed: 0 - objects in use: 0 - bytes in use: 0 - objects evictable: 0 - bytes evictable: 0 - objects created by worker: 0 - bytes created by worker: 0 - objects restored: 0 - bytes restored: 0 - objects received: 0 - bytes received: 0 - objects errored: 0 - bytes errored: 0 [2025-01-21 00:45:56,219 I 13636 13636] (raylet) node_manager.cc:525: [state-dump] NodeManager: [state-dump] Node ID: 94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3 [state-dump] Node name: 192.168.0.2 [state-dump] InitialConfigResources: {object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, GPU: 20000, node:__internal_head__: 10000, accelerator_type:A40: 10000, memory: 844966629380000} [state-dump] ClusterTaskManager: [state-dump] ========== Node: 94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3 ================= [state-dump] Infeasible queue length: 0 [state-dump] Schedule queue length: 0 [state-dump] Dispatch queue length: 0 [state-dump] num_waiting_for_resource: 0 [state-dump] num_waiting_for_plasma_memory: 0 [state-dump] num_waiting_for_remote_node_resources: 0 [state-dump] num_worker_not_started_by_job_config_not_exist: 0 [state-dump] num_worker_not_started_by_registration_timeout: 0 [state-dump] num_tasks_waiting_for_workers: 0 [state-dump] num_cancelled_tasks: 0 [state-dump] cluster_resource_scheduler state: [state-dump] Local id: -2729954820020547849 Local resources: {"total":{CPU: [200000], memory: [844966629380000], accelerator_type:A40: [10000], node:192.168.0.2: [10000], node:__internal_head__: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000]}}, "available": {CPU: [200000], memory: [844966629380000], accelerator_type:A40: [10000], node:192.168.0.2: [10000], node:__internal_head__: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000]}}, "labels":{"ray.io/node_id":"94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3",} is_draining: 0 is_idle: 1 Cluster resources: node id: -2729954820020547849{"total":{object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, GPU: 20000, node:__internal_head__: 10000, accelerator_type:A40: 10000, memory: 844966629380000}}, "available": {GPU: 20000, node:__internal_head__: 10000, memory: 844966629380000, accelerator_type:A40: 10000, object_store_memory: 21474836480000, node:192.168.0.2: 10000, CPU: 200000}}, "labels":{"ray.io/node_id":"94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} [state-dump] Waiting tasks size: 0 [state-dump] Number of executing tasks: 0 [state-dump] Number of pinned task arguments: 0 [state-dump] Number of total spilled tasks: 0 [state-dump] Number of spilled waiting tasks: 0 [state-dump] Number of spilled unschedulable tasks: 0 [state-dump] Resource usage { [state-dump] } [state-dump] Backlog Size per scheduling descriptor :{workerId: num backlogs}: [state-dump] [state-dump] Running tasks by scheduling class: [state-dump] ================================================== [state-dump] [state-dump] ClusterResources: [state-dump] LocalObjectManager: [state-dump] - num pinned objects: 0 [state-dump] - pinned objects size: 0 [state-dump] - num objects pending restore: 0 [state-dump] - num objects pending spill: 0 [state-dump] - num bytes pending spill: 0 [state-dump] - num bytes currently spilled: 0 [state-dump] - cumulative spill requests: 0 [state-dump] - cumulative restore requests: 0 [state-dump] - spilled objects pending delete: 0 [state-dump] [state-dump] ObjectManager: [state-dump] - num local objects: 0 [state-dump] - num unfulfilled push requests: 0 [state-dump] - num object pull requests: 0 [state-dump] - num chunks received total: 0 [state-dump] - num chunks received failed (all): 0 [state-dump] - num chunks received failed / cancelled: 0 [state-dump] - num chunks received failed / plasma error: 0 [state-dump] Event stats: [state-dump] Global stats: 0 total (0 active) [state-dump] Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Execution time: mean = -nan s, total = 0.000 s [state-dump] Event stats: [state-dump] PushManager: [state-dump] - num pushes in flight: 0 [state-dump] - num chunks in flight: 0 [state-dump] - num chunks remaining: 0 [state-dump] - max chunks allowed: 409 [state-dump] OwnershipBasedObjectDirectory: [state-dump] - num listeners: 0 [state-dump] - cumulative location updates: 0 [state-dump] - num location updates per second: 0.000 [state-dump] - num location lookups per second: 0.000 [state-dump] - num locations added per second: 0.000 [state-dump] - num locations removed per second: 0.000 [state-dump] BufferPool: [state-dump] - create buffer state map size: 0 [state-dump] PullManager: [state-dump] - num bytes available for pulled objects: 2147483648 [state-dump] - num bytes being pulled (all): 0 [state-dump] - num bytes being pulled / pinned: 0 [state-dump] - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - first get request bundle: N/A [state-dump] - first wait request bundle: N/A [state-dump] - first task request bundle: N/A [state-dump] - num objects queued: 0 [state-dump] - num objects actively pulled (all): 0 [state-dump] - num objects actively pulled / pinned: 0 [state-dump] - num bundles being pulled: 0 [state-dump] - num pull retries: 0 [state-dump] - max timeout seconds: 0 [state-dump] - max timeout request is already processed. No entry. [state-dump] [state-dump] WorkerPool: [state-dump] - registered jobs: 1 [state-dump] - process_failed_job_config_missing: 0 [state-dump] - process_failed_rate_limited: 0 [state-dump] - process_failed_pending_registration: 0 [state-dump] - process_failed_runtime_env_setup_failed: 0 [state-dump] - num PYTHON workers: 20 [state-dump] - num PYTHON drivers: 1 [state-dump] - num PYTHON pending start requests: 0 [state-dump] - num PYTHON pending registration requests: 0 [state-dump] - num object spill callbacks queued: 0 [state-dump] - num object restore queued: 0 [state-dump] - num util functions queued: 0 [state-dump] - num idle workers: 20 [state-dump] TaskDependencyManager: [state-dump] - task deps map size: 0 [state-dump] - get req map size: 0 [state-dump] - wait req map size: 0 [state-dump] - local objects map size: 0 [state-dump] WaitManager: [state-dump] - num active wait requests: 0 [state-dump] Subscriber: [state-dump] Channel WORKER_OBJECT_LOCATIONS_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_OBJECT_EVICTION [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_REF_REMOVED_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] num async plasma notifications: 0 [state-dump] Remote node managers: [state-dump] Event stats: [state-dump] Global stats: 544560 total (35 active) [state-dump] Queueing time: mean = 194.097 us, max = 59.826 s, min = -0.001 s, total = 105.698 s [state-dump] Execution time: mean = 11.198 ms, total = 6097.851 s [state-dump] Event stats: [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog - 130998 total (0 active), Execution time: mean = 533.406 us, total = 69.875 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 130998 total (0 active), Execution time: mean = 35.583 us, total = 4.661 s, Queueing time: mean = 111.566 us, max = 3.003 ms, min = 1.846 us, total = 14.615 s [state-dump] RaySyncer.OnDemandBroadcasting - 62340 total (1 active), Execution time: mean = 11.000 us, total = 685.763 ms, Queueing time: mean = 92.618 us, max = 25.869 ms, min = 6.166 us, total = 5.774 s [state-dump] NodeManager.CheckGC - 62340 total (1 active), Execution time: mean = 3.857 us, total = 240.454 ms, Queueing time: mean = 98.912 us, max = 25.875 ms, min = -0.000 s, total = 6.166 s [state-dump] ObjectManager.UpdateAvailableMemory - 62340 total (0 active), Execution time: mean = 6.329 us, total = 394.526 ms, Queueing time: mean = 112.102 us, max = 45.939 ms, min = 2.228 us, total = 6.988 s [state-dump] RayletWorkerPool.deadline_timer.kill_idle_workers - 31187 total (1 active), Execution time: mean = 18.995 us, total = 592.402 ms, Queueing time: mean = 78.069 us, max = 26.386 ms, min = -0.001 s, total = 2.435 s [state-dump] MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 24909 total (1 active), Execution time: mean = 459.213 us, total = 11.439 s, Queueing time: mean = 76.393 us, max = 4.063 ms, min = -0.000 s, total = 1.903 s [state-dump] NodeManager.deadline_timer.flush_free_objects - 6240 total (1 active), Execution time: mean = 9.507 us, total = 59.322 ms, Queueing time: mean = 178.557 us, max = 2.947 ms, min = 3.811 us, total = 1.114 s [state-dump] NodeManager.ScheduleAndDispatchTasks - 6240 total (1 active), Execution time: mean = 17.366 us, total = 108.363 ms, Queueing time: mean = 76.232 us, max = 2.581 ms, min = 8.735 us, total = 475.689 ms [state-dump] NodeManager.deadline_timer.spill_objects_when_over_threshold - 6240 total (1 active), Execution time: mean = 3.162 us, total = 19.733 ms, Queueing time: mean = 182.858 us, max = 2.946 ms, min = 3.845 us, total = 1.141 s [state-dump] NodeManagerService.grpc_server.GetResourceLoad - 6238 total (0 active), Execution time: mean = 614.116 us, total = 3.831 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 6238 total (0 active), Execution time: mean = 97.769 us, total = 609.880 ms, Queueing time: mean = 113.618 us, max = 2.934 ms, min = 4.027 us, total = 708.748 ms [state-dump] ClusterResourceManager.ResetRemoteNodeView - 2081 total (1 active), Execution time: mean = 10.148 us, total = 21.118 ms, Queueing time: mean = 75.769 us, max = 442.307 us, min = 12.181 us, total = 157.674 ms [state-dump] NodeManager.deadline_timer.record_metrics - 1248 total (1 active), Execution time: mean = 585.204 us, total = 730.334 ms, Queueing time: mean = 343.852 us, max = 2.010 ms, min = 9.115 us, total = 429.127 ms [state-dump] NodeManager.GcsCheckAlive - 1248 total (1 active), Execution time: mean = 321.466 us, total = 401.190 ms, Queueing time: mean = 607.173 us, max = 2.567 ms, min = 6.690 us, total = 757.752 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 1248 total (0 active), Execution time: mean = 56.147 us, total = 70.072 ms, Queueing time: mean = 112.670 us, max = 4.779 ms, min = 11.561 us, total = 140.612 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 1248 total (0 active), Execution time: mean = 1.605 ms, total = 2.003 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.deadline_timer.debug_state_dump - 624 total (1 active), Execution time: mean = 1.783 ms, total = 1.113 s, Queueing time: mean = 72.624 us, max = 248.460 us, min = 11.609 us, total = 45.317 ms [state-dump] NodeManager.deadline_timer.print_event_loop_stats - 104 total (1 active, 1 running), Execution time: mean = 2.727 ms, total = 283.567 ms, Queueing time: mean = 76.739 us, max = 325.100 us, min = 9.635 us, total = 7.981 ms [state-dump] ClientConnection.async_read.ProcessMessageHeader - 95 total (21 active), Execution time: mean = 7.620 us, total = 723.861 us, Queueing time: mean = 660.815 ms, max = 59.826 s, min = 29.537 us, total = 62.777 s [state-dump] ClientConnection.async_read.ProcessMessage - 74 total (0 active), Execution time: mean = 752.373 us, total = 55.676 ms, Queueing time: mean = 18.898 us, max = 247.615 us, min = 4.453 us, total = 1.398 ms [state-dump] ClientConnection.async_write.DoAsyncWrites - 22 total (0 active), Execution time: mean = 2.129 us, total = 46.834 us, Queueing time: mean = 43.475 us, max = 264.443 us, min = 17.727 us, total = 956.459 us [state-dump] CoreWorkerService.grpc_client.LocalGC.OnReplyReceived - 21 total (0 active), Execution time: mean = 65.428 us, total = 1.374 ms, Queueing time: mean = 149.534 us, max = 223.485 us, min = 56.603 us, total = 3.140 ms [state-dump] ObjectManager.ObjectAdded - 21 total (0 active), Execution time: mean = 13.522 us, total = 283.961 us, Queueing time: mean = 115.200 us, max = 203.067 us, min = 15.853 us, total = 2.419 ms [state-dump] ObjectManager.ObjectDeleted - 21 total (0 active), Execution time: mean = 18.654 us, total = 391.728 us, Queueing time: mean = 66.932 us, max = 269.040 us, min = 24.280 us, total = 1.406 ms [state-dump] CoreWorkerService.grpc_client.LocalGC - 21 total (0 active), Execution time: mean = 93.349 ms, total = 1.960 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 21 total (0 active), Execution time: mean = 119.783 us, total = 2.515 ms, Queueing time: mean = 119.709 us, max = 193.975 us, min = 26.529 us, total = 2.514 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig - 21 total (0 active), Execution time: mean = 800.451 us, total = 16.809 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 204.482 us, total = 2.658 ms, Queueing time: mean = 3.109 ms, max = 9.915 ms, min = 28.359 us, total = 40.413 ms [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 12 total (1 active), Execution time: mean = 499.801 s, total = 5997.612 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] - 12 total (0 active), Execution time: mean = 1.109 us, total = 13.308 us, Queueing time: mean = 117.235 us, max = 207.332 us, min = 32.817 us, total = 1.407 ms [state-dump] RaySyncer.BroadcastMessage - 12 total (0 active), Execution time: mean = 229.790 us, total = 2.757 ms, Queueing time: mean = 796.500 ns, max = 1.330 us, min = 180.000 ns, total = 9.558 us [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 11 total (0 active), Execution time: mean = 377.030 us, total = 4.147 ms, Queueing time: mean = 127.582 us, max = 243.371 us, min = 20.299 us, total = 1.403 ms [state-dump] NodeManagerService.grpc_server.RequestWorkerLease - 10 total (0 active), Execution time: mean = 967.372 us, total = 9.674 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.RequestWorkerLease.HandleRequestImpl - 10 total (0 active), Execution time: mean = 133.400 us, total = 1.334 ms, Queueing time: mean = 133.654 us, max = 258.016 us, min = 50.206 us, total = 1.337 ms [state-dump] WorkerPool.PopWorkerCallback - 10 total (0 active), Execution time: mean = 24.488 us, total = 244.878 us, Queueing time: mean = 114.997 us, max = 206.869 us, min = 23.653 us, total = 1.150 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker - 10 total (0 active), Execution time: mean = 698.937 us, total = 6.989 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReturnWorker.HandleRequestImpl - 10 total (0 active), Execution time: mean = 129.906 us, total = 1.299 ms, Queueing time: mean = 109.463 us, max = 137.712 us, min = 31.332 us, total = 1.095 ms [state-dump] NodeManager.GCTaskFailureReason - 7 total (1 active), Execution time: mean = 7.313 us, total = 51.189 us, Queueing time: mean = 59.264 us, max = 97.290 us, min = 24.344 us, total = 414.847 us [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 1.320 ms, total = 2.640 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] RaySyncerRegister - 2 total (0 active), Execution time: mean = 2.053 us, total = 4.106 us, Queueing time: mean = 222.000 ns, max = 369.000 ns, min = 75.000 ns, total = 444.000 ns [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 142.119 us, total = 284.237 us, Queueing time: mean = 702.200 us, max = 1.287 ms, min = 117.240 us, total = 1.404 ms [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 92.080 us, total = 92.080 us, Queueing time: mean = 36.492 us, max = 36.492 us, min = 36.492 us, total = 36.492 us [state-dump] Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 74.750 us, total = 74.750 us, Queueing time: mean = 225.429 us, max = 225.429 us, min = 225.429 us, total = 225.429 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 1.746 ms, total = 1.746 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 1.935 ms, total = 1.935 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 1.022 s, total = 1.022 s, Queueing time: mean = 126.779 us, max = 126.779 us, min = 126.779 us, total = 126.779 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 2.178 ms, total = 2.178 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 43.810 us, total = 43.810 us, Queueing time: mean = 116.080 us, max = 116.080 us, min = 116.080 us, total = 116.080 us [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 2.002 ms, total = 2.002 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 1.278 ms, total = 1.278 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 288.995 us, total = 288.995 us, Queueing time: mean = 122.868 us, max = 122.868 us, min = 122.868 us, total = 122.868 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 242.297 us, total = 242.297 us, Queueing time: mean = 122.977 us, max = 122.977 us, min = 122.977 us, total = 122.977 us [state-dump] DebugString() time ms: 1 [state-dump] [state-dump] [2025-01-21 00:46:54,934 I 13636 13664] (raylet) store.cc:564: Plasma store debug dump: Current usage: 0 / 2.14748 GB - num bytes created total: 168 0 pending objects of total size 0MB - objects spillable: 0 - bytes spillable: 0 - objects unsealed: 0 - bytes unsealed: 0 - objects in use: 0 - bytes in use: 0 - objects evictable: 0 - bytes evictable: 0 - objects created by worker: 0 - bytes created by worker: 0 - objects restored: 0 - bytes restored: 0 - objects received: 0 - bytes received: 0 - objects errored: 0 - bytes errored: 0 [2025-01-21 00:46:56,221 I 13636 13636] (raylet) node_manager.cc:525: [state-dump] NodeManager: [state-dump] Node ID: 94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3 [state-dump] Node name: 192.168.0.2 [state-dump] InitialConfigResources: {object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, GPU: 20000, node:__internal_head__: 10000, accelerator_type:A40: 10000, memory: 844966629380000} [state-dump] ClusterTaskManager: [state-dump] ========== Node: 94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3 ================= [state-dump] Infeasible queue length: 0 [state-dump] Schedule queue length: 0 [state-dump] Dispatch queue length: 0 [state-dump] num_waiting_for_resource: 0 [state-dump] num_waiting_for_plasma_memory: 0 [state-dump] num_waiting_for_remote_node_resources: 0 [state-dump] num_worker_not_started_by_job_config_not_exist: 0 [state-dump] num_worker_not_started_by_registration_timeout: 0 [state-dump] num_tasks_waiting_for_workers: 0 [state-dump] num_cancelled_tasks: 0 [state-dump] cluster_resource_scheduler state: [state-dump] Local id: -2729954820020547849 Local resources: {"total":{CPU: [200000], memory: [844966629380000], accelerator_type:A40: [10000], node:192.168.0.2: [10000], node:__internal_head__: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000]}}, "available": {CPU: [200000], memory: [844966629380000], accelerator_type:A40: [10000], node:192.168.0.2: [10000], node:__internal_head__: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000]}}, "labels":{"ray.io/node_id":"94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3",} is_draining: 0 is_idle: 1 Cluster resources: node id: -2729954820020547849{"total":{object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, GPU: 20000, node:__internal_head__: 10000, accelerator_type:A40: 10000, memory: 844966629380000}}, "available": {GPU: 20000, node:__internal_head__: 10000, memory: 844966629380000, accelerator_type:A40: 10000, object_store_memory: 21474836480000, node:192.168.0.2: 10000, CPU: 200000}}, "labels":{"ray.io/node_id":"94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} [state-dump] Waiting tasks size: 0 [state-dump] Number of executing tasks: 0 [state-dump] Number of pinned task arguments: 0 [state-dump] Number of total spilled tasks: 0 [state-dump] Number of spilled waiting tasks: 0 [state-dump] Number of spilled unschedulable tasks: 0 [state-dump] Resource usage { [state-dump] } [state-dump] Backlog Size per scheduling descriptor :{workerId: num backlogs}: [state-dump] [state-dump] Running tasks by scheduling class: [state-dump] ================================================== [state-dump] [state-dump] ClusterResources: [state-dump] LocalObjectManager: [state-dump] - num pinned objects: 0 [state-dump] - pinned objects size: 0 [state-dump] - num objects pending restore: 0 [state-dump] - num objects pending spill: 0 [state-dump] - num bytes pending spill: 0 [state-dump] - num bytes currently spilled: 0 [state-dump] - cumulative spill requests: 0 [state-dump] - cumulative restore requests: 0 [state-dump] - spilled objects pending delete: 0 [state-dump] [state-dump] ObjectManager: [state-dump] - num local objects: 0 [state-dump] - num unfulfilled push requests: 0 [state-dump] - num object pull requests: 0 [state-dump] - num chunks received total: 0 [state-dump] - num chunks received failed (all): 0 [state-dump] - num chunks received failed / cancelled: 0 [state-dump] - num chunks received failed / plasma error: 0 [state-dump] Event stats: [state-dump] Global stats: 0 total (0 active) [state-dump] Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Execution time: mean = -nan s, total = 0.000 s [state-dump] Event stats: [state-dump] PushManager: [state-dump] - num pushes in flight: 0 [state-dump] - num chunks in flight: 0 [state-dump] - num chunks remaining: 0 [state-dump] - max chunks allowed: 409 [state-dump] OwnershipBasedObjectDirectory: [state-dump] - num listeners: 0 [state-dump] - cumulative location updates: 0 [state-dump] - num location updates per second: 0.000 [state-dump] - num location lookups per second: 0.000 [state-dump] - num locations added per second: 0.000 [state-dump] - num locations removed per second: 0.000 [state-dump] BufferPool: [state-dump] - create buffer state map size: 0 [state-dump] PullManager: [state-dump] - num bytes available for pulled objects: 2147483648 [state-dump] - num bytes being pulled (all): 0 [state-dump] - num bytes being pulled / pinned: 0 [state-dump] - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - first get request bundle: N/A [state-dump] - first wait request bundle: N/A [state-dump] - first task request bundle: N/A [state-dump] - num objects queued: 0 [state-dump] - num objects actively pulled (all): 0 [state-dump] - num objects actively pulled / pinned: 0 [state-dump] - num bundles being pulled: 0 [state-dump] - num pull retries: 0 [state-dump] - max timeout seconds: 0 [state-dump] - max timeout request is already processed. No entry. [state-dump] [state-dump] WorkerPool: [state-dump] - registered jobs: 1 [state-dump] - process_failed_job_config_missing: 0 [state-dump] - process_failed_rate_limited: 0 [state-dump] - process_failed_pending_registration: 0 [state-dump] - process_failed_runtime_env_setup_failed: 0 [state-dump] - num PYTHON workers: 20 [state-dump] - num PYTHON drivers: 1 [state-dump] - num PYTHON pending start requests: 0 [state-dump] - num PYTHON pending registration requests: 0 [state-dump] - num object spill callbacks queued: 0 [state-dump] - num object restore queued: 0 [state-dump] - num util functions queued: 0 [state-dump] - num idle workers: 20 [state-dump] TaskDependencyManager: [state-dump] - task deps map size: 0 [state-dump] - get req map size: 0 [state-dump] - wait req map size: 0 [state-dump] - local objects map size: 0 [state-dump] WaitManager: [state-dump] - num active wait requests: 0 [state-dump] Subscriber: [state-dump] Channel WORKER_OBJECT_LOCATIONS_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_OBJECT_EVICTION [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_REF_REMOVED_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] num async plasma notifications: 0 [state-dump] Remote node managers: [state-dump] Event stats: [state-dump] Global stats: 549792 total (35 active) [state-dump] Queueing time: mean = 193.031 us, max = 59.826 s, min = -0.001 s, total = 106.127 s [state-dump] Execution time: mean = 11.093 ms, total = 6098.805 s [state-dump] Event stats: [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog - 132258 total (0 active), Execution time: mean = 533.516 us, total = 70.562 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 132258 total (0 active), Execution time: mean = 35.576 us, total = 4.705 s, Queueing time: mean = 111.600 us, max = 3.003 ms, min = 1.846 us, total = 14.760 s [state-dump] RaySyncer.OnDemandBroadcasting - 62939 total (1 active), Execution time: mean = 11.010 us, total = 692.952 ms, Queueing time: mean = 92.673 us, max = 25.869 ms, min = 6.166 us, total = 5.833 s [state-dump] NodeManager.CheckGC - 62939 total (1 active), Execution time: mean = 3.851 us, total = 242.354 ms, Queueing time: mean = 98.982 us, max = 25.875 ms, min = -0.000 s, total = 6.230 s [state-dump] ObjectManager.UpdateAvailableMemory - 62939 total (0 active), Execution time: mean = 6.335 us, total = 398.698 ms, Queueing time: mean = 112.157 us, max = 45.939 ms, min = 2.228 us, total = 7.059 s [state-dump] RayletWorkerPool.deadline_timer.kill_idle_workers - 31487 total (1 active), Execution time: mean = 19.011 us, total = 598.595 ms, Queueing time: mean = 78.096 us, max = 26.386 ms, min = -0.001 s, total = 2.459 s [state-dump] MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 25148 total (1 active), Execution time: mean = 459.365 us, total = 11.552 s, Queueing time: mean = 76.408 us, max = 4.063 ms, min = -0.000 s, total = 1.922 s [state-dump] NodeManager.deadline_timer.flush_free_objects - 6300 total (1 active), Execution time: mean = 9.514 us, total = 59.938 ms, Queueing time: mean = 178.567 us, max = 2.947 ms, min = 3.811 us, total = 1.125 s [state-dump] NodeManager.ScheduleAndDispatchTasks - 6300 total (1 active), Execution time: mean = 17.381 us, total = 109.502 ms, Queueing time: mean = 76.298 us, max = 2.581 ms, min = 8.735 us, total = 480.678 ms [state-dump] NodeManager.deadline_timer.spill_objects_when_over_threshold - 6300 total (1 active), Execution time: mean = 3.163 us, total = 19.927 ms, Queueing time: mean = 182.871 us, max = 2.946 ms, min = 3.845 us, total = 1.152 s [state-dump] NodeManagerService.grpc_server.GetResourceLoad - 6298 total (0 active), Execution time: mean = 614.141 us, total = 3.868 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 6298 total (0 active), Execution time: mean = 97.781 us, total = 615.823 ms, Queueing time: mean = 113.605 us, max = 2.934 ms, min = 4.027 us, total = 715.487 ms [state-dump] ClusterResourceManager.ResetRemoteNodeView - 2101 total (1 active), Execution time: mean = 10.131 us, total = 21.285 ms, Queueing time: mean = 75.679 us, max = 442.307 us, min = 12.181 us, total = 159.001 ms [state-dump] NodeManager.deadline_timer.record_metrics - 1260 total (1 active), Execution time: mean = 585.278 us, total = 737.451 ms, Queueing time: mean = 343.931 us, max = 2.010 ms, min = 9.115 us, total = 433.353 ms [state-dump] NodeManager.GcsCheckAlive - 1260 total (1 active), Execution time: mean = 321.624 us, total = 405.246 ms, Queueing time: mean = 607.170 us, max = 2.567 ms, min = 6.690 us, total = 765.034 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 1260 total (0 active), Execution time: mean = 56.172 us, total = 70.777 ms, Queueing time: mean = 112.484 us, max = 4.779 ms, min = 11.561 us, total = 141.730 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 1260 total (0 active), Execution time: mean = 1.605 ms, total = 2.022 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.deadline_timer.debug_state_dump - 630 total (1 active), Execution time: mean = 1.783 ms, total = 1.124 s, Queueing time: mean = 72.621 us, max = 248.460 us, min = 11.609 us, total = 45.751 ms [state-dump] NodeManager.deadline_timer.print_event_loop_stats - 105 total (1 active, 1 running), Execution time: mean = 2.730 ms, total = 286.670 ms, Queueing time: mean = 76.657 us, max = 325.100 us, min = 9.635 us, total = 8.049 ms [state-dump] ClientConnection.async_read.ProcessMessageHeader - 95 total (21 active), Execution time: mean = 7.620 us, total = 723.861 us, Queueing time: mean = 660.815 ms, max = 59.826 s, min = 29.537 us, total = 62.777 s [state-dump] ClientConnection.async_read.ProcessMessage - 74 total (0 active), Execution time: mean = 752.373 us, total = 55.676 ms, Queueing time: mean = 18.898 us, max = 247.615 us, min = 4.453 us, total = 1.398 ms [state-dump] ClientConnection.async_write.DoAsyncWrites - 22 total (0 active), Execution time: mean = 2.129 us, total = 46.834 us, Queueing time: mean = 43.475 us, max = 264.443 us, min = 17.727 us, total = 956.459 us [state-dump] CoreWorkerService.grpc_client.LocalGC.OnReplyReceived - 21 total (0 active), Execution time: mean = 65.428 us, total = 1.374 ms, Queueing time: mean = 149.534 us, max = 223.485 us, min = 56.603 us, total = 3.140 ms [state-dump] ObjectManager.ObjectAdded - 21 total (0 active), Execution time: mean = 13.522 us, total = 283.961 us, Queueing time: mean = 115.200 us, max = 203.067 us, min = 15.853 us, total = 2.419 ms [state-dump] ObjectManager.ObjectDeleted - 21 total (0 active), Execution time: mean = 18.654 us, total = 391.728 us, Queueing time: mean = 66.932 us, max = 269.040 us, min = 24.280 us, total = 1.406 ms [state-dump] CoreWorkerService.grpc_client.LocalGC - 21 total (0 active), Execution time: mean = 93.349 ms, total = 1.960 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 21 total (0 active), Execution time: mean = 119.783 us, total = 2.515 ms, Queueing time: mean = 119.709 us, max = 193.975 us, min = 26.529 us, total = 2.514 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig - 21 total (0 active), Execution time: mean = 800.451 us, total = 16.809 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 204.482 us, total = 2.658 ms, Queueing time: mean = 3.109 ms, max = 9.915 ms, min = 28.359 us, total = 40.413 ms [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 12 total (1 active), Execution time: mean = 499.801 s, total = 5997.612 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] - 12 total (0 active), Execution time: mean = 1.109 us, total = 13.308 us, Queueing time: mean = 117.235 us, max = 207.332 us, min = 32.817 us, total = 1.407 ms [state-dump] RaySyncer.BroadcastMessage - 12 total (0 active), Execution time: mean = 229.790 us, total = 2.757 ms, Queueing time: mean = 796.500 ns, max = 1.330 us, min = 180.000 ns, total = 9.558 us [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 11 total (0 active), Execution time: mean = 377.030 us, total = 4.147 ms, Queueing time: mean = 127.582 us, max = 243.371 us, min = 20.299 us, total = 1.403 ms [state-dump] NodeManagerService.grpc_server.RequestWorkerLease - 10 total (0 active), Execution time: mean = 967.372 us, total = 9.674 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.RequestWorkerLease.HandleRequestImpl - 10 total (0 active), Execution time: mean = 133.400 us, total = 1.334 ms, Queueing time: mean = 133.654 us, max = 258.016 us, min = 50.206 us, total = 1.337 ms [state-dump] WorkerPool.PopWorkerCallback - 10 total (0 active), Execution time: mean = 24.488 us, total = 244.878 us, Queueing time: mean = 114.997 us, max = 206.869 us, min = 23.653 us, total = 1.150 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker - 10 total (0 active), Execution time: mean = 698.937 us, total = 6.989 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReturnWorker.HandleRequestImpl - 10 total (0 active), Execution time: mean = 129.906 us, total = 1.299 ms, Queueing time: mean = 109.463 us, max = 137.712 us, min = 31.332 us, total = 1.095 ms [state-dump] NodeManager.GCTaskFailureReason - 8 total (1 active), Execution time: mean = 7.585 us, total = 60.683 us, Queueing time: mean = 59.928 us, max = 97.290 us, min = 24.344 us, total = 479.424 us [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 1.320 ms, total = 2.640 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] RaySyncerRegister - 2 total (0 active), Execution time: mean = 2.053 us, total = 4.106 us, Queueing time: mean = 222.000 ns, max = 369.000 ns, min = 75.000 ns, total = 444.000 ns [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 142.119 us, total = 284.237 us, Queueing time: mean = 702.200 us, max = 1.287 ms, min = 117.240 us, total = 1.404 ms [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 92.080 us, total = 92.080 us, Queueing time: mean = 36.492 us, max = 36.492 us, min = 36.492 us, total = 36.492 us [state-dump] Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 74.750 us, total = 74.750 us, Queueing time: mean = 225.429 us, max = 225.429 us, min = 225.429 us, total = 225.429 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 1.746 ms, total = 1.746 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 1.935 ms, total = 1.935 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 1.022 s, total = 1.022 s, Queueing time: mean = 126.779 us, max = 126.779 us, min = 126.779 us, total = 126.779 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 2.178 ms, total = 2.178 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 43.810 us, total = 43.810 us, Queueing time: mean = 116.080 us, max = 116.080 us, min = 116.080 us, total = 116.080 us [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 2.002 ms, total = 2.002 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 1.278 ms, total = 1.278 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 288.995 us, total = 288.995 us, Queueing time: mean = 122.868 us, max = 122.868 us, min = 122.868 us, total = 122.868 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 242.297 us, total = 242.297 us, Queueing time: mean = 122.977 us, max = 122.977 us, min = 122.977 us, total = 122.977 us [state-dump] DebugString() time ms: 1 [state-dump] [state-dump] [2025-01-21 00:47:54,935 I 13636 13664] (raylet) store.cc:564: Plasma store debug dump: Current usage: 0 / 2.14748 GB - num bytes created total: 168 0 pending objects of total size 0MB - objects spillable: 0 - bytes spillable: 0 - objects unsealed: 0 - bytes unsealed: 0 - objects in use: 0 - bytes in use: 0 - objects evictable: 0 - bytes evictable: 0 - objects created by worker: 0 - bytes created by worker: 0 - objects restored: 0 - bytes restored: 0 - objects received: 0 - bytes received: 0 - objects errored: 0 - bytes errored: 0 [2025-01-21 00:47:56,224 I 13636 13636] (raylet) node_manager.cc:525: [state-dump] NodeManager: [state-dump] Node ID: 94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3 [state-dump] Node name: 192.168.0.2 [state-dump] InitialConfigResources: {object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, GPU: 20000, node:__internal_head__: 10000, accelerator_type:A40: 10000, memory: 844966629380000} [state-dump] ClusterTaskManager: [state-dump] ========== Node: 94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3 ================= [state-dump] Infeasible queue length: 0 [state-dump] Schedule queue length: 0 [state-dump] Dispatch queue length: 0 [state-dump] num_waiting_for_resource: 0 [state-dump] num_waiting_for_plasma_memory: 0 [state-dump] num_waiting_for_remote_node_resources: 0 [state-dump] num_worker_not_started_by_job_config_not_exist: 0 [state-dump] num_worker_not_started_by_registration_timeout: 0 [state-dump] num_tasks_waiting_for_workers: 0 [state-dump] num_cancelled_tasks: 0 [state-dump] cluster_resource_scheduler state: [state-dump] Local id: -2729954820020547849 Local resources: {"total":{CPU: [200000], memory: [844966629380000], accelerator_type:A40: [10000], node:192.168.0.2: [10000], node:__internal_head__: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000]}}, "available": {CPU: [200000], memory: [844966629380000], accelerator_type:A40: [10000], node:192.168.0.2: [10000], node:__internal_head__: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000]}}, "labels":{"ray.io/node_id":"94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3",} is_draining: 0 is_idle: 1 Cluster resources: node id: -2729954820020547849{"total":{object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, GPU: 20000, node:__internal_head__: 10000, accelerator_type:A40: 10000, memory: 844966629380000}}, "available": {GPU: 20000, node:__internal_head__: 10000, memory: 844966629380000, accelerator_type:A40: 10000, object_store_memory: 21474836480000, node:192.168.0.2: 10000, CPU: 200000}}, "labels":{"ray.io/node_id":"94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} [state-dump] Waiting tasks size: 0 [state-dump] Number of executing tasks: 0 [state-dump] Number of pinned task arguments: 0 [state-dump] Number of total spilled tasks: 0 [state-dump] Number of spilled waiting tasks: 0 [state-dump] Number of spilled unschedulable tasks: 0 [state-dump] Resource usage { [state-dump] } [state-dump] Backlog Size per scheduling descriptor :{workerId: num backlogs}: [state-dump] [state-dump] Running tasks by scheduling class: [state-dump] ================================================== [state-dump] [state-dump] ClusterResources: [state-dump] LocalObjectManager: [state-dump] - num pinned objects: 0 [state-dump] - pinned objects size: 0 [state-dump] - num objects pending restore: 0 [state-dump] - num objects pending spill: 0 [state-dump] - num bytes pending spill: 0 [state-dump] - num bytes currently spilled: 0 [state-dump] - cumulative spill requests: 0 [state-dump] - cumulative restore requests: 0 [state-dump] - spilled objects pending delete: 0 [state-dump] [state-dump] ObjectManager: [state-dump] - num local objects: 0 [state-dump] - num unfulfilled push requests: 0 [state-dump] - num object pull requests: 0 [state-dump] - num chunks received total: 0 [state-dump] - num chunks received failed (all): 0 [state-dump] - num chunks received failed / cancelled: 0 [state-dump] - num chunks received failed / plasma error: 0 [state-dump] Event stats: [state-dump] Global stats: 0 total (0 active) [state-dump] Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Execution time: mean = -nan s, total = 0.000 s [state-dump] Event stats: [state-dump] PushManager: [state-dump] - num pushes in flight: 0 [state-dump] - num chunks in flight: 0 [state-dump] - num chunks remaining: 0 [state-dump] - max chunks allowed: 409 [state-dump] OwnershipBasedObjectDirectory: [state-dump] - num listeners: 0 [state-dump] - cumulative location updates: 0 [state-dump] - num location updates per second: 0.000 [state-dump] - num location lookups per second: 0.000 [state-dump] - num locations added per second: 0.000 [state-dump] - num locations removed per second: 0.000 [state-dump] BufferPool: [state-dump] - create buffer state map size: 0 [state-dump] PullManager: [state-dump] - num bytes available for pulled objects: 2147483648 [state-dump] - num bytes being pulled (all): 0 [state-dump] - num bytes being pulled / pinned: 0 [state-dump] - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - first get request bundle: N/A [state-dump] - first wait request bundle: N/A [state-dump] - first task request bundle: N/A [state-dump] - num objects queued: 0 [state-dump] - num objects actively pulled (all): 0 [state-dump] - num objects actively pulled / pinned: 0 [state-dump] - num bundles being pulled: 0 [state-dump] - num pull retries: 0 [state-dump] - max timeout seconds: 0 [state-dump] - max timeout request is already processed. No entry. [state-dump] [state-dump] WorkerPool: [state-dump] - registered jobs: 1 [state-dump] - process_failed_job_config_missing: 0 [state-dump] - process_failed_rate_limited: 0 [state-dump] - process_failed_pending_registration: 0 [state-dump] - process_failed_runtime_env_setup_failed: 0 [state-dump] - num PYTHON workers: 20 [state-dump] - num PYTHON drivers: 1 [state-dump] - num PYTHON pending start requests: 0 [state-dump] - num PYTHON pending registration requests: 0 [state-dump] - num object spill callbacks queued: 0 [state-dump] - num object restore queued: 0 [state-dump] - num util functions queued: 0 [state-dump] - num idle workers: 20 [state-dump] TaskDependencyManager: [state-dump] - task deps map size: 0 [state-dump] - get req map size: 0 [state-dump] - wait req map size: 0 [state-dump] - local objects map size: 0 [state-dump] WaitManager: [state-dump] - num active wait requests: 0 [state-dump] Subscriber: [state-dump] Channel WORKER_OBJECT_LOCATIONS_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_OBJECT_EVICTION [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_REF_REMOVED_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] num async plasma notifications: 0 [state-dump] Remote node managers: [state-dump] Event stats: [state-dump] Global stats: 555027 total (35 active) [state-dump] Queueing time: mean = 191.893 us, max = 59.826 s, min = -0.001 s, total = 106.506 s [state-dump] Execution time: mean = 10.990 ms, total = 6099.678 s [state-dump] Event stats: [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog - 133518 total (0 active), Execution time: mean = 533.153 us, total = 71.185 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 133518 total (0 active), Execution time: mean = 35.540 us, total = 4.745 s, Queueing time: mean = 111.503 us, max = 3.003 ms, min = 1.846 us, total = 14.888 s [state-dump] RaySyncer.OnDemandBroadcasting - 63539 total (1 active), Execution time: mean = 11.001 us, total = 698.975 ms, Queueing time: mean = 92.617 us, max = 25.869 ms, min = 6.166 us, total = 5.885 s [state-dump] NodeManager.CheckGC - 63539 total (1 active), Execution time: mean = 3.841 us, total = 244.053 ms, Queueing time: mean = 98.926 us, max = 25.875 ms, min = -0.000 s, total = 6.286 s [state-dump] ObjectManager.UpdateAvailableMemory - 63539 total (0 active), Execution time: mean = 6.329 us, total = 402.141 ms, Queueing time: mean = 112.036 us, max = 45.939 ms, min = 2.228 us, total = 7.119 s [state-dump] RayletWorkerPool.deadline_timer.kill_idle_workers - 31787 total (1 active), Execution time: mean = 18.992 us, total = 603.713 ms, Queueing time: mean = 77.986 us, max = 26.386 ms, min = -0.001 s, total = 2.479 s [state-dump] MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 25388 total (1 active), Execution time: mean = 459.339 us, total = 11.662 s, Queueing time: mean = 76.406 us, max = 4.063 ms, min = -0.000 s, total = 1.940 s [state-dump] NodeManager.deadline_timer.flush_free_objects - 6360 total (1 active), Execution time: mean = 9.500 us, total = 60.422 ms, Queueing time: mean = 178.491 us, max = 2.947 ms, min = 3.811 us, total = 1.135 s [state-dump] NodeManager.ScheduleAndDispatchTasks - 6360 total (1 active), Execution time: mean = 17.355 us, total = 110.378 ms, Queueing time: mean = 76.167 us, max = 2.581 ms, min = 7.438 us, total = 484.423 ms [state-dump] NodeManager.deadline_timer.spill_objects_when_over_threshold - 6360 total (1 active), Execution time: mean = 3.160 us, total = 20.099 ms, Queueing time: mean = 182.787 us, max = 2.946 ms, min = 3.845 us, total = 1.163 s [state-dump] NodeManagerService.grpc_server.GetResourceLoad - 6358 total (0 active), Execution time: mean = 613.699 us, total = 3.902 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 6358 total (0 active), Execution time: mean = 97.744 us, total = 621.454 ms, Queueing time: mean = 113.470 us, max = 2.934 ms, min = 4.027 us, total = 721.442 ms [state-dump] ClusterResourceManager.ResetRemoteNodeView - 2121 total (1 active), Execution time: mean = 10.103 us, total = 21.429 ms, Queueing time: mean = 75.914 us, max = 564.135 us, min = 12.181 us, total = 161.014 ms [state-dump] NodeManager.deadline_timer.record_metrics - 1272 total (1 active), Execution time: mean = 584.721 us, total = 743.765 ms, Queueing time: mean = 344.200 us, max = 2.010 ms, min = 9.115 us, total = 437.822 ms [state-dump] NodeManager.GcsCheckAlive - 1272 total (1 active), Execution time: mean = 321.303 us, total = 408.697 ms, Queueing time: mean = 607.133 us, max = 2.567 ms, min = 6.690 us, total = 772.274 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 1272 total (0 active), Execution time: mean = 56.120 us, total = 71.384 ms, Queueing time: mean = 112.319 us, max = 4.779 ms, min = 11.561 us, total = 142.869 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 1272 total (0 active), Execution time: mean = 1.604 ms, total = 2.040 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.deadline_timer.debug_state_dump - 636 total (1 active), Execution time: mean = 1.783 ms, total = 1.134 s, Queueing time: mean = 72.622 us, max = 248.460 us, min = 11.609 us, total = 46.188 ms [state-dump] NodeManager.deadline_timer.print_event_loop_stats - 106 total (1 active, 1 running), Execution time: mean = 2.729 ms, total = 289.237 ms, Queueing time: mean = 76.382 us, max = 325.100 us, min = 9.635 us, total = 8.097 ms [state-dump] ClientConnection.async_read.ProcessMessageHeader - 95 total (21 active), Execution time: mean = 7.620 us, total = 723.861 us, Queueing time: mean = 660.815 ms, max = 59.826 s, min = 29.537 us, total = 62.777 s [state-dump] ClientConnection.async_read.ProcessMessage - 74 total (0 active), Execution time: mean = 752.373 us, total = 55.676 ms, Queueing time: mean = 18.898 us, max = 247.615 us, min = 4.453 us, total = 1.398 ms [state-dump] ClientConnection.async_write.DoAsyncWrites - 22 total (0 active), Execution time: mean = 2.129 us, total = 46.834 us, Queueing time: mean = 43.475 us, max = 264.443 us, min = 17.727 us, total = 956.459 us [state-dump] CoreWorkerService.grpc_client.LocalGC.OnReplyReceived - 21 total (0 active), Execution time: mean = 65.428 us, total = 1.374 ms, Queueing time: mean = 149.534 us, max = 223.485 us, min = 56.603 us, total = 3.140 ms [state-dump] ObjectManager.ObjectAdded - 21 total (0 active), Execution time: mean = 13.522 us, total = 283.961 us, Queueing time: mean = 115.200 us, max = 203.067 us, min = 15.853 us, total = 2.419 ms [state-dump] ObjectManager.ObjectDeleted - 21 total (0 active), Execution time: mean = 18.654 us, total = 391.728 us, Queueing time: mean = 66.932 us, max = 269.040 us, min = 24.280 us, total = 1.406 ms [state-dump] CoreWorkerService.grpc_client.LocalGC - 21 total (0 active), Execution time: mean = 93.349 ms, total = 1.960 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 21 total (0 active), Execution time: mean = 119.783 us, total = 2.515 ms, Queueing time: mean = 119.709 us, max = 193.975 us, min = 26.529 us, total = 2.514 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig - 21 total (0 active), Execution time: mean = 800.451 us, total = 16.809 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 204.482 us, total = 2.658 ms, Queueing time: mean = 3.109 ms, max = 9.915 ms, min = 28.359 us, total = 40.413 ms [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 12 total (1 active), Execution time: mean = 499.801 s, total = 5997.612 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] - 12 total (0 active), Execution time: mean = 1.109 us, total = 13.308 us, Queueing time: mean = 117.235 us, max = 207.332 us, min = 32.817 us, total = 1.407 ms [state-dump] RaySyncer.BroadcastMessage - 12 total (0 active), Execution time: mean = 229.790 us, total = 2.757 ms, Queueing time: mean = 796.500 ns, max = 1.330 us, min = 180.000 ns, total = 9.558 us [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 11 total (0 active), Execution time: mean = 377.030 us, total = 4.147 ms, Queueing time: mean = 127.582 us, max = 243.371 us, min = 20.299 us, total = 1.403 ms [state-dump] NodeManagerService.grpc_server.RequestWorkerLease - 10 total (0 active), Execution time: mean = 967.372 us, total = 9.674 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.RequestWorkerLease.HandleRequestImpl - 10 total (0 active), Execution time: mean = 133.400 us, total = 1.334 ms, Queueing time: mean = 133.654 us, max = 258.016 us, min = 50.206 us, total = 1.337 ms [state-dump] WorkerPool.PopWorkerCallback - 10 total (0 active), Execution time: mean = 24.488 us, total = 244.878 us, Queueing time: mean = 114.997 us, max = 206.869 us, min = 23.653 us, total = 1.150 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker - 10 total (0 active), Execution time: mean = 698.937 us, total = 6.989 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReturnWorker.HandleRequestImpl - 10 total (0 active), Execution time: mean = 129.906 us, total = 1.299 ms, Queueing time: mean = 109.463 us, max = 137.712 us, min = 31.332 us, total = 1.095 ms [state-dump] NodeManager.GCTaskFailureReason - 8 total (1 active), Execution time: mean = 7.585 us, total = 60.683 us, Queueing time: mean = 59.928 us, max = 97.290 us, min = 24.344 us, total = 479.424 us [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 1.320 ms, total = 2.640 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] RaySyncerRegister - 2 total (0 active), Execution time: mean = 2.053 us, total = 4.106 us, Queueing time: mean = 222.000 ns, max = 369.000 ns, min = 75.000 ns, total = 444.000 ns [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 142.119 us, total = 284.237 us, Queueing time: mean = 702.200 us, max = 1.287 ms, min = 117.240 us, total = 1.404 ms [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 92.080 us, total = 92.080 us, Queueing time: mean = 36.492 us, max = 36.492 us, min = 36.492 us, total = 36.492 us [state-dump] Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 74.750 us, total = 74.750 us, Queueing time: mean = 225.429 us, max = 225.429 us, min = 225.429 us, total = 225.429 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 1.746 ms, total = 1.746 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 1.935 ms, total = 1.935 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 1.022 s, total = 1.022 s, Queueing time: mean = 126.779 us, max = 126.779 us, min = 126.779 us, total = 126.779 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 2.178 ms, total = 2.178 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 43.810 us, total = 43.810 us, Queueing time: mean = 116.080 us, max = 116.080 us, min = 116.080 us, total = 116.080 us [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 2.002 ms, total = 2.002 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 1.278 ms, total = 1.278 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 288.995 us, total = 288.995 us, Queueing time: mean = 122.868 us, max = 122.868 us, min = 122.868 us, total = 122.868 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 242.297 us, total = 242.297 us, Queueing time: mean = 122.977 us, max = 122.977 us, min = 122.977 us, total = 122.977 us [state-dump] DebugString() time ms: 2 [state-dump] [state-dump] [2025-01-21 00:48:54,935 I 13636 13664] (raylet) store.cc:564: Plasma store debug dump: Current usage: 0 / 2.14748 GB - num bytes created total: 168 0 pending objects of total size 0MB - objects spillable: 0 - bytes spillable: 0 - objects unsealed: 0 - bytes unsealed: 0 - objects in use: 0 - bytes in use: 0 - objects evictable: 0 - bytes evictable: 0 - objects created by worker: 0 - bytes created by worker: 0 - objects restored: 0 - bytes restored: 0 - objects received: 0 - bytes received: 0 - objects errored: 0 - bytes errored: 0 [2025-01-21 00:48:56,227 I 13636 13636] (raylet) node_manager.cc:525: [state-dump] NodeManager: [state-dump] Node ID: 94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3 [state-dump] Node name: 192.168.0.2 [state-dump] InitialConfigResources: {object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, GPU: 20000, node:__internal_head__: 10000, accelerator_type:A40: 10000, memory: 844966629380000} [state-dump] ClusterTaskManager: [state-dump] ========== Node: 94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3 ================= [state-dump] Infeasible queue length: 0 [state-dump] Schedule queue length: 0 [state-dump] Dispatch queue length: 0 [state-dump] num_waiting_for_resource: 0 [state-dump] num_waiting_for_plasma_memory: 0 [state-dump] num_waiting_for_remote_node_resources: 0 [state-dump] num_worker_not_started_by_job_config_not_exist: 0 [state-dump] num_worker_not_started_by_registration_timeout: 0 [state-dump] num_tasks_waiting_for_workers: 0 [state-dump] num_cancelled_tasks: 0 [state-dump] cluster_resource_scheduler state: [state-dump] Local id: -2729954820020547849 Local resources: {"total":{CPU: [200000], memory: [844966629380000], accelerator_type:A40: [10000], node:192.168.0.2: [10000], node:__internal_head__: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000]}}, "available": {CPU: [200000], memory: [844966629380000], accelerator_type:A40: [10000], node:192.168.0.2: [10000], node:__internal_head__: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000]}}, "labels":{"ray.io/node_id":"94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3",} is_draining: 0 is_idle: 1 Cluster resources: node id: -2729954820020547849{"total":{object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, GPU: 20000, node:__internal_head__: 10000, accelerator_type:A40: 10000, memory: 844966629380000}}, "available": {GPU: 20000, node:__internal_head__: 10000, memory: 844966629380000, accelerator_type:A40: 10000, object_store_memory: 21474836480000, node:192.168.0.2: 10000, CPU: 200000}}, "labels":{"ray.io/node_id":"94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} [state-dump] Waiting tasks size: 0 [state-dump] Number of executing tasks: 0 [state-dump] Number of pinned task arguments: 0 [state-dump] Number of total spilled tasks: 0 [state-dump] Number of spilled waiting tasks: 0 [state-dump] Number of spilled unschedulable tasks: 0 [state-dump] Resource usage { [state-dump] } [state-dump] Backlog Size per scheduling descriptor :{workerId: num backlogs}: [state-dump] [state-dump] Running tasks by scheduling class: [state-dump] ================================================== [state-dump] [state-dump] ClusterResources: [state-dump] LocalObjectManager: [state-dump] - num pinned objects: 0 [state-dump] - pinned objects size: 0 [state-dump] - num objects pending restore: 0 [state-dump] - num objects pending spill: 0 [state-dump] - num bytes pending spill: 0 [state-dump] - num bytes currently spilled: 0 [state-dump] - cumulative spill requests: 0 [state-dump] - cumulative restore requests: 0 [state-dump] - spilled objects pending delete: 0 [state-dump] [state-dump] ObjectManager: [state-dump] - num local objects: 0 [state-dump] - num unfulfilled push requests: 0 [state-dump] - num object pull requests: 0 [state-dump] - num chunks received total: 0 [state-dump] - num chunks received failed (all): 0 [state-dump] - num chunks received failed / cancelled: 0 [state-dump] - num chunks received failed / plasma error: 0 [state-dump] Event stats: [state-dump] Global stats: 0 total (0 active) [state-dump] Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Execution time: mean = -nan s, total = 0.000 s [state-dump] Event stats: [state-dump] PushManager: [state-dump] - num pushes in flight: 0 [state-dump] - num chunks in flight: 0 [state-dump] - num chunks remaining: 0 [state-dump] - max chunks allowed: 409 [state-dump] OwnershipBasedObjectDirectory: [state-dump] - num listeners: 0 [state-dump] - cumulative location updates: 0 [state-dump] - num location updates per second: 0.000 [state-dump] - num location lookups per second: 0.000 [state-dump] - num locations added per second: 0.000 [state-dump] - num locations removed per second: 0.000 [state-dump] BufferPool: [state-dump] - create buffer state map size: 0 [state-dump] PullManager: [state-dump] - num bytes available for pulled objects: 2147483648 [state-dump] - num bytes being pulled (all): 0 [state-dump] - num bytes being pulled / pinned: 0 [state-dump] - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - first get request bundle: N/A [state-dump] - first wait request bundle: N/A [state-dump] - first task request bundle: N/A [state-dump] - num objects queued: 0 [state-dump] - num objects actively pulled (all): 0 [state-dump] - num objects actively pulled / pinned: 0 [state-dump] - num bundles being pulled: 0 [state-dump] - num pull retries: 0 [state-dump] - max timeout seconds: 0 [state-dump] - max timeout request is already processed. No entry. [state-dump] [state-dump] WorkerPool: [state-dump] - registered jobs: 1 [state-dump] - process_failed_job_config_missing: 0 [state-dump] - process_failed_rate_limited: 0 [state-dump] - process_failed_pending_registration: 0 [state-dump] - process_failed_runtime_env_setup_failed: 0 [state-dump] - num PYTHON workers: 20 [state-dump] - num PYTHON drivers: 1 [state-dump] - num PYTHON pending start requests: 0 [state-dump] - num PYTHON pending registration requests: 0 [state-dump] - num object spill callbacks queued: 0 [state-dump] - num object restore queued: 0 [state-dump] - num util functions queued: 0 [state-dump] - num idle workers: 20 [state-dump] TaskDependencyManager: [state-dump] - task deps map size: 0 [state-dump] - get req map size: 0 [state-dump] - wait req map size: 0 [state-dump] - local objects map size: 0 [state-dump] WaitManager: [state-dump] - num active wait requests: 0 [state-dump] Subscriber: [state-dump] Channel WORKER_OBJECT_LOCATIONS_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_OBJECT_EVICTION [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_REF_REMOVED_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] num async plasma notifications: 0 [state-dump] Remote node managers: [state-dump] Event stats: [state-dump] Global stats: 560258 total (35 active) [state-dump] Queueing time: mean = 190.840 us, max = 59.826 s, min = -0.001 s, total = 106.920 s [state-dump] Execution time: mean = 10.889 ms, total = 6100.597 s [state-dump] Event stats: [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog - 134778 total (0 active), Execution time: mean = 533.055 us, total = 71.844 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 134778 total (0 active), Execution time: mean = 35.537 us, total = 4.790 s, Queueing time: mean = 111.510 us, max = 3.003 ms, min = 1.846 us, total = 15.029 s [state-dump] RaySyncer.OnDemandBroadcasting - 64138 total (1 active), Execution time: mean = 11.009 us, total = 706.078 ms, Queueing time: mean = 92.605 us, max = 25.869 ms, min = 6.166 us, total = 5.939 s [state-dump] NodeManager.CheckGC - 64138 total (1 active), Execution time: mean = 3.834 us, total = 245.909 ms, Queueing time: mean = 98.928 us, max = 25.875 ms, min = -0.000 s, total = 6.345 s [state-dump] ObjectManager.UpdateAvailableMemory - 64138 total (0 active), Execution time: mean = 6.330 us, total = 406.017 ms, Queueing time: mean = 112.038 us, max = 45.939 ms, min = 2.228 us, total = 7.186 s [state-dump] RayletWorkerPool.deadline_timer.kill_idle_workers - 32087 total (1 active), Execution time: mean = 18.998 us, total = 609.592 ms, Queueing time: mean = 78.021 us, max = 26.386 ms, min = -0.001 s, total = 2.503 s [state-dump] MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 25627 total (1 active), Execution time: mean = 459.401 us, total = 11.773 s, Queueing time: mean = 76.424 us, max = 4.063 ms, min = -0.000 s, total = 1.959 s [state-dump] NodeManager.deadline_timer.flush_free_objects - 6420 total (1 active), Execution time: mean = 9.501 us, total = 60.997 ms, Queueing time: mean = 178.512 us, max = 2.947 ms, min = 3.811 us, total = 1.146 s [state-dump] NodeManager.ScheduleAndDispatchTasks - 6420 total (1 active), Execution time: mean = 17.367 us, total = 111.499 ms, Queueing time: mean = 76.145 us, max = 2.581 ms, min = 7.438 us, total = 488.849 ms [state-dump] NodeManager.deadline_timer.spill_objects_when_over_threshold - 6420 total (1 active), Execution time: mean = 3.161 us, total = 20.294 ms, Queueing time: mean = 182.808 us, max = 2.946 ms, min = 3.845 us, total = 1.174 s [state-dump] NodeManagerService.grpc_server.GetResourceLoad - 6418 total (0 active), Execution time: mean = 613.612 us, total = 3.938 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 6418 total (0 active), Execution time: mean = 97.709 us, total = 627.099 ms, Queueing time: mean = 113.495 us, max = 2.934 ms, min = 4.027 us, total = 728.410 ms [state-dump] ClusterResourceManager.ResetRemoteNodeView - 2141 total (1 active), Execution time: mean = 10.110 us, total = 21.646 ms, Queueing time: mean = 75.818 us, max = 564.135 us, min = 12.181 us, total = 162.325 ms [state-dump] NodeManager.deadline_timer.record_metrics - 1284 total (1 active), Execution time: mean = 584.876 us, total = 750.981 ms, Queueing time: mean = 344.021 us, max = 2.010 ms, min = 9.115 us, total = 441.723 ms [state-dump] NodeManager.GcsCheckAlive - 1284 total (1 active), Execution time: mean = 321.241 us, total = 412.473 ms, Queueing time: mean = 607.249 us, max = 2.567 ms, min = 6.690 us, total = 779.708 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 1284 total (0 active), Execution time: mean = 56.115 us, total = 72.051 ms, Queueing time: mean = 112.204 us, max = 4.779 ms, min = 11.561 us, total = 144.070 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 1284 total (0 active), Execution time: mean = 1.602 ms, total = 2.057 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.deadline_timer.debug_state_dump - 642 total (1 active), Execution time: mean = 1.783 ms, total = 1.145 s, Queueing time: mean = 72.561 us, max = 248.460 us, min = 11.609 us, total = 46.584 ms [state-dump] NodeManager.deadline_timer.print_event_loop_stats - 107 total (1 active, 1 running), Execution time: mean = 2.730 ms, total = 292.088 ms, Queueing time: mean = 75.848 us, max = 325.100 us, min = 9.635 us, total = 8.116 ms [state-dump] ClientConnection.async_read.ProcessMessageHeader - 95 total (21 active), Execution time: mean = 7.620 us, total = 723.861 us, Queueing time: mean = 660.815 ms, max = 59.826 s, min = 29.537 us, total = 62.777 s [state-dump] ClientConnection.async_read.ProcessMessage - 74 total (0 active), Execution time: mean = 752.373 us, total = 55.676 ms, Queueing time: mean = 18.898 us, max = 247.615 us, min = 4.453 us, total = 1.398 ms [state-dump] ClientConnection.async_write.DoAsyncWrites - 22 total (0 active), Execution time: mean = 2.129 us, total = 46.834 us, Queueing time: mean = 43.475 us, max = 264.443 us, min = 17.727 us, total = 956.459 us [state-dump] CoreWorkerService.grpc_client.LocalGC.OnReplyReceived - 21 total (0 active), Execution time: mean = 65.428 us, total = 1.374 ms, Queueing time: mean = 149.534 us, max = 223.485 us, min = 56.603 us, total = 3.140 ms [state-dump] ObjectManager.ObjectAdded - 21 total (0 active), Execution time: mean = 13.522 us, total = 283.961 us, Queueing time: mean = 115.200 us, max = 203.067 us, min = 15.853 us, total = 2.419 ms [state-dump] ObjectManager.ObjectDeleted - 21 total (0 active), Execution time: mean = 18.654 us, total = 391.728 us, Queueing time: mean = 66.932 us, max = 269.040 us, min = 24.280 us, total = 1.406 ms [state-dump] CoreWorkerService.grpc_client.LocalGC - 21 total (0 active), Execution time: mean = 93.349 ms, total = 1.960 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 21 total (0 active), Execution time: mean = 119.783 us, total = 2.515 ms, Queueing time: mean = 119.709 us, max = 193.975 us, min = 26.529 us, total = 2.514 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig - 21 total (0 active), Execution time: mean = 800.451 us, total = 16.809 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 204.482 us, total = 2.658 ms, Queueing time: mean = 3.109 ms, max = 9.915 ms, min = 28.359 us, total = 40.413 ms [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 12 total (1 active), Execution time: mean = 499.801 s, total = 5997.612 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] - 12 total (0 active), Execution time: mean = 1.109 us, total = 13.308 us, Queueing time: mean = 117.235 us, max = 207.332 us, min = 32.817 us, total = 1.407 ms [state-dump] RaySyncer.BroadcastMessage - 12 total (0 active), Execution time: mean = 229.790 us, total = 2.757 ms, Queueing time: mean = 796.500 ns, max = 1.330 us, min = 180.000 ns, total = 9.558 us [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 11 total (0 active), Execution time: mean = 377.030 us, total = 4.147 ms, Queueing time: mean = 127.582 us, max = 243.371 us, min = 20.299 us, total = 1.403 ms [state-dump] NodeManagerService.grpc_server.RequestWorkerLease - 10 total (0 active), Execution time: mean = 967.372 us, total = 9.674 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.RequestWorkerLease.HandleRequestImpl - 10 total (0 active), Execution time: mean = 133.400 us, total = 1.334 ms, Queueing time: mean = 133.654 us, max = 258.016 us, min = 50.206 us, total = 1.337 ms [state-dump] WorkerPool.PopWorkerCallback - 10 total (0 active), Execution time: mean = 24.488 us, total = 244.878 us, Queueing time: mean = 114.997 us, max = 206.869 us, min = 23.653 us, total = 1.150 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker - 10 total (0 active), Execution time: mean = 698.937 us, total = 6.989 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReturnWorker.HandleRequestImpl - 10 total (0 active), Execution time: mean = 129.906 us, total = 1.299 ms, Queueing time: mean = 109.463 us, max = 137.712 us, min = 31.332 us, total = 1.095 ms [state-dump] NodeManager.GCTaskFailureReason - 8 total (1 active), Execution time: mean = 7.585 us, total = 60.683 us, Queueing time: mean = 59.928 us, max = 97.290 us, min = 24.344 us, total = 479.424 us [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 1.320 ms, total = 2.640 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] RaySyncerRegister - 2 total (0 active), Execution time: mean = 2.053 us, total = 4.106 us, Queueing time: mean = 222.000 ns, max = 369.000 ns, min = 75.000 ns, total = 444.000 ns [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 142.119 us, total = 284.237 us, Queueing time: mean = 702.200 us, max = 1.287 ms, min = 117.240 us, total = 1.404 ms [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 92.080 us, total = 92.080 us, Queueing time: mean = 36.492 us, max = 36.492 us, min = 36.492 us, total = 36.492 us [state-dump] Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 74.750 us, total = 74.750 us, Queueing time: mean = 225.429 us, max = 225.429 us, min = 225.429 us, total = 225.429 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 1.746 ms, total = 1.746 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 1.935 ms, total = 1.935 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 1.022 s, total = 1.022 s, Queueing time: mean = 126.779 us, max = 126.779 us, min = 126.779 us, total = 126.779 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 2.178 ms, total = 2.178 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 43.810 us, total = 43.810 us, Queueing time: mean = 116.080 us, max = 116.080 us, min = 116.080 us, total = 116.080 us [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 2.002 ms, total = 2.002 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 1.278 ms, total = 1.278 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 288.995 us, total = 288.995 us, Queueing time: mean = 122.868 us, max = 122.868 us, min = 122.868 us, total = 122.868 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 242.297 us, total = 242.297 us, Queueing time: mean = 122.977 us, max = 122.977 us, min = 122.977 us, total = 122.977 us [state-dump] DebugString() time ms: 1 [state-dump] [state-dump] [2025-01-21 00:49:54,935 I 13636 13664] (raylet) store.cc:564: Plasma store debug dump: Current usage: 0 / 2.14748 GB - num bytes created total: 168 0 pending objects of total size 0MB - objects spillable: 0 - bytes spillable: 0 - objects unsealed: 0 - bytes unsealed: 0 - objects in use: 0 - bytes in use: 0 - objects evictable: 0 - bytes evictable: 0 - objects created by worker: 0 - bytes created by worker: 0 - objects restored: 0 - bytes restored: 0 - objects received: 0 - bytes received: 0 - objects errored: 0 - bytes errored: 0 [2025-01-21 00:49:56,230 I 13636 13636] (raylet) node_manager.cc:525: [state-dump] NodeManager: [state-dump] Node ID: 94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3 [state-dump] Node name: 192.168.0.2 [state-dump] InitialConfigResources: {object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, GPU: 20000, node:__internal_head__: 10000, accelerator_type:A40: 10000, memory: 844966629380000} [state-dump] ClusterTaskManager: [state-dump] ========== Node: 94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3 ================= [state-dump] Infeasible queue length: 0 [state-dump] Schedule queue length: 0 [state-dump] Dispatch queue length: 0 [state-dump] num_waiting_for_resource: 0 [state-dump] num_waiting_for_plasma_memory: 0 [state-dump] num_waiting_for_remote_node_resources: 0 [state-dump] num_worker_not_started_by_job_config_not_exist: 0 [state-dump] num_worker_not_started_by_registration_timeout: 0 [state-dump] num_tasks_waiting_for_workers: 0 [state-dump] num_cancelled_tasks: 0 [state-dump] cluster_resource_scheduler state: [state-dump] Local id: -2729954820020547849 Local resources: {"total":{CPU: [200000], memory: [844966629380000], accelerator_type:A40: [10000], node:192.168.0.2: [10000], node:__internal_head__: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000]}}, "available": {CPU: [200000], memory: [844966629380000], accelerator_type:A40: [10000], node:192.168.0.2: [10000], node:__internal_head__: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000]}}, "labels":{"ray.io/node_id":"94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3",} is_draining: 0 is_idle: 1 Cluster resources: node id: -2729954820020547849{"total":{object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, GPU: 20000, node:__internal_head__: 10000, accelerator_type:A40: 10000, memory: 844966629380000}}, "available": {GPU: 20000, node:__internal_head__: 10000, memory: 844966629380000, accelerator_type:A40: 10000, object_store_memory: 21474836480000, node:192.168.0.2: 10000, CPU: 200000}}, "labels":{"ray.io/node_id":"94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} [state-dump] Waiting tasks size: 0 [state-dump] Number of executing tasks: 0 [state-dump] Number of pinned task arguments: 0 [state-dump] Number of total spilled tasks: 0 [state-dump] Number of spilled waiting tasks: 0 [state-dump] Number of spilled unschedulable tasks: 0 [state-dump] Resource usage { [state-dump] } [state-dump] Backlog Size per scheduling descriptor :{workerId: num backlogs}: [state-dump] [state-dump] Running tasks by scheduling class: [state-dump] ================================================== [state-dump] [state-dump] ClusterResources: [state-dump] LocalObjectManager: [state-dump] - num pinned objects: 0 [state-dump] - pinned objects size: 0 [state-dump] - num objects pending restore: 0 [state-dump] - num objects pending spill: 0 [state-dump] - num bytes pending spill: 0 [state-dump] - num bytes currently spilled: 0 [state-dump] - cumulative spill requests: 0 [state-dump] - cumulative restore requests: 0 [state-dump] - spilled objects pending delete: 0 [state-dump] [state-dump] ObjectManager: [state-dump] - num local objects: 0 [state-dump] - num unfulfilled push requests: 0 [state-dump] - num object pull requests: 0 [state-dump] - num chunks received total: 0 [state-dump] - num chunks received failed (all): 0 [state-dump] - num chunks received failed / cancelled: 0 [state-dump] - num chunks received failed / plasma error: 0 [state-dump] Event stats: [state-dump] Global stats: 0 total (0 active) [state-dump] Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Execution time: mean = -nan s, total = 0.000 s [state-dump] Event stats: [state-dump] PushManager: [state-dump] - num pushes in flight: 0 [state-dump] - num chunks in flight: 0 [state-dump] - num chunks remaining: 0 [state-dump] - max chunks allowed: 409 [state-dump] OwnershipBasedObjectDirectory: [state-dump] - num listeners: 0 [state-dump] - cumulative location updates: 0 [state-dump] - num location updates per second: 0.000 [state-dump] - num location lookups per second: 0.000 [state-dump] - num locations added per second: 0.000 [state-dump] - num locations removed per second: 0.000 [state-dump] BufferPool: [state-dump] - create buffer state map size: 0 [state-dump] PullManager: [state-dump] - num bytes available for pulled objects: 2147483648 [state-dump] - num bytes being pulled (all): 0 [state-dump] - num bytes being pulled / pinned: 0 [state-dump] - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - first get request bundle: N/A [state-dump] - first wait request bundle: N/A [state-dump] - first task request bundle: N/A [state-dump] - num objects queued: 0 [state-dump] - num objects actively pulled (all): 0 [state-dump] - num objects actively pulled / pinned: 0 [state-dump] - num bundles being pulled: 0 [state-dump] - num pull retries: 0 [state-dump] - max timeout seconds: 0 [state-dump] - max timeout request is already processed. No entry. [state-dump] [state-dump] WorkerPool: [state-dump] - registered jobs: 1 [state-dump] - process_failed_job_config_missing: 0 [state-dump] - process_failed_rate_limited: 0 [state-dump] - process_failed_pending_registration: 0 [state-dump] - process_failed_runtime_env_setup_failed: 0 [state-dump] - num PYTHON workers: 20 [state-dump] - num PYTHON drivers: 1 [state-dump] - num PYTHON pending start requests: 0 [state-dump] - num PYTHON pending registration requests: 0 [state-dump] - num object spill callbacks queued: 0 [state-dump] - num object restore queued: 0 [state-dump] - num util functions queued: 0 [state-dump] - num idle workers: 20 [state-dump] TaskDependencyManager: [state-dump] - task deps map size: 0 [state-dump] - get req map size: 0 [state-dump] - wait req map size: 0 [state-dump] - local objects map size: 0 [state-dump] WaitManager: [state-dump] - num active wait requests: 0 [state-dump] Subscriber: [state-dump] Channel WORKER_OBJECT_LOCATIONS_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_OBJECT_EVICTION [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_REF_REMOVED_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] num async plasma notifications: 0 [state-dump] Remote node managers: [state-dump] Event stats: [state-dump] Global stats: 565493 total (35 active) [state-dump] Queueing time: mean = 189.757 us, max = 59.826 s, min = -0.001 s, total = 107.306 s [state-dump] Execution time: mean = 10.790 ms, total = 6101.438 s [state-dump] Event stats: [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog - 136038 total (0 active), Execution time: mean = 532.490 us, total = 72.439 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 136038 total (0 active), Execution time: mean = 35.497 us, total = 4.829 s, Queueing time: mean = 111.446 us, max = 3.003 ms, min = 1.846 us, total = 15.161 s [state-dump] RaySyncer.OnDemandBroadcasting - 64738 total (1 active), Execution time: mean = 11.012 us, total = 712.894 ms, Queueing time: mean = 92.551 us, max = 25.869 ms, min = 6.166 us, total = 5.992 s [state-dump] NodeManager.CheckGC - 64738 total (1 active), Execution time: mean = 3.826 us, total = 247.699 ms, Queueing time: mean = 98.884 us, max = 25.875 ms, min = -0.000 s, total = 6.402 s [state-dump] ObjectManager.UpdateAvailableMemory - 64738 total (0 active), Execution time: mean = 6.325 us, total = 409.482 ms, Queueing time: mean = 111.955 us, max = 45.939 ms, min = 2.228 us, total = 7.248 s [state-dump] RayletWorkerPool.deadline_timer.kill_idle_workers - 32387 total (1 active), Execution time: mean = 18.989 us, total = 615.000 ms, Queueing time: mean = 77.947 us, max = 26.386 ms, min = -0.001 s, total = 2.524 s [state-dump] MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 25867 total (1 active), Execution time: mean = 459.247 us, total = 11.879 s, Queueing time: mean = 76.362 us, max = 4.063 ms, min = -0.000 s, total = 1.975 s [state-dump] NodeManager.deadline_timer.flush_free_objects - 6480 total (1 active), Execution time: mean = 9.490 us, total = 61.493 ms, Queueing time: mean = 178.482 us, max = 2.947 ms, min = 3.811 us, total = 1.157 s [state-dump] NodeManager.ScheduleAndDispatchTasks - 6480 total (1 active), Execution time: mean = 17.370 us, total = 112.559 ms, Queueing time: mean = 76.093 us, max = 2.581 ms, min = 7.438 us, total = 493.081 ms [state-dump] NodeManager.deadline_timer.spill_objects_when_over_threshold - 6480 total (1 active), Execution time: mean = 3.159 us, total = 20.470 ms, Queueing time: mean = 182.772 us, max = 2.946 ms, min = 3.845 us, total = 1.184 s [state-dump] NodeManagerService.grpc_server.GetResourceLoad - 6478 total (0 active), Execution time: mean = 613.002 us, total = 3.971 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 6478 total (0 active), Execution time: mean = 97.656 us, total = 632.612 ms, Queueing time: mean = 113.421 us, max = 2.934 ms, min = 4.027 us, total = 734.743 ms [state-dump] ClusterResourceManager.ResetRemoteNodeView - 2161 total (1 active), Execution time: mean = 10.105 us, total = 21.837 ms, Queueing time: mean = 75.758 us, max = 564.135 us, min = 12.181 us, total = 163.712 ms [state-dump] NodeManager.deadline_timer.record_metrics - 1296 total (1 active), Execution time: mean = 584.374 us, total = 757.349 ms, Queueing time: mean = 344.444 us, max = 2.010 ms, min = 9.115 us, total = 446.399 ms [state-dump] NodeManager.GcsCheckAlive - 1296 total (1 active), Execution time: mean = 321.165 us, total = 416.230 ms, Queueing time: mean = 607.163 us, max = 2.567 ms, min = 6.690 us, total = 786.883 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 1296 total (0 active), Execution time: mean = 56.056 us, total = 72.648 ms, Queueing time: mean = 112.244 us, max = 4.779 ms, min = 11.561 us, total = 145.468 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 1296 total (0 active), Execution time: mean = 1.601 ms, total = 2.075 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.deadline_timer.debug_state_dump - 648 total (1 active), Execution time: mean = 1.783 ms, total = 1.155 s, Queueing time: mean = 72.479 us, max = 248.460 us, min = 11.609 us, total = 46.966 ms [state-dump] NodeManager.deadline_timer.print_event_loop_stats - 108 total (1 active, 1 running), Execution time: mean = 2.732 ms, total = 295.034 ms, Queueing time: mean = 75.795 us, max = 325.100 us, min = 9.635 us, total = 8.186 ms [state-dump] ClientConnection.async_read.ProcessMessageHeader - 95 total (21 active), Execution time: mean = 7.620 us, total = 723.861 us, Queueing time: mean = 660.815 ms, max = 59.826 s, min = 29.537 us, total = 62.777 s [state-dump] ClientConnection.async_read.ProcessMessage - 74 total (0 active), Execution time: mean = 752.373 us, total = 55.676 ms, Queueing time: mean = 18.898 us, max = 247.615 us, min = 4.453 us, total = 1.398 ms [state-dump] ClientConnection.async_write.DoAsyncWrites - 22 total (0 active), Execution time: mean = 2.129 us, total = 46.834 us, Queueing time: mean = 43.475 us, max = 264.443 us, min = 17.727 us, total = 956.459 us [state-dump] CoreWorkerService.grpc_client.LocalGC.OnReplyReceived - 21 total (0 active), Execution time: mean = 65.428 us, total = 1.374 ms, Queueing time: mean = 149.534 us, max = 223.485 us, min = 56.603 us, total = 3.140 ms [state-dump] ObjectManager.ObjectAdded - 21 total (0 active), Execution time: mean = 13.522 us, total = 283.961 us, Queueing time: mean = 115.200 us, max = 203.067 us, min = 15.853 us, total = 2.419 ms [state-dump] ObjectManager.ObjectDeleted - 21 total (0 active), Execution time: mean = 18.654 us, total = 391.728 us, Queueing time: mean = 66.932 us, max = 269.040 us, min = 24.280 us, total = 1.406 ms [state-dump] CoreWorkerService.grpc_client.LocalGC - 21 total (0 active), Execution time: mean = 93.349 ms, total = 1.960 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 21 total (0 active), Execution time: mean = 119.783 us, total = 2.515 ms, Queueing time: mean = 119.709 us, max = 193.975 us, min = 26.529 us, total = 2.514 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig - 21 total (0 active), Execution time: mean = 800.451 us, total = 16.809 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 204.482 us, total = 2.658 ms, Queueing time: mean = 3.109 ms, max = 9.915 ms, min = 28.359 us, total = 40.413 ms [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 12 total (1 active), Execution time: mean = 499.801 s, total = 5997.612 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] - 12 total (0 active), Execution time: mean = 1.109 us, total = 13.308 us, Queueing time: mean = 117.235 us, max = 207.332 us, min = 32.817 us, total = 1.407 ms [state-dump] RaySyncer.BroadcastMessage - 12 total (0 active), Execution time: mean = 229.790 us, total = 2.757 ms, Queueing time: mean = 796.500 ns, max = 1.330 us, min = 180.000 ns, total = 9.558 us [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 11 total (0 active), Execution time: mean = 377.030 us, total = 4.147 ms, Queueing time: mean = 127.582 us, max = 243.371 us, min = 20.299 us, total = 1.403 ms [state-dump] NodeManagerService.grpc_server.RequestWorkerLease - 10 total (0 active), Execution time: mean = 967.372 us, total = 9.674 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.RequestWorkerLease.HandleRequestImpl - 10 total (0 active), Execution time: mean = 133.400 us, total = 1.334 ms, Queueing time: mean = 133.654 us, max = 258.016 us, min = 50.206 us, total = 1.337 ms [state-dump] WorkerPool.PopWorkerCallback - 10 total (0 active), Execution time: mean = 24.488 us, total = 244.878 us, Queueing time: mean = 114.997 us, max = 206.869 us, min = 23.653 us, total = 1.150 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker - 10 total (0 active), Execution time: mean = 698.937 us, total = 6.989 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReturnWorker.HandleRequestImpl - 10 total (0 active), Execution time: mean = 129.906 us, total = 1.299 ms, Queueing time: mean = 109.463 us, max = 137.712 us, min = 31.332 us, total = 1.095 ms [state-dump] NodeManager.GCTaskFailureReason - 8 total (1 active), Execution time: mean = 7.585 us, total = 60.683 us, Queueing time: mean = 59.928 us, max = 97.290 us, min = 24.344 us, total = 479.424 us [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 1.320 ms, total = 2.640 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] RaySyncerRegister - 2 total (0 active), Execution time: mean = 2.053 us, total = 4.106 us, Queueing time: mean = 222.000 ns, max = 369.000 ns, min = 75.000 ns, total = 444.000 ns [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 142.119 us, total = 284.237 us, Queueing time: mean = 702.200 us, max = 1.287 ms, min = 117.240 us, total = 1.404 ms [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 92.080 us, total = 92.080 us, Queueing time: mean = 36.492 us, max = 36.492 us, min = 36.492 us, total = 36.492 us [state-dump] Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 74.750 us, total = 74.750 us, Queueing time: mean = 225.429 us, max = 225.429 us, min = 225.429 us, total = 225.429 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 1.746 ms, total = 1.746 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 1.935 ms, total = 1.935 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 1.022 s, total = 1.022 s, Queueing time: mean = 126.779 us, max = 126.779 us, min = 126.779 us, total = 126.779 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 2.178 ms, total = 2.178 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 43.810 us, total = 43.810 us, Queueing time: mean = 116.080 us, max = 116.080 us, min = 116.080 us, total = 116.080 us [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 2.002 ms, total = 2.002 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 1.278 ms, total = 1.278 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 288.995 us, total = 288.995 us, Queueing time: mean = 122.868 us, max = 122.868 us, min = 122.868 us, total = 122.868 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 242.297 us, total = 242.297 us, Queueing time: mean = 122.977 us, max = 122.977 us, min = 122.977 us, total = 122.977 us [state-dump] DebugString() time ms: 2 [state-dump] [state-dump] [2025-01-21 00:50:54,935 I 13636 13664] (raylet) store.cc:564: Plasma store debug dump: Current usage: 0 / 2.14748 GB - num bytes created total: 168 0 pending objects of total size 0MB - objects spillable: 0 - bytes spillable: 0 - objects unsealed: 0 - bytes unsealed: 0 - objects in use: 0 - bytes in use: 0 - objects evictable: 0 - bytes evictable: 0 - objects created by worker: 0 - bytes created by worker: 0 - objects restored: 0 - bytes restored: 0 - objects received: 0 - bytes received: 0 - objects errored: 0 - bytes errored: 0 [2025-01-21 00:50:56,233 I 13636 13636] (raylet) node_manager.cc:525: [state-dump] NodeManager: [state-dump] Node ID: 94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3 [state-dump] Node name: 192.168.0.2 [state-dump] InitialConfigResources: {object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, GPU: 20000, node:__internal_head__: 10000, accelerator_type:A40: 10000, memory: 844966629380000} [state-dump] ClusterTaskManager: [state-dump] ========== Node: 94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3 ================= [state-dump] Infeasible queue length: 0 [state-dump] Schedule queue length: 0 [state-dump] Dispatch queue length: 0 [state-dump] num_waiting_for_resource: 0 [state-dump] num_waiting_for_plasma_memory: 0 [state-dump] num_waiting_for_remote_node_resources: 0 [state-dump] num_worker_not_started_by_job_config_not_exist: 0 [state-dump] num_worker_not_started_by_registration_timeout: 0 [state-dump] num_tasks_waiting_for_workers: 0 [state-dump] num_cancelled_tasks: 0 [state-dump] cluster_resource_scheduler state: [state-dump] Local id: -2729954820020547849 Local resources: {"total":{CPU: [200000], memory: [844966629380000], accelerator_type:A40: [10000], node:192.168.0.2: [10000], node:__internal_head__: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000]}}, "available": {CPU: [200000], memory: [844966629380000], accelerator_type:A40: [10000], node:192.168.0.2: [10000], node:__internal_head__: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000]}}, "labels":{"ray.io/node_id":"94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3",} is_draining: 0 is_idle: 1 Cluster resources: node id: -2729954820020547849{"total":{object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, GPU: 20000, node:__internal_head__: 10000, accelerator_type:A40: 10000, memory: 844966629380000}}, "available": {GPU: 20000, node:__internal_head__: 10000, memory: 844966629380000, accelerator_type:A40: 10000, object_store_memory: 21474836480000, node:192.168.0.2: 10000, CPU: 200000}}, "labels":{"ray.io/node_id":"94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} [state-dump] Waiting tasks size: 0 [state-dump] Number of executing tasks: 0 [state-dump] Number of pinned task arguments: 0 [state-dump] Number of total spilled tasks: 0 [state-dump] Number of spilled waiting tasks: 0 [state-dump] Number of spilled unschedulable tasks: 0 [state-dump] Resource usage { [state-dump] } [state-dump] Backlog Size per scheduling descriptor :{workerId: num backlogs}: [state-dump] [state-dump] Running tasks by scheduling class: [state-dump] ================================================== [state-dump] [state-dump] ClusterResources: [state-dump] LocalObjectManager: [state-dump] - num pinned objects: 0 [state-dump] - pinned objects size: 0 [state-dump] - num objects pending restore: 0 [state-dump] - num objects pending spill: 0 [state-dump] - num bytes pending spill: 0 [state-dump] - num bytes currently spilled: 0 [state-dump] - cumulative spill requests: 0 [state-dump] - cumulative restore requests: 0 [state-dump] - spilled objects pending delete: 0 [state-dump] [state-dump] ObjectManager: [state-dump] - num local objects: 0 [state-dump] - num unfulfilled push requests: 0 [state-dump] - num object pull requests: 0 [state-dump] - num chunks received total: 0 [state-dump] - num chunks received failed (all): 0 [state-dump] - num chunks received failed / cancelled: 0 [state-dump] - num chunks received failed / plasma error: 0 [state-dump] Event stats: [state-dump] Global stats: 0 total (0 active) [state-dump] Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Execution time: mean = -nan s, total = 0.000 s [state-dump] Event stats: [state-dump] PushManager: [state-dump] - num pushes in flight: 0 [state-dump] - num chunks in flight: 0 [state-dump] - num chunks remaining: 0 [state-dump] - max chunks allowed: 409 [state-dump] OwnershipBasedObjectDirectory: [state-dump] - num listeners: 0 [state-dump] - cumulative location updates: 0 [state-dump] - num location updates per second: 0.000 [state-dump] - num location lookups per second: 0.000 [state-dump] - num locations added per second: 0.000 [state-dump] - num locations removed per second: 0.000 [state-dump] BufferPool: [state-dump] - create buffer state map size: 0 [state-dump] PullManager: [state-dump] - num bytes available for pulled objects: 2147483648 [state-dump] - num bytes being pulled (all): 0 [state-dump] - num bytes being pulled / pinned: 0 [state-dump] - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - first get request bundle: N/A [state-dump] - first wait request bundle: N/A [state-dump] - first task request bundle: N/A [state-dump] - num objects queued: 0 [state-dump] - num objects actively pulled (all): 0 [state-dump] - num objects actively pulled / pinned: 0 [state-dump] - num bundles being pulled: 0 [state-dump] - num pull retries: 0 [state-dump] - max timeout seconds: 0 [state-dump] - max timeout request is already processed. No entry. [state-dump] [state-dump] WorkerPool: [state-dump] - registered jobs: 1 [state-dump] - process_failed_job_config_missing: 0 [state-dump] - process_failed_rate_limited: 0 [state-dump] - process_failed_pending_registration: 0 [state-dump] - process_failed_runtime_env_setup_failed: 0 [state-dump] - num PYTHON workers: 20 [state-dump] - num PYTHON drivers: 1 [state-dump] - num PYTHON pending start requests: 0 [state-dump] - num PYTHON pending registration requests: 0 [state-dump] - num object spill callbacks queued: 0 [state-dump] - num object restore queued: 0 [state-dump] - num util functions queued: 0 [state-dump] - num idle workers: 20 [state-dump] TaskDependencyManager: [state-dump] - task deps map size: 0 [state-dump] - get req map size: 0 [state-dump] - wait req map size: 0 [state-dump] - local objects map size: 0 [state-dump] WaitManager: [state-dump] - num active wait requests: 0 [state-dump] Subscriber: [state-dump] Channel WORKER_OBJECT_LOCATIONS_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_OBJECT_EVICTION [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_REF_REMOVED_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] num async plasma notifications: 0 [state-dump] Remote node managers: [state-dump] Event stats: [state-dump] Global stats: 570724 total (35 active) [state-dump] Queueing time: mean = 188.773 us, max = 59.826 s, min = -0.001 s, total = 107.737 s [state-dump] Execution time: mean = 10.692 ms, total = 6102.373 s [state-dump] Event stats: [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog - 137298 total (0 active), Execution time: mean = 532.456 us, total = 73.105 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 137298 total (0 active), Execution time: mean = 35.505 us, total = 4.875 s, Queueing time: mean = 111.454 us, max = 3.003 ms, min = 1.846 us, total = 15.302 s [state-dump] RaySyncer.OnDemandBroadcasting - 65337 total (1 active), Execution time: mean = 11.033 us, total = 720.862 ms, Queueing time: mean = 92.645 us, max = 25.869 ms, min = 6.166 us, total = 6.053 s [state-dump] NodeManager.CheckGC - 65337 total (1 active), Execution time: mean = 3.820 us, total = 249.567 ms, Queueing time: mean = 99.004 us, max = 25.875 ms, min = -0.000 s, total = 6.469 s [state-dump] ObjectManager.UpdateAvailableMemory - 65337 total (0 active), Execution time: mean = 6.333 us, total = 413.789 ms, Queueing time: mean = 111.971 us, max = 45.939 ms, min = 2.228 us, total = 7.316 s [state-dump] RayletWorkerPool.deadline_timer.kill_idle_workers - 32687 total (1 active), Execution time: mean = 19.015 us, total = 621.554 ms, Queueing time: mean = 77.990 us, max = 26.386 ms, min = -0.001 s, total = 2.549 s [state-dump] MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 26106 total (1 active), Execution time: mean = 459.317 us, total = 11.991 s, Queueing time: mean = 76.370 us, max = 4.063 ms, min = -0.000 s, total = 1.994 s [state-dump] NodeManager.deadline_timer.flush_free_objects - 6540 total (1 active), Execution time: mean = 9.510 us, total = 62.194 ms, Queueing time: mean = 178.511 us, max = 2.947 ms, min = 3.811 us, total = 1.167 s [state-dump] NodeManager.ScheduleAndDispatchTasks - 6540 total (1 active), Execution time: mean = 17.405 us, total = 113.831 ms, Queueing time: mean = 76.129 us, max = 2.581 ms, min = 7.438 us, total = 497.882 ms [state-dump] NodeManager.deadline_timer.spill_objects_when_over_threshold - 6540 total (1 active), Execution time: mean = 3.163 us, total = 20.687 ms, Queueing time: mean = 182.808 us, max = 2.946 ms, min = 3.845 us, total = 1.196 s [state-dump] NodeManagerService.grpc_server.GetResourceLoad - 6538 total (0 active), Execution time: mean = 613.067 us, total = 4.008 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 6538 total (0 active), Execution time: mean = 97.686 us, total = 638.668 ms, Queueing time: mean = 113.470 us, max = 2.934 ms, min = 4.027 us, total = 741.867 ms [state-dump] ClusterResourceManager.ResetRemoteNodeView - 2181 total (1 active), Execution time: mean = 10.121 us, total = 22.075 ms, Queueing time: mean = 75.935 us, max = 564.135 us, min = 12.181 us, total = 165.614 ms [state-dump] NodeManager.deadline_timer.record_metrics - 1308 total (1 active), Execution time: mean = 584.681 us, total = 764.763 ms, Queueing time: mean = 344.232 us, max = 2.010 ms, min = 9.115 us, total = 450.255 ms [state-dump] NodeManager.GcsCheckAlive - 1308 total (1 active), Execution time: mean = 321.341 us, total = 420.314 ms, Queueing time: mean = 607.167 us, max = 2.567 ms, min = 6.690 us, total = 794.175 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 1308 total (0 active), Execution time: mean = 56.047 us, total = 73.310 ms, Queueing time: mean = 112.287 us, max = 4.779 ms, min = 11.561 us, total = 146.871 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 1308 total (0 active), Execution time: mean = 1.601 ms, total = 2.094 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.deadline_timer.debug_state_dump - 654 total (1 active), Execution time: mean = 1.783 ms, total = 1.166 s, Queueing time: mean = 72.537 us, max = 248.460 us, min = 11.609 us, total = 47.439 ms [state-dump] NodeManager.deadline_timer.print_event_loop_stats - 109 total (1 active, 1 running), Execution time: mean = 2.735 ms, total = 298.073 ms, Queueing time: mean = 75.654 us, max = 325.100 us, min = 9.635 us, total = 8.246 ms [state-dump] ClientConnection.async_read.ProcessMessageHeader - 95 total (21 active), Execution time: mean = 7.620 us, total = 723.861 us, Queueing time: mean = 660.815 ms, max = 59.826 s, min = 29.537 us, total = 62.777 s [state-dump] ClientConnection.async_read.ProcessMessage - 74 total (0 active), Execution time: mean = 752.373 us, total = 55.676 ms, Queueing time: mean = 18.898 us, max = 247.615 us, min = 4.453 us, total = 1.398 ms [state-dump] ClientConnection.async_write.DoAsyncWrites - 22 total (0 active), Execution time: mean = 2.129 us, total = 46.834 us, Queueing time: mean = 43.475 us, max = 264.443 us, min = 17.727 us, total = 956.459 us [state-dump] CoreWorkerService.grpc_client.LocalGC.OnReplyReceived - 21 total (0 active), Execution time: mean = 65.428 us, total = 1.374 ms, Queueing time: mean = 149.534 us, max = 223.485 us, min = 56.603 us, total = 3.140 ms [state-dump] ObjectManager.ObjectAdded - 21 total (0 active), Execution time: mean = 13.522 us, total = 283.961 us, Queueing time: mean = 115.200 us, max = 203.067 us, min = 15.853 us, total = 2.419 ms [state-dump] ObjectManager.ObjectDeleted - 21 total (0 active), Execution time: mean = 18.654 us, total = 391.728 us, Queueing time: mean = 66.932 us, max = 269.040 us, min = 24.280 us, total = 1.406 ms [state-dump] CoreWorkerService.grpc_client.LocalGC - 21 total (0 active), Execution time: mean = 93.349 ms, total = 1.960 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 21 total (0 active), Execution time: mean = 119.783 us, total = 2.515 ms, Queueing time: mean = 119.709 us, max = 193.975 us, min = 26.529 us, total = 2.514 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig - 21 total (0 active), Execution time: mean = 800.451 us, total = 16.809 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 204.482 us, total = 2.658 ms, Queueing time: mean = 3.109 ms, max = 9.915 ms, min = 28.359 us, total = 40.413 ms [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 12 total (1 active), Execution time: mean = 499.801 s, total = 5997.612 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] - 12 total (0 active), Execution time: mean = 1.109 us, total = 13.308 us, Queueing time: mean = 117.235 us, max = 207.332 us, min = 32.817 us, total = 1.407 ms [state-dump] RaySyncer.BroadcastMessage - 12 total (0 active), Execution time: mean = 229.790 us, total = 2.757 ms, Queueing time: mean = 796.500 ns, max = 1.330 us, min = 180.000 ns, total = 9.558 us [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 11 total (0 active), Execution time: mean = 377.030 us, total = 4.147 ms, Queueing time: mean = 127.582 us, max = 243.371 us, min = 20.299 us, total = 1.403 ms [state-dump] NodeManagerService.grpc_server.RequestWorkerLease - 10 total (0 active), Execution time: mean = 967.372 us, total = 9.674 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.RequestWorkerLease.HandleRequestImpl - 10 total (0 active), Execution time: mean = 133.400 us, total = 1.334 ms, Queueing time: mean = 133.654 us, max = 258.016 us, min = 50.206 us, total = 1.337 ms [state-dump] WorkerPool.PopWorkerCallback - 10 total (0 active), Execution time: mean = 24.488 us, total = 244.878 us, Queueing time: mean = 114.997 us, max = 206.869 us, min = 23.653 us, total = 1.150 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker - 10 total (0 active), Execution time: mean = 698.937 us, total = 6.989 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReturnWorker.HandleRequestImpl - 10 total (0 active), Execution time: mean = 129.906 us, total = 1.299 ms, Queueing time: mean = 109.463 us, max = 137.712 us, min = 31.332 us, total = 1.095 ms [state-dump] NodeManager.GCTaskFailureReason - 8 total (1 active), Execution time: mean = 7.585 us, total = 60.683 us, Queueing time: mean = 59.928 us, max = 97.290 us, min = 24.344 us, total = 479.424 us [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 1.320 ms, total = 2.640 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] RaySyncerRegister - 2 total (0 active), Execution time: mean = 2.053 us, total = 4.106 us, Queueing time: mean = 222.000 ns, max = 369.000 ns, min = 75.000 ns, total = 444.000 ns [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 142.119 us, total = 284.237 us, Queueing time: mean = 702.200 us, max = 1.287 ms, min = 117.240 us, total = 1.404 ms [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 92.080 us, total = 92.080 us, Queueing time: mean = 36.492 us, max = 36.492 us, min = 36.492 us, total = 36.492 us [state-dump] Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 74.750 us, total = 74.750 us, Queueing time: mean = 225.429 us, max = 225.429 us, min = 225.429 us, total = 225.429 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 1.746 ms, total = 1.746 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 1.935 ms, total = 1.935 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 1.022 s, total = 1.022 s, Queueing time: mean = 126.779 us, max = 126.779 us, min = 126.779 us, total = 126.779 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 2.178 ms, total = 2.178 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 43.810 us, total = 43.810 us, Queueing time: mean = 116.080 us, max = 116.080 us, min = 116.080 us, total = 116.080 us [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 2.002 ms, total = 2.002 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 1.278 ms, total = 1.278 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 288.995 us, total = 288.995 us, Queueing time: mean = 122.868 us, max = 122.868 us, min = 122.868 us, total = 122.868 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 242.297 us, total = 242.297 us, Queueing time: mean = 122.977 us, max = 122.977 us, min = 122.977 us, total = 122.977 us [state-dump] DebugString() time ms: 2 [state-dump] [state-dump] [2025-01-21 00:51:54,936 I 13636 13664] (raylet) store.cc:564: Plasma store debug dump: Current usage: 0 / 2.14748 GB - num bytes created total: 168 0 pending objects of total size 0MB - objects spillable: 0 - bytes spillable: 0 - objects unsealed: 0 - bytes unsealed: 0 - objects in use: 0 - bytes in use: 0 - objects evictable: 0 - bytes evictable: 0 - objects created by worker: 0 - bytes created by worker: 0 - objects restored: 0 - bytes restored: 0 - objects received: 0 - bytes received: 0 - objects errored: 0 - bytes errored: 0 [2025-01-21 00:51:56,236 I 13636 13636] (raylet) node_manager.cc:525: [state-dump] NodeManager: [state-dump] Node ID: 94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3 [state-dump] Node name: 192.168.0.2 [state-dump] InitialConfigResources: {object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, GPU: 20000, node:__internal_head__: 10000, accelerator_type:A40: 10000, memory: 844966629380000} [state-dump] ClusterTaskManager: [state-dump] ========== Node: 94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3 ================= [state-dump] Infeasible queue length: 0 [state-dump] Schedule queue length: 0 [state-dump] Dispatch queue length: 0 [state-dump] num_waiting_for_resource: 0 [state-dump] num_waiting_for_plasma_memory: 0 [state-dump] num_waiting_for_remote_node_resources: 0 [state-dump] num_worker_not_started_by_job_config_not_exist: 0 [state-dump] num_worker_not_started_by_registration_timeout: 0 [state-dump] num_tasks_waiting_for_workers: 0 [state-dump] num_cancelled_tasks: 0 [state-dump] cluster_resource_scheduler state: [state-dump] Local id: -2729954820020547849 Local resources: {"total":{CPU: [200000], memory: [844966629380000], accelerator_type:A40: [10000], node:192.168.0.2: [10000], node:__internal_head__: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000]}}, "available": {CPU: [200000], memory: [844966629380000], accelerator_type:A40: [10000], node:192.168.0.2: [10000], node:__internal_head__: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000]}}, "labels":{"ray.io/node_id":"94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3",} is_draining: 0 is_idle: 1 Cluster resources: node id: -2729954820020547849{"total":{object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, GPU: 20000, node:__internal_head__: 10000, accelerator_type:A40: 10000, memory: 844966629380000}}, "available": {GPU: 20000, node:__internal_head__: 10000, memory: 844966629380000, accelerator_type:A40: 10000, object_store_memory: 21474836480000, node:192.168.0.2: 10000, CPU: 200000}}, "labels":{"ray.io/node_id":"94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} [state-dump] Waiting tasks size: 0 [state-dump] Number of executing tasks: 0 [state-dump] Number of pinned task arguments: 0 [state-dump] Number of total spilled tasks: 0 [state-dump] Number of spilled waiting tasks: 0 [state-dump] Number of spilled unschedulable tasks: 0 [state-dump] Resource usage { [state-dump] } [state-dump] Backlog Size per scheduling descriptor :{workerId: num backlogs}: [state-dump] [state-dump] Running tasks by scheduling class: [state-dump] ================================================== [state-dump] [state-dump] ClusterResources: [state-dump] LocalObjectManager: [state-dump] - num pinned objects: 0 [state-dump] - pinned objects size: 0 [state-dump] - num objects pending restore: 0 [state-dump] - num objects pending spill: 0 [state-dump] - num bytes pending spill: 0 [state-dump] - num bytes currently spilled: 0 [state-dump] - cumulative spill requests: 0 [state-dump] - cumulative restore requests: 0 [state-dump] - spilled objects pending delete: 0 [state-dump] [state-dump] ObjectManager: [state-dump] - num local objects: 0 [state-dump] - num unfulfilled push requests: 0 [state-dump] - num object pull requests: 0 [state-dump] - num chunks received total: 0 [state-dump] - num chunks received failed (all): 0 [state-dump] - num chunks received failed / cancelled: 0 [state-dump] - num chunks received failed / plasma error: 0 [state-dump] Event stats: [state-dump] Global stats: 0 total (0 active) [state-dump] Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Execution time: mean = -nan s, total = 0.000 s [state-dump] Event stats: [state-dump] PushManager: [state-dump] - num pushes in flight: 0 [state-dump] - num chunks in flight: 0 [state-dump] - num chunks remaining: 0 [state-dump] - max chunks allowed: 409 [state-dump] OwnershipBasedObjectDirectory: [state-dump] - num listeners: 0 [state-dump] - cumulative location updates: 0 [state-dump] - num location updates per second: 0.000 [state-dump] - num location lookups per second: 0.000 [state-dump] - num locations added per second: 0.000 [state-dump] - num locations removed per second: 0.000 [state-dump] BufferPool: [state-dump] - create buffer state map size: 0 [state-dump] PullManager: [state-dump] - num bytes available for pulled objects: 2147483648 [state-dump] - num bytes being pulled (all): 0 [state-dump] - num bytes being pulled / pinned: 0 [state-dump] - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - first get request bundle: N/A [state-dump] - first wait request bundle: N/A [state-dump] - first task request bundle: N/A [state-dump] - num objects queued: 0 [state-dump] - num objects actively pulled (all): 0 [state-dump] - num objects actively pulled / pinned: 0 [state-dump] - num bundles being pulled: 0 [state-dump] - num pull retries: 0 [state-dump] - max timeout seconds: 0 [state-dump] - max timeout request is already processed. No entry. [state-dump] [state-dump] WorkerPool: [state-dump] - registered jobs: 1 [state-dump] - process_failed_job_config_missing: 0 [state-dump] - process_failed_rate_limited: 0 [state-dump] - process_failed_pending_registration: 0 [state-dump] - process_failed_runtime_env_setup_failed: 0 [state-dump] - num PYTHON workers: 20 [state-dump] - num PYTHON drivers: 1 [state-dump] - num PYTHON pending start requests: 0 [state-dump] - num PYTHON pending registration requests: 0 [state-dump] - num object spill callbacks queued: 0 [state-dump] - num object restore queued: 0 [state-dump] - num util functions queued: 0 [state-dump] - num idle workers: 20 [state-dump] TaskDependencyManager: [state-dump] - task deps map size: 0 [state-dump] - get req map size: 0 [state-dump] - wait req map size: 0 [state-dump] - local objects map size: 0 [state-dump] WaitManager: [state-dump] - num active wait requests: 0 [state-dump] Subscriber: [state-dump] Channel WORKER_OBJECT_LOCATIONS_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_OBJECT_EVICTION [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_REF_REMOVED_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] num async plasma notifications: 0 [state-dump] Remote node managers: [state-dump] Event stats: [state-dump] Global stats: 575958 total (35 active) [state-dump] Queueing time: mean = 187.790 us, max = 59.826 s, min = -0.001 s, total = 108.159 s [state-dump] Execution time: mean = 11.639 ms, total = 6703.309 s [state-dump] Event stats: [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog - 138558 total (0 active), Execution time: mean = 532.450 us, total = 73.775 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 138558 total (0 active), Execution time: mean = 35.518 us, total = 4.921 s, Queueing time: mean = 111.462 us, max = 3.003 ms, min = 1.846 us, total = 15.444 s [state-dump] RaySyncer.OnDemandBroadcasting - 65936 total (1 active), Execution time: mean = 11.043 us, total = 728.120 ms, Queueing time: mean = 92.675 us, max = 25.869 ms, min = 6.166 us, total = 6.111 s [state-dump] NodeManager.CheckGC - 65936 total (1 active), Execution time: mean = 3.813 us, total = 251.409 ms, Queueing time: mean = 99.049 us, max = 25.875 ms, min = -0.000 s, total = 6.531 s [state-dump] ObjectManager.UpdateAvailableMemory - 65936 total (0 active), Execution time: mean = 6.340 us, total = 418.007 ms, Queueing time: mean = 111.984 us, max = 45.939 ms, min = 2.228 us, total = 7.384 s [state-dump] RayletWorkerPool.deadline_timer.kill_idle_workers - 32987 total (1 active), Execution time: mean = 19.046 us, total = 628.259 ms, Queueing time: mean = 78.031 us, max = 26.386 ms, min = -0.001 s, total = 2.574 s [state-dump] MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 26346 total (1 active), Execution time: mean = 459.328 us, total = 12.101 s, Queueing time: mean = 76.401 us, max = 4.063 ms, min = -0.000 s, total = 2.013 s [state-dump] NodeManager.deadline_timer.flush_free_objects - 6600 total (1 active), Execution time: mean = 9.518 us, total = 62.817 ms, Queueing time: mean = 178.498 us, max = 2.947 ms, min = 3.811 us, total = 1.178 s [state-dump] NodeManager.ScheduleAndDispatchTasks - 6600 total (1 active), Execution time: mean = 17.432 us, total = 115.051 ms, Queueing time: mean = 76.203 us, max = 2.581 ms, min = 7.438 us, total = 502.941 ms [state-dump] NodeManager.deadline_timer.spill_objects_when_over_threshold - 6600 total (1 active), Execution time: mean = 3.163 us, total = 20.878 ms, Queueing time: mean = 182.800 us, max = 2.946 ms, min = 3.845 us, total = 1.206 s [state-dump] NodeManagerService.grpc_server.GetResourceLoad - 6598 total (0 active), Execution time: mean = 612.967 us, total = 4.044 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 6598 total (0 active), Execution time: mean = 97.707 us, total = 644.673 ms, Queueing time: mean = 113.434 us, max = 2.934 ms, min = 4.027 us, total = 748.437 ms [state-dump] ClusterResourceManager.ResetRemoteNodeView - 2201 total (1 active), Execution time: mean = 10.134 us, total = 22.305 ms, Queueing time: mean = 76.100 us, max = 564.135 us, min = 12.181 us, total = 167.497 ms [state-dump] NodeManager.deadline_timer.record_metrics - 1320 total (1 active), Execution time: mean = 584.314 us, total = 771.295 ms, Queueing time: mean = 344.588 us, max = 2.010 ms, min = 9.115 us, total = 454.856 ms [state-dump] NodeManager.GcsCheckAlive - 1320 total (1 active), Execution time: mean = 321.399 us, total = 424.246 ms, Queueing time: mean = 607.067 us, max = 2.567 ms, min = 6.690 us, total = 801.329 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 1320 total (0 active), Execution time: mean = 56.073 us, total = 74.016 ms, Queueing time: mean = 112.337 us, max = 4.779 ms, min = 11.561 us, total = 148.285 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 1320 total (0 active), Execution time: mean = 1.601 ms, total = 2.114 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.deadline_timer.debug_state_dump - 660 total (1 active), Execution time: mean = 1.783 ms, total = 1.177 s, Queueing time: mean = 72.459 us, max = 248.460 us, min = 11.609 us, total = 47.823 ms [state-dump] NodeManager.deadline_timer.print_event_loop_stats - 110 total (1 active, 1 running), Execution time: mean = 2.737 ms, total = 301.028 ms, Queueing time: mean = 75.812 us, max = 325.100 us, min = 9.635 us, total = 8.339 ms [state-dump] ClientConnection.async_read.ProcessMessageHeader - 95 total (21 active), Execution time: mean = 7.620 us, total = 723.861 us, Queueing time: mean = 660.815 ms, max = 59.826 s, min = 29.537 us, total = 62.777 s [state-dump] ClientConnection.async_read.ProcessMessage - 74 total (0 active), Execution time: mean = 752.373 us, total = 55.676 ms, Queueing time: mean = 18.898 us, max = 247.615 us, min = 4.453 us, total = 1.398 ms [state-dump] ClientConnection.async_write.DoAsyncWrites - 22 total (0 active), Execution time: mean = 2.129 us, total = 46.834 us, Queueing time: mean = 43.475 us, max = 264.443 us, min = 17.727 us, total = 956.459 us [state-dump] CoreWorkerService.grpc_client.LocalGC.OnReplyReceived - 21 total (0 active), Execution time: mean = 65.428 us, total = 1.374 ms, Queueing time: mean = 149.534 us, max = 223.485 us, min = 56.603 us, total = 3.140 ms [state-dump] ObjectManager.ObjectAdded - 21 total (0 active), Execution time: mean = 13.522 us, total = 283.961 us, Queueing time: mean = 115.200 us, max = 203.067 us, min = 15.853 us, total = 2.419 ms [state-dump] CoreWorkerService.grpc_client.LocalGC - 21 total (0 active), Execution time: mean = 93.349 ms, total = 1.960 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ObjectManager.ObjectDeleted - 21 total (0 active), Execution time: mean = 18.654 us, total = 391.728 us, Queueing time: mean = 66.932 us, max = 269.040 us, min = 24.280 us, total = 1.406 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 21 total (0 active), Execution time: mean = 119.783 us, total = 2.515 ms, Queueing time: mean = 119.709 us, max = 193.975 us, min = 26.529 us, total = 2.514 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig - 21 total (0 active), Execution time: mean = 800.451 us, total = 16.809 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 13 total (1 active), Execution time: mean = 507.509 s, total = 6597.612 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 204.482 us, total = 2.658 ms, Queueing time: mean = 3.109 ms, max = 9.915 ms, min = 28.359 us, total = 40.413 ms [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 12 total (0 active), Execution time: mean = 379.678 us, total = 4.556 ms, Queueing time: mean = 140.234 us, max = 279.406 us, min = 20.299 us, total = 1.683 ms [state-dump] RaySyncer.BroadcastMessage - 12 total (0 active), Execution time: mean = 229.790 us, total = 2.757 ms, Queueing time: mean = 796.500 ns, max = 1.330 us, min = 180.000 ns, total = 9.558 us [state-dump] - 12 total (0 active), Execution time: mean = 1.109 us, total = 13.308 us, Queueing time: mean = 117.235 us, max = 207.332 us, min = 32.817 us, total = 1.407 ms [state-dump] NodeManagerService.grpc_server.RequestWorkerLease.HandleRequestImpl - 10 total (0 active), Execution time: mean = 133.400 us, total = 1.334 ms, Queueing time: mean = 133.654 us, max = 258.016 us, min = 50.206 us, total = 1.337 ms [state-dump] NodeManagerService.grpc_server.RequestWorkerLease - 10 total (0 active), Execution time: mean = 967.372 us, total = 9.674 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] WorkerPool.PopWorkerCallback - 10 total (0 active), Execution time: mean = 24.488 us, total = 244.878 us, Queueing time: mean = 114.997 us, max = 206.869 us, min = 23.653 us, total = 1.150 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker - 10 total (0 active), Execution time: mean = 698.937 us, total = 6.989 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReturnWorker.HandleRequestImpl - 10 total (0 active), Execution time: mean = 129.906 us, total = 1.299 ms, Queueing time: mean = 109.463 us, max = 137.712 us, min = 31.332 us, total = 1.095 ms [state-dump] NodeManager.GCTaskFailureReason - 8 total (1 active), Execution time: mean = 7.585 us, total = 60.683 us, Queueing time: mean = 59.928 us, max = 97.290 us, min = 24.344 us, total = 479.424 us [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 1.320 ms, total = 2.640 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] RaySyncerRegister - 2 total (0 active), Execution time: mean = 2.053 us, total = 4.106 us, Queueing time: mean = 222.000 ns, max = 369.000 ns, min = 75.000 ns, total = 444.000 ns [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 142.119 us, total = 284.237 us, Queueing time: mean = 702.200 us, max = 1.287 ms, min = 117.240 us, total = 1.404 ms [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 92.080 us, total = 92.080 us, Queueing time: mean = 36.492 us, max = 36.492 us, min = 36.492 us, total = 36.492 us [state-dump] Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 74.750 us, total = 74.750 us, Queueing time: mean = 225.429 us, max = 225.429 us, min = 225.429 us, total = 225.429 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 1.746 ms, total = 1.746 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 1.935 ms, total = 1.935 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 1.022 s, total = 1.022 s, Queueing time: mean = 126.779 us, max = 126.779 us, min = 126.779 us, total = 126.779 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 2.178 ms, total = 2.178 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 43.810 us, total = 43.810 us, Queueing time: mean = 116.080 us, max = 116.080 us, min = 116.080 us, total = 116.080 us [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 2.002 ms, total = 2.002 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 1.278 ms, total = 1.278 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 288.995 us, total = 288.995 us, Queueing time: mean = 122.868 us, max = 122.868 us, min = 122.868 us, total = 122.868 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 242.297 us, total = 242.297 us, Queueing time: mean = 122.977 us, max = 122.977 us, min = 122.977 us, total = 122.977 us [state-dump] DebugString() time ms: 2 [state-dump] [state-dump] [2025-01-21 00:52:54,936 I 13636 13664] (raylet) store.cc:564: Plasma store debug dump: Current usage: 0 / 2.14748 GB - num bytes created total: 168 0 pending objects of total size 0MB - objects spillable: 0 - bytes spillable: 0 - objects unsealed: 0 - bytes unsealed: 0 - objects in use: 0 - bytes in use: 0 - objects evictable: 0 - bytes evictable: 0 - objects created by worker: 0 - bytes created by worker: 0 - objects restored: 0 - bytes restored: 0 - objects received: 0 - bytes received: 0 - objects errored: 0 - bytes errored: 0 [2025-01-21 00:52:56,240 I 13636 13636] (raylet) node_manager.cc:525: [state-dump] NodeManager: [state-dump] Node ID: 94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3 [state-dump] Node name: 192.168.0.2 [state-dump] InitialConfigResources: {object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, GPU: 20000, node:__internal_head__: 10000, accelerator_type:A40: 10000, memory: 844966629380000} [state-dump] ClusterTaskManager: [state-dump] ========== Node: 94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3 ================= [state-dump] Infeasible queue length: 0 [state-dump] Schedule queue length: 0 [state-dump] Dispatch queue length: 0 [state-dump] num_waiting_for_resource: 0 [state-dump] num_waiting_for_plasma_memory: 0 [state-dump] num_waiting_for_remote_node_resources: 0 [state-dump] num_worker_not_started_by_job_config_not_exist: 0 [state-dump] num_worker_not_started_by_registration_timeout: 0 [state-dump] num_tasks_waiting_for_workers: 0 [state-dump] num_cancelled_tasks: 0 [state-dump] cluster_resource_scheduler state: [state-dump] Local id: -2729954820020547849 Local resources: {"total":{CPU: [200000], memory: [844966629380000], accelerator_type:A40: [10000], node:192.168.0.2: [10000], node:__internal_head__: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000]}}, "available": {CPU: [200000], memory: [844966629380000], accelerator_type:A40: [10000], node:192.168.0.2: [10000], node:__internal_head__: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000]}}, "labels":{"ray.io/node_id":"94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3",} is_draining: 0 is_idle: 1 Cluster resources: node id: -2729954820020547849{"total":{object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, GPU: 20000, node:__internal_head__: 10000, accelerator_type:A40: 10000, memory: 844966629380000}}, "available": {GPU: 20000, node:__internal_head__: 10000, memory: 844966629380000, accelerator_type:A40: 10000, object_store_memory: 21474836480000, node:192.168.0.2: 10000, CPU: 200000}}, "labels":{"ray.io/node_id":"94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} [state-dump] Waiting tasks size: 0 [state-dump] Number of executing tasks: 0 [state-dump] Number of pinned task arguments: 0 [state-dump] Number of total spilled tasks: 0 [state-dump] Number of spilled waiting tasks: 0 [state-dump] Number of spilled unschedulable tasks: 0 [state-dump] Resource usage { [state-dump] } [state-dump] Backlog Size per scheduling descriptor :{workerId: num backlogs}: [state-dump] [state-dump] Running tasks by scheduling class: [state-dump] ================================================== [state-dump] [state-dump] ClusterResources: [state-dump] LocalObjectManager: [state-dump] - num pinned objects: 0 [state-dump] - pinned objects size: 0 [state-dump] - num objects pending restore: 0 [state-dump] - num objects pending spill: 0 [state-dump] - num bytes pending spill: 0 [state-dump] - num bytes currently spilled: 0 [state-dump] - cumulative spill requests: 0 [state-dump] - cumulative restore requests: 0 [state-dump] - spilled objects pending delete: 0 [state-dump] [state-dump] ObjectManager: [state-dump] - num local objects: 0 [state-dump] - num unfulfilled push requests: 0 [state-dump] - num object pull requests: 0 [state-dump] - num chunks received total: 0 [state-dump] - num chunks received failed (all): 0 [state-dump] - num chunks received failed / cancelled: 0 [state-dump] - num chunks received failed / plasma error: 0 [state-dump] Event stats: [state-dump] Global stats: 0 total (0 active) [state-dump] Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Execution time: mean = -nan s, total = 0.000 s [state-dump] Event stats: [state-dump] PushManager: [state-dump] - num pushes in flight: 0 [state-dump] - num chunks in flight: 0 [state-dump] - num chunks remaining: 0 [state-dump] - max chunks allowed: 409 [state-dump] OwnershipBasedObjectDirectory: [state-dump] - num listeners: 0 [state-dump] - cumulative location updates: 0 [state-dump] - num location updates per second: 0.000 [state-dump] - num location lookups per second: 0.000 [state-dump] - num locations added per second: 0.000 [state-dump] - num locations removed per second: 0.000 [state-dump] BufferPool: [state-dump] - create buffer state map size: 0 [state-dump] PullManager: [state-dump] - num bytes available for pulled objects: 2147483648 [state-dump] - num bytes being pulled (all): 0 [state-dump] - num bytes being pulled / pinned: 0 [state-dump] - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - first get request bundle: N/A [state-dump] - first wait request bundle: N/A [state-dump] - first task request bundle: N/A [state-dump] - num objects queued: 0 [state-dump] - num objects actively pulled (all): 0 [state-dump] - num objects actively pulled / pinned: 0 [state-dump] - num bundles being pulled: 0 [state-dump] - num pull retries: 0 [state-dump] - max timeout seconds: 0 [state-dump] - max timeout request is already processed. No entry. [state-dump] [state-dump] WorkerPool: [state-dump] - registered jobs: 1 [state-dump] - process_failed_job_config_missing: 0 [state-dump] - process_failed_rate_limited: 0 [state-dump] - process_failed_pending_registration: 0 [state-dump] - process_failed_runtime_env_setup_failed: 0 [state-dump] - num PYTHON workers: 20 [state-dump] - num PYTHON drivers: 1 [state-dump] - num PYTHON pending start requests: 0 [state-dump] - num PYTHON pending registration requests: 0 [state-dump] - num object spill callbacks queued: 0 [state-dump] - num object restore queued: 0 [state-dump] - num util functions queued: 0 [state-dump] - num idle workers: 20 [state-dump] TaskDependencyManager: [state-dump] - task deps map size: 0 [state-dump] - get req map size: 0 [state-dump] - wait req map size: 0 [state-dump] - local objects map size: 0 [state-dump] WaitManager: [state-dump] - num active wait requests: 0 [state-dump] Subscriber: [state-dump] Channel WORKER_OBJECT_LOCATIONS_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_OBJECT_EVICTION [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_REF_REMOVED_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] num async plasma notifications: 0 [state-dump] Remote node managers: [state-dump] Event stats: [state-dump] Global stats: 581191 total (35 active) [state-dump] Queueing time: mean = 186.842 us, max = 59.826 s, min = -0.001 s, total = 108.591 s [state-dump] Execution time: mean = 11.535 ms, total = 6704.255 s [state-dump] Event stats: [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog - 139818 total (0 active), Execution time: mean = 532.523 us, total = 74.456 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 139818 total (0 active), Execution time: mean = 35.524 us, total = 4.967 s, Queueing time: mean = 111.498 us, max = 3.003 ms, min = 1.846 us, total = 15.589 s [state-dump] RaySyncer.OnDemandBroadcasting - 66536 total (1 active), Execution time: mean = 11.061 us, total = 735.943 ms, Queueing time: mean = 92.776 us, max = 25.869 ms, min = 6.166 us, total = 6.173 s [state-dump] NodeManager.CheckGC - 66536 total (1 active), Execution time: mean = 3.807 us, total = 253.291 ms, Queueing time: mean = 99.173 us, max = 25.875 ms, min = -0.000 s, total = 6.599 s [state-dump] ObjectManager.UpdateAvailableMemory - 66536 total (0 active), Execution time: mean = 6.346 us, total = 422.207 ms, Queueing time: mean = 112.003 us, max = 45.939 ms, min = 2.228 us, total = 7.452 s [state-dump] RayletWorkerPool.deadline_timer.kill_idle_workers - 33286 total (1 active), Execution time: mean = 19.067 us, total = 634.671 ms, Queueing time: mean = 78.074 us, max = 26.386 ms, min = -0.001 s, total = 2.599 s [state-dump] MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 26585 total (1 active), Execution time: mean = 459.416 us, total = 12.214 s, Queueing time: mean = 76.453 us, max = 4.063 ms, min = -0.000 s, total = 2.033 s [state-dump] NodeManager.deadline_timer.flush_free_objects - 6660 total (1 active), Execution time: mean = 9.539 us, total = 63.531 ms, Queueing time: mean = 178.295 us, max = 2.947 ms, min = 2.735 us, total = 1.187 s [state-dump] NodeManager.ScheduleAndDispatchTasks - 6660 total (1 active), Execution time: mean = 17.454 us, total = 116.244 ms, Queueing time: mean = 76.217 us, max = 2.581 ms, min = 7.438 us, total = 507.605 ms [state-dump] NodeManager.deadline_timer.spill_objects_when_over_threshold - 6660 total (1 active), Execution time: mean = 3.165 us, total = 21.080 ms, Queueing time: mean = 182.610 us, max = 2.946 ms, min = 3.845 us, total = 1.216 s [state-dump] NodeManagerService.grpc_server.GetResourceLoad - 6658 total (0 active), Execution time: mean = 612.988 us, total = 4.081 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 6658 total (0 active), Execution time: mean = 97.726 us, total = 650.661 ms, Queueing time: mean = 113.423 us, max = 2.934 ms, min = 4.027 us, total = 755.168 ms [state-dump] ClusterResourceManager.ResetRemoteNodeView - 2221 total (1 active), Execution time: mean = 10.136 us, total = 22.512 ms, Queueing time: mean = 76.144 us, max = 564.135 us, min = 12.181 us, total = 169.116 ms [state-dump] NodeManager.deadline_timer.record_metrics - 1332 total (1 active), Execution time: mean = 583.987 us, total = 777.871 ms, Queueing time: mean = 343.972 us, max = 2.010 ms, min = 9.115 us, total = 458.170 ms [state-dump] NodeManager.GcsCheckAlive - 1332 total (1 active), Execution time: mean = 321.334 us, total = 428.016 ms, Queueing time: mean = 606.210 us, max = 2.567 ms, min = 6.690 us, total = 807.472 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 1332 total (0 active), Execution time: mean = 56.091 us, total = 74.713 ms, Queueing time: mean = 112.366 us, max = 4.779 ms, min = 11.561 us, total = 149.672 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 1332 total (0 active), Execution time: mean = 1.600 ms, total = 2.131 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.deadline_timer.debug_state_dump - 666 total (1 active), Execution time: mean = 1.781 ms, total = 1.186 s, Queueing time: mean = 72.469 us, max = 248.460 us, min = 11.609 us, total = 48.265 ms [state-dump] NodeManager.deadline_timer.print_event_loop_stats - 111 total (1 active, 1 running), Execution time: mean = 2.740 ms, total = 304.124 ms, Queueing time: mean = 75.713 us, max = 325.100 us, min = 9.635 us, total = 8.404 ms [state-dump] ClientConnection.async_read.ProcessMessageHeader - 95 total (21 active), Execution time: mean = 7.620 us, total = 723.861 us, Queueing time: mean = 660.815 ms, max = 59.826 s, min = 29.537 us, total = 62.777 s [state-dump] ClientConnection.async_read.ProcessMessage - 74 total (0 active), Execution time: mean = 752.373 us, total = 55.676 ms, Queueing time: mean = 18.898 us, max = 247.615 us, min = 4.453 us, total = 1.398 ms [state-dump] ClientConnection.async_write.DoAsyncWrites - 22 total (0 active), Execution time: mean = 2.129 us, total = 46.834 us, Queueing time: mean = 43.475 us, max = 264.443 us, min = 17.727 us, total = 956.459 us [state-dump] CoreWorkerService.grpc_client.LocalGC.OnReplyReceived - 21 total (0 active), Execution time: mean = 65.428 us, total = 1.374 ms, Queueing time: mean = 149.534 us, max = 223.485 us, min = 56.603 us, total = 3.140 ms [state-dump] ObjectManager.ObjectAdded - 21 total (0 active), Execution time: mean = 13.522 us, total = 283.961 us, Queueing time: mean = 115.200 us, max = 203.067 us, min = 15.853 us, total = 2.419 ms [state-dump] CoreWorkerService.grpc_client.LocalGC - 21 total (0 active), Execution time: mean = 93.349 ms, total = 1.960 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ObjectManager.ObjectDeleted - 21 total (0 active), Execution time: mean = 18.654 us, total = 391.728 us, Queueing time: mean = 66.932 us, max = 269.040 us, min = 24.280 us, total = 1.406 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 21 total (0 active), Execution time: mean = 119.783 us, total = 2.515 ms, Queueing time: mean = 119.709 us, max = 193.975 us, min = 26.529 us, total = 2.514 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig - 21 total (0 active), Execution time: mean = 800.451 us, total = 16.809 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 13 total (1 active), Execution time: mean = 507.509 s, total = 6597.612 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 204.482 us, total = 2.658 ms, Queueing time: mean = 3.109 ms, max = 9.915 ms, min = 28.359 us, total = 40.413 ms [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 12 total (0 active), Execution time: mean = 379.678 us, total = 4.556 ms, Queueing time: mean = 140.234 us, max = 279.406 us, min = 20.299 us, total = 1.683 ms [state-dump] RaySyncer.BroadcastMessage - 12 total (0 active), Execution time: mean = 229.790 us, total = 2.757 ms, Queueing time: mean = 796.500 ns, max = 1.330 us, min = 180.000 ns, total = 9.558 us [state-dump] - 12 total (0 active), Execution time: mean = 1.109 us, total = 13.308 us, Queueing time: mean = 117.235 us, max = 207.332 us, min = 32.817 us, total = 1.407 ms [state-dump] NodeManagerService.grpc_server.RequestWorkerLease.HandleRequestImpl - 10 total (0 active), Execution time: mean = 133.400 us, total = 1.334 ms, Queueing time: mean = 133.654 us, max = 258.016 us, min = 50.206 us, total = 1.337 ms [state-dump] NodeManagerService.grpc_server.RequestWorkerLease - 10 total (0 active), Execution time: mean = 967.372 us, total = 9.674 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] WorkerPool.PopWorkerCallback - 10 total (0 active), Execution time: mean = 24.488 us, total = 244.878 us, Queueing time: mean = 114.997 us, max = 206.869 us, min = 23.653 us, total = 1.150 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker - 10 total (0 active), Execution time: mean = 698.937 us, total = 6.989 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReturnWorker.HandleRequestImpl - 10 total (0 active), Execution time: mean = 129.906 us, total = 1.299 ms, Queueing time: mean = 109.463 us, max = 137.712 us, min = 31.332 us, total = 1.095 ms [state-dump] NodeManager.GCTaskFailureReason - 8 total (1 active), Execution time: mean = 7.585 us, total = 60.683 us, Queueing time: mean = 59.928 us, max = 97.290 us, min = 24.344 us, total = 479.424 us [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 1.320 ms, total = 2.640 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] RaySyncerRegister - 2 total (0 active), Execution time: mean = 2.053 us, total = 4.106 us, Queueing time: mean = 222.000 ns, max = 369.000 ns, min = 75.000 ns, total = 444.000 ns [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 142.119 us, total = 284.237 us, Queueing time: mean = 702.200 us, max = 1.287 ms, min = 117.240 us, total = 1.404 ms [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 92.080 us, total = 92.080 us, Queueing time: mean = 36.492 us, max = 36.492 us, min = 36.492 us, total = 36.492 us [state-dump] Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 74.750 us, total = 74.750 us, Queueing time: mean = 225.429 us, max = 225.429 us, min = 225.429 us, total = 225.429 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 1.746 ms, total = 1.746 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 1.935 ms, total = 1.935 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 1.022 s, total = 1.022 s, Queueing time: mean = 126.779 us, max = 126.779 us, min = 126.779 us, total = 126.779 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 2.178 ms, total = 2.178 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 43.810 us, total = 43.810 us, Queueing time: mean = 116.080 us, max = 116.080 us, min = 116.080 us, total = 116.080 us [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 2.002 ms, total = 2.002 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 1.278 ms, total = 1.278 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 288.995 us, total = 288.995 us, Queueing time: mean = 122.868 us, max = 122.868 us, min = 122.868 us, total = 122.868 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 242.297 us, total = 242.297 us, Queueing time: mean = 122.977 us, max = 122.977 us, min = 122.977 us, total = 122.977 us [state-dump] DebugString() time ms: 2 [state-dump] [state-dump] [2025-01-21 00:53:54,936 I 13636 13664] (raylet) store.cc:564: Plasma store debug dump: Current usage: 0 / 2.14748 GB - num bytes created total: 168 0 pending objects of total size 0MB - objects spillable: 0 - bytes spillable: 0 - objects unsealed: 0 - bytes unsealed: 0 - objects in use: 0 - bytes in use: 0 - objects evictable: 0 - bytes evictable: 0 - objects created by worker: 0 - bytes created by worker: 0 - objects restored: 0 - bytes restored: 0 - objects received: 0 - bytes received: 0 - objects errored: 0 - bytes errored: 0 [2025-01-21 00:53:56,243 I 13636 13636] (raylet) node_manager.cc:525: [state-dump] NodeManager: [state-dump] Node ID: 94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3 [state-dump] Node name: 192.168.0.2 [state-dump] InitialConfigResources: {object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, GPU: 20000, node:__internal_head__: 10000, accelerator_type:A40: 10000, memory: 844966629380000} [state-dump] ClusterTaskManager: [state-dump] ========== Node: 94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3 ================= [state-dump] Infeasible queue length: 0 [state-dump] Schedule queue length: 0 [state-dump] Dispatch queue length: 0 [state-dump] num_waiting_for_resource: 0 [state-dump] num_waiting_for_plasma_memory: 0 [state-dump] num_waiting_for_remote_node_resources: 0 [state-dump] num_worker_not_started_by_job_config_not_exist: 0 [state-dump] num_worker_not_started_by_registration_timeout: 0 [state-dump] num_tasks_waiting_for_workers: 0 [state-dump] num_cancelled_tasks: 0 [state-dump] cluster_resource_scheduler state: [state-dump] Local id: -2729954820020547849 Local resources: {"total":{CPU: [200000], memory: [844966629380000], accelerator_type:A40: [10000], node:192.168.0.2: [10000], node:__internal_head__: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000]}}, "available": {CPU: [200000], memory: [844966629380000], accelerator_type:A40: [10000], node:192.168.0.2: [10000], node:__internal_head__: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000]}}, "labels":{"ray.io/node_id":"94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3",} is_draining: 0 is_idle: 1 Cluster resources: node id: -2729954820020547849{"total":{object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, GPU: 20000, node:__internal_head__: 10000, accelerator_type:A40: 10000, memory: 844966629380000}}, "available": {GPU: 20000, node:__internal_head__: 10000, memory: 844966629380000, accelerator_type:A40: 10000, object_store_memory: 21474836480000, node:192.168.0.2: 10000, CPU: 200000}}, "labels":{"ray.io/node_id":"94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} [state-dump] Waiting tasks size: 0 [state-dump] Number of executing tasks: 0 [state-dump] Number of pinned task arguments: 0 [state-dump] Number of total spilled tasks: 0 [state-dump] Number of spilled waiting tasks: 0 [state-dump] Number of spilled unschedulable tasks: 0 [state-dump] Resource usage { [state-dump] } [state-dump] Backlog Size per scheduling descriptor :{workerId: num backlogs}: [state-dump] [state-dump] Running tasks by scheduling class: [state-dump] ================================================== [state-dump] [state-dump] ClusterResources: [state-dump] LocalObjectManager: [state-dump] - num pinned objects: 0 [state-dump] - pinned objects size: 0 [state-dump] - num objects pending restore: 0 [state-dump] - num objects pending spill: 0 [state-dump] - num bytes pending spill: 0 [state-dump] - num bytes currently spilled: 0 [state-dump] - cumulative spill requests: 0 [state-dump] - cumulative restore requests: 0 [state-dump] - spilled objects pending delete: 0 [state-dump] [state-dump] ObjectManager: [state-dump] - num local objects: 0 [state-dump] - num unfulfilled push requests: 0 [state-dump] - num object pull requests: 0 [state-dump] - num chunks received total: 0 [state-dump] - num chunks received failed (all): 0 [state-dump] - num chunks received failed / cancelled: 0 [state-dump] - num chunks received failed / plasma error: 0 [state-dump] Event stats: [state-dump] Global stats: 0 total (0 active) [state-dump] Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Execution time: mean = -nan s, total = 0.000 s [state-dump] Event stats: [state-dump] PushManager: [state-dump] - num pushes in flight: 0 [state-dump] - num chunks in flight: 0 [state-dump] - num chunks remaining: 0 [state-dump] - max chunks allowed: 409 [state-dump] OwnershipBasedObjectDirectory: [state-dump] - num listeners: 0 [state-dump] - cumulative location updates: 0 [state-dump] - num location updates per second: 0.000 [state-dump] - num location lookups per second: 0.000 [state-dump] - num locations added per second: 0.000 [state-dump] - num locations removed per second: 0.000 [state-dump] BufferPool: [state-dump] - create buffer state map size: 0 [state-dump] PullManager: [state-dump] - num bytes available for pulled objects: 2147483648 [state-dump] - num bytes being pulled (all): 0 [state-dump] - num bytes being pulled / pinned: 0 [state-dump] - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - first get request bundle: N/A [state-dump] - first wait request bundle: N/A [state-dump] - first task request bundle: N/A [state-dump] - num objects queued: 0 [state-dump] - num objects actively pulled (all): 0 [state-dump] - num objects actively pulled / pinned: 0 [state-dump] - num bundles being pulled: 0 [state-dump] - num pull retries: 0 [state-dump] - max timeout seconds: 0 [state-dump] - max timeout request is already processed. No entry. [state-dump] [state-dump] WorkerPool: [state-dump] - registered jobs: 1 [state-dump] - process_failed_job_config_missing: 0 [state-dump] - process_failed_rate_limited: 0 [state-dump] - process_failed_pending_registration: 0 [state-dump] - process_failed_runtime_env_setup_failed: 0 [state-dump] - num PYTHON workers: 20 [state-dump] - num PYTHON drivers: 1 [state-dump] - num PYTHON pending start requests: 0 [state-dump] - num PYTHON pending registration requests: 0 [state-dump] - num object spill callbacks queued: 0 [state-dump] - num object restore queued: 0 [state-dump] - num util functions queued: 0 [state-dump] - num idle workers: 20 [state-dump] TaskDependencyManager: [state-dump] - task deps map size: 0 [state-dump] - get req map size: 0 [state-dump] - wait req map size: 0 [state-dump] - local objects map size: 0 [state-dump] WaitManager: [state-dump] - num active wait requests: 0 [state-dump] Subscriber: [state-dump] Channel WORKER_OBJECT_LOCATIONS_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_OBJECT_EVICTION [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_REF_REMOVED_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] num async plasma notifications: 0 [state-dump] Remote node managers: [state-dump] Event stats: [state-dump] Global stats: 586423 total (35 active) [state-dump] Queueing time: mean = 185.908 us, max = 59.826 s, min = -0.001 s, total = 109.021 s [state-dump] Execution time: mean = 11.434 ms, total = 6705.213 s [state-dump] Event stats: [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog - 141078 total (0 active), Execution time: mean = 532.607 us, total = 75.139 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 141078 total (0 active), Execution time: mean = 35.541 us, total = 5.014 s, Queueing time: mean = 111.505 us, max = 3.003 ms, min = 1.846 us, total = 15.731 s [state-dump] RaySyncer.OnDemandBroadcasting - 67135 total (1 active), Execution time: mean = 11.070 us, total = 743.158 ms, Queueing time: mean = 92.812 us, max = 25.869 ms, min = 6.166 us, total = 6.231 s [state-dump] NodeManager.CheckGC - 67135 total (1 active), Execution time: mean = 3.801 us, total = 255.208 ms, Queueing time: mean = 99.222 us, max = 25.875 ms, min = -0.000 s, total = 6.661 s [state-dump] ObjectManager.UpdateAvailableMemory - 67135 total (0 active), Execution time: mean = 6.353 us, total = 426.540 ms, Queueing time: mean = 112.088 us, max = 45.939 ms, min = 2.228 us, total = 7.525 s [state-dump] RayletWorkerPool.deadline_timer.kill_idle_workers - 33586 total (1 active), Execution time: mean = 19.084 us, total = 640.956 ms, Queueing time: mean = 78.088 us, max = 26.386 ms, min = -0.001 s, total = 2.623 s [state-dump] MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 26825 total (1 active), Execution time: mean = 459.616 us, total = 12.329 s, Queueing time: mean = 76.495 us, max = 4.063 ms, min = -0.000 s, total = 2.052 s [state-dump] NodeManager.deadline_timer.flush_free_objects - 6720 total (1 active), Execution time: mean = 9.547 us, total = 64.159 ms, Queueing time: mean = 178.416 us, max = 2.947 ms, min = 2.735 us, total = 1.199 s [state-dump] NodeManager.ScheduleAndDispatchTasks - 6720 total (1 active), Execution time: mean = 17.491 us, total = 117.537 ms, Queueing time: mean = 76.251 us, max = 2.581 ms, min = 7.438 us, total = 512.405 ms [state-dump] NodeManager.deadline_timer.spill_objects_when_over_threshold - 6720 total (1 active), Execution time: mean = 3.166 us, total = 21.273 ms, Queueing time: mean = 182.735 us, max = 2.946 ms, min = 3.845 us, total = 1.228 s [state-dump] NodeManagerService.grpc_server.GetResourceLoad - 6718 total (0 active), Execution time: mean = 613.180 us, total = 4.119 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 6718 total (0 active), Execution time: mean = 97.758 us, total = 656.738 ms, Queueing time: mean = 113.479 us, max = 2.934 ms, min = 4.027 us, total = 762.353 ms [state-dump] ClusterResourceManager.ResetRemoteNodeView - 2241 total (1 active), Execution time: mean = 10.139 us, total = 22.722 ms, Queueing time: mean = 76.186 us, max = 564.135 us, min = 12.181 us, total = 170.733 ms [state-dump] NodeManager.deadline_timer.record_metrics - 1344 total (1 active), Execution time: mean = 583.817 us, total = 784.651 ms, Queueing time: mean = 344.691 us, max = 2.010 ms, min = 9.115 us, total = 463.265 ms [state-dump] NodeManager.GcsCheckAlive - 1344 total (1 active), Execution time: mean = 321.601 us, total = 432.231 ms, Queueing time: mean = 606.488 us, max = 2.567 ms, min = 6.690 us, total = 815.120 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 1344 total (0 active), Execution time: mean = 56.136 us, total = 75.446 ms, Queueing time: mean = 112.311 us, max = 4.779 ms, min = 11.561 us, total = 150.946 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 1344 total (0 active), Execution time: mean = 1.601 ms, total = 2.151 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.deadline_timer.debug_state_dump - 672 total (1 active), Execution time: mean = 1.782 ms, total = 1.198 s, Queueing time: mean = 72.460 us, max = 248.460 us, min = 11.609 us, total = 48.693 ms [state-dump] NodeManager.deadline_timer.print_event_loop_stats - 112 total (1 active, 1 running), Execution time: mean = 2.742 ms, total = 307.160 ms, Queueing time: mean = 75.813 us, max = 325.100 us, min = 9.635 us, total = 8.491 ms [state-dump] ClientConnection.async_read.ProcessMessageHeader - 95 total (21 active), Execution time: mean = 7.620 us, total = 723.861 us, Queueing time: mean = 660.815 ms, max = 59.826 s, min = 29.537 us, total = 62.777 s [state-dump] ClientConnection.async_read.ProcessMessage - 74 total (0 active), Execution time: mean = 752.373 us, total = 55.676 ms, Queueing time: mean = 18.898 us, max = 247.615 us, min = 4.453 us, total = 1.398 ms [state-dump] ClientConnection.async_write.DoAsyncWrites - 22 total (0 active), Execution time: mean = 2.129 us, total = 46.834 us, Queueing time: mean = 43.475 us, max = 264.443 us, min = 17.727 us, total = 956.459 us [state-dump] CoreWorkerService.grpc_client.LocalGC.OnReplyReceived - 21 total (0 active), Execution time: mean = 65.428 us, total = 1.374 ms, Queueing time: mean = 149.534 us, max = 223.485 us, min = 56.603 us, total = 3.140 ms [state-dump] ObjectManager.ObjectAdded - 21 total (0 active), Execution time: mean = 13.522 us, total = 283.961 us, Queueing time: mean = 115.200 us, max = 203.067 us, min = 15.853 us, total = 2.419 ms [state-dump] CoreWorkerService.grpc_client.LocalGC - 21 total (0 active), Execution time: mean = 93.349 ms, total = 1.960 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ObjectManager.ObjectDeleted - 21 total (0 active), Execution time: mean = 18.654 us, total = 391.728 us, Queueing time: mean = 66.932 us, max = 269.040 us, min = 24.280 us, total = 1.406 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 21 total (0 active), Execution time: mean = 119.783 us, total = 2.515 ms, Queueing time: mean = 119.709 us, max = 193.975 us, min = 26.529 us, total = 2.514 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig - 21 total (0 active), Execution time: mean = 800.451 us, total = 16.809 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 13 total (1 active), Execution time: mean = 507.509 s, total = 6597.612 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 204.482 us, total = 2.658 ms, Queueing time: mean = 3.109 ms, max = 9.915 ms, min = 28.359 us, total = 40.413 ms [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 12 total (0 active), Execution time: mean = 379.678 us, total = 4.556 ms, Queueing time: mean = 140.234 us, max = 279.406 us, min = 20.299 us, total = 1.683 ms [state-dump] RaySyncer.BroadcastMessage - 12 total (0 active), Execution time: mean = 229.790 us, total = 2.757 ms, Queueing time: mean = 796.500 ns, max = 1.330 us, min = 180.000 ns, total = 9.558 us [state-dump] - 12 total (0 active), Execution time: mean = 1.109 us, total = 13.308 us, Queueing time: mean = 117.235 us, max = 207.332 us, min = 32.817 us, total = 1.407 ms [state-dump] NodeManagerService.grpc_server.RequestWorkerLease.HandleRequestImpl - 10 total (0 active), Execution time: mean = 133.400 us, total = 1.334 ms, Queueing time: mean = 133.654 us, max = 258.016 us, min = 50.206 us, total = 1.337 ms [state-dump] NodeManagerService.grpc_server.RequestWorkerLease - 10 total (0 active), Execution time: mean = 967.372 us, total = 9.674 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] WorkerPool.PopWorkerCallback - 10 total (0 active), Execution time: mean = 24.488 us, total = 244.878 us, Queueing time: mean = 114.997 us, max = 206.869 us, min = 23.653 us, total = 1.150 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker - 10 total (0 active), Execution time: mean = 698.937 us, total = 6.989 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReturnWorker.HandleRequestImpl - 10 total (0 active), Execution time: mean = 129.906 us, total = 1.299 ms, Queueing time: mean = 109.463 us, max = 137.712 us, min = 31.332 us, total = 1.095 ms [state-dump] NodeManager.GCTaskFailureReason - 8 total (1 active), Execution time: mean = 7.585 us, total = 60.683 us, Queueing time: mean = 59.928 us, max = 97.290 us, min = 24.344 us, total = 479.424 us [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 1.320 ms, total = 2.640 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] RaySyncerRegister - 2 total (0 active), Execution time: mean = 2.053 us, total = 4.106 us, Queueing time: mean = 222.000 ns, max = 369.000 ns, min = 75.000 ns, total = 444.000 ns [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 142.119 us, total = 284.237 us, Queueing time: mean = 702.200 us, max = 1.287 ms, min = 117.240 us, total = 1.404 ms [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 92.080 us, total = 92.080 us, Queueing time: mean = 36.492 us, max = 36.492 us, min = 36.492 us, total = 36.492 us [state-dump] Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 74.750 us, total = 74.750 us, Queueing time: mean = 225.429 us, max = 225.429 us, min = 225.429 us, total = 225.429 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 1.746 ms, total = 1.746 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 1.935 ms, total = 1.935 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 1.022 s, total = 1.022 s, Queueing time: mean = 126.779 us, max = 126.779 us, min = 126.779 us, total = 126.779 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 2.178 ms, total = 2.178 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 43.810 us, total = 43.810 us, Queueing time: mean = 116.080 us, max = 116.080 us, min = 116.080 us, total = 116.080 us [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 2.002 ms, total = 2.002 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 1.278 ms, total = 1.278 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 288.995 us, total = 288.995 us, Queueing time: mean = 122.868 us, max = 122.868 us, min = 122.868 us, total = 122.868 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 242.297 us, total = 242.297 us, Queueing time: mean = 122.977 us, max = 122.977 us, min = 122.977 us, total = 122.977 us [state-dump] DebugString() time ms: 1 [state-dump] [state-dump] [2025-01-21 00:54:54,937 I 13636 13664] (raylet) store.cc:564: Plasma store debug dump: Current usage: 0 / 2.14748 GB - num bytes created total: 168 0 pending objects of total size 0MB - objects spillable: 0 - bytes spillable: 0 - objects unsealed: 0 - bytes unsealed: 0 - objects in use: 0 - bytes in use: 0 - objects evictable: 0 - bytes evictable: 0 - objects created by worker: 0 - bytes created by worker: 0 - objects restored: 0 - bytes restored: 0 - objects received: 0 - bytes received: 0 - objects errored: 0 - bytes errored: 0 [2025-01-21 00:54:56,246 I 13636 13636] (raylet) node_manager.cc:525: [state-dump] NodeManager: [state-dump] Node ID: 94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3 [state-dump] Node name: 192.168.0.2 [state-dump] InitialConfigResources: {object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, GPU: 20000, node:__internal_head__: 10000, accelerator_type:A40: 10000, memory: 844966629380000} [state-dump] ClusterTaskManager: [state-dump] ========== Node: 94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3 ================= [state-dump] Infeasible queue length: 0 [state-dump] Schedule queue length: 0 [state-dump] Dispatch queue length: 0 [state-dump] num_waiting_for_resource: 0 [state-dump] num_waiting_for_plasma_memory: 0 [state-dump] num_waiting_for_remote_node_resources: 0 [state-dump] num_worker_not_started_by_job_config_not_exist: 0 [state-dump] num_worker_not_started_by_registration_timeout: 0 [state-dump] num_tasks_waiting_for_workers: 0 [state-dump] num_cancelled_tasks: 0 [state-dump] cluster_resource_scheduler state: [state-dump] Local id: -2729954820020547849 Local resources: {"total":{CPU: [200000], memory: [844966629380000], accelerator_type:A40: [10000], node:192.168.0.2: [10000], node:__internal_head__: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000]}}, "available": {CPU: [200000], memory: [844966629380000], accelerator_type:A40: [10000], node:192.168.0.2: [10000], node:__internal_head__: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000]}}, "labels":{"ray.io/node_id":"94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3",} is_draining: 0 is_idle: 1 Cluster resources: node id: -2729954820020547849{"total":{object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, GPU: 20000, node:__internal_head__: 10000, accelerator_type:A40: 10000, memory: 844966629380000}}, "available": {GPU: 20000, node:__internal_head__: 10000, memory: 844966629380000, accelerator_type:A40: 10000, object_store_memory: 21474836480000, node:192.168.0.2: 10000, CPU: 200000}}, "labels":{"ray.io/node_id":"94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} [state-dump] Waiting tasks size: 0 [state-dump] Number of executing tasks: 0 [state-dump] Number of pinned task arguments: 0 [state-dump] Number of total spilled tasks: 0 [state-dump] Number of spilled waiting tasks: 0 [state-dump] Number of spilled unschedulable tasks: 0 [state-dump] Resource usage { [state-dump] } [state-dump] Backlog Size per scheduling descriptor :{workerId: num backlogs}: [state-dump] [state-dump] Running tasks by scheduling class: [state-dump] ================================================== [state-dump] [state-dump] ClusterResources: [state-dump] LocalObjectManager: [state-dump] - num pinned objects: 0 [state-dump] - pinned objects size: 0 [state-dump] - num objects pending restore: 0 [state-dump] - num objects pending spill: 0 [state-dump] - num bytes pending spill: 0 [state-dump] - num bytes currently spilled: 0 [state-dump] - cumulative spill requests: 0 [state-dump] - cumulative restore requests: 0 [state-dump] - spilled objects pending delete: 0 [state-dump] [state-dump] ObjectManager: [state-dump] - num local objects: 0 [state-dump] - num unfulfilled push requests: 0 [state-dump] - num object pull requests: 0 [state-dump] - num chunks received total: 0 [state-dump] - num chunks received failed (all): 0 [state-dump] - num chunks received failed / cancelled: 0 [state-dump] - num chunks received failed / plasma error: 0 [state-dump] Event stats: [state-dump] Global stats: 0 total (0 active) [state-dump] Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Execution time: mean = -nan s, total = 0.000 s [state-dump] Event stats: [state-dump] PushManager: [state-dump] - num pushes in flight: 0 [state-dump] - num chunks in flight: 0 [state-dump] - num chunks remaining: 0 [state-dump] - max chunks allowed: 409 [state-dump] OwnershipBasedObjectDirectory: [state-dump] - num listeners: 0 [state-dump] - cumulative location updates: 0 [state-dump] - num location updates per second: 0.000 [state-dump] - num location lookups per second: 0.000 [state-dump] - num locations added per second: 0.000 [state-dump] - num locations removed per second: 0.000 [state-dump] BufferPool: [state-dump] - create buffer state map size: 0 [state-dump] PullManager: [state-dump] - num bytes available for pulled objects: 2147483648 [state-dump] - num bytes being pulled (all): 0 [state-dump] - num bytes being pulled / pinned: 0 [state-dump] - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - first get request bundle: N/A [state-dump] - first wait request bundle: N/A [state-dump] - first task request bundle: N/A [state-dump] - num objects queued: 0 [state-dump] - num objects actively pulled (all): 0 [state-dump] - num objects actively pulled / pinned: 0 [state-dump] - num bundles being pulled: 0 [state-dump] - num pull retries: 0 [state-dump] - max timeout seconds: 0 [state-dump] - max timeout request is already processed. No entry. [state-dump] [state-dump] WorkerPool: [state-dump] - registered jobs: 1 [state-dump] - process_failed_job_config_missing: 0 [state-dump] - process_failed_rate_limited: 0 [state-dump] - process_failed_pending_registration: 0 [state-dump] - process_failed_runtime_env_setup_failed: 0 [state-dump] - num PYTHON workers: 20 [state-dump] - num PYTHON drivers: 1 [state-dump] - num PYTHON pending start requests: 0 [state-dump] - num PYTHON pending registration requests: 0 [state-dump] - num object spill callbacks queued: 0 [state-dump] - num object restore queued: 0 [state-dump] - num util functions queued: 0 [state-dump] - num idle workers: 20 [state-dump] TaskDependencyManager: [state-dump] - task deps map size: 0 [state-dump] - get req map size: 0 [state-dump] - wait req map size: 0 [state-dump] - local objects map size: 0 [state-dump] WaitManager: [state-dump] - num active wait requests: 0 [state-dump] Subscriber: [state-dump] Channel WORKER_OBJECT_LOCATIONS_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_OBJECT_EVICTION [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_REF_REMOVED_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] num async plasma notifications: 0 [state-dump] Remote node managers: [state-dump] Event stats: [state-dump] Global stats: 591654 total (35 active) [state-dump] Queueing time: mean = 185.026 us, max = 59.826 s, min = -0.001 s, total = 109.471 s [state-dump] Execution time: mean = 11.335 ms, total = 6706.181 s [state-dump] Event stats: [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog - 142338 total (0 active), Execution time: mean = 532.771 us, total = 75.834 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 142338 total (0 active), Execution time: mean = 35.559 us, total = 5.061 s, Queueing time: mean = 111.564 us, max = 3.003 ms, min = 1.846 us, total = 15.880 s [state-dump] RaySyncer.OnDemandBroadcasting - 67734 total (1 active), Execution time: mean = 11.085 us, total = 750.822 ms, Queueing time: mean = 92.921 us, max = 25.869 ms, min = 6.166 us, total = 6.294 s [state-dump] NodeManager.CheckGC - 67734 total (1 active), Execution time: mean = 3.796 us, total = 257.086 ms, Queueing time: mean = 99.351 us, max = 25.875 ms, min = -0.000 s, total = 6.729 s [state-dump] ObjectManager.UpdateAvailableMemory - 67734 total (0 active), Execution time: mean = 6.361 us, total = 430.836 ms, Queueing time: mean = 112.198 us, max = 45.939 ms, min = 2.228 us, total = 7.600 s [state-dump] RayletWorkerPool.deadline_timer.kill_idle_workers - 33886 total (1 active), Execution time: mean = 19.103 us, total = 647.315 ms, Queueing time: mean = 78.145 us, max = 26.386 ms, min = -0.001 s, total = 2.648 s [state-dump] MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 27064 total (1 active), Execution time: mean = 459.818 us, total = 12.445 s, Queueing time: mean = 76.546 us, max = 4.063 ms, min = -0.000 s, total = 2.072 s [state-dump] NodeManager.deadline_timer.flush_free_objects - 6780 total (1 active), Execution time: mean = 9.555 us, total = 64.784 ms, Queueing time: mean = 178.456 us, max = 2.947 ms, min = 2.735 us, total = 1.210 s [state-dump] NodeManager.ScheduleAndDispatchTasks - 6780 total (1 active), Execution time: mean = 17.506 us, total = 118.688 ms, Queueing time: mean = 76.245 us, max = 2.581 ms, min = 7.438 us, total = 516.941 ms [state-dump] NodeManager.deadline_timer.spill_objects_when_over_threshold - 6780 total (1 active), Execution time: mean = 3.166 us, total = 21.469 ms, Queueing time: mean = 182.779 us, max = 2.946 ms, min = 3.845 us, total = 1.239 s [state-dump] NodeManagerService.grpc_server.GetResourceLoad - 6778 total (0 active), Execution time: mean = 613.516 us, total = 4.158 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 6778 total (0 active), Execution time: mean = 97.809 us, total = 662.949 ms, Queueing time: mean = 113.655 us, max = 2.934 ms, min = 4.027 us, total = 770.352 ms [state-dump] ClusterResourceManager.ResetRemoteNodeView - 2261 total (1 active), Execution time: mean = 10.131 us, total = 22.907 ms, Queueing time: mean = 76.139 us, max = 564.135 us, min = 12.181 us, total = 172.151 ms [state-dump] NodeManager.deadline_timer.record_metrics - 1356 total (1 active), Execution time: mean = 583.511 us, total = 791.241 ms, Queueing time: mean = 345.373 us, max = 2.010 ms, min = 9.115 us, total = 468.325 ms [state-dump] NodeManager.GcsCheckAlive - 1356 total (1 active), Execution time: mean = 321.658 us, total = 436.168 ms, Queueing time: mean = 606.748 us, max = 2.567 ms, min = 6.690 us, total = 822.750 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 1356 total (0 active), Execution time: mean = 56.145 us, total = 76.132 ms, Queueing time: mean = 112.523 us, max = 4.779 ms, min = 11.561 us, total = 152.581 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 1356 total (0 active), Execution time: mean = 1.600 ms, total = 2.169 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.deadline_timer.debug_state_dump - 678 total (1 active), Execution time: mean = 1.783 ms, total = 1.209 s, Queueing time: mean = 72.583 us, max = 248.460 us, min = 11.609 us, total = 49.211 ms [state-dump] NodeManager.deadline_timer.print_event_loop_stats - 113 total (1 active, 1 running), Execution time: mean = 2.745 ms, total = 310.221 ms, Queueing time: mean = 75.892 us, max = 325.100 us, min = 9.635 us, total = 8.576 ms [state-dump] ClientConnection.async_read.ProcessMessageHeader - 95 total (21 active), Execution time: mean = 7.620 us, total = 723.861 us, Queueing time: mean = 660.815 ms, max = 59.826 s, min = 29.537 us, total = 62.777 s [state-dump] ClientConnection.async_read.ProcessMessage - 74 total (0 active), Execution time: mean = 752.373 us, total = 55.676 ms, Queueing time: mean = 18.898 us, max = 247.615 us, min = 4.453 us, total = 1.398 ms [state-dump] ClientConnection.async_write.DoAsyncWrites - 22 total (0 active), Execution time: mean = 2.129 us, total = 46.834 us, Queueing time: mean = 43.475 us, max = 264.443 us, min = 17.727 us, total = 956.459 us [state-dump] CoreWorkerService.grpc_client.LocalGC.OnReplyReceived - 21 total (0 active), Execution time: mean = 65.428 us, total = 1.374 ms, Queueing time: mean = 149.534 us, max = 223.485 us, min = 56.603 us, total = 3.140 ms [state-dump] ObjectManager.ObjectAdded - 21 total (0 active), Execution time: mean = 13.522 us, total = 283.961 us, Queueing time: mean = 115.200 us, max = 203.067 us, min = 15.853 us, total = 2.419 ms [state-dump] CoreWorkerService.grpc_client.LocalGC - 21 total (0 active), Execution time: mean = 93.349 ms, total = 1.960 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ObjectManager.ObjectDeleted - 21 total (0 active), Execution time: mean = 18.654 us, total = 391.728 us, Queueing time: mean = 66.932 us, max = 269.040 us, min = 24.280 us, total = 1.406 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 21 total (0 active), Execution time: mean = 119.783 us, total = 2.515 ms, Queueing time: mean = 119.709 us, max = 193.975 us, min = 26.529 us, total = 2.514 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig - 21 total (0 active), Execution time: mean = 800.451 us, total = 16.809 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 13 total (1 active), Execution time: mean = 507.509 s, total = 6597.612 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 204.482 us, total = 2.658 ms, Queueing time: mean = 3.109 ms, max = 9.915 ms, min = 28.359 us, total = 40.413 ms [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 12 total (0 active), Execution time: mean = 379.678 us, total = 4.556 ms, Queueing time: mean = 140.234 us, max = 279.406 us, min = 20.299 us, total = 1.683 ms [state-dump] RaySyncer.BroadcastMessage - 12 total (0 active), Execution time: mean = 229.790 us, total = 2.757 ms, Queueing time: mean = 796.500 ns, max = 1.330 us, min = 180.000 ns, total = 9.558 us [state-dump] - 12 total (0 active), Execution time: mean = 1.109 us, total = 13.308 us, Queueing time: mean = 117.235 us, max = 207.332 us, min = 32.817 us, total = 1.407 ms [state-dump] NodeManagerService.grpc_server.RequestWorkerLease.HandleRequestImpl - 10 total (0 active), Execution time: mean = 133.400 us, total = 1.334 ms, Queueing time: mean = 133.654 us, max = 258.016 us, min = 50.206 us, total = 1.337 ms [state-dump] NodeManagerService.grpc_server.RequestWorkerLease - 10 total (0 active), Execution time: mean = 967.372 us, total = 9.674 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] WorkerPool.PopWorkerCallback - 10 total (0 active), Execution time: mean = 24.488 us, total = 244.878 us, Queueing time: mean = 114.997 us, max = 206.869 us, min = 23.653 us, total = 1.150 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker - 10 total (0 active), Execution time: mean = 698.937 us, total = 6.989 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReturnWorker.HandleRequestImpl - 10 total (0 active), Execution time: mean = 129.906 us, total = 1.299 ms, Queueing time: mean = 109.463 us, max = 137.712 us, min = 31.332 us, total = 1.095 ms [state-dump] NodeManager.GCTaskFailureReason - 8 total (1 active), Execution time: mean = 7.585 us, total = 60.683 us, Queueing time: mean = 59.928 us, max = 97.290 us, min = 24.344 us, total = 479.424 us [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 1.320 ms, total = 2.640 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] RaySyncerRegister - 2 total (0 active), Execution time: mean = 2.053 us, total = 4.106 us, Queueing time: mean = 222.000 ns, max = 369.000 ns, min = 75.000 ns, total = 444.000 ns [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 142.119 us, total = 284.237 us, Queueing time: mean = 702.200 us, max = 1.287 ms, min = 117.240 us, total = 1.404 ms [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 92.080 us, total = 92.080 us, Queueing time: mean = 36.492 us, max = 36.492 us, min = 36.492 us, total = 36.492 us [state-dump] Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 74.750 us, total = 74.750 us, Queueing time: mean = 225.429 us, max = 225.429 us, min = 225.429 us, total = 225.429 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 1.746 ms, total = 1.746 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 1.935 ms, total = 1.935 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 1.022 s, total = 1.022 s, Queueing time: mean = 126.779 us, max = 126.779 us, min = 126.779 us, total = 126.779 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 2.178 ms, total = 2.178 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 43.810 us, total = 43.810 us, Queueing time: mean = 116.080 us, max = 116.080 us, min = 116.080 us, total = 116.080 us [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 2.002 ms, total = 2.002 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 1.278 ms, total = 1.278 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 288.995 us, total = 288.995 us, Queueing time: mean = 122.868 us, max = 122.868 us, min = 122.868 us, total = 122.868 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 242.297 us, total = 242.297 us, Queueing time: mean = 122.977 us, max = 122.977 us, min = 122.977 us, total = 122.977 us [state-dump] DebugString() time ms: 1 [state-dump] [state-dump] [2025-01-21 00:55:54,937 I 13636 13664] (raylet) store.cc:564: Plasma store debug dump: Current usage: 0 / 2.14748 GB - num bytes created total: 168 0 pending objects of total size 0MB - objects spillable: 0 - bytes spillable: 0 - objects unsealed: 0 - bytes unsealed: 0 - objects in use: 0 - bytes in use: 0 - objects evictable: 0 - bytes evictable: 0 - objects created by worker: 0 - bytes created by worker: 0 - objects restored: 0 - bytes restored: 0 - objects received: 0 - bytes received: 0 - objects errored: 0 - bytes errored: 0 [2025-01-21 00:55:56,249 I 13636 13636] (raylet) node_manager.cc:525: [state-dump] NodeManager: [state-dump] Node ID: 94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3 [state-dump] Node name: 192.168.0.2 [state-dump] InitialConfigResources: {object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, GPU: 20000, node:__internal_head__: 10000, accelerator_type:A40: 10000, memory: 844966629380000} [state-dump] ClusterTaskManager: [state-dump] ========== Node: 94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3 ================= [state-dump] Infeasible queue length: 0 [state-dump] Schedule queue length: 0 [state-dump] Dispatch queue length: 0 [state-dump] num_waiting_for_resource: 0 [state-dump] num_waiting_for_plasma_memory: 0 [state-dump] num_waiting_for_remote_node_resources: 0 [state-dump] num_worker_not_started_by_job_config_not_exist: 0 [state-dump] num_worker_not_started_by_registration_timeout: 0 [state-dump] num_tasks_waiting_for_workers: 0 [state-dump] num_cancelled_tasks: 0 [state-dump] cluster_resource_scheduler state: [state-dump] Local id: -2729954820020547849 Local resources: {"total":{CPU: [200000], memory: [844966629380000], accelerator_type:A40: [10000], node:192.168.0.2: [10000], node:__internal_head__: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000]}}, "available": {CPU: [200000], memory: [844966629380000], accelerator_type:A40: [10000], node:192.168.0.2: [10000], node:__internal_head__: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000]}}, "labels":{"ray.io/node_id":"94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3",} is_draining: 0 is_idle: 1 Cluster resources: node id: -2729954820020547849{"total":{object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, GPU: 20000, node:__internal_head__: 10000, accelerator_type:A40: 10000, memory: 844966629380000}}, "available": {GPU: 20000, node:__internal_head__: 10000, memory: 844966629380000, accelerator_type:A40: 10000, object_store_memory: 21474836480000, node:192.168.0.2: 10000, CPU: 200000}}, "labels":{"ray.io/node_id":"94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} [state-dump] Waiting tasks size: 0 [state-dump] Number of executing tasks: 0 [state-dump] Number of pinned task arguments: 0 [state-dump] Number of total spilled tasks: 0 [state-dump] Number of spilled waiting tasks: 0 [state-dump] Number of spilled unschedulable tasks: 0 [state-dump] Resource usage { [state-dump] } [state-dump] Backlog Size per scheduling descriptor :{workerId: num backlogs}: [state-dump] [state-dump] Running tasks by scheduling class: [state-dump] ================================================== [state-dump] [state-dump] ClusterResources: [state-dump] LocalObjectManager: [state-dump] - num pinned objects: 0 [state-dump] - pinned objects size: 0 [state-dump] - num objects pending restore: 0 [state-dump] - num objects pending spill: 0 [state-dump] - num bytes pending spill: 0 [state-dump] - num bytes currently spilled: 0 [state-dump] - cumulative spill requests: 0 [state-dump] - cumulative restore requests: 0 [state-dump] - spilled objects pending delete: 0 [state-dump] [state-dump] ObjectManager: [state-dump] - num local objects: 0 [state-dump] - num unfulfilled push requests: 0 [state-dump] - num object pull requests: 0 [state-dump] - num chunks received total: 0 [state-dump] - num chunks received failed (all): 0 [state-dump] - num chunks received failed / cancelled: 0 [state-dump] - num chunks received failed / plasma error: 0 [state-dump] Event stats: [state-dump] Global stats: 0 total (0 active) [state-dump] Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Execution time: mean = -nan s, total = 0.000 s [state-dump] Event stats: [state-dump] PushManager: [state-dump] - num pushes in flight: 0 [state-dump] - num chunks in flight: 0 [state-dump] - num chunks remaining: 0 [state-dump] - max chunks allowed: 409 [state-dump] OwnershipBasedObjectDirectory: [state-dump] - num listeners: 0 [state-dump] - cumulative location updates: 0 [state-dump] - num location updates per second: 0.000 [state-dump] - num location lookups per second: 0.000 [state-dump] - num locations added per second: 0.000 [state-dump] - num locations removed per second: 0.000 [state-dump] BufferPool: [state-dump] - create buffer state map size: 0 [state-dump] PullManager: [state-dump] - num bytes available for pulled objects: 2147483648 [state-dump] - num bytes being pulled (all): 0 [state-dump] - num bytes being pulled / pinned: 0 [state-dump] - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - first get request bundle: N/A [state-dump] - first wait request bundle: N/A [state-dump] - first task request bundle: N/A [state-dump] - num objects queued: 0 [state-dump] - num objects actively pulled (all): 0 [state-dump] - num objects actively pulled / pinned: 0 [state-dump] - num bundles being pulled: 0 [state-dump] - num pull retries: 0 [state-dump] - max timeout seconds: 0 [state-dump] - max timeout request is already processed. No entry. [state-dump] [state-dump] WorkerPool: [state-dump] - registered jobs: 1 [state-dump] - process_failed_job_config_missing: 0 [state-dump] - process_failed_rate_limited: 0 [state-dump] - process_failed_pending_registration: 0 [state-dump] - process_failed_runtime_env_setup_failed: 0 [state-dump] - num PYTHON workers: 20 [state-dump] - num PYTHON drivers: 1 [state-dump] - num PYTHON pending start requests: 0 [state-dump] - num PYTHON pending registration requests: 0 [state-dump] - num object spill callbacks queued: 0 [state-dump] - num object restore queued: 0 [state-dump] - num util functions queued: 0 [state-dump] - num idle workers: 20 [state-dump] TaskDependencyManager: [state-dump] - task deps map size: 0 [state-dump] - get req map size: 0 [state-dump] - wait req map size: 0 [state-dump] - local objects map size: 0 [state-dump] WaitManager: [state-dump] - num active wait requests: 0 [state-dump] Subscriber: [state-dump] Channel WORKER_OBJECT_LOCATIONS_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_OBJECT_EVICTION [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_REF_REMOVED_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] num async plasma notifications: 0 [state-dump] Remote node managers: [state-dump] Event stats: [state-dump] Global stats: 596889 total (35 active) [state-dump] Queueing time: mean = 184.075 us, max = 59.826 s, min = -0.001 s, total = 109.872 s [state-dump] Execution time: mean = 11.237 ms, total = 6707.095 s [state-dump] Event stats: [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog - 143598 total (0 active), Execution time: mean = 532.663 us, total = 76.489 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 143598 total (0 active), Execution time: mean = 35.540 us, total = 5.103 s, Queueing time: mean = 111.542 us, max = 3.003 ms, min = 1.846 us, total = 16.017 s [state-dump] RaySyncer.OnDemandBroadcasting - 68334 total (1 active), Execution time: mean = 11.086 us, total = 757.541 ms, Queueing time: mean = 92.903 us, max = 25.869 ms, min = 6.166 us, total = 6.348 s [state-dump] NodeManager.CheckGC - 68334 total (1 active), Execution time: mean = 3.788 us, total = 258.817 ms, Queueing time: mean = 99.341 us, max = 25.875 ms, min = -0.000 s, total = 6.788 s [state-dump] ObjectManager.UpdateAvailableMemory - 68334 total (0 active), Execution time: mean = 6.359 us, total = 434.518 ms, Queueing time: mean = 112.101 us, max = 45.939 ms, min = 2.228 us, total = 7.660 s [state-dump] RayletWorkerPool.deadline_timer.kill_idle_workers - 34186 total (1 active), Execution time: mean = 19.094 us, total = 652.741 ms, Queueing time: mean = 78.071 us, max = 26.386 ms, min = -0.001 s, total = 2.669 s [state-dump] MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 27304 total (1 active), Execution time: mean = 459.768 us, total = 12.553 s, Queueing time: mean = 76.509 us, max = 4.063 ms, min = -0.000 s, total = 2.089 s [state-dump] NodeManager.deadline_timer.flush_free_objects - 6840 total (1 active), Execution time: mean = 9.552 us, total = 65.334 ms, Queueing time: mean = 178.646 us, max = 2.947 ms, min = 2.735 us, total = 1.222 s [state-dump] NodeManager.ScheduleAndDispatchTasks - 6840 total (1 active), Execution time: mean = 17.503 us, total = 119.723 ms, Queueing time: mean = 76.187 us, max = 2.581 ms, min = 7.438 us, total = 521.119 ms [state-dump] NodeManager.deadline_timer.spill_objects_when_over_threshold - 6840 total (1 active), Execution time: mean = 3.165 us, total = 21.652 ms, Queueing time: mean = 182.968 us, max = 2.946 ms, min = 3.845 us, total = 1.251 s [state-dump] NodeManagerService.grpc_server.GetResourceLoad - 6838 total (0 active), Execution time: mean = 613.574 us, total = 4.196 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 6838 total (0 active), Execution time: mean = 97.799 us, total = 668.748 ms, Queueing time: mean = 113.606 us, max = 2.934 ms, min = 4.027 us, total = 776.836 ms [state-dump] ClusterResourceManager.ResetRemoteNodeView - 2281 total (1 active), Execution time: mean = 10.125 us, total = 23.095 ms, Queueing time: mean = 76.141 us, max = 564.135 us, min = 12.181 us, total = 173.677 ms [state-dump] NodeManager.deadline_timer.record_metrics - 1368 total (1 active), Execution time: mean = 584.447 us, total = 799.523 ms, Queueing time: mean = 345.298 us, max = 2.010 ms, min = 9.115 us, total = 472.368 ms [state-dump] NodeManager.GcsCheckAlive - 1368 total (1 active), Execution time: mean = 321.600 us, total = 439.948 ms, Queueing time: mean = 607.739 us, max = 2.567 ms, min = 6.690 us, total = 831.387 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 1368 total (0 active), Execution time: mean = 56.175 us, total = 76.847 ms, Queueing time: mean = 112.486 us, max = 4.779 ms, min = 11.561 us, total = 153.880 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 1368 total (0 active), Execution time: mean = 1.599 ms, total = 2.187 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.deadline_timer.debug_state_dump - 684 total (1 active), Execution time: mean = 1.784 ms, total = 1.220 s, Queueing time: mean = 72.522 us, max = 248.460 us, min = 11.609 us, total = 49.605 ms [state-dump] NodeManager.deadline_timer.print_event_loop_stats - 114 total (1 active, 1 running), Execution time: mean = 2.749 ms, total = 313.402 ms, Queueing time: mean = 76.414 us, max = 325.100 us, min = 9.635 us, total = 8.711 ms [state-dump] ClientConnection.async_read.ProcessMessageHeader - 95 total (21 active), Execution time: mean = 7.620 us, total = 723.861 us, Queueing time: mean = 660.815 ms, max = 59.826 s, min = 29.537 us, total = 62.777 s [state-dump] ClientConnection.async_read.ProcessMessage - 74 total (0 active), Execution time: mean = 752.373 us, total = 55.676 ms, Queueing time: mean = 18.898 us, max = 247.615 us, min = 4.453 us, total = 1.398 ms [state-dump] ClientConnection.async_write.DoAsyncWrites - 22 total (0 active), Execution time: mean = 2.129 us, total = 46.834 us, Queueing time: mean = 43.475 us, max = 264.443 us, min = 17.727 us, total = 956.459 us [state-dump] CoreWorkerService.grpc_client.LocalGC.OnReplyReceived - 21 total (0 active), Execution time: mean = 65.428 us, total = 1.374 ms, Queueing time: mean = 149.534 us, max = 223.485 us, min = 56.603 us, total = 3.140 ms [state-dump] ObjectManager.ObjectAdded - 21 total (0 active), Execution time: mean = 13.522 us, total = 283.961 us, Queueing time: mean = 115.200 us, max = 203.067 us, min = 15.853 us, total = 2.419 ms [state-dump] CoreWorkerService.grpc_client.LocalGC - 21 total (0 active), Execution time: mean = 93.349 ms, total = 1.960 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ObjectManager.ObjectDeleted - 21 total (0 active), Execution time: mean = 18.654 us, total = 391.728 us, Queueing time: mean = 66.932 us, max = 269.040 us, min = 24.280 us, total = 1.406 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 21 total (0 active), Execution time: mean = 119.783 us, total = 2.515 ms, Queueing time: mean = 119.709 us, max = 193.975 us, min = 26.529 us, total = 2.514 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig - 21 total (0 active), Execution time: mean = 800.451 us, total = 16.809 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 13 total (1 active), Execution time: mean = 507.509 s, total = 6597.612 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 204.482 us, total = 2.658 ms, Queueing time: mean = 3.109 ms, max = 9.915 ms, min = 28.359 us, total = 40.413 ms [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 12 total (0 active), Execution time: mean = 379.678 us, total = 4.556 ms, Queueing time: mean = 140.234 us, max = 279.406 us, min = 20.299 us, total = 1.683 ms [state-dump] RaySyncer.BroadcastMessage - 12 total (0 active), Execution time: mean = 229.790 us, total = 2.757 ms, Queueing time: mean = 796.500 ns, max = 1.330 us, min = 180.000 ns, total = 9.558 us [state-dump] - 12 total (0 active), Execution time: mean = 1.109 us, total = 13.308 us, Queueing time: mean = 117.235 us, max = 207.332 us, min = 32.817 us, total = 1.407 ms [state-dump] NodeManagerService.grpc_server.RequestWorkerLease.HandleRequestImpl - 10 total (0 active), Execution time: mean = 133.400 us, total = 1.334 ms, Queueing time: mean = 133.654 us, max = 258.016 us, min = 50.206 us, total = 1.337 ms [state-dump] NodeManagerService.grpc_server.RequestWorkerLease - 10 total (0 active), Execution time: mean = 967.372 us, total = 9.674 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] WorkerPool.PopWorkerCallback - 10 total (0 active), Execution time: mean = 24.488 us, total = 244.878 us, Queueing time: mean = 114.997 us, max = 206.869 us, min = 23.653 us, total = 1.150 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker - 10 total (0 active), Execution time: mean = 698.937 us, total = 6.989 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReturnWorker.HandleRequestImpl - 10 total (0 active), Execution time: mean = 129.906 us, total = 1.299 ms, Queueing time: mean = 109.463 us, max = 137.712 us, min = 31.332 us, total = 1.095 ms [state-dump] NodeManager.GCTaskFailureReason - 8 total (1 active), Execution time: mean = 7.585 us, total = 60.683 us, Queueing time: mean = 59.928 us, max = 97.290 us, min = 24.344 us, total = 479.424 us [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 1.320 ms, total = 2.640 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] RaySyncerRegister - 2 total (0 active), Execution time: mean = 2.053 us, total = 4.106 us, Queueing time: mean = 222.000 ns, max = 369.000 ns, min = 75.000 ns, total = 444.000 ns [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 142.119 us, total = 284.237 us, Queueing time: mean = 702.200 us, max = 1.287 ms, min = 117.240 us, total = 1.404 ms [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 92.080 us, total = 92.080 us, Queueing time: mean = 36.492 us, max = 36.492 us, min = 36.492 us, total = 36.492 us [state-dump] Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 74.750 us, total = 74.750 us, Queueing time: mean = 225.429 us, max = 225.429 us, min = 225.429 us, total = 225.429 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 1.746 ms, total = 1.746 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 1.935 ms, total = 1.935 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 1.022 s, total = 1.022 s, Queueing time: mean = 126.779 us, max = 126.779 us, min = 126.779 us, total = 126.779 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 2.178 ms, total = 2.178 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 43.810 us, total = 43.810 us, Queueing time: mean = 116.080 us, max = 116.080 us, min = 116.080 us, total = 116.080 us [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 2.002 ms, total = 2.002 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 1.278 ms, total = 1.278 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 288.995 us, total = 288.995 us, Queueing time: mean = 122.868 us, max = 122.868 us, min = 122.868 us, total = 122.868 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 242.297 us, total = 242.297 us, Queueing time: mean = 122.977 us, max = 122.977 us, min = 122.977 us, total = 122.977 us [state-dump] DebugString() time ms: 2 [state-dump] [state-dump] [2025-01-21 00:56:54,937 I 13636 13664] (raylet) store.cc:564: Plasma store debug dump: Current usage: 0 / 2.14748 GB - num bytes created total: 168 0 pending objects of total size 0MB - objects spillable: 0 - bytes spillable: 0 - objects unsealed: 0 - bytes unsealed: 0 - objects in use: 0 - bytes in use: 0 - objects evictable: 0 - bytes evictable: 0 - objects created by worker: 0 - bytes created by worker: 0 - objects restored: 0 - bytes restored: 0 - objects received: 0 - bytes received: 0 - objects errored: 0 - bytes errored: 0 [2025-01-21 00:56:56,252 I 13636 13636] (raylet) node_manager.cc:525: [state-dump] NodeManager: [state-dump] Node ID: 94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3 [state-dump] Node name: 192.168.0.2 [state-dump] InitialConfigResources: {object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, GPU: 20000, node:__internal_head__: 10000, accelerator_type:A40: 10000, memory: 844966629380000} [state-dump] ClusterTaskManager: [state-dump] ========== Node: 94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3 ================= [state-dump] Infeasible queue length: 0 [state-dump] Schedule queue length: 0 [state-dump] Dispatch queue length: 0 [state-dump] num_waiting_for_resource: 0 [state-dump] num_waiting_for_plasma_memory: 0 [state-dump] num_waiting_for_remote_node_resources: 0 [state-dump] num_worker_not_started_by_job_config_not_exist: 0 [state-dump] num_worker_not_started_by_registration_timeout: 0 [state-dump] num_tasks_waiting_for_workers: 0 [state-dump] num_cancelled_tasks: 0 [state-dump] cluster_resource_scheduler state: [state-dump] Local id: -2729954820020547849 Local resources: {"total":{CPU: [200000], memory: [844966629380000], accelerator_type:A40: [10000], node:192.168.0.2: [10000], node:__internal_head__: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000]}}, "available": {CPU: [200000], memory: [844966629380000], accelerator_type:A40: [10000], node:192.168.0.2: [10000], node:__internal_head__: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000]}}, "labels":{"ray.io/node_id":"94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3",} is_draining: 0 is_idle: 1 Cluster resources: node id: -2729954820020547849{"total":{object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, GPU: 20000, node:__internal_head__: 10000, accelerator_type:A40: 10000, memory: 844966629380000}}, "available": {GPU: 20000, node:__internal_head__: 10000, memory: 844966629380000, accelerator_type:A40: 10000, object_store_memory: 21474836480000, node:192.168.0.2: 10000, CPU: 200000}}, "labels":{"ray.io/node_id":"94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} [state-dump] Waiting tasks size: 0 [state-dump] Number of executing tasks: 0 [state-dump] Number of pinned task arguments: 0 [state-dump] Number of total spilled tasks: 0 [state-dump] Number of spilled waiting tasks: 0 [state-dump] Number of spilled unschedulable tasks: 0 [state-dump] Resource usage { [state-dump] } [state-dump] Backlog Size per scheduling descriptor :{workerId: num backlogs}: [state-dump] [state-dump] Running tasks by scheduling class: [state-dump] ================================================== [state-dump] [state-dump] ClusterResources: [state-dump] LocalObjectManager: [state-dump] - num pinned objects: 0 [state-dump] - pinned objects size: 0 [state-dump] - num objects pending restore: 0 [state-dump] - num objects pending spill: 0 [state-dump] - num bytes pending spill: 0 [state-dump] - num bytes currently spilled: 0 [state-dump] - cumulative spill requests: 0 [state-dump] - cumulative restore requests: 0 [state-dump] - spilled objects pending delete: 0 [state-dump] [state-dump] ObjectManager: [state-dump] - num local objects: 0 [state-dump] - num unfulfilled push requests: 0 [state-dump] - num object pull requests: 0 [state-dump] - num chunks received total: 0 [state-dump] - num chunks received failed (all): 0 [state-dump] - num chunks received failed / cancelled: 0 [state-dump] - num chunks received failed / plasma error: 0 [state-dump] Event stats: [state-dump] Global stats: 0 total (0 active) [state-dump] Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Execution time: mean = -nan s, total = 0.000 s [state-dump] Event stats: [state-dump] PushManager: [state-dump] - num pushes in flight: 0 [state-dump] - num chunks in flight: 0 [state-dump] - num chunks remaining: 0 [state-dump] - max chunks allowed: 409 [state-dump] OwnershipBasedObjectDirectory: [state-dump] - num listeners: 0 [state-dump] - cumulative location updates: 0 [state-dump] - num location updates per second: 0.000 [state-dump] - num location lookups per second: 0.000 [state-dump] - num locations added per second: 0.000 [state-dump] - num locations removed per second: 0.000 [state-dump] BufferPool: [state-dump] - create buffer state map size: 0 [state-dump] PullManager: [state-dump] - num bytes available for pulled objects: 2147483648 [state-dump] - num bytes being pulled (all): 0 [state-dump] - num bytes being pulled / pinned: 0 [state-dump] - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - first get request bundle: N/A [state-dump] - first wait request bundle: N/A [state-dump] - first task request bundle: N/A [state-dump] - num objects queued: 0 [state-dump] - num objects actively pulled (all): 0 [state-dump] - num objects actively pulled / pinned: 0 [state-dump] - num bundles being pulled: 0 [state-dump] - num pull retries: 0 [state-dump] - max timeout seconds: 0 [state-dump] - max timeout request is already processed. No entry. [state-dump] [state-dump] WorkerPool: [state-dump] - registered jobs: 1 [state-dump] - process_failed_job_config_missing: 0 [state-dump] - process_failed_rate_limited: 0 [state-dump] - process_failed_pending_registration: 0 [state-dump] - process_failed_runtime_env_setup_failed: 0 [state-dump] - num PYTHON workers: 20 [state-dump] - num PYTHON drivers: 1 [state-dump] - num PYTHON pending start requests: 0 [state-dump] - num PYTHON pending registration requests: 0 [state-dump] - num object spill callbacks queued: 0 [state-dump] - num object restore queued: 0 [state-dump] - num util functions queued: 0 [state-dump] - num idle workers: 20 [state-dump] TaskDependencyManager: [state-dump] - task deps map size: 0 [state-dump] - get req map size: 0 [state-dump] - wait req map size: 0 [state-dump] - local objects map size: 0 [state-dump] WaitManager: [state-dump] - num active wait requests: 0 [state-dump] Subscriber: [state-dump] Channel WORKER_OBJECT_LOCATIONS_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_OBJECT_EVICTION [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_REF_REMOVED_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] num async plasma notifications: 0 [state-dump] Remote node managers: [state-dump] Event stats: [state-dump] Global stats: 602120 total (35 active) [state-dump] Queueing time: mean = 183.176 us, max = 59.826 s, min = -0.001 s, total = 110.294 s [state-dump] Execution time: mean = 11.141 ms, total = 6708.061 s [state-dump] Event stats: [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog - 144858 total (0 active), Execution time: mean = 532.796 us, total = 77.180 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 144858 total (0 active), Execution time: mean = 35.573 us, total = 5.153 s, Queueing time: mean = 111.588 us, max = 3.003 ms, min = 1.846 us, total = 16.164 s [state-dump] RaySyncer.OnDemandBroadcasting - 68933 total (1 active), Execution time: mean = 11.094 us, total = 764.755 ms, Queueing time: mean = 92.917 us, max = 25.869 ms, min = 6.166 us, total = 6.405 s [state-dump] NodeManager.CheckGC - 68933 total (1 active), Execution time: mean = 3.782 us, total = 260.697 ms, Queueing time: mean = 99.368 us, max = 25.875 ms, min = -0.000 s, total = 6.850 s [state-dump] ObjectManager.UpdateAvailableMemory - 68933 total (0 active), Execution time: mean = 6.366 us, total = 438.794 ms, Queueing time: mean = 112.115 us, max = 45.939 ms, min = 2.228 us, total = 7.728 s [state-dump] RayletWorkerPool.deadline_timer.kill_idle_workers - 34486 total (1 active), Execution time: mean = 19.108 us, total = 658.970 ms, Queueing time: mean = 78.075 us, max = 26.386 ms, min = -0.001 s, total = 2.692 s [state-dump] MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 27543 total (1 active), Execution time: mean = 459.947 us, total = 12.668 s, Queueing time: mean = 76.578 us, max = 4.063 ms, min = -0.000 s, total = 2.109 s [state-dump] NodeManager.deadline_timer.flush_free_objects - 6900 total (1 active), Execution time: mean = 9.568 us, total = 66.019 ms, Queueing time: mean = 178.536 us, max = 2.947 ms, min = 2.735 us, total = 1.232 s [state-dump] NodeManager.ScheduleAndDispatchTasks - 6900 total (1 active), Execution time: mean = 17.519 us, total = 120.881 ms, Queueing time: mean = 76.175 us, max = 2.581 ms, min = 7.438 us, total = 525.608 ms [state-dump] NodeManager.deadline_timer.spill_objects_when_over_threshold - 6900 total (1 active), Execution time: mean = 3.166 us, total = 21.848 ms, Queueing time: mean = 182.867 us, max = 2.946 ms, min = 3.845 us, total = 1.262 s [state-dump] NodeManagerService.grpc_server.GetResourceLoad - 6898 total (0 active), Execution time: mean = 613.858 us, total = 4.234 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 6898 total (0 active), Execution time: mean = 97.847 us, total = 674.951 ms, Queueing time: mean = 113.628 us, max = 2.934 ms, min = 4.027 us, total = 783.803 ms [state-dump] ClusterResourceManager.ResetRemoteNodeView - 2301 total (1 active), Execution time: mean = 10.135 us, total = 23.320 ms, Queueing time: mean = 76.201 us, max = 564.135 us, min = 12.181 us, total = 175.339 ms [state-dump] NodeManager.deadline_timer.record_metrics - 1380 total (1 active), Execution time: mean = 584.728 us, total = 806.925 ms, Queueing time: mean = 344.568 us, max = 2.010 ms, min = 9.115 us, total = 475.504 ms [state-dump] NodeManager.GcsCheckAlive - 1380 total (1 active), Execution time: mean = 321.776 us, total = 444.051 ms, Queueing time: mean = 607.093 us, max = 2.567 ms, min = 6.690 us, total = 837.788 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 1380 total (0 active), Execution time: mean = 56.205 us, total = 77.562 ms, Queueing time: mean = 112.559 us, max = 4.779 ms, min = 11.561 us, total = 155.331 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 1380 total (0 active), Execution time: mean = 1.598 ms, total = 2.205 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.deadline_timer.debug_state_dump - 690 total (1 active), Execution time: mean = 1.784 ms, total = 1.231 s, Queueing time: mean = 72.524 us, max = 248.460 us, min = 11.609 us, total = 50.041 ms [state-dump] NodeManager.deadline_timer.print_event_loop_stats - 115 total (1 active, 1 running), Execution time: mean = 2.751 ms, total = 316.341 ms, Queueing time: mean = 76.406 us, max = 325.100 us, min = 9.635 us, total = 8.787 ms [state-dump] ClientConnection.async_read.ProcessMessageHeader - 95 total (21 active), Execution time: mean = 7.620 us, total = 723.861 us, Queueing time: mean = 660.815 ms, max = 59.826 s, min = 29.537 us, total = 62.777 s [state-dump] ClientConnection.async_read.ProcessMessage - 74 total (0 active), Execution time: mean = 752.373 us, total = 55.676 ms, Queueing time: mean = 18.898 us, max = 247.615 us, min = 4.453 us, total = 1.398 ms [state-dump] ClientConnection.async_write.DoAsyncWrites - 22 total (0 active), Execution time: mean = 2.129 us, total = 46.834 us, Queueing time: mean = 43.475 us, max = 264.443 us, min = 17.727 us, total = 956.459 us [state-dump] CoreWorkerService.grpc_client.LocalGC.OnReplyReceived - 21 total (0 active), Execution time: mean = 65.428 us, total = 1.374 ms, Queueing time: mean = 149.534 us, max = 223.485 us, min = 56.603 us, total = 3.140 ms [state-dump] ObjectManager.ObjectAdded - 21 total (0 active), Execution time: mean = 13.522 us, total = 283.961 us, Queueing time: mean = 115.200 us, max = 203.067 us, min = 15.853 us, total = 2.419 ms [state-dump] CoreWorkerService.grpc_client.LocalGC - 21 total (0 active), Execution time: mean = 93.349 ms, total = 1.960 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ObjectManager.ObjectDeleted - 21 total (0 active), Execution time: mean = 18.654 us, total = 391.728 us, Queueing time: mean = 66.932 us, max = 269.040 us, min = 24.280 us, total = 1.406 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 21 total (0 active), Execution time: mean = 119.783 us, total = 2.515 ms, Queueing time: mean = 119.709 us, max = 193.975 us, min = 26.529 us, total = 2.514 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig - 21 total (0 active), Execution time: mean = 800.451 us, total = 16.809 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 13 total (1 active), Execution time: mean = 507.509 s, total = 6597.612 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 204.482 us, total = 2.658 ms, Queueing time: mean = 3.109 ms, max = 9.915 ms, min = 28.359 us, total = 40.413 ms [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 12 total (0 active), Execution time: mean = 379.678 us, total = 4.556 ms, Queueing time: mean = 140.234 us, max = 279.406 us, min = 20.299 us, total = 1.683 ms [state-dump] RaySyncer.BroadcastMessage - 12 total (0 active), Execution time: mean = 229.790 us, total = 2.757 ms, Queueing time: mean = 796.500 ns, max = 1.330 us, min = 180.000 ns, total = 9.558 us [state-dump] - 12 total (0 active), Execution time: mean = 1.109 us, total = 13.308 us, Queueing time: mean = 117.235 us, max = 207.332 us, min = 32.817 us, total = 1.407 ms [state-dump] NodeManagerService.grpc_server.RequestWorkerLease.HandleRequestImpl - 10 total (0 active), Execution time: mean = 133.400 us, total = 1.334 ms, Queueing time: mean = 133.654 us, max = 258.016 us, min = 50.206 us, total = 1.337 ms [state-dump] NodeManagerService.grpc_server.RequestWorkerLease - 10 total (0 active), Execution time: mean = 967.372 us, total = 9.674 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] WorkerPool.PopWorkerCallback - 10 total (0 active), Execution time: mean = 24.488 us, total = 244.878 us, Queueing time: mean = 114.997 us, max = 206.869 us, min = 23.653 us, total = 1.150 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker - 10 total (0 active), Execution time: mean = 698.937 us, total = 6.989 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReturnWorker.HandleRequestImpl - 10 total (0 active), Execution time: mean = 129.906 us, total = 1.299 ms, Queueing time: mean = 109.463 us, max = 137.712 us, min = 31.332 us, total = 1.095 ms [state-dump] NodeManager.GCTaskFailureReason - 8 total (1 active), Execution time: mean = 7.585 us, total = 60.683 us, Queueing time: mean = 59.928 us, max = 97.290 us, min = 24.344 us, total = 479.424 us [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 1.320 ms, total = 2.640 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] RaySyncerRegister - 2 total (0 active), Execution time: mean = 2.053 us, total = 4.106 us, Queueing time: mean = 222.000 ns, max = 369.000 ns, min = 75.000 ns, total = 444.000 ns [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 142.119 us, total = 284.237 us, Queueing time: mean = 702.200 us, max = 1.287 ms, min = 117.240 us, total = 1.404 ms [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 92.080 us, total = 92.080 us, Queueing time: mean = 36.492 us, max = 36.492 us, min = 36.492 us, total = 36.492 us [state-dump] Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 74.750 us, total = 74.750 us, Queueing time: mean = 225.429 us, max = 225.429 us, min = 225.429 us, total = 225.429 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 1.746 ms, total = 1.746 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 1.935 ms, total = 1.935 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 1.022 s, total = 1.022 s, Queueing time: mean = 126.779 us, max = 126.779 us, min = 126.779 us, total = 126.779 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 2.178 ms, total = 2.178 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 43.810 us, total = 43.810 us, Queueing time: mean = 116.080 us, max = 116.080 us, min = 116.080 us, total = 116.080 us [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 2.002 ms, total = 2.002 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 1.278 ms, total = 1.278 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 288.995 us, total = 288.995 us, Queueing time: mean = 122.868 us, max = 122.868 us, min = 122.868 us, total = 122.868 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 242.297 us, total = 242.297 us, Queueing time: mean = 122.977 us, max = 122.977 us, min = 122.977 us, total = 122.977 us [state-dump] DebugString() time ms: 1 [state-dump] [state-dump] [2025-01-21 00:57:54,938 I 13636 13664] (raylet) store.cc:564: Plasma store debug dump: Current usage: 0 / 2.14748 GB - num bytes created total: 168 0 pending objects of total size 0MB - objects spillable: 0 - bytes spillable: 0 - objects unsealed: 0 - bytes unsealed: 0 - objects in use: 0 - bytes in use: 0 - objects evictable: 0 - bytes evictable: 0 - objects created by worker: 0 - bytes created by worker: 0 - objects restored: 0 - bytes restored: 0 - objects received: 0 - bytes received: 0 - objects errored: 0 - bytes errored: 0 [2025-01-21 00:57:56,255 I 13636 13636] (raylet) node_manager.cc:525: [state-dump] NodeManager: [state-dump] Node ID: 94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3 [state-dump] Node name: 192.168.0.2 [state-dump] InitialConfigResources: {object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, GPU: 20000, node:__internal_head__: 10000, accelerator_type:A40: 10000, memory: 844966629380000} [state-dump] ClusterTaskManager: [state-dump] ========== Node: 94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3 ================= [state-dump] Infeasible queue length: 0 [state-dump] Schedule queue length: 0 [state-dump] Dispatch queue length: 0 [state-dump] num_waiting_for_resource: 0 [state-dump] num_waiting_for_plasma_memory: 0 [state-dump] num_waiting_for_remote_node_resources: 0 [state-dump] num_worker_not_started_by_job_config_not_exist: 0 [state-dump] num_worker_not_started_by_registration_timeout: 0 [state-dump] num_tasks_waiting_for_workers: 0 [state-dump] num_cancelled_tasks: 0 [state-dump] cluster_resource_scheduler state: [state-dump] Local id: -2729954820020547849 Local resources: {"total":{CPU: [200000], memory: [844966629380000], accelerator_type:A40: [10000], node:192.168.0.2: [10000], node:__internal_head__: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000]}}, "available": {CPU: [200000], memory: [844966629380000], accelerator_type:A40: [10000], node:192.168.0.2: [10000], node:__internal_head__: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000]}}, "labels":{"ray.io/node_id":"94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3",} is_draining: 0 is_idle: 1 Cluster resources: node id: -2729954820020547849{"total":{object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, GPU: 20000, node:__internal_head__: 10000, accelerator_type:A40: 10000, memory: 844966629380000}}, "available": {GPU: 20000, node:__internal_head__: 10000, memory: 844966629380000, accelerator_type:A40: 10000, object_store_memory: 21474836480000, node:192.168.0.2: 10000, CPU: 200000}}, "labels":{"ray.io/node_id":"94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} [state-dump] Waiting tasks size: 0 [state-dump] Number of executing tasks: 0 [state-dump] Number of pinned task arguments: 0 [state-dump] Number of total spilled tasks: 0 [state-dump] Number of spilled waiting tasks: 0 [state-dump] Number of spilled unschedulable tasks: 0 [state-dump] Resource usage { [state-dump] } [state-dump] Backlog Size per scheduling descriptor :{workerId: num backlogs}: [state-dump] [state-dump] Running tasks by scheduling class: [state-dump] ================================================== [state-dump] [state-dump] ClusterResources: [state-dump] LocalObjectManager: [state-dump] - num pinned objects: 0 [state-dump] - pinned objects size: 0 [state-dump] - num objects pending restore: 0 [state-dump] - num objects pending spill: 0 [state-dump] - num bytes pending spill: 0 [state-dump] - num bytes currently spilled: 0 [state-dump] - cumulative spill requests: 0 [state-dump] - cumulative restore requests: 0 [state-dump] - spilled objects pending delete: 0 [state-dump] [state-dump] ObjectManager: [state-dump] - num local objects: 0 [state-dump] - num unfulfilled push requests: 0 [state-dump] - num object pull requests: 0 [state-dump] - num chunks received total: 0 [state-dump] - num chunks received failed (all): 0 [state-dump] - num chunks received failed / cancelled: 0 [state-dump] - num chunks received failed / plasma error: 0 [state-dump] Event stats: [state-dump] Global stats: 0 total (0 active) [state-dump] Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Execution time: mean = -nan s, total = 0.000 s [state-dump] Event stats: [state-dump] PushManager: [state-dump] - num pushes in flight: 0 [state-dump] - num chunks in flight: 0 [state-dump] - num chunks remaining: 0 [state-dump] - max chunks allowed: 409 [state-dump] OwnershipBasedObjectDirectory: [state-dump] - num listeners: 0 [state-dump] - cumulative location updates: 0 [state-dump] - num location updates per second: 0.000 [state-dump] - num location lookups per second: 0.000 [state-dump] - num locations added per second: 0.000 [state-dump] - num locations removed per second: 0.000 [state-dump] BufferPool: [state-dump] - create buffer state map size: 0 [state-dump] PullManager: [state-dump] - num bytes available for pulled objects: 2147483648 [state-dump] - num bytes being pulled (all): 0 [state-dump] - num bytes being pulled / pinned: 0 [state-dump] - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - first get request bundle: N/A [state-dump] - first wait request bundle: N/A [state-dump] - first task request bundle: N/A [state-dump] - num objects queued: 0 [state-dump] - num objects actively pulled (all): 0 [state-dump] - num objects actively pulled / pinned: 0 [state-dump] - num bundles being pulled: 0 [state-dump] - num pull retries: 0 [state-dump] - max timeout seconds: 0 [state-dump] - max timeout request is already processed. No entry. [state-dump] [state-dump] WorkerPool: [state-dump] - registered jobs: 1 [state-dump] - process_failed_job_config_missing: 0 [state-dump] - process_failed_rate_limited: 0 [state-dump] - process_failed_pending_registration: 0 [state-dump] - process_failed_runtime_env_setup_failed: 0 [state-dump] - num PYTHON workers: 20 [state-dump] - num PYTHON drivers: 1 [state-dump] - num PYTHON pending start requests: 0 [state-dump] - num PYTHON pending registration requests: 0 [state-dump] - num object spill callbacks queued: 0 [state-dump] - num object restore queued: 0 [state-dump] - num util functions queued: 0 [state-dump] - num idle workers: 20 [state-dump] TaskDependencyManager: [state-dump] - task deps map size: 0 [state-dump] - get req map size: 0 [state-dump] - wait req map size: 0 [state-dump] - local objects map size: 0 [state-dump] WaitManager: [state-dump] - num active wait requests: 0 [state-dump] Subscriber: [state-dump] Channel WORKER_OBJECT_LOCATIONS_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_OBJECT_EVICTION [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_REF_REMOVED_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] num async plasma notifications: 0 [state-dump] Remote node managers: [state-dump] Event stats: [state-dump] Global stats: 607355 total (35 active) [state-dump] Queueing time: mean = 182.313 us, max = 59.826 s, min = -0.001 s, total = 110.729 s [state-dump] Execution time: mean = 11.046 ms, total = 6709.000 s [state-dump] Event stats: [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog - 146118 total (0 active), Execution time: mean = 532.776 us, total = 77.848 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 146118 total (0 active), Execution time: mean = 35.576 us, total = 5.198 s, Queueing time: mean = 111.611 us, max = 3.003 ms, min = 1.846 us, total = 16.308 s [state-dump] RaySyncer.OnDemandBroadcasting - 69533 total (1 active), Execution time: mean = 11.106 us, total = 772.218 ms, Queueing time: mean = 92.972 us, max = 25.869 ms, min = 6.166 us, total = 6.465 s [state-dump] NodeManager.CheckGC - 69533 total (1 active), Execution time: mean = 3.778 us, total = 262.675 ms, Queueing time: mean = 99.438 us, max = 25.875 ms, min = -0.000 s, total = 6.914 s [state-dump] ObjectManager.UpdateAvailableMemory - 69533 total (0 active), Execution time: mean = 6.372 us, total = 443.081 ms, Queueing time: mean = 112.192 us, max = 45.939 ms, min = 2.228 us, total = 7.801 s [state-dump] RayletWorkerPool.deadline_timer.kill_idle_workers - 34786 total (1 active), Execution time: mean = 19.131 us, total = 665.486 ms, Queueing time: mean = 78.085 us, max = 26.386 ms, min = -0.001 s, total = 2.716 s [state-dump] MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 27783 total (1 active), Execution time: mean = 460.129 us, total = 12.784 s, Queueing time: mean = 76.594 us, max = 4.063 ms, min = -0.000 s, total = 2.128 s [state-dump] NodeManager.deadline_timer.flush_free_objects - 6960 total (1 active), Execution time: mean = 9.575 us, total = 66.640 ms, Queueing time: mean = 178.641 us, max = 2.947 ms, min = 2.735 us, total = 1.243 s [state-dump] NodeManager.ScheduleAndDispatchTasks - 6960 total (1 active), Execution time: mean = 17.544 us, total = 122.108 ms, Queueing time: mean = 76.184 us, max = 2.581 ms, min = 7.438 us, total = 530.244 ms [state-dump] NodeManager.deadline_timer.spill_objects_when_over_threshold - 6960 total (1 active), Execution time: mean = 3.169 us, total = 22.057 ms, Queueing time: mean = 182.975 us, max = 2.946 ms, min = 3.845 us, total = 1.274 s [state-dump] NodeManagerService.grpc_server.GetResourceLoad - 6958 total (0 active), Execution time: mean = 613.954 us, total = 4.272 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 6958 total (0 active), Execution time: mean = 97.877 us, total = 681.030 ms, Queueing time: mean = 113.655 us, max = 2.934 ms, min = 4.027 us, total = 790.811 ms [state-dump] ClusterResourceManager.ResetRemoteNodeView - 2321 total (1 active), Execution time: mean = 10.148 us, total = 23.553 ms, Queueing time: mean = 76.249 us, max = 564.135 us, min = 12.181 us, total = 176.973 ms [state-dump] NodeManager.deadline_timer.record_metrics - 1392 total (1 active), Execution time: mean = 584.178 us, total = 813.176 ms, Queueing time: mean = 345.619 us, max = 2.010 ms, min = 9.115 us, total = 481.102 ms [state-dump] NodeManager.GcsCheckAlive - 1392 total (1 active), Execution time: mean = 321.978 us, total = 448.194 ms, Queueing time: mean = 607.422 us, max = 2.567 ms, min = 6.690 us, total = 845.532 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 1392 total (0 active), Execution time: mean = 56.192 us, total = 78.220 ms, Queueing time: mean = 112.390 us, max = 4.779 ms, min = 11.561 us, total = 156.447 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 1392 total (0 active), Execution time: mean = 1.598 ms, total = 2.224 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.deadline_timer.debug_state_dump - 696 total (1 active), Execution time: mean = 1.785 ms, total = 1.242 s, Queueing time: mean = 72.620 us, max = 248.460 us, min = 11.609 us, total = 50.543 ms [state-dump] NodeManager.deadline_timer.print_event_loop_stats - 116 total (1 active, 1 running), Execution time: mean = 2.750 ms, total = 319.050 ms, Queueing time: mean = 76.612 us, max = 325.100 us, min = 9.635 us, total = 8.887 ms [state-dump] ClientConnection.async_read.ProcessMessageHeader - 95 total (21 active), Execution time: mean = 7.620 us, total = 723.861 us, Queueing time: mean = 660.815 ms, max = 59.826 s, min = 29.537 us, total = 62.777 s [state-dump] ClientConnection.async_read.ProcessMessage - 74 total (0 active), Execution time: mean = 752.373 us, total = 55.676 ms, Queueing time: mean = 18.898 us, max = 247.615 us, min = 4.453 us, total = 1.398 ms [state-dump] ClientConnection.async_write.DoAsyncWrites - 22 total (0 active), Execution time: mean = 2.129 us, total = 46.834 us, Queueing time: mean = 43.475 us, max = 264.443 us, min = 17.727 us, total = 956.459 us [state-dump] CoreWorkerService.grpc_client.LocalGC.OnReplyReceived - 21 total (0 active), Execution time: mean = 65.428 us, total = 1.374 ms, Queueing time: mean = 149.534 us, max = 223.485 us, min = 56.603 us, total = 3.140 ms [state-dump] ObjectManager.ObjectAdded - 21 total (0 active), Execution time: mean = 13.522 us, total = 283.961 us, Queueing time: mean = 115.200 us, max = 203.067 us, min = 15.853 us, total = 2.419 ms [state-dump] CoreWorkerService.grpc_client.LocalGC - 21 total (0 active), Execution time: mean = 93.349 ms, total = 1.960 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ObjectManager.ObjectDeleted - 21 total (0 active), Execution time: mean = 18.654 us, total = 391.728 us, Queueing time: mean = 66.932 us, max = 269.040 us, min = 24.280 us, total = 1.406 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 21 total (0 active), Execution time: mean = 119.783 us, total = 2.515 ms, Queueing time: mean = 119.709 us, max = 193.975 us, min = 26.529 us, total = 2.514 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig - 21 total (0 active), Execution time: mean = 800.451 us, total = 16.809 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 13 total (1 active), Execution time: mean = 507.509 s, total = 6597.612 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 204.482 us, total = 2.658 ms, Queueing time: mean = 3.109 ms, max = 9.915 ms, min = 28.359 us, total = 40.413 ms [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 12 total (0 active), Execution time: mean = 379.678 us, total = 4.556 ms, Queueing time: mean = 140.234 us, max = 279.406 us, min = 20.299 us, total = 1.683 ms [state-dump] RaySyncer.BroadcastMessage - 12 total (0 active), Execution time: mean = 229.790 us, total = 2.757 ms, Queueing time: mean = 796.500 ns, max = 1.330 us, min = 180.000 ns, total = 9.558 us [state-dump] - 12 total (0 active), Execution time: mean = 1.109 us, total = 13.308 us, Queueing time: mean = 117.235 us, max = 207.332 us, min = 32.817 us, total = 1.407 ms [state-dump] NodeManagerService.grpc_server.RequestWorkerLease.HandleRequestImpl - 10 total (0 active), Execution time: mean = 133.400 us, total = 1.334 ms, Queueing time: mean = 133.654 us, max = 258.016 us, min = 50.206 us, total = 1.337 ms [state-dump] NodeManagerService.grpc_server.RequestWorkerLease - 10 total (0 active), Execution time: mean = 967.372 us, total = 9.674 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] WorkerPool.PopWorkerCallback - 10 total (0 active), Execution time: mean = 24.488 us, total = 244.878 us, Queueing time: mean = 114.997 us, max = 206.869 us, min = 23.653 us, total = 1.150 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker - 10 total (0 active), Execution time: mean = 698.937 us, total = 6.989 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReturnWorker.HandleRequestImpl - 10 total (0 active), Execution time: mean = 129.906 us, total = 1.299 ms, Queueing time: mean = 109.463 us, max = 137.712 us, min = 31.332 us, total = 1.095 ms [state-dump] NodeManager.GCTaskFailureReason - 8 total (1 active), Execution time: mean = 7.585 us, total = 60.683 us, Queueing time: mean = 59.928 us, max = 97.290 us, min = 24.344 us, total = 479.424 us [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 1.320 ms, total = 2.640 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] RaySyncerRegister - 2 total (0 active), Execution time: mean = 2.053 us, total = 4.106 us, Queueing time: mean = 222.000 ns, max = 369.000 ns, min = 75.000 ns, total = 444.000 ns [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 142.119 us, total = 284.237 us, Queueing time: mean = 702.200 us, max = 1.287 ms, min = 117.240 us, total = 1.404 ms [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 92.080 us, total = 92.080 us, Queueing time: mean = 36.492 us, max = 36.492 us, min = 36.492 us, total = 36.492 us [state-dump] Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 74.750 us, total = 74.750 us, Queueing time: mean = 225.429 us, max = 225.429 us, min = 225.429 us, total = 225.429 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 1.746 ms, total = 1.746 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 1.935 ms, total = 1.935 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 1.022 s, total = 1.022 s, Queueing time: mean = 126.779 us, max = 126.779 us, min = 126.779 us, total = 126.779 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 2.178 ms, total = 2.178 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 43.810 us, total = 43.810 us, Queueing time: mean = 116.080 us, max = 116.080 us, min = 116.080 us, total = 116.080 us [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 2.002 ms, total = 2.002 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 1.278 ms, total = 1.278 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 288.995 us, total = 288.995 us, Queueing time: mean = 122.868 us, max = 122.868 us, min = 122.868 us, total = 122.868 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 242.297 us, total = 242.297 us, Queueing time: mean = 122.977 us, max = 122.977 us, min = 122.977 us, total = 122.977 us [state-dump] DebugString() time ms: 1 [state-dump] [state-dump] [2025-01-21 00:58:54,938 I 13636 13664] (raylet) store.cc:564: Plasma store debug dump: Current usage: 0 / 2.14748 GB - num bytes created total: 168 0 pending objects of total size 0MB - objects spillable: 0 - bytes spillable: 0 - objects unsealed: 0 - bytes unsealed: 0 - objects in use: 0 - bytes in use: 0 - objects evictable: 0 - bytes evictable: 0 - objects created by worker: 0 - bytes created by worker: 0 - objects restored: 0 - bytes restored: 0 - objects received: 0 - bytes received: 0 - objects errored: 0 - bytes errored: 0 [2025-01-21 00:58:56,258 I 13636 13636] (raylet) node_manager.cc:525: [state-dump] NodeManager: [state-dump] Node ID: 94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3 [state-dump] Node name: 192.168.0.2 [state-dump] InitialConfigResources: {object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, GPU: 20000, node:__internal_head__: 10000, accelerator_type:A40: 10000, memory: 844966629380000} [state-dump] ClusterTaskManager: [state-dump] ========== Node: 94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3 ================= [state-dump] Infeasible queue length: 0 [state-dump] Schedule queue length: 0 [state-dump] Dispatch queue length: 0 [state-dump] num_waiting_for_resource: 0 [state-dump] num_waiting_for_plasma_memory: 0 [state-dump] num_waiting_for_remote_node_resources: 0 [state-dump] num_worker_not_started_by_job_config_not_exist: 0 [state-dump] num_worker_not_started_by_registration_timeout: 0 [state-dump] num_tasks_waiting_for_workers: 0 [state-dump] num_cancelled_tasks: 0 [state-dump] cluster_resource_scheduler state: [state-dump] Local id: -2729954820020547849 Local resources: {"total":{CPU: [200000], memory: [844966629380000], accelerator_type:A40: [10000], node:192.168.0.2: [10000], node:__internal_head__: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000]}}, "available": {CPU: [200000], memory: [844966629380000], accelerator_type:A40: [10000], node:192.168.0.2: [10000], node:__internal_head__: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000]}}, "labels":{"ray.io/node_id":"94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3",} is_draining: 0 is_idle: 1 Cluster resources: node id: -2729954820020547849{"total":{object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, GPU: 20000, node:__internal_head__: 10000, accelerator_type:A40: 10000, memory: 844966629380000}}, "available": {GPU: 20000, node:__internal_head__: 10000, memory: 844966629380000, accelerator_type:A40: 10000, object_store_memory: 21474836480000, node:192.168.0.2: 10000, CPU: 200000}}, "labels":{"ray.io/node_id":"94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} [state-dump] Waiting tasks size: 0 [state-dump] Number of executing tasks: 0 [state-dump] Number of pinned task arguments: 0 [state-dump] Number of total spilled tasks: 0 [state-dump] Number of spilled waiting tasks: 0 [state-dump] Number of spilled unschedulable tasks: 0 [state-dump] Resource usage { [state-dump] } [state-dump] Backlog Size per scheduling descriptor :{workerId: num backlogs}: [state-dump] [state-dump] Running tasks by scheduling class: [state-dump] ================================================== [state-dump] [state-dump] ClusterResources: [state-dump] LocalObjectManager: [state-dump] - num pinned objects: 0 [state-dump] - pinned objects size: 0 [state-dump] - num objects pending restore: 0 [state-dump] - num objects pending spill: 0 [state-dump] - num bytes pending spill: 0 [state-dump] - num bytes currently spilled: 0 [state-dump] - cumulative spill requests: 0 [state-dump] - cumulative restore requests: 0 [state-dump] - spilled objects pending delete: 0 [state-dump] [state-dump] ObjectManager: [state-dump] - num local objects: 0 [state-dump] - num unfulfilled push requests: 0 [state-dump] - num object pull requests: 0 [state-dump] - num chunks received total: 0 [state-dump] - num chunks received failed (all): 0 [state-dump] - num chunks received failed / cancelled: 0 [state-dump] - num chunks received failed / plasma error: 0 [state-dump] Event stats: [state-dump] Global stats: 0 total (0 active) [state-dump] Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Execution time: mean = -nan s, total = 0.000 s [state-dump] Event stats: [state-dump] PushManager: [state-dump] - num pushes in flight: 0 [state-dump] - num chunks in flight: 0 [state-dump] - num chunks remaining: 0 [state-dump] - max chunks allowed: 409 [state-dump] OwnershipBasedObjectDirectory: [state-dump] - num listeners: 0 [state-dump] - cumulative location updates: 0 [state-dump] - num location updates per second: 0.000 [state-dump] - num location lookups per second: 0.000 [state-dump] - num locations added per second: 0.000 [state-dump] - num locations removed per second: 0.000 [state-dump] BufferPool: [state-dump] - create buffer state map size: 0 [state-dump] PullManager: [state-dump] - num bytes available for pulled objects: 2147483648 [state-dump] - num bytes being pulled (all): 0 [state-dump] - num bytes being pulled / pinned: 0 [state-dump] - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - first get request bundle: N/A [state-dump] - first wait request bundle: N/A [state-dump] - first task request bundle: N/A [state-dump] - num objects queued: 0 [state-dump] - num objects actively pulled (all): 0 [state-dump] - num objects actively pulled / pinned: 0 [state-dump] - num bundles being pulled: 0 [state-dump] - num pull retries: 0 [state-dump] - max timeout seconds: 0 [state-dump] - max timeout request is already processed. No entry. [state-dump] [state-dump] WorkerPool: [state-dump] - registered jobs: 1 [state-dump] - process_failed_job_config_missing: 0 [state-dump] - process_failed_rate_limited: 0 [state-dump] - process_failed_pending_registration: 0 [state-dump] - process_failed_runtime_env_setup_failed: 0 [state-dump] - num PYTHON workers: 20 [state-dump] - num PYTHON drivers: 1 [state-dump] - num PYTHON pending start requests: 0 [state-dump] - num PYTHON pending registration requests: 0 [state-dump] - num object spill callbacks queued: 0 [state-dump] - num object restore queued: 0 [state-dump] - num util functions queued: 0 [state-dump] - num idle workers: 20 [state-dump] TaskDependencyManager: [state-dump] - task deps map size: 0 [state-dump] - get req map size: 0 [state-dump] - wait req map size: 0 [state-dump] - local objects map size: 0 [state-dump] WaitManager: [state-dump] - num active wait requests: 0 [state-dump] Subscriber: [state-dump] Channel WORKER_OBJECT_LOCATIONS_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_OBJECT_EVICTION [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_REF_REMOVED_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] num async plasma notifications: 0 [state-dump] Remote node managers: [state-dump] Event stats: [state-dump] Global stats: 612586 total (35 active) [state-dump] Queueing time: mean = 181.460 us, max = 59.826 s, min = -0.001 s, total = 111.160 s [state-dump] Execution time: mean = 10.953 ms, total = 6709.959 s [state-dump] Event stats: [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog - 147378 total (0 active), Execution time: mean = 532.892 us, total = 78.537 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 147378 total (0 active), Execution time: mean = 35.587 us, total = 5.245 s, Queueing time: mean = 111.656 us, max = 3.003 ms, min = 1.846 us, total = 16.456 s [state-dump] RaySyncer.OnDemandBroadcasting - 70132 total (1 active), Execution time: mean = 11.120 us, total = 779.834 ms, Queueing time: mean = 93.002 us, max = 25.869 ms, min = 6.166 us, total = 6.522 s [state-dump] NodeManager.CheckGC - 70132 total (1 active), Execution time: mean = 3.772 us, total = 264.533 ms, Queueing time: mean = 99.487 us, max = 25.875 ms, min = -0.000 s, total = 6.977 s [state-dump] ObjectManager.UpdateAvailableMemory - 70132 total (0 active), Execution time: mean = 6.378 us, total = 447.285 ms, Queueing time: mean = 112.239 us, max = 45.939 ms, min = 2.228 us, total = 7.872 s [state-dump] RayletWorkerPool.deadline_timer.kill_idle_workers - 35086 total (1 active), Execution time: mean = 19.136 us, total = 671.393 ms, Queueing time: mean = 78.060 us, max = 26.386 ms, min = -0.001 s, total = 2.739 s [state-dump] MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 28022 total (1 active), Execution time: mean = 460.301 us, total = 12.899 s, Queueing time: mean = 76.645 us, max = 4.063 ms, min = -0.000 s, total = 2.148 s [state-dump] NodeManager.deadline_timer.flush_free_objects - 7020 total (1 active), Execution time: mean = 9.580 us, total = 67.254 ms, Queueing time: mean = 178.733 us, max = 2.947 ms, min = 2.735 us, total = 1.255 s [state-dump] NodeManager.ScheduleAndDispatchTasks - 7020 total (1 active), Execution time: mean = 17.563 us, total = 123.291 ms, Queueing time: mean = 76.148 us, max = 2.581 ms, min = 7.438 us, total = 534.558 ms [state-dump] NodeManager.deadline_timer.spill_objects_when_over_threshold - 7020 total (1 active), Execution time: mean = 3.170 us, total = 22.253 ms, Queueing time: mean = 183.069 us, max = 2.946 ms, min = 3.845 us, total = 1.285 s [state-dump] NodeManagerService.grpc_server.GetResourceLoad - 7018 total (0 active), Execution time: mean = 614.027 us, total = 4.309 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 7018 total (0 active), Execution time: mean = 97.898 us, total = 687.049 ms, Queueing time: mean = 113.667 us, max = 2.934 ms, min = 4.027 us, total = 797.716 ms [state-dump] ClusterResourceManager.ResetRemoteNodeView - 2341 total (1 active), Execution time: mean = 10.143 us, total = 23.744 ms, Queueing time: mean = 76.305 us, max = 564.135 us, min = 12.181 us, total = 178.631 ms [state-dump] NodeManager.deadline_timer.record_metrics - 1404 total (1 active), Execution time: mean = 584.321 us, total = 820.387 ms, Queueing time: mean = 345.997 us, max = 2.010 ms, min = 9.115 us, total = 485.780 ms [state-dump] NodeManager.GcsCheckAlive - 1404 total (1 active), Execution time: mean = 322.156 us, total = 452.307 ms, Queueing time: mean = 607.734 us, max = 2.567 ms, min = 6.690 us, total = 853.258 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 1404 total (0 active), Execution time: mean = 56.209 us, total = 78.918 ms, Queueing time: mean = 112.483 us, max = 4.779 ms, min = 11.561 us, total = 157.926 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 1404 total (0 active), Execution time: mean = 1.597 ms, total = 2.242 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.deadline_timer.debug_state_dump - 702 total (1 active), Execution time: mean = 1.786 ms, total = 1.254 s, Queueing time: mean = 72.766 us, max = 248.460 us, min = 11.609 us, total = 51.082 ms [state-dump] NodeManager.deadline_timer.print_event_loop_stats - 117 total (1 active, 1 running), Execution time: mean = 2.750 ms, total = 321.794 ms, Queueing time: mean = 76.475 us, max = 325.100 us, min = 9.635 us, total = 8.948 ms [state-dump] ClientConnection.async_read.ProcessMessageHeader - 95 total (21 active), Execution time: mean = 7.620 us, total = 723.861 us, Queueing time: mean = 660.815 ms, max = 59.826 s, min = 29.537 us, total = 62.777 s [state-dump] ClientConnection.async_read.ProcessMessage - 74 total (0 active), Execution time: mean = 752.373 us, total = 55.676 ms, Queueing time: mean = 18.898 us, max = 247.615 us, min = 4.453 us, total = 1.398 ms [state-dump] ClientConnection.async_write.DoAsyncWrites - 22 total (0 active), Execution time: mean = 2.129 us, total = 46.834 us, Queueing time: mean = 43.475 us, max = 264.443 us, min = 17.727 us, total = 956.459 us [state-dump] CoreWorkerService.grpc_client.LocalGC.OnReplyReceived - 21 total (0 active), Execution time: mean = 65.428 us, total = 1.374 ms, Queueing time: mean = 149.534 us, max = 223.485 us, min = 56.603 us, total = 3.140 ms [state-dump] ObjectManager.ObjectAdded - 21 total (0 active), Execution time: mean = 13.522 us, total = 283.961 us, Queueing time: mean = 115.200 us, max = 203.067 us, min = 15.853 us, total = 2.419 ms [state-dump] CoreWorkerService.grpc_client.LocalGC - 21 total (0 active), Execution time: mean = 93.349 ms, total = 1.960 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ObjectManager.ObjectDeleted - 21 total (0 active), Execution time: mean = 18.654 us, total = 391.728 us, Queueing time: mean = 66.932 us, max = 269.040 us, min = 24.280 us, total = 1.406 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 21 total (0 active), Execution time: mean = 119.783 us, total = 2.515 ms, Queueing time: mean = 119.709 us, max = 193.975 us, min = 26.529 us, total = 2.514 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig - 21 total (0 active), Execution time: mean = 800.451 us, total = 16.809 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 13 total (1 active), Execution time: mean = 507.509 s, total = 6597.612 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 204.482 us, total = 2.658 ms, Queueing time: mean = 3.109 ms, max = 9.915 ms, min = 28.359 us, total = 40.413 ms [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 12 total (0 active), Execution time: mean = 379.678 us, total = 4.556 ms, Queueing time: mean = 140.234 us, max = 279.406 us, min = 20.299 us, total = 1.683 ms [state-dump] RaySyncer.BroadcastMessage - 12 total (0 active), Execution time: mean = 229.790 us, total = 2.757 ms, Queueing time: mean = 796.500 ns, max = 1.330 us, min = 180.000 ns, total = 9.558 us [state-dump] - 12 total (0 active), Execution time: mean = 1.109 us, total = 13.308 us, Queueing time: mean = 117.235 us, max = 207.332 us, min = 32.817 us, total = 1.407 ms [state-dump] NodeManagerService.grpc_server.RequestWorkerLease.HandleRequestImpl - 10 total (0 active), Execution time: mean = 133.400 us, total = 1.334 ms, Queueing time: mean = 133.654 us, max = 258.016 us, min = 50.206 us, total = 1.337 ms [state-dump] NodeManagerService.grpc_server.RequestWorkerLease - 10 total (0 active), Execution time: mean = 967.372 us, total = 9.674 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] WorkerPool.PopWorkerCallback - 10 total (0 active), Execution time: mean = 24.488 us, total = 244.878 us, Queueing time: mean = 114.997 us, max = 206.869 us, min = 23.653 us, total = 1.150 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker - 10 total (0 active), Execution time: mean = 698.937 us, total = 6.989 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReturnWorker.HandleRequestImpl - 10 total (0 active), Execution time: mean = 129.906 us, total = 1.299 ms, Queueing time: mean = 109.463 us, max = 137.712 us, min = 31.332 us, total = 1.095 ms [state-dump] NodeManager.GCTaskFailureReason - 8 total (1 active), Execution time: mean = 7.585 us, total = 60.683 us, Queueing time: mean = 59.928 us, max = 97.290 us, min = 24.344 us, total = 479.424 us [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 1.320 ms, total = 2.640 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] RaySyncerRegister - 2 total (0 active), Execution time: mean = 2.053 us, total = 4.106 us, Queueing time: mean = 222.000 ns, max = 369.000 ns, min = 75.000 ns, total = 444.000 ns [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 142.119 us, total = 284.237 us, Queueing time: mean = 702.200 us, max = 1.287 ms, min = 117.240 us, total = 1.404 ms [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 92.080 us, total = 92.080 us, Queueing time: mean = 36.492 us, max = 36.492 us, min = 36.492 us, total = 36.492 us [state-dump] Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 74.750 us, total = 74.750 us, Queueing time: mean = 225.429 us, max = 225.429 us, min = 225.429 us, total = 225.429 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 1.746 ms, total = 1.746 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 1.935 ms, total = 1.935 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 1.022 s, total = 1.022 s, Queueing time: mean = 126.779 us, max = 126.779 us, min = 126.779 us, total = 126.779 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 2.178 ms, total = 2.178 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 43.810 us, total = 43.810 us, Queueing time: mean = 116.080 us, max = 116.080 us, min = 116.080 us, total = 116.080 us [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 2.002 ms, total = 2.002 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 1.278 ms, total = 1.278 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 288.995 us, total = 288.995 us, Queueing time: mean = 122.868 us, max = 122.868 us, min = 122.868 us, total = 122.868 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 242.297 us, total = 242.297 us, Queueing time: mean = 122.977 us, max = 122.977 us, min = 122.977 us, total = 122.977 us [state-dump] DebugString() time ms: 1 [state-dump] [state-dump] [2025-01-21 00:59:54,938 I 13636 13664] (raylet) store.cc:564: Plasma store debug dump: Current usage: 0 / 2.14748 GB - num bytes created total: 168 0 pending objects of total size 0MB - objects spillable: 0 - bytes spillable: 0 - objects unsealed: 0 - bytes unsealed: 0 - objects in use: 0 - bytes in use: 0 - objects evictable: 0 - bytes evictable: 0 - objects created by worker: 0 - bytes created by worker: 0 - objects restored: 0 - bytes restored: 0 - objects received: 0 - bytes received: 0 - objects errored: 0 - bytes errored: 0 [2025-01-21 00:59:56,261 I 13636 13636] (raylet) node_manager.cc:525: [state-dump] NodeManager: [state-dump] Node ID: 94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3 [state-dump] Node name: 192.168.0.2 [state-dump] InitialConfigResources: {object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, GPU: 20000, node:__internal_head__: 10000, accelerator_type:A40: 10000, memory: 844966629380000} [state-dump] ClusterTaskManager: [state-dump] ========== Node: 94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3 ================= [state-dump] Infeasible queue length: 0 [state-dump] Schedule queue length: 0 [state-dump] Dispatch queue length: 0 [state-dump] num_waiting_for_resource: 0 [state-dump] num_waiting_for_plasma_memory: 0 [state-dump] num_waiting_for_remote_node_resources: 0 [state-dump] num_worker_not_started_by_job_config_not_exist: 0 [state-dump] num_worker_not_started_by_registration_timeout: 0 [state-dump] num_tasks_waiting_for_workers: 0 [state-dump] num_cancelled_tasks: 0 [state-dump] cluster_resource_scheduler state: [state-dump] Local id: -2729954820020547849 Local resources: {"total":{CPU: [200000], memory: [844966629380000], accelerator_type:A40: [10000], node:192.168.0.2: [10000], node:__internal_head__: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000]}}, "available": {CPU: [200000], memory: [844966629380000], accelerator_type:A40: [10000], node:192.168.0.2: [10000], node:__internal_head__: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000]}}, "labels":{"ray.io/node_id":"94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3",} is_draining: 0 is_idle: 1 Cluster resources: node id: -2729954820020547849{"total":{object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, GPU: 20000, node:__internal_head__: 10000, accelerator_type:A40: 10000, memory: 844966629380000}}, "available": {GPU: 20000, node:__internal_head__: 10000, memory: 844966629380000, accelerator_type:A40: 10000, object_store_memory: 21474836480000, node:192.168.0.2: 10000, CPU: 200000}}, "labels":{"ray.io/node_id":"94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} [state-dump] Waiting tasks size: 0 [state-dump] Number of executing tasks: 0 [state-dump] Number of pinned task arguments: 0 [state-dump] Number of total spilled tasks: 0 [state-dump] Number of spilled waiting tasks: 0 [state-dump] Number of spilled unschedulable tasks: 0 [state-dump] Resource usage { [state-dump] } [state-dump] Backlog Size per scheduling descriptor :{workerId: num backlogs}: [state-dump] [state-dump] Running tasks by scheduling class: [state-dump] ================================================== [state-dump] [state-dump] ClusterResources: [state-dump] LocalObjectManager: [state-dump] - num pinned objects: 0 [state-dump] - pinned objects size: 0 [state-dump] - num objects pending restore: 0 [state-dump] - num objects pending spill: 0 [state-dump] - num bytes pending spill: 0 [state-dump] - num bytes currently spilled: 0 [state-dump] - cumulative spill requests: 0 [state-dump] - cumulative restore requests: 0 [state-dump] - spilled objects pending delete: 0 [state-dump] [state-dump] ObjectManager: [state-dump] - num local objects: 0 [state-dump] - num unfulfilled push requests: 0 [state-dump] - num object pull requests: 0 [state-dump] - num chunks received total: 0 [state-dump] - num chunks received failed (all): 0 [state-dump] - num chunks received failed / cancelled: 0 [state-dump] - num chunks received failed / plasma error: 0 [state-dump] Event stats: [state-dump] Global stats: 0 total (0 active) [state-dump] Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Execution time: mean = -nan s, total = 0.000 s [state-dump] Event stats: [state-dump] PushManager: [state-dump] - num pushes in flight: 0 [state-dump] - num chunks in flight: 0 [state-dump] - num chunks remaining: 0 [state-dump] - max chunks allowed: 409 [state-dump] OwnershipBasedObjectDirectory: [state-dump] - num listeners: 0 [state-dump] - cumulative location updates: 0 [state-dump] - num location updates per second: 0.000 [state-dump] - num location lookups per second: 0.000 [state-dump] - num locations added per second: 0.000 [state-dump] - num locations removed per second: 0.000 [state-dump] BufferPool: [state-dump] - create buffer state map size: 0 [state-dump] PullManager: [state-dump] - num bytes available for pulled objects: 2147483648 [state-dump] - num bytes being pulled (all): 0 [state-dump] - num bytes being pulled / pinned: 0 [state-dump] - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - first get request bundle: N/A [state-dump] - first wait request bundle: N/A [state-dump] - first task request bundle: N/A [state-dump] - num objects queued: 0 [state-dump] - num objects actively pulled (all): 0 [state-dump] - num objects actively pulled / pinned: 0 [state-dump] - num bundles being pulled: 0 [state-dump] - num pull retries: 0 [state-dump] - max timeout seconds: 0 [state-dump] - max timeout request is already processed. No entry. [state-dump] [state-dump] WorkerPool: [state-dump] - registered jobs: 1 [state-dump] - process_failed_job_config_missing: 0 [state-dump] - process_failed_rate_limited: 0 [state-dump] - process_failed_pending_registration: 0 [state-dump] - process_failed_runtime_env_setup_failed: 0 [state-dump] - num PYTHON workers: 20 [state-dump] - num PYTHON drivers: 1 [state-dump] - num PYTHON pending start requests: 0 [state-dump] - num PYTHON pending registration requests: 0 [state-dump] - num object spill callbacks queued: 0 [state-dump] - num object restore queued: 0 [state-dump] - num util functions queued: 0 [state-dump] - num idle workers: 20 [state-dump] TaskDependencyManager: [state-dump] - task deps map size: 0 [state-dump] - get req map size: 0 [state-dump] - wait req map size: 0 [state-dump] - local objects map size: 0 [state-dump] WaitManager: [state-dump] - num active wait requests: 0 [state-dump] Subscriber: [state-dump] Channel WORKER_OBJECT_LOCATIONS_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_OBJECT_EVICTION [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_REF_REMOVED_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] num async plasma notifications: 0 [state-dump] Remote node managers: [state-dump] Event stats: [state-dump] Global stats: 617818 total (35 active) [state-dump] Queueing time: mean = 180.644 us, max = 59.826 s, min = -0.001 s, total = 111.605 s [state-dump] Execution time: mean = 10.862 ms, total = 6710.877 s [state-dump] Event stats: [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog - 148638 total (0 active), Execution time: mean = 532.820 us, total = 79.197 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 148638 total (0 active), Execution time: mean = 35.557 us, total = 5.285 s, Queueing time: mean = 111.834 us, max = 3.855 ms, min = 1.846 us, total = 16.623 s [state-dump] RaySyncer.OnDemandBroadcasting - 70731 total (1 active), Execution time: mean = 11.127 us, total = 787.011 ms, Queueing time: mean = 93.029 us, max = 25.869 ms, min = 6.166 us, total = 6.580 s [state-dump] NodeManager.CheckGC - 70731 total (1 active), Execution time: mean = 3.764 us, total = 266.235 ms, Queueing time: mean = 99.528 us, max = 25.875 ms, min = -0.000 s, total = 7.040 s [state-dump] ObjectManager.UpdateAvailableMemory - 70731 total (0 active), Execution time: mean = 6.378 us, total = 451.151 ms, Queueing time: mean = 112.180 us, max = 45.939 ms, min = 2.228 us, total = 7.935 s [state-dump] RayletWorkerPool.deadline_timer.kill_idle_workers - 35386 total (1 active), Execution time: mean = 19.154 us, total = 677.797 ms, Queueing time: mean = 78.102 us, max = 26.386 ms, min = -0.001 s, total = 2.764 s [state-dump] MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 28262 total (1 active), Execution time: mean = 460.173 us, total = 13.005 s, Queueing time: mean = 76.649 us, max = 4.063 ms, min = -0.000 s, total = 2.166 s [state-dump] NodeManager.deadline_timer.flush_free_objects - 7080 total (1 active), Execution time: mean = 9.579 us, total = 67.818 ms, Queueing time: mean = 178.903 us, max = 2.947 ms, min = 2.735 us, total = 1.267 s [state-dump] NodeManager.ScheduleAndDispatchTasks - 7080 total (1 active), Execution time: mean = 17.570 us, total = 124.398 ms, Queueing time: mean = 76.218 us, max = 2.581 ms, min = 7.438 us, total = 539.625 ms [state-dump] NodeManager.deadline_timer.spill_objects_when_over_threshold - 7080 total (1 active), Execution time: mean = 3.170 us, total = 22.445 ms, Queueing time: mean = 183.240 us, max = 2.946 ms, min = 3.845 us, total = 1.297 s [state-dump] NodeManagerService.grpc_server.GetResourceLoad - 7078 total (0 active), Execution time: mean = 613.864 us, total = 4.345 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 7078 total (0 active), Execution time: mean = 97.830 us, total = 692.441 ms, Queueing time: mean = 113.640 us, max = 2.934 ms, min = 4.027 us, total = 804.342 ms [state-dump] ClusterResourceManager.ResetRemoteNodeView - 2361 total (1 active), Execution time: mean = 10.139 us, total = 23.938 ms, Queueing time: mean = 76.435 us, max = 564.135 us, min = 12.181 us, total = 180.463 ms [state-dump] NodeManager.deadline_timer.record_metrics - 1416 total (1 active), Execution time: mean = 585.319 us, total = 828.812 ms, Queueing time: mean = 345.538 us, max = 2.010 ms, min = 9.115 us, total = 489.282 ms [state-dump] NodeManager.GcsCheckAlive - 1416 total (1 active), Execution time: mean = 322.426 us, total = 456.556 ms, Queueing time: mean = 608.302 us, max = 2.567 ms, min = 6.690 us, total = 861.355 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 1416 total (0 active), Execution time: mean = 56.224 us, total = 79.613 ms, Queueing time: mean = 112.682 us, max = 4.779 ms, min = 11.561 us, total = 159.558 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 1416 total (0 active), Execution time: mean = 1.597 ms, total = 2.262 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.deadline_timer.debug_state_dump - 708 total (1 active), Execution time: mean = 1.786 ms, total = 1.265 s, Queueing time: mean = 72.697 us, max = 248.460 us, min = 11.609 us, total = 51.469 ms [state-dump] NodeManager.deadline_timer.print_event_loop_stats - 118 total (1 active, 1 running), Execution time: mean = 2.753 ms, total = 324.837 ms, Queueing time: mean = 76.284 us, max = 325.100 us, min = 9.635 us, total = 9.001 ms [state-dump] ClientConnection.async_read.ProcessMessageHeader - 95 total (21 active), Execution time: mean = 7.620 us, total = 723.861 us, Queueing time: mean = 660.815 ms, max = 59.826 s, min = 29.537 us, total = 62.777 s [state-dump] ClientConnection.async_read.ProcessMessage - 74 total (0 active), Execution time: mean = 752.373 us, total = 55.676 ms, Queueing time: mean = 18.898 us, max = 247.615 us, min = 4.453 us, total = 1.398 ms [state-dump] ClientConnection.async_write.DoAsyncWrites - 22 total (0 active), Execution time: mean = 2.129 us, total = 46.834 us, Queueing time: mean = 43.475 us, max = 264.443 us, min = 17.727 us, total = 956.459 us [state-dump] CoreWorkerService.grpc_client.LocalGC.OnReplyReceived - 21 total (0 active), Execution time: mean = 65.428 us, total = 1.374 ms, Queueing time: mean = 149.534 us, max = 223.485 us, min = 56.603 us, total = 3.140 ms [state-dump] ObjectManager.ObjectAdded - 21 total (0 active), Execution time: mean = 13.522 us, total = 283.961 us, Queueing time: mean = 115.200 us, max = 203.067 us, min = 15.853 us, total = 2.419 ms [state-dump] CoreWorkerService.grpc_client.LocalGC - 21 total (0 active), Execution time: mean = 93.349 ms, total = 1.960 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ObjectManager.ObjectDeleted - 21 total (0 active), Execution time: mean = 18.654 us, total = 391.728 us, Queueing time: mean = 66.932 us, max = 269.040 us, min = 24.280 us, total = 1.406 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 21 total (0 active), Execution time: mean = 119.783 us, total = 2.515 ms, Queueing time: mean = 119.709 us, max = 193.975 us, min = 26.529 us, total = 2.514 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig - 21 total (0 active), Execution time: mean = 800.451 us, total = 16.809 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 13 total (1 active), Execution time: mean = 507.509 s, total = 6597.612 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 204.482 us, total = 2.658 ms, Queueing time: mean = 3.109 ms, max = 9.915 ms, min = 28.359 us, total = 40.413 ms [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 12 total (0 active), Execution time: mean = 379.678 us, total = 4.556 ms, Queueing time: mean = 140.234 us, max = 279.406 us, min = 20.299 us, total = 1.683 ms [state-dump] RaySyncer.BroadcastMessage - 12 total (0 active), Execution time: mean = 229.790 us, total = 2.757 ms, Queueing time: mean = 796.500 ns, max = 1.330 us, min = 180.000 ns, total = 9.558 us [state-dump] - 12 total (0 active), Execution time: mean = 1.109 us, total = 13.308 us, Queueing time: mean = 117.235 us, max = 207.332 us, min = 32.817 us, total = 1.407 ms [state-dump] NodeManagerService.grpc_server.RequestWorkerLease.HandleRequestImpl - 10 total (0 active), Execution time: mean = 133.400 us, total = 1.334 ms, Queueing time: mean = 133.654 us, max = 258.016 us, min = 50.206 us, total = 1.337 ms [state-dump] NodeManagerService.grpc_server.RequestWorkerLease - 10 total (0 active), Execution time: mean = 967.372 us, total = 9.674 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] WorkerPool.PopWorkerCallback - 10 total (0 active), Execution time: mean = 24.488 us, total = 244.878 us, Queueing time: mean = 114.997 us, max = 206.869 us, min = 23.653 us, total = 1.150 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker - 10 total (0 active), Execution time: mean = 698.937 us, total = 6.989 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReturnWorker.HandleRequestImpl - 10 total (0 active), Execution time: mean = 129.906 us, total = 1.299 ms, Queueing time: mean = 109.463 us, max = 137.712 us, min = 31.332 us, total = 1.095 ms [state-dump] NodeManager.GCTaskFailureReason - 8 total (1 active), Execution time: mean = 7.585 us, total = 60.683 us, Queueing time: mean = 59.928 us, max = 97.290 us, min = 24.344 us, total = 479.424 us [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 1.320 ms, total = 2.640 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] RaySyncerRegister - 2 total (0 active), Execution time: mean = 2.053 us, total = 4.106 us, Queueing time: mean = 222.000 ns, max = 369.000 ns, min = 75.000 ns, total = 444.000 ns [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 142.119 us, total = 284.237 us, Queueing time: mean = 702.200 us, max = 1.287 ms, min = 117.240 us, total = 1.404 ms [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 92.080 us, total = 92.080 us, Queueing time: mean = 36.492 us, max = 36.492 us, min = 36.492 us, total = 36.492 us [state-dump] Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 74.750 us, total = 74.750 us, Queueing time: mean = 225.429 us, max = 225.429 us, min = 225.429 us, total = 225.429 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 1.746 ms, total = 1.746 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 1.935 ms, total = 1.935 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 1.022 s, total = 1.022 s, Queueing time: mean = 126.779 us, max = 126.779 us, min = 126.779 us, total = 126.779 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 2.178 ms, total = 2.178 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 43.810 us, total = 43.810 us, Queueing time: mean = 116.080 us, max = 116.080 us, min = 116.080 us, total = 116.080 us [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 2.002 ms, total = 2.002 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 1.278 ms, total = 1.278 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 288.995 us, total = 288.995 us, Queueing time: mean = 122.868 us, max = 122.868 us, min = 122.868 us, total = 122.868 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 242.297 us, total = 242.297 us, Queueing time: mean = 122.977 us, max = 122.977 us, min = 122.977 us, total = 122.977 us [state-dump] DebugString() time ms: 1 [state-dump] [state-dump] [2025-01-21 01:00:54,939 I 13636 13664] (raylet) store.cc:564: Plasma store debug dump: Current usage: 0 / 2.14748 GB - num bytes created total: 168 0 pending objects of total size 0MB - objects spillable: 0 - bytes spillable: 0 - objects unsealed: 0 - bytes unsealed: 0 - objects in use: 0 - bytes in use: 0 - objects evictable: 0 - bytes evictable: 0 - objects created by worker: 0 - bytes created by worker: 0 - objects restored: 0 - bytes restored: 0 - objects received: 0 - bytes received: 0 - objects errored: 0 - bytes errored: 0 [2025-01-21 01:00:56,264 I 13636 13636] (raylet) node_manager.cc:525: [state-dump] NodeManager: [state-dump] Node ID: 94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3 [state-dump] Node name: 192.168.0.2 [state-dump] InitialConfigResources: {object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, GPU: 20000, node:__internal_head__: 10000, accelerator_type:A40: 10000, memory: 844966629380000} [state-dump] ClusterTaskManager: [state-dump] ========== Node: 94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3 ================= [state-dump] Infeasible queue length: 0 [state-dump] Schedule queue length: 0 [state-dump] Dispatch queue length: 0 [state-dump] num_waiting_for_resource: 0 [state-dump] num_waiting_for_plasma_memory: 0 [state-dump] num_waiting_for_remote_node_resources: 0 [state-dump] num_worker_not_started_by_job_config_not_exist: 0 [state-dump] num_worker_not_started_by_registration_timeout: 0 [state-dump] num_tasks_waiting_for_workers: 0 [state-dump] num_cancelled_tasks: 0 [state-dump] cluster_resource_scheduler state: [state-dump] Local id: -2729954820020547849 Local resources: {"total":{CPU: [200000], memory: [844966629380000], accelerator_type:A40: [10000], node:192.168.0.2: [10000], node:__internal_head__: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000]}}, "available": {CPU: [200000], memory: [844966629380000], accelerator_type:A40: [10000], node:192.168.0.2: [10000], node:__internal_head__: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000]}}, "labels":{"ray.io/node_id":"94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3",} is_draining: 0 is_idle: 1 Cluster resources: node id: -2729954820020547849{"total":{object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, GPU: 20000, node:__internal_head__: 10000, accelerator_type:A40: 10000, memory: 844966629380000}}, "available": {GPU: 20000, node:__internal_head__: 10000, memory: 844966629380000, accelerator_type:A40: 10000, object_store_memory: 21474836480000, node:192.168.0.2: 10000, CPU: 200000}}, "labels":{"ray.io/node_id":"94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} [state-dump] Waiting tasks size: 0 [state-dump] Number of executing tasks: 0 [state-dump] Number of pinned task arguments: 0 [state-dump] Number of total spilled tasks: 0 [state-dump] Number of spilled waiting tasks: 0 [state-dump] Number of spilled unschedulable tasks: 0 [state-dump] Resource usage { [state-dump] } [state-dump] Backlog Size per scheduling descriptor :{workerId: num backlogs}: [state-dump] [state-dump] Running tasks by scheduling class: [state-dump] ================================================== [state-dump] [state-dump] ClusterResources: [state-dump] LocalObjectManager: [state-dump] - num pinned objects: 0 [state-dump] - pinned objects size: 0 [state-dump] - num objects pending restore: 0 [state-dump] - num objects pending spill: 0 [state-dump] - num bytes pending spill: 0 [state-dump] - num bytes currently spilled: 0 [state-dump] - cumulative spill requests: 0 [state-dump] - cumulative restore requests: 0 [state-dump] - spilled objects pending delete: 0 [state-dump] [state-dump] ObjectManager: [state-dump] - num local objects: 0 [state-dump] - num unfulfilled push requests: 0 [state-dump] - num object pull requests: 0 [state-dump] - num chunks received total: 0 [state-dump] - num chunks received failed (all): 0 [state-dump] - num chunks received failed / cancelled: 0 [state-dump] - num chunks received failed / plasma error: 0 [state-dump] Event stats: [state-dump] Global stats: 0 total (0 active) [state-dump] Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Execution time: mean = -nan s, total = 0.000 s [state-dump] Event stats: [state-dump] PushManager: [state-dump] - num pushes in flight: 0 [state-dump] - num chunks in flight: 0 [state-dump] - num chunks remaining: 0 [state-dump] - max chunks allowed: 409 [state-dump] OwnershipBasedObjectDirectory: [state-dump] - num listeners: 0 [state-dump] - cumulative location updates: 0 [state-dump] - num location updates per second: 0.000 [state-dump] - num location lookups per second: 0.000 [state-dump] - num locations added per second: 0.000 [state-dump] - num locations removed per second: 0.000 [state-dump] BufferPool: [state-dump] - create buffer state map size: 0 [state-dump] PullManager: [state-dump] - num bytes available for pulled objects: 2147483648 [state-dump] - num bytes being pulled (all): 0 [state-dump] - num bytes being pulled / pinned: 0 [state-dump] - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - first get request bundle: N/A [state-dump] - first wait request bundle: N/A [state-dump] - first task request bundle: N/A [state-dump] - num objects queued: 0 [state-dump] - num objects actively pulled (all): 0 [state-dump] - num objects actively pulled / pinned: 0 [state-dump] - num bundles being pulled: 0 [state-dump] - num pull retries: 0 [state-dump] - max timeout seconds: 0 [state-dump] - max timeout request is already processed. No entry. [state-dump] [state-dump] WorkerPool: [state-dump] - registered jobs: 1 [state-dump] - process_failed_job_config_missing: 0 [state-dump] - process_failed_rate_limited: 0 [state-dump] - process_failed_pending_registration: 0 [state-dump] - process_failed_runtime_env_setup_failed: 0 [state-dump] - num PYTHON workers: 20 [state-dump] - num PYTHON drivers: 1 [state-dump] - num PYTHON pending start requests: 0 [state-dump] - num PYTHON pending registration requests: 0 [state-dump] - num object spill callbacks queued: 0 [state-dump] - num object restore queued: 0 [state-dump] - num util functions queued: 0 [state-dump] - num idle workers: 20 [state-dump] TaskDependencyManager: [state-dump] - task deps map size: 0 [state-dump] - get req map size: 0 [state-dump] - wait req map size: 0 [state-dump] - local objects map size: 0 [state-dump] WaitManager: [state-dump] - num active wait requests: 0 [state-dump] Subscriber: [state-dump] Channel WORKER_OBJECT_LOCATIONS_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_OBJECT_EVICTION [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_REF_REMOVED_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] num async plasma notifications: 0 [state-dump] Remote node managers: [state-dump] Event stats: [state-dump] Global stats: 623051 total (35 active) [state-dump] Queueing time: mean = 179.722 us, max = 59.826 s, min = -0.001 s, total = 111.976 s [state-dump] Execution time: mean = 10.772 ms, total = 6711.701 s [state-dump] Event stats: [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog - 149898 total (0 active), Execution time: mean = 532.245 us, total = 79.782 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 149898 total (0 active), Execution time: mean = 35.518 us, total = 5.324 s, Queueing time: mean = 111.736 us, max = 3.855 ms, min = 1.846 us, total = 16.749 s [state-dump] RaySyncer.OnDemandBroadcasting - 71331 total (1 active), Execution time: mean = 11.119 us, total = 793.123 ms, Queueing time: mean = 92.944 us, max = 25.869 ms, min = 6.166 us, total = 6.630 s [state-dump] NodeManager.CheckGC - 71331 total (1 active), Execution time: mean = 3.756 us, total = 267.890 ms, Queueing time: mean = 99.443 us, max = 25.875 ms, min = -0.000 s, total = 7.093 s [state-dump] ObjectManager.UpdateAvailableMemory - 71331 total (0 active), Execution time: mean = 6.371 us, total = 454.456 ms, Queueing time: mean = 112.074 us, max = 45.939 ms, min = 2.228 us, total = 7.994 s [state-dump] RayletWorkerPool.deadline_timer.kill_idle_workers - 35685 total (1 active), Execution time: mean = 19.139 us, total = 682.988 ms, Queueing time: mean = 78.004 us, max = 26.386 ms, min = -0.001 s, total = 2.784 s [state-dump] MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 28501 total (1 active), Execution time: mean = 459.923 us, total = 13.108 s, Queueing time: mean = 76.573 us, max = 4.063 ms, min = -0.000 s, total = 2.182 s [state-dump] NodeManager.deadline_timer.flush_free_objects - 7140 total (1 active), Execution time: mean = 9.574 us, total = 68.361 ms, Queueing time: mean = 178.846 us, max = 2.947 ms, min = 2.735 us, total = 1.277 s [state-dump] NodeManager.ScheduleAndDispatchTasks - 7140 total (1 active), Execution time: mean = 17.561 us, total = 125.384 ms, Queueing time: mean = 76.166 us, max = 2.581 ms, min = 7.438 us, total = 543.822 ms [state-dump] NodeManager.deadline_timer.spill_objects_when_over_threshold - 7140 total (1 active), Execution time: mean = 3.170 us, total = 22.631 ms, Queueing time: mean = 183.178 us, max = 2.946 ms, min = 3.845 us, total = 1.308 s [state-dump] NodeManagerService.grpc_server.GetResourceLoad - 7138 total (0 active), Execution time: mean = 613.190 us, total = 4.377 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 7138 total (0 active), Execution time: mean = 97.752 us, total = 697.753 ms, Queueing time: mean = 113.528 us, max = 2.934 ms, min = 4.027 us, total = 810.366 ms [state-dump] ClusterResourceManager.ResetRemoteNodeView - 2381 total (1 active), Execution time: mean = 10.119 us, total = 24.092 ms, Queueing time: mean = 76.290 us, max = 564.135 us, min = 12.181 us, total = 181.647 ms [state-dump] NodeManager.deadline_timer.record_metrics - 1428 total (1 active), Execution time: mean = 585.203 us, total = 835.669 ms, Queueing time: mean = 345.690 us, max = 2.010 ms, min = 9.115 us, total = 493.646 ms [state-dump] NodeManager.GcsCheckAlive - 1428 total (1 active), Execution time: mean = 322.240 us, total = 460.158 ms, Queueing time: mean = 608.196 us, max = 2.567 ms, min = 6.690 us, total = 868.504 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 1428 total (0 active), Execution time: mean = 56.192 us, total = 80.242 ms, Queueing time: mean = 112.510 us, max = 4.779 ms, min = 11.561 us, total = 160.664 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 1428 total (0 active), Execution time: mean = 1.595 ms, total = 2.278 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.deadline_timer.debug_state_dump - 714 total (1 active), Execution time: mean = 1.787 ms, total = 1.276 s, Queueing time: mean = 72.687 us, max = 248.460 us, min = 11.609 us, total = 51.899 ms [state-dump] NodeManager.deadline_timer.print_event_loop_stats - 119 total (1 active, 1 running), Execution time: mean = 2.757 ms, total = 328.111 ms, Queueing time: mean = 76.192 us, max = 325.100 us, min = 9.635 us, total = 9.067 ms [state-dump] ClientConnection.async_read.ProcessMessageHeader - 95 total (21 active), Execution time: mean = 7.620 us, total = 723.861 us, Queueing time: mean = 660.815 ms, max = 59.826 s, min = 29.537 us, total = 62.777 s [state-dump] ClientConnection.async_read.ProcessMessage - 74 total (0 active), Execution time: mean = 752.373 us, total = 55.676 ms, Queueing time: mean = 18.898 us, max = 247.615 us, min = 4.453 us, total = 1.398 ms [state-dump] ClientConnection.async_write.DoAsyncWrites - 22 total (0 active), Execution time: mean = 2.129 us, total = 46.834 us, Queueing time: mean = 43.475 us, max = 264.443 us, min = 17.727 us, total = 956.459 us [state-dump] CoreWorkerService.grpc_client.LocalGC.OnReplyReceived - 21 total (0 active), Execution time: mean = 65.428 us, total = 1.374 ms, Queueing time: mean = 149.534 us, max = 223.485 us, min = 56.603 us, total = 3.140 ms [state-dump] ObjectManager.ObjectAdded - 21 total (0 active), Execution time: mean = 13.522 us, total = 283.961 us, Queueing time: mean = 115.200 us, max = 203.067 us, min = 15.853 us, total = 2.419 ms [state-dump] CoreWorkerService.grpc_client.LocalGC - 21 total (0 active), Execution time: mean = 93.349 ms, total = 1.960 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ObjectManager.ObjectDeleted - 21 total (0 active), Execution time: mean = 18.654 us, total = 391.728 us, Queueing time: mean = 66.932 us, max = 269.040 us, min = 24.280 us, total = 1.406 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 21 total (0 active), Execution time: mean = 119.783 us, total = 2.515 ms, Queueing time: mean = 119.709 us, max = 193.975 us, min = 26.529 us, total = 2.514 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig - 21 total (0 active), Execution time: mean = 800.451 us, total = 16.809 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 13 total (1 active), Execution time: mean = 507.509 s, total = 6597.612 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 204.482 us, total = 2.658 ms, Queueing time: mean = 3.109 ms, max = 9.915 ms, min = 28.359 us, total = 40.413 ms [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 12 total (0 active), Execution time: mean = 379.678 us, total = 4.556 ms, Queueing time: mean = 140.234 us, max = 279.406 us, min = 20.299 us, total = 1.683 ms [state-dump] RaySyncer.BroadcastMessage - 12 total (0 active), Execution time: mean = 229.790 us, total = 2.757 ms, Queueing time: mean = 796.500 ns, max = 1.330 us, min = 180.000 ns, total = 9.558 us [state-dump] - 12 total (0 active), Execution time: mean = 1.109 us, total = 13.308 us, Queueing time: mean = 117.235 us, max = 207.332 us, min = 32.817 us, total = 1.407 ms [state-dump] NodeManagerService.grpc_server.RequestWorkerLease.HandleRequestImpl - 10 total (0 active), Execution time: mean = 133.400 us, total = 1.334 ms, Queueing time: mean = 133.654 us, max = 258.016 us, min = 50.206 us, total = 1.337 ms [state-dump] NodeManagerService.grpc_server.RequestWorkerLease - 10 total (0 active), Execution time: mean = 967.372 us, total = 9.674 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] WorkerPool.PopWorkerCallback - 10 total (0 active), Execution time: mean = 24.488 us, total = 244.878 us, Queueing time: mean = 114.997 us, max = 206.869 us, min = 23.653 us, total = 1.150 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker - 10 total (0 active), Execution time: mean = 698.937 us, total = 6.989 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReturnWorker.HandleRequestImpl - 10 total (0 active), Execution time: mean = 129.906 us, total = 1.299 ms, Queueing time: mean = 109.463 us, max = 137.712 us, min = 31.332 us, total = 1.095 ms [state-dump] NodeManager.GCTaskFailureReason - 8 total (1 active), Execution time: mean = 7.585 us, total = 60.683 us, Queueing time: mean = 59.928 us, max = 97.290 us, min = 24.344 us, total = 479.424 us [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 1.320 ms, total = 2.640 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] RaySyncerRegister - 2 total (0 active), Execution time: mean = 2.053 us, total = 4.106 us, Queueing time: mean = 222.000 ns, max = 369.000 ns, min = 75.000 ns, total = 444.000 ns [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 142.119 us, total = 284.237 us, Queueing time: mean = 702.200 us, max = 1.287 ms, min = 117.240 us, total = 1.404 ms [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 92.080 us, total = 92.080 us, Queueing time: mean = 36.492 us, max = 36.492 us, min = 36.492 us, total = 36.492 us [state-dump] Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 74.750 us, total = 74.750 us, Queueing time: mean = 225.429 us, max = 225.429 us, min = 225.429 us, total = 225.429 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 1.746 ms, total = 1.746 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 1.935 ms, total = 1.935 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 1.022 s, total = 1.022 s, Queueing time: mean = 126.779 us, max = 126.779 us, min = 126.779 us, total = 126.779 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 2.178 ms, total = 2.178 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 43.810 us, total = 43.810 us, Queueing time: mean = 116.080 us, max = 116.080 us, min = 116.080 us, total = 116.080 us [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 2.002 ms, total = 2.002 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 1.278 ms, total = 1.278 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 288.995 us, total = 288.995 us, Queueing time: mean = 122.868 us, max = 122.868 us, min = 122.868 us, total = 122.868 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 242.297 us, total = 242.297 us, Queueing time: mean = 122.977 us, max = 122.977 us, min = 122.977 us, total = 122.977 us [state-dump] DebugString() time ms: 1 [state-dump] [state-dump] [2025-01-21 01:01:54,939 I 13636 13664] (raylet) store.cc:564: Plasma store debug dump: Current usage: 0 / 2.14748 GB - num bytes created total: 168 0 pending objects of total size 0MB - objects spillable: 0 - bytes spillable: 0 - objects unsealed: 0 - bytes unsealed: 0 - objects in use: 0 - bytes in use: 0 - objects evictable: 0 - bytes evictable: 0 - objects created by worker: 0 - bytes created by worker: 0 - objects restored: 0 - bytes restored: 0 - objects received: 0 - bytes received: 0 - objects errored: 0 - bytes errored: 0 [2025-01-21 01:01:56,268 I 13636 13636] (raylet) node_manager.cc:525: [state-dump] NodeManager: [state-dump] Node ID: 94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3 [state-dump] Node name: 192.168.0.2 [state-dump] InitialConfigResources: {object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, GPU: 20000, node:__internal_head__: 10000, accelerator_type:A40: 10000, memory: 844966629380000} [state-dump] ClusterTaskManager: [state-dump] ========== Node: 94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3 ================= [state-dump] Infeasible queue length: 0 [state-dump] Schedule queue length: 0 [state-dump] Dispatch queue length: 0 [state-dump] num_waiting_for_resource: 0 [state-dump] num_waiting_for_plasma_memory: 0 [state-dump] num_waiting_for_remote_node_resources: 0 [state-dump] num_worker_not_started_by_job_config_not_exist: 0 [state-dump] num_worker_not_started_by_registration_timeout: 0 [state-dump] num_tasks_waiting_for_workers: 0 [state-dump] num_cancelled_tasks: 0 [state-dump] cluster_resource_scheduler state: [state-dump] Local id: -2729954820020547849 Local resources: {"total":{CPU: [200000], memory: [844966629380000], accelerator_type:A40: [10000], node:192.168.0.2: [10000], node:__internal_head__: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000]}}, "available": {CPU: [200000], memory: [844966629380000], accelerator_type:A40: [10000], node:192.168.0.2: [10000], node:__internal_head__: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000]}}, "labels":{"ray.io/node_id":"94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3",} is_draining: 0 is_idle: 1 Cluster resources: node id: -2729954820020547849{"total":{object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, GPU: 20000, node:__internal_head__: 10000, accelerator_type:A40: 10000, memory: 844966629380000}}, "available": {GPU: 20000, node:__internal_head__: 10000, memory: 844966629380000, accelerator_type:A40: 10000, object_store_memory: 21474836480000, node:192.168.0.2: 10000, CPU: 200000}}, "labels":{"ray.io/node_id":"94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} [state-dump] Waiting tasks size: 0 [state-dump] Number of executing tasks: 0 [state-dump] Number of pinned task arguments: 0 [state-dump] Number of total spilled tasks: 0 [state-dump] Number of spilled waiting tasks: 0 [state-dump] Number of spilled unschedulable tasks: 0 [state-dump] Resource usage { [state-dump] } [state-dump] Backlog Size per scheduling descriptor :{workerId: num backlogs}: [state-dump] [state-dump] Running tasks by scheduling class: [state-dump] ================================================== [state-dump] [state-dump] ClusterResources: [state-dump] LocalObjectManager: [state-dump] - num pinned objects: 0 [state-dump] - pinned objects size: 0 [state-dump] - num objects pending restore: 0 [state-dump] - num objects pending spill: 0 [state-dump] - num bytes pending spill: 0 [state-dump] - num bytes currently spilled: 0 [state-dump] - cumulative spill requests: 0 [state-dump] - cumulative restore requests: 0 [state-dump] - spilled objects pending delete: 0 [state-dump] [state-dump] ObjectManager: [state-dump] - num local objects: 0 [state-dump] - num unfulfilled push requests: 0 [state-dump] - num object pull requests: 0 [state-dump] - num chunks received total: 0 [state-dump] - num chunks received failed (all): 0 [state-dump] - num chunks received failed / cancelled: 0 [state-dump] - num chunks received failed / plasma error: 0 [state-dump] Event stats: [state-dump] Global stats: 0 total (0 active) [state-dump] Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Execution time: mean = -nan s, total = 0.000 s [state-dump] Event stats: [state-dump] PushManager: [state-dump] - num pushes in flight: 0 [state-dump] - num chunks in flight: 0 [state-dump] - num chunks remaining: 0 [state-dump] - max chunks allowed: 409 [state-dump] OwnershipBasedObjectDirectory: [state-dump] - num listeners: 0 [state-dump] - cumulative location updates: 0 [state-dump] - num location updates per second: 0.000 [state-dump] - num location lookups per second: 0.000 [state-dump] - num locations added per second: 0.000 [state-dump] - num locations removed per second: 0.000 [state-dump] BufferPool: [state-dump] - create buffer state map size: 0 [state-dump] PullManager: [state-dump] - num bytes available for pulled objects: 2147483648 [state-dump] - num bytes being pulled (all): 0 [state-dump] - num bytes being pulled / pinned: 0 [state-dump] - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - first get request bundle: N/A [state-dump] - first wait request bundle: N/A [state-dump] - first task request bundle: N/A [state-dump] - num objects queued: 0 [state-dump] - num objects actively pulled (all): 0 [state-dump] - num objects actively pulled / pinned: 0 [state-dump] - num bundles being pulled: 0 [state-dump] - num pull retries: 0 [state-dump] - max timeout seconds: 0 [state-dump] - max timeout request is already processed. No entry. [state-dump] [state-dump] WorkerPool: [state-dump] - registered jobs: 1 [state-dump] - process_failed_job_config_missing: 0 [state-dump] - process_failed_rate_limited: 0 [state-dump] - process_failed_pending_registration: 0 [state-dump] - process_failed_runtime_env_setup_failed: 0 [state-dump] - num PYTHON workers: 20 [state-dump] - num PYTHON drivers: 1 [state-dump] - num PYTHON pending start requests: 0 [state-dump] - num PYTHON pending registration requests: 0 [state-dump] - num object spill callbacks queued: 0 [state-dump] - num object restore queued: 0 [state-dump] - num util functions queued: 0 [state-dump] - num idle workers: 20 [state-dump] TaskDependencyManager: [state-dump] - task deps map size: 0 [state-dump] - get req map size: 0 [state-dump] - wait req map size: 0 [state-dump] - local objects map size: 0 [state-dump] WaitManager: [state-dump] - num active wait requests: 0 [state-dump] Subscriber: [state-dump] Channel WORKER_OBJECT_LOCATIONS_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_OBJECT_EVICTION [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_REF_REMOVED_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] num async plasma notifications: 0 [state-dump] Remote node managers: [state-dump] Event stats: [state-dump] Global stats: 628286 total (35 active) [state-dump] Queueing time: mean = 178.919 us, max = 59.826 s, min = -0.001 s, total = 112.412 s [state-dump] Execution time: mean = 11.639 ms, total = 7312.647 s [state-dump] Event stats: [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog - 151158 total (0 active), Execution time: mean = 532.285 us, total = 80.459 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 151158 total (0 active), Execution time: mean = 35.520 us, total = 5.369 s, Queueing time: mean = 111.798 us, max = 3.855 ms, min = 1.846 us, total = 16.899 s [state-dump] RaySyncer.OnDemandBroadcasting - 71930 total (1 active), Execution time: mean = 11.127 us, total = 800.334 ms, Queueing time: mean = 92.986 us, max = 25.869 ms, min = 6.166 us, total = 6.688 s [state-dump] NodeManager.CheckGC - 71930 total (1 active), Execution time: mean = 3.750 us, total = 269.737 ms, Queueing time: mean = 99.498 us, max = 25.875 ms, min = -0.000 s, total = 7.157 s [state-dump] ObjectManager.UpdateAvailableMemory - 71930 total (0 active), Execution time: mean = 6.376 us, total = 458.590 ms, Queueing time: mean = 112.125 us, max = 45.939 ms, min = 2.228 us, total = 8.065 s [state-dump] RayletWorkerPool.deadline_timer.kill_idle_workers - 35985 total (1 active), Execution time: mean = 19.142 us, total = 688.841 ms, Queueing time: mean = 78.001 us, max = 26.386 ms, min = -0.001 s, total = 2.807 s [state-dump] MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 28741 total (1 active), Execution time: mean = 460.048 us, total = 13.222 s, Queueing time: mean = 76.587 us, max = 4.063 ms, min = -0.000 s, total = 2.201 s [state-dump] NodeManager.deadline_timer.flush_free_objects - 7200 total (1 active), Execution time: mean = 9.579 us, total = 68.970 ms, Queueing time: mean = 178.922 us, max = 2.947 ms, min = 2.735 us, total = 1.288 s [state-dump] NodeManager.ScheduleAndDispatchTasks - 7200 total (1 active), Execution time: mean = 17.570 us, total = 126.502 ms, Queueing time: mean = 76.173 us, max = 2.581 ms, min = 7.438 us, total = 548.444 ms [state-dump] NodeManager.deadline_timer.spill_objects_when_over_threshold - 7200 total (1 active), Execution time: mean = 3.170 us, total = 22.827 ms, Queueing time: mean = 183.258 us, max = 2.946 ms, min = 3.845 us, total = 1.319 s [state-dump] NodeManagerService.grpc_server.GetResourceLoad - 7198 total (0 active), Execution time: mean = 613.195 us, total = 4.414 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 7198 total (0 active), Execution time: mean = 97.762 us, total = 703.690 ms, Queueing time: mean = 113.536 us, max = 2.934 ms, min = 4.027 us, total = 817.230 ms [state-dump] ClusterResourceManager.ResetRemoteNodeView - 2401 total (1 active), Execution time: mean = 10.120 us, total = 24.298 ms, Queueing time: mean = 76.655 us, max = 564.135 us, min = 12.181 us, total = 184.050 ms [state-dump] NodeManager.deadline_timer.record_metrics - 1440 total (1 active), Execution time: mean = 585.227 us, total = 842.727 ms, Queueing time: mean = 346.050 us, max = 2.010 ms, min = 9.115 us, total = 498.312 ms [state-dump] NodeManager.GcsCheckAlive - 1440 total (1 active), Execution time: mean = 322.332 us, total = 464.158 ms, Queueing time: mean = 608.523 us, max = 2.567 ms, min = 6.690 us, total = 876.274 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 1440 total (0 active), Execution time: mean = 56.193 us, total = 80.918 ms, Queueing time: mean = 112.560 us, max = 4.779 ms, min = 11.561 us, total = 162.086 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 1440 total (0 active), Execution time: mean = 1.595 ms, total = 2.297 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.deadline_timer.debug_state_dump - 720 total (1 active), Execution time: mean = 1.788 ms, total = 1.287 s, Queueing time: mean = 72.795 us, max = 248.460 us, min = 11.609 us, total = 52.413 ms [state-dump] NodeManager.deadline_timer.print_event_loop_stats - 120 total (1 active, 1 running), Execution time: mean = 2.759 ms, total = 331.110 ms, Queueing time: mean = 76.229 us, max = 325.100 us, min = 9.635 us, total = 9.147 ms [state-dump] ClientConnection.async_read.ProcessMessageHeader - 95 total (21 active), Execution time: mean = 7.620 us, total = 723.861 us, Queueing time: mean = 660.815 ms, max = 59.826 s, min = 29.537 us, total = 62.777 s [state-dump] ClientConnection.async_read.ProcessMessage - 74 total (0 active), Execution time: mean = 752.373 us, total = 55.676 ms, Queueing time: mean = 18.898 us, max = 247.615 us, min = 4.453 us, total = 1.398 ms [state-dump] ClientConnection.async_write.DoAsyncWrites - 22 total (0 active), Execution time: mean = 2.129 us, total = 46.834 us, Queueing time: mean = 43.475 us, max = 264.443 us, min = 17.727 us, total = 956.459 us [state-dump] CoreWorkerService.grpc_client.LocalGC.OnReplyReceived - 21 total (0 active), Execution time: mean = 65.428 us, total = 1.374 ms, Queueing time: mean = 149.534 us, max = 223.485 us, min = 56.603 us, total = 3.140 ms [state-dump] ObjectManager.ObjectAdded - 21 total (0 active), Execution time: mean = 13.522 us, total = 283.961 us, Queueing time: mean = 115.200 us, max = 203.067 us, min = 15.853 us, total = 2.419 ms [state-dump] CoreWorkerService.grpc_client.LocalGC - 21 total (0 active), Execution time: mean = 93.349 ms, total = 1.960 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ObjectManager.ObjectDeleted - 21 total (0 active), Execution time: mean = 18.654 us, total = 391.728 us, Queueing time: mean = 66.932 us, max = 269.040 us, min = 24.280 us, total = 1.406 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 21 total (0 active), Execution time: mean = 119.783 us, total = 2.515 ms, Queueing time: mean = 119.709 us, max = 193.975 us, min = 26.529 us, total = 2.514 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig - 21 total (0 active), Execution time: mean = 800.451 us, total = 16.809 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 14 total (1 active), Execution time: mean = 514.115 s, total = 7197.613 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 13 total (0 active), Execution time: mean = 380.546 us, total = 4.947 ms, Queueing time: mean = 133.192 us, max = 279.406 us, min = 20.299 us, total = 1.731 ms [state-dump] PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 204.482 us, total = 2.658 ms, Queueing time: mean = 3.109 ms, max = 9.915 ms, min = 28.359 us, total = 40.413 ms [state-dump] RaySyncer.BroadcastMessage - 12 total (0 active), Execution time: mean = 229.790 us, total = 2.757 ms, Queueing time: mean = 796.500 ns, max = 1.330 us, min = 180.000 ns, total = 9.558 us [state-dump] - 12 total (0 active), Execution time: mean = 1.109 us, total = 13.308 us, Queueing time: mean = 117.235 us, max = 207.332 us, min = 32.817 us, total = 1.407 ms [state-dump] WorkerPool.PopWorkerCallback - 10 total (0 active), Execution time: mean = 24.488 us, total = 244.878 us, Queueing time: mean = 114.997 us, max = 206.869 us, min = 23.653 us, total = 1.150 ms [state-dump] NodeManagerService.grpc_server.RequestWorkerLease.HandleRequestImpl - 10 total (0 active), Execution time: mean = 133.400 us, total = 1.334 ms, Queueing time: mean = 133.654 us, max = 258.016 us, min = 50.206 us, total = 1.337 ms [state-dump] NodeManagerService.grpc_server.RequestWorkerLease - 10 total (0 active), Execution time: mean = 967.372 us, total = 9.674 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReturnWorker - 10 total (0 active), Execution time: mean = 698.937 us, total = 6.989 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReturnWorker.HandleRequestImpl - 10 total (0 active), Execution time: mean = 129.906 us, total = 1.299 ms, Queueing time: mean = 109.463 us, max = 137.712 us, min = 31.332 us, total = 1.095 ms [state-dump] NodeManager.GCTaskFailureReason - 9 total (1 active), Execution time: mean = 7.823 us, total = 70.403 us, Queueing time: mean = 60.810 us, max = 97.290 us, min = 24.344 us, total = 547.290 us [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 1.320 ms, total = 2.640 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] RaySyncerRegister - 2 total (0 active), Execution time: mean = 2.053 us, total = 4.106 us, Queueing time: mean = 222.000 ns, max = 369.000 ns, min = 75.000 ns, total = 444.000 ns [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 142.119 us, total = 284.237 us, Queueing time: mean = 702.200 us, max = 1.287 ms, min = 117.240 us, total = 1.404 ms [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 92.080 us, total = 92.080 us, Queueing time: mean = 36.492 us, max = 36.492 us, min = 36.492 us, total = 36.492 us [state-dump] Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 74.750 us, total = 74.750 us, Queueing time: mean = 225.429 us, max = 225.429 us, min = 225.429 us, total = 225.429 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 1.746 ms, total = 1.746 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 1.935 ms, total = 1.935 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 1.022 s, total = 1.022 s, Queueing time: mean = 126.779 us, max = 126.779 us, min = 126.779 us, total = 126.779 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 2.178 ms, total = 2.178 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 43.810 us, total = 43.810 us, Queueing time: mean = 116.080 us, max = 116.080 us, min = 116.080 us, total = 116.080 us [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 2.002 ms, total = 2.002 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 1.278 ms, total = 1.278 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 288.995 us, total = 288.995 us, Queueing time: mean = 122.868 us, max = 122.868 us, min = 122.868 us, total = 122.868 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 242.297 us, total = 242.297 us, Queueing time: mean = 122.977 us, max = 122.977 us, min = 122.977 us, total = 122.977 us [state-dump] DebugString() time ms: 2 [state-dump] [state-dump] [2025-01-21 01:02:54,939 I 13636 13664] (raylet) store.cc:564: Plasma store debug dump: Current usage: 0 / 2.14748 GB - num bytes created total: 168 0 pending objects of total size 0MB - objects spillable: 0 - bytes spillable: 0 - objects unsealed: 0 - bytes unsealed: 0 - objects in use: 0 - bytes in use: 0 - objects evictable: 0 - bytes evictable: 0 - objects created by worker: 0 - bytes created by worker: 0 - objects restored: 0 - bytes restored: 0 - objects received: 0 - bytes received: 0 - objects errored: 0 - bytes errored: 0 [2025-01-21 01:02:56,271 I 13636 13636] (raylet) node_manager.cc:525: [state-dump] NodeManager: [state-dump] Node ID: 94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3 [state-dump] Node name: 192.168.0.2 [state-dump] InitialConfigResources: {object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, GPU: 20000, node:__internal_head__: 10000, accelerator_type:A40: 10000, memory: 844966629380000} [state-dump] ClusterTaskManager: [state-dump] ========== Node: 94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3 ================= [state-dump] Infeasible queue length: 0 [state-dump] Schedule queue length: 0 [state-dump] Dispatch queue length: 0 [state-dump] num_waiting_for_resource: 0 [state-dump] num_waiting_for_plasma_memory: 0 [state-dump] num_waiting_for_remote_node_resources: 0 [state-dump] num_worker_not_started_by_job_config_not_exist: 0 [state-dump] num_worker_not_started_by_registration_timeout: 0 [state-dump] num_tasks_waiting_for_workers: 0 [state-dump] num_cancelled_tasks: 0 [state-dump] cluster_resource_scheduler state: [state-dump] Local id: -2729954820020547849 Local resources: {"total":{CPU: [200000], memory: [844966629380000], accelerator_type:A40: [10000], node:192.168.0.2: [10000], node:__internal_head__: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000]}}, "available": {CPU: [200000], memory: [844966629380000], accelerator_type:A40: [10000], node:192.168.0.2: [10000], node:__internal_head__: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000]}}, "labels":{"ray.io/node_id":"94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3",} is_draining: 0 is_idle: 1 Cluster resources: node id: -2729954820020547849{"total":{object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, GPU: 20000, node:__internal_head__: 10000, accelerator_type:A40: 10000, memory: 844966629380000}}, "available": {GPU: 20000, node:__internal_head__: 10000, memory: 844966629380000, accelerator_type:A40: 10000, object_store_memory: 21474836480000, node:192.168.0.2: 10000, CPU: 200000}}, "labels":{"ray.io/node_id":"94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} [state-dump] Waiting tasks size: 0 [state-dump] Number of executing tasks: 0 [state-dump] Number of pinned task arguments: 0 [state-dump] Number of total spilled tasks: 0 [state-dump] Number of spilled waiting tasks: 0 [state-dump] Number of spilled unschedulable tasks: 0 [state-dump] Resource usage { [state-dump] } [state-dump] Backlog Size per scheduling descriptor :{workerId: num backlogs}: [state-dump] [state-dump] Running tasks by scheduling class: [state-dump] ================================================== [state-dump] [state-dump] ClusterResources: [state-dump] LocalObjectManager: [state-dump] - num pinned objects: 0 [state-dump] - pinned objects size: 0 [state-dump] - num objects pending restore: 0 [state-dump] - num objects pending spill: 0 [state-dump] - num bytes pending spill: 0 [state-dump] - num bytes currently spilled: 0 [state-dump] - cumulative spill requests: 0 [state-dump] - cumulative restore requests: 0 [state-dump] - spilled objects pending delete: 0 [state-dump] [state-dump] ObjectManager: [state-dump] - num local objects: 0 [state-dump] - num unfulfilled push requests: 0 [state-dump] - num object pull requests: 0 [state-dump] - num chunks received total: 0 [state-dump] - num chunks received failed (all): 0 [state-dump] - num chunks received failed / cancelled: 0 [state-dump] - num chunks received failed / plasma error: 0 [state-dump] Event stats: [state-dump] Global stats: 0 total (0 active) [state-dump] Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Execution time: mean = -nan s, total = 0.000 s [state-dump] Event stats: [state-dump] PushManager: [state-dump] - num pushes in flight: 0 [state-dump] - num chunks in flight: 0 [state-dump] - num chunks remaining: 0 [state-dump] - max chunks allowed: 409 [state-dump] OwnershipBasedObjectDirectory: [state-dump] - num listeners: 0 [state-dump] - cumulative location updates: 0 [state-dump] - num location updates per second: 0.000 [state-dump] - num location lookups per second: 0.000 [state-dump] - num locations added per second: 0.000 [state-dump] - num locations removed per second: 0.000 [state-dump] BufferPool: [state-dump] - create buffer state map size: 0 [state-dump] PullManager: [state-dump] - num bytes available for pulled objects: 2147483648 [state-dump] - num bytes being pulled (all): 0 [state-dump] - num bytes being pulled / pinned: 0 [state-dump] - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - first get request bundle: N/A [state-dump] - first wait request bundle: N/A [state-dump] - first task request bundle: N/A [state-dump] - num objects queued: 0 [state-dump] - num objects actively pulled (all): 0 [state-dump] - num objects actively pulled / pinned: 0 [state-dump] - num bundles being pulled: 0 [state-dump] - num pull retries: 0 [state-dump] - max timeout seconds: 0 [state-dump] - max timeout request is already processed. No entry. [state-dump] [state-dump] WorkerPool: [state-dump] - registered jobs: 1 [state-dump] - process_failed_job_config_missing: 0 [state-dump] - process_failed_rate_limited: 0 [state-dump] - process_failed_pending_registration: 0 [state-dump] - process_failed_runtime_env_setup_failed: 0 [state-dump] - num PYTHON workers: 20 [state-dump] - num PYTHON drivers: 1 [state-dump] - num PYTHON pending start requests: 0 [state-dump] - num PYTHON pending registration requests: 0 [state-dump] - num object spill callbacks queued: 0 [state-dump] - num object restore queued: 0 [state-dump] - num util functions queued: 0 [state-dump] - num idle workers: 20 [state-dump] TaskDependencyManager: [state-dump] - task deps map size: 0 [state-dump] - get req map size: 0 [state-dump] - wait req map size: 0 [state-dump] - local objects map size: 0 [state-dump] WaitManager: [state-dump] - num active wait requests: 0 [state-dump] Subscriber: [state-dump] Channel WORKER_OBJECT_LOCATIONS_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_OBJECT_EVICTION [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_REF_REMOVED_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] num async plasma notifications: 0 [state-dump] Remote node managers: [state-dump] Event stats: [state-dump] Global stats: 633518 total (35 active) [state-dump] Queueing time: mean = 178.091 us, max = 59.826 s, min = -0.001 s, total = 112.824 s [state-dump] Execution time: mean = 11.544 ms, total = 7313.570 s [state-dump] Event stats: [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog - 152418 total (0 active), Execution time: mean = 532.223 us, total = 81.120 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 152418 total (0 active), Execution time: mean = 35.498 us, total = 5.411 s, Queueing time: mean = 111.782 us, max = 3.855 ms, min = 1.846 us, total = 17.038 s [state-dump] RaySyncer.OnDemandBroadcasting - 72530 total (1 active), Execution time: mean = 11.127 us, total = 807.040 ms, Queueing time: mean = 92.965 us, max = 25.869 ms, min = 6.166 us, total = 6.743 s [state-dump] NodeManager.CheckGC - 72530 total (1 active), Execution time: mean = 3.744 us, total = 271.551 ms, Queueing time: mean = 99.483 us, max = 25.875 ms, min = -0.000 s, total = 7.216 s [state-dump] ObjectManager.UpdateAvailableMemory - 72530 total (0 active), Execution time: mean = 6.377 us, total = 462.555 ms, Queueing time: mean = 112.131 us, max = 45.939 ms, min = 2.228 us, total = 8.133 s [state-dump] RayletWorkerPool.deadline_timer.kill_idle_workers - 36285 total (1 active), Execution time: mean = 19.154 us, total = 694.986 ms, Queueing time: mean = 78.024 us, max = 26.386 ms, min = -0.001 s, total = 2.831 s [state-dump] MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 28980 total (1 active), Execution time: mean = 460.083 us, total = 13.333 s, Queueing time: mean = 76.563 us, max = 4.063 ms, min = -0.000 s, total = 2.219 s [state-dump] NodeManager.ScheduleAndDispatchTasks - 7260 total (1 active), Execution time: mean = 17.581 us, total = 127.637 ms, Queueing time: mean = 76.151 us, max = 2.581 ms, min = 7.438 us, total = 552.856 ms [state-dump] NodeManager.deadline_timer.flush_free_objects - 7259 total (1 active), Execution time: mean = 9.586 us, total = 69.584 ms, Queueing time: mean = 179.014 us, max = 2.947 ms, min = 2.735 us, total = 1.299 s [state-dump] NodeManager.deadline_timer.spill_objects_when_over_threshold - 7259 total (1 active), Execution time: mean = 3.170 us, total = 23.010 ms, Queueing time: mean = 183.352 us, max = 2.946 ms, min = 3.845 us, total = 1.331 s [state-dump] NodeManagerService.grpc_server.GetResourceLoad - 7258 total (0 active), Execution time: mean = 613.371 us, total = 4.452 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 7258 total (0 active), Execution time: mean = 97.745 us, total = 709.435 ms, Queueing time: mean = 113.651 us, max = 2.934 ms, min = 4.027 us, total = 824.880 ms [state-dump] ClusterResourceManager.ResetRemoteNodeView - 2421 total (1 active), Execution time: mean = 10.100 us, total = 24.452 ms, Queueing time: mean = 76.656 us, max = 564.135 us, min = 12.181 us, total = 185.585 ms [state-dump] NodeManager.deadline_timer.record_metrics - 1452 total (1 active), Execution time: mean = 585.366 us, total = 849.951 ms, Queueing time: mean = 346.285 us, max = 2.010 ms, min = 9.115 us, total = 502.805 ms [state-dump] NodeManager.GcsCheckAlive - 1452 total (1 active), Execution time: mean = 322.246 us, total = 467.901 ms, Queueing time: mean = 608.963 us, max = 2.567 ms, min = 6.690 us, total = 884.215 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 1452 total (0 active), Execution time: mean = 56.174 us, total = 81.565 ms, Queueing time: mean = 112.523 us, max = 4.779 ms, min = 11.561 us, total = 163.384 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 1452 total (0 active), Execution time: mean = 1.594 ms, total = 2.315 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.deadline_timer.debug_state_dump - 726 total (1 active), Execution time: mean = 1.788 ms, total = 1.298 s, Queueing time: mean = 72.869 us, max = 248.460 us, min = 11.609 us, total = 52.903 ms [state-dump] NodeManager.deadline_timer.print_event_loop_stats - 121 total (1 active, 1 running), Execution time: mean = 2.765 ms, total = 334.579 ms, Queueing time: mean = 76.142 us, max = 325.100 us, min = 9.635 us, total = 9.213 ms [state-dump] ClientConnection.async_read.ProcessMessageHeader - 95 total (21 active), Execution time: mean = 7.620 us, total = 723.861 us, Queueing time: mean = 660.815 ms, max = 59.826 s, min = 29.537 us, total = 62.777 s [state-dump] ClientConnection.async_read.ProcessMessage - 74 total (0 active), Execution time: mean = 752.373 us, total = 55.676 ms, Queueing time: mean = 18.898 us, max = 247.615 us, min = 4.453 us, total = 1.398 ms [state-dump] ClientConnection.async_write.DoAsyncWrites - 22 total (0 active), Execution time: mean = 2.129 us, total = 46.834 us, Queueing time: mean = 43.475 us, max = 264.443 us, min = 17.727 us, total = 956.459 us [state-dump] CoreWorkerService.grpc_client.LocalGC.OnReplyReceived - 21 total (0 active), Execution time: mean = 65.428 us, total = 1.374 ms, Queueing time: mean = 149.534 us, max = 223.485 us, min = 56.603 us, total = 3.140 ms [state-dump] ObjectManager.ObjectAdded - 21 total (0 active), Execution time: mean = 13.522 us, total = 283.961 us, Queueing time: mean = 115.200 us, max = 203.067 us, min = 15.853 us, total = 2.419 ms [state-dump] CoreWorkerService.grpc_client.LocalGC - 21 total (0 active), Execution time: mean = 93.349 ms, total = 1.960 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ObjectManager.ObjectDeleted - 21 total (0 active), Execution time: mean = 18.654 us, total = 391.728 us, Queueing time: mean = 66.932 us, max = 269.040 us, min = 24.280 us, total = 1.406 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 21 total (0 active), Execution time: mean = 119.783 us, total = 2.515 ms, Queueing time: mean = 119.709 us, max = 193.975 us, min = 26.529 us, total = 2.514 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig - 21 total (0 active), Execution time: mean = 800.451 us, total = 16.809 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 14 total (1 active), Execution time: mean = 514.115 s, total = 7197.613 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 13 total (0 active), Execution time: mean = 380.546 us, total = 4.947 ms, Queueing time: mean = 133.192 us, max = 279.406 us, min = 20.299 us, total = 1.731 ms [state-dump] PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 204.482 us, total = 2.658 ms, Queueing time: mean = 3.109 ms, max = 9.915 ms, min = 28.359 us, total = 40.413 ms [state-dump] RaySyncer.BroadcastMessage - 12 total (0 active), Execution time: mean = 229.790 us, total = 2.757 ms, Queueing time: mean = 796.500 ns, max = 1.330 us, min = 180.000 ns, total = 9.558 us [state-dump] - 12 total (0 active), Execution time: mean = 1.109 us, total = 13.308 us, Queueing time: mean = 117.235 us, max = 207.332 us, min = 32.817 us, total = 1.407 ms [state-dump] WorkerPool.PopWorkerCallback - 10 total (0 active), Execution time: mean = 24.488 us, total = 244.878 us, Queueing time: mean = 114.997 us, max = 206.869 us, min = 23.653 us, total = 1.150 ms [state-dump] NodeManagerService.grpc_server.RequestWorkerLease.HandleRequestImpl - 10 total (0 active), Execution time: mean = 133.400 us, total = 1.334 ms, Queueing time: mean = 133.654 us, max = 258.016 us, min = 50.206 us, total = 1.337 ms [state-dump] NodeManagerService.grpc_server.RequestWorkerLease - 10 total (0 active), Execution time: mean = 967.372 us, total = 9.674 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReturnWorker - 10 total (0 active), Execution time: mean = 698.937 us, total = 6.989 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReturnWorker.HandleRequestImpl - 10 total (0 active), Execution time: mean = 129.906 us, total = 1.299 ms, Queueing time: mean = 109.463 us, max = 137.712 us, min = 31.332 us, total = 1.095 ms [state-dump] NodeManager.GCTaskFailureReason - 9 total (1 active), Execution time: mean = 7.823 us, total = 70.403 us, Queueing time: mean = 60.810 us, max = 97.290 us, min = 24.344 us, total = 547.290 us [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 1.320 ms, total = 2.640 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] RaySyncerRegister - 2 total (0 active), Execution time: mean = 2.053 us, total = 4.106 us, Queueing time: mean = 222.000 ns, max = 369.000 ns, min = 75.000 ns, total = 444.000 ns [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 142.119 us, total = 284.237 us, Queueing time: mean = 702.200 us, max = 1.287 ms, min = 117.240 us, total = 1.404 ms [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 92.080 us, total = 92.080 us, Queueing time: mean = 36.492 us, max = 36.492 us, min = 36.492 us, total = 36.492 us [state-dump] Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 74.750 us, total = 74.750 us, Queueing time: mean = 225.429 us, max = 225.429 us, min = 225.429 us, total = 225.429 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 1.746 ms, total = 1.746 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 1.935 ms, total = 1.935 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 1.022 s, total = 1.022 s, Queueing time: mean = 126.779 us, max = 126.779 us, min = 126.779 us, total = 126.779 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 2.178 ms, total = 2.178 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 43.810 us, total = 43.810 us, Queueing time: mean = 116.080 us, max = 116.080 us, min = 116.080 us, total = 116.080 us [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 2.002 ms, total = 2.002 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 1.278 ms, total = 1.278 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 288.995 us, total = 288.995 us, Queueing time: mean = 122.868 us, max = 122.868 us, min = 122.868 us, total = 122.868 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 242.297 us, total = 242.297 us, Queueing time: mean = 122.977 us, max = 122.977 us, min = 122.977 us, total = 122.977 us [state-dump] DebugString() time ms: 1 [state-dump] [state-dump] [2025-01-21 01:03:54,940 I 13636 13664] (raylet) store.cc:564: Plasma store debug dump: Current usage: 0 / 2.14748 GB - num bytes created total: 168 0 pending objects of total size 0MB - objects spillable: 0 - bytes spillable: 0 - objects unsealed: 0 - bytes unsealed: 0 - objects in use: 0 - bytes in use: 0 - objects evictable: 0 - bytes evictable: 0 - objects created by worker: 0 - bytes created by worker: 0 - objects restored: 0 - bytes restored: 0 - objects received: 0 - bytes received: 0 - objects errored: 0 - bytes errored: 0 [2025-01-21 01:03:56,273 I 13636 13636] (raylet) node_manager.cc:525: [state-dump] NodeManager: [state-dump] Node ID: 94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3 [state-dump] Node name: 192.168.0.2 [state-dump] InitialConfigResources: {object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, GPU: 20000, node:__internal_head__: 10000, accelerator_type:A40: 10000, memory: 844966629380000} [state-dump] ClusterTaskManager: [state-dump] ========== Node: 94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3 ================= [state-dump] Infeasible queue length: 0 [state-dump] Schedule queue length: 0 [state-dump] Dispatch queue length: 0 [state-dump] num_waiting_for_resource: 0 [state-dump] num_waiting_for_plasma_memory: 0 [state-dump] num_waiting_for_remote_node_resources: 0 [state-dump] num_worker_not_started_by_job_config_not_exist: 0 [state-dump] num_worker_not_started_by_registration_timeout: 0 [state-dump] num_tasks_waiting_for_workers: 0 [state-dump] num_cancelled_tasks: 0 [state-dump] cluster_resource_scheduler state: [state-dump] Local id: -2729954820020547849 Local resources: {"total":{CPU: [200000], memory: [844966629380000], accelerator_type:A40: [10000], node:192.168.0.2: [10000], node:__internal_head__: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000]}}, "available": {CPU: [200000], memory: [844966629380000], accelerator_type:A40: [10000], node:192.168.0.2: [10000], node:__internal_head__: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000]}}, "labels":{"ray.io/node_id":"94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3",} is_draining: 0 is_idle: 1 Cluster resources: node id: -2729954820020547849{"total":{object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, GPU: 20000, node:__internal_head__: 10000, accelerator_type:A40: 10000, memory: 844966629380000}}, "available": {GPU: 20000, node:__internal_head__: 10000, memory: 844966629380000, accelerator_type:A40: 10000, object_store_memory: 21474836480000, node:192.168.0.2: 10000, CPU: 200000}}, "labels":{"ray.io/node_id":"94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} [state-dump] Waiting tasks size: 0 [state-dump] Number of executing tasks: 0 [state-dump] Number of pinned task arguments: 0 [state-dump] Number of total spilled tasks: 0 [state-dump] Number of spilled waiting tasks: 0 [state-dump] Number of spilled unschedulable tasks: 0 [state-dump] Resource usage { [state-dump] } [state-dump] Backlog Size per scheduling descriptor :{workerId: num backlogs}: [state-dump] [state-dump] Running tasks by scheduling class: [state-dump] ================================================== [state-dump] [state-dump] ClusterResources: [state-dump] LocalObjectManager: [state-dump] - num pinned objects: 0 [state-dump] - pinned objects size: 0 [state-dump] - num objects pending restore: 0 [state-dump] - num objects pending spill: 0 [state-dump] - num bytes pending spill: 0 [state-dump] - num bytes currently spilled: 0 [state-dump] - cumulative spill requests: 0 [state-dump] - cumulative restore requests: 0 [state-dump] - spilled objects pending delete: 0 [state-dump] [state-dump] ObjectManager: [state-dump] - num local objects: 0 [state-dump] - num unfulfilled push requests: 0 [state-dump] - num object pull requests: 0 [state-dump] - num chunks received total: 0 [state-dump] - num chunks received failed (all): 0 [state-dump] - num chunks received failed / cancelled: 0 [state-dump] - num chunks received failed / plasma error: 0 [state-dump] Event stats: [state-dump] Global stats: 0 total (0 active) [state-dump] Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Execution time: mean = -nan s, total = 0.000 s [state-dump] Event stats: [state-dump] PushManager: [state-dump] - num pushes in flight: 0 [state-dump] - num chunks in flight: 0 [state-dump] - num chunks remaining: 0 [state-dump] - max chunks allowed: 409 [state-dump] OwnershipBasedObjectDirectory: [state-dump] - num listeners: 0 [state-dump] - cumulative location updates: 0 [state-dump] - num location updates per second: 0.000 [state-dump] - num location lookups per second: 0.000 [state-dump] - num locations added per second: 0.000 [state-dump] - num locations removed per second: 0.000 [state-dump] BufferPool: [state-dump] - create buffer state map size: 0 [state-dump] PullManager: [state-dump] - num bytes available for pulled objects: 2147483648 [state-dump] - num bytes being pulled (all): 0 [state-dump] - num bytes being pulled / pinned: 0 [state-dump] - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - first get request bundle: N/A [state-dump] - first wait request bundle: N/A [state-dump] - first task request bundle: N/A [state-dump] - num objects queued: 0 [state-dump] - num objects actively pulled (all): 0 [state-dump] - num objects actively pulled / pinned: 0 [state-dump] - num bundles being pulled: 0 [state-dump] - num pull retries: 0 [state-dump] - max timeout seconds: 0 [state-dump] - max timeout request is already processed. No entry. [state-dump] [state-dump] WorkerPool: [state-dump] - registered jobs: 1 [state-dump] - process_failed_job_config_missing: 0 [state-dump] - process_failed_rate_limited: 0 [state-dump] - process_failed_pending_registration: 0 [state-dump] - process_failed_runtime_env_setup_failed: 0 [state-dump] - num PYTHON workers: 20 [state-dump] - num PYTHON drivers: 1 [state-dump] - num PYTHON pending start requests: 0 [state-dump] - num PYTHON pending registration requests: 0 [state-dump] - num object spill callbacks queued: 0 [state-dump] - num object restore queued: 0 [state-dump] - num util functions queued: 0 [state-dump] - num idle workers: 20 [state-dump] TaskDependencyManager: [state-dump] - task deps map size: 0 [state-dump] - get req map size: 0 [state-dump] - wait req map size: 0 [state-dump] - local objects map size: 0 [state-dump] WaitManager: [state-dump] - num active wait requests: 0 [state-dump] Subscriber: [state-dump] Channel WORKER_OBJECT_LOCATIONS_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_OBJECT_EVICTION [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_REF_REMOVED_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] num async plasma notifications: 0 [state-dump] Remote node managers: [state-dump] Event stats: [state-dump] Global stats: 638750 total (35 active) [state-dump] Queueing time: mean = 177.295 us, max = 59.826 s, min = -0.001 s, total = 113.247 s [state-dump] Execution time: mean = 11.451 ms, total = 7314.526 s [state-dump] Event stats: [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog - 153678 total (0 active), Execution time: mean = 532.359 us, total = 81.812 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 153678 total (0 active), Execution time: mean = 35.493 us, total = 5.455 s, Queueing time: mean = 111.812 us, max = 3.855 ms, min = 1.846 us, total = 17.183 s [state-dump] RaySyncer.OnDemandBroadcasting - 73129 total (1 active), Execution time: mean = 11.135 us, total = 814.311 ms, Queueing time: mean = 92.974 us, max = 25.869 ms, min = 6.166 us, total = 6.799 s [state-dump] NodeManager.CheckGC - 73129 total (1 active), Execution time: mean = 3.738 us, total = 273.374 ms, Queueing time: mean = 99.505 us, max = 25.875 ms, min = -0.000 s, total = 7.277 s [state-dump] ObjectManager.UpdateAvailableMemory - 73129 total (0 active), Execution time: mean = 6.382 us, total = 466.690 ms, Queueing time: mean = 112.144 us, max = 45.939 ms, min = 2.228 us, total = 8.201 s [state-dump] RayletWorkerPool.deadline_timer.kill_idle_workers - 36585 total (1 active), Execution time: mean = 19.155 us, total = 700.792 ms, Queueing time: mean = 78.030 us, max = 26.386 ms, min = -0.001 s, total = 2.855 s [state-dump] MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 29220 total (1 active), Execution time: mean = 460.093 us, total = 13.444 s, Queueing time: mean = 76.572 us, max = 4.063 ms, min = -0.000 s, total = 2.237 s [state-dump] NodeManager.ScheduleAndDispatchTasks - 7320 total (1 active), Execution time: mean = 17.591 us, total = 128.767 ms, Queueing time: mean = 76.157 us, max = 2.581 ms, min = 7.438 us, total = 557.469 ms [state-dump] NodeManager.deadline_timer.flush_free_objects - 7319 total (1 active), Execution time: mean = 9.596 us, total = 70.233 ms, Queueing time: mean = 179.065 us, max = 2.947 ms, min = 2.735 us, total = 1.311 s [state-dump] NodeManager.deadline_timer.spill_objects_when_over_threshold - 7319 total (1 active), Execution time: mean = 3.170 us, total = 23.204 ms, Queueing time: mean = 183.409 us, max = 2.946 ms, min = 3.845 us, total = 1.342 s [state-dump] NodeManagerService.grpc_server.GetResourceLoad - 7318 total (0 active), Execution time: mean = 613.604 us, total = 4.490 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 7318 total (0 active), Execution time: mean = 97.748 us, total = 715.322 ms, Queueing time: mean = 113.682 us, max = 2.934 ms, min = 4.027 us, total = 831.924 ms [state-dump] ClusterResourceManager.ResetRemoteNodeView - 2441 total (1 active), Execution time: mean = 10.097 us, total = 24.648 ms, Queueing time: mean = 76.638 us, max = 564.135 us, min = 12.181 us, total = 187.072 ms [state-dump] NodeManager.deadline_timer.record_metrics - 1464 total (1 active), Execution time: mean = 585.150 us, total = 856.660 ms, Queueing time: mean = 346.777 us, max = 2.010 ms, min = 9.115 us, total = 507.681 ms [state-dump] NodeManager.GcsCheckAlive - 1464 total (1 active), Execution time: mean = 322.426 us, total = 472.032 ms, Queueing time: mean = 609.092 us, max = 2.567 ms, min = 6.690 us, total = 891.710 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 1464 total (0 active), Execution time: mean = 56.202 us, total = 82.280 ms, Queueing time: mean = 112.568 us, max = 4.779 ms, min = 11.561 us, total = 164.799 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 1464 total (0 active), Execution time: mean = 1.594 ms, total = 2.334 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.deadline_timer.debug_state_dump - 732 total (1 active), Execution time: mean = 1.789 ms, total = 1.309 s, Queueing time: mean = 72.998 us, max = 248.460 us, min = 11.609 us, total = 53.434 ms [state-dump] NodeManager.deadline_timer.print_event_loop_stats - 122 total (1 active, 1 running), Execution time: mean = 2.767 ms, total = 337.562 ms, Queueing time: mean = 76.113 us, max = 325.100 us, min = 9.635 us, total = 9.286 ms [state-dump] ClientConnection.async_read.ProcessMessageHeader - 95 total (21 active), Execution time: mean = 7.620 us, total = 723.861 us, Queueing time: mean = 660.815 ms, max = 59.826 s, min = 29.537 us, total = 62.777 s [state-dump] ClientConnection.async_read.ProcessMessage - 74 total (0 active), Execution time: mean = 752.373 us, total = 55.676 ms, Queueing time: mean = 18.898 us, max = 247.615 us, min = 4.453 us, total = 1.398 ms [state-dump] ClientConnection.async_write.DoAsyncWrites - 22 total (0 active), Execution time: mean = 2.129 us, total = 46.834 us, Queueing time: mean = 43.475 us, max = 264.443 us, min = 17.727 us, total = 956.459 us [state-dump] CoreWorkerService.grpc_client.LocalGC.OnReplyReceived - 21 total (0 active), Execution time: mean = 65.428 us, total = 1.374 ms, Queueing time: mean = 149.534 us, max = 223.485 us, min = 56.603 us, total = 3.140 ms [state-dump] ObjectManager.ObjectAdded - 21 total (0 active), Execution time: mean = 13.522 us, total = 283.961 us, Queueing time: mean = 115.200 us, max = 203.067 us, min = 15.853 us, total = 2.419 ms [state-dump] CoreWorkerService.grpc_client.LocalGC - 21 total (0 active), Execution time: mean = 93.349 ms, total = 1.960 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ObjectManager.ObjectDeleted - 21 total (0 active), Execution time: mean = 18.654 us, total = 391.728 us, Queueing time: mean = 66.932 us, max = 269.040 us, min = 24.280 us, total = 1.406 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 21 total (0 active), Execution time: mean = 119.783 us, total = 2.515 ms, Queueing time: mean = 119.709 us, max = 193.975 us, min = 26.529 us, total = 2.514 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig - 21 total (0 active), Execution time: mean = 800.451 us, total = 16.809 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 14 total (1 active), Execution time: mean = 514.115 s, total = 7197.613 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 13 total (0 active), Execution time: mean = 380.546 us, total = 4.947 ms, Queueing time: mean = 133.192 us, max = 279.406 us, min = 20.299 us, total = 1.731 ms [state-dump] PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 204.482 us, total = 2.658 ms, Queueing time: mean = 3.109 ms, max = 9.915 ms, min = 28.359 us, total = 40.413 ms [state-dump] RaySyncer.BroadcastMessage - 12 total (0 active), Execution time: mean = 229.790 us, total = 2.757 ms, Queueing time: mean = 796.500 ns, max = 1.330 us, min = 180.000 ns, total = 9.558 us [state-dump] - 12 total (0 active), Execution time: mean = 1.109 us, total = 13.308 us, Queueing time: mean = 117.235 us, max = 207.332 us, min = 32.817 us, total = 1.407 ms [state-dump] WorkerPool.PopWorkerCallback - 10 total (0 active), Execution time: mean = 24.488 us, total = 244.878 us, Queueing time: mean = 114.997 us, max = 206.869 us, min = 23.653 us, total = 1.150 ms [state-dump] NodeManagerService.grpc_server.RequestWorkerLease.HandleRequestImpl - 10 total (0 active), Execution time: mean = 133.400 us, total = 1.334 ms, Queueing time: mean = 133.654 us, max = 258.016 us, min = 50.206 us, total = 1.337 ms [state-dump] NodeManagerService.grpc_server.RequestWorkerLease - 10 total (0 active), Execution time: mean = 967.372 us, total = 9.674 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReturnWorker - 10 total (0 active), Execution time: mean = 698.937 us, total = 6.989 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReturnWorker.HandleRequestImpl - 10 total (0 active), Execution time: mean = 129.906 us, total = 1.299 ms, Queueing time: mean = 109.463 us, max = 137.712 us, min = 31.332 us, total = 1.095 ms [state-dump] NodeManager.GCTaskFailureReason - 9 total (1 active), Execution time: mean = 7.823 us, total = 70.403 us, Queueing time: mean = 60.810 us, max = 97.290 us, min = 24.344 us, total = 547.290 us [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 1.320 ms, total = 2.640 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] RaySyncerRegister - 2 total (0 active), Execution time: mean = 2.053 us, total = 4.106 us, Queueing time: mean = 222.000 ns, max = 369.000 ns, min = 75.000 ns, total = 444.000 ns [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 142.119 us, total = 284.237 us, Queueing time: mean = 702.200 us, max = 1.287 ms, min = 117.240 us, total = 1.404 ms [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 92.080 us, total = 92.080 us, Queueing time: mean = 36.492 us, max = 36.492 us, min = 36.492 us, total = 36.492 us [state-dump] Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 74.750 us, total = 74.750 us, Queueing time: mean = 225.429 us, max = 225.429 us, min = 225.429 us, total = 225.429 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 1.746 ms, total = 1.746 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 1.935 ms, total = 1.935 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 1.022 s, total = 1.022 s, Queueing time: mean = 126.779 us, max = 126.779 us, min = 126.779 us, total = 126.779 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 2.178 ms, total = 2.178 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 43.810 us, total = 43.810 us, Queueing time: mean = 116.080 us, max = 116.080 us, min = 116.080 us, total = 116.080 us [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 2.002 ms, total = 2.002 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 1.278 ms, total = 1.278 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 288.995 us, total = 288.995 us, Queueing time: mean = 122.868 us, max = 122.868 us, min = 122.868 us, total = 122.868 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 242.297 us, total = 242.297 us, Queueing time: mean = 122.977 us, max = 122.977 us, min = 122.977 us, total = 122.977 us [state-dump] DebugString() time ms: 1 [state-dump] [state-dump] [2025-01-21 01:04:54,940 I 13636 13664] (raylet) store.cc:564: Plasma store debug dump: Current usage: 0 / 2.14748 GB - num bytes created total: 168 0 pending objects of total size 0MB - objects spillable: 0 - bytes spillable: 0 - objects unsealed: 0 - bytes unsealed: 0 - objects in use: 0 - bytes in use: 0 - objects evictable: 0 - bytes evictable: 0 - objects created by worker: 0 - bytes created by worker: 0 - objects restored: 0 - bytes restored: 0 - objects received: 0 - bytes received: 0 - objects errored: 0 - bytes errored: 0 [2025-01-21 01:04:56,276 I 13636 13636] (raylet) node_manager.cc:525: [state-dump] NodeManager: [state-dump] Node ID: 94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3 [state-dump] Node name: 192.168.0.2 [state-dump] InitialConfigResources: {object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, GPU: 20000, node:__internal_head__: 10000, accelerator_type:A40: 10000, memory: 844966629380000} [state-dump] ClusterTaskManager: [state-dump] ========== Node: 94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3 ================= [state-dump] Infeasible queue length: 0 [state-dump] Schedule queue length: 0 [state-dump] Dispatch queue length: 0 [state-dump] num_waiting_for_resource: 0 [state-dump] num_waiting_for_plasma_memory: 0 [state-dump] num_waiting_for_remote_node_resources: 0 [state-dump] num_worker_not_started_by_job_config_not_exist: 0 [state-dump] num_worker_not_started_by_registration_timeout: 0 [state-dump] num_tasks_waiting_for_workers: 0 [state-dump] num_cancelled_tasks: 0 [state-dump] cluster_resource_scheduler state: [state-dump] Local id: -2729954820020547849 Local resources: {"total":{CPU: [200000], memory: [844966629380000], accelerator_type:A40: [10000], node:192.168.0.2: [10000], node:__internal_head__: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000]}}, "available": {CPU: [200000], memory: [844966629380000], accelerator_type:A40: [10000], node:192.168.0.2: [10000], node:__internal_head__: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000]}}, "labels":{"ray.io/node_id":"94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3",} is_draining: 0 is_idle: 1 Cluster resources: node id: -2729954820020547849{"total":{object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, GPU: 20000, node:__internal_head__: 10000, accelerator_type:A40: 10000, memory: 844966629380000}}, "available": {GPU: 20000, node:__internal_head__: 10000, memory: 844966629380000, accelerator_type:A40: 10000, object_store_memory: 21474836480000, node:192.168.0.2: 10000, CPU: 200000}}, "labels":{"ray.io/node_id":"94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} [state-dump] Waiting tasks size: 0 [state-dump] Number of executing tasks: 0 [state-dump] Number of pinned task arguments: 0 [state-dump] Number of total spilled tasks: 0 [state-dump] Number of spilled waiting tasks: 0 [state-dump] Number of spilled unschedulable tasks: 0 [state-dump] Resource usage { [state-dump] } [state-dump] Backlog Size per scheduling descriptor :{workerId: num backlogs}: [state-dump] [state-dump] Running tasks by scheduling class: [state-dump] ================================================== [state-dump] [state-dump] ClusterResources: [state-dump] LocalObjectManager: [state-dump] - num pinned objects: 0 [state-dump] - pinned objects size: 0 [state-dump] - num objects pending restore: 0 [state-dump] - num objects pending spill: 0 [state-dump] - num bytes pending spill: 0 [state-dump] - num bytes currently spilled: 0 [state-dump] - cumulative spill requests: 0 [state-dump] - cumulative restore requests: 0 [state-dump] - spilled objects pending delete: 0 [state-dump] [state-dump] ObjectManager: [state-dump] - num local objects: 0 [state-dump] - num unfulfilled push requests: 0 [state-dump] - num object pull requests: 0 [state-dump] - num chunks received total: 0 [state-dump] - num chunks received failed (all): 0 [state-dump] - num chunks received failed / cancelled: 0 [state-dump] - num chunks received failed / plasma error: 0 [state-dump] Event stats: [state-dump] Global stats: 0 total (0 active) [state-dump] Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Execution time: mean = -nan s, total = 0.000 s [state-dump] Event stats: [state-dump] PushManager: [state-dump] - num pushes in flight: 0 [state-dump] - num chunks in flight: 0 [state-dump] - num chunks remaining: 0 [state-dump] - max chunks allowed: 409 [state-dump] OwnershipBasedObjectDirectory: [state-dump] - num listeners: 0 [state-dump] - cumulative location updates: 0 [state-dump] - num location updates per second: 0.000 [state-dump] - num location lookups per second: 0.000 [state-dump] - num locations added per second: 0.000 [state-dump] - num locations removed per second: 0.000 [state-dump] BufferPool: [state-dump] - create buffer state map size: 0 [state-dump] PullManager: [state-dump] - num bytes available for pulled objects: 2147483648 [state-dump] - num bytes being pulled (all): 0 [state-dump] - num bytes being pulled / pinned: 0 [state-dump] - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - first get request bundle: N/A [state-dump] - first wait request bundle: N/A [state-dump] - first task request bundle: N/A [state-dump] - num objects queued: 0 [state-dump] - num objects actively pulled (all): 0 [state-dump] - num objects actively pulled / pinned: 0 [state-dump] - num bundles being pulled: 0 [state-dump] - num pull retries: 0 [state-dump] - max timeout seconds: 0 [state-dump] - max timeout request is already processed. No entry. [state-dump] [state-dump] WorkerPool: [state-dump] - registered jobs: 1 [state-dump] - process_failed_job_config_missing: 0 [state-dump] - process_failed_rate_limited: 0 [state-dump] - process_failed_pending_registration: 0 [state-dump] - process_failed_runtime_env_setup_failed: 0 [state-dump] - num PYTHON workers: 20 [state-dump] - num PYTHON drivers: 1 [state-dump] - num PYTHON pending start requests: 0 [state-dump] - num PYTHON pending registration requests: 0 [state-dump] - num object spill callbacks queued: 0 [state-dump] - num object restore queued: 0 [state-dump] - num util functions queued: 0 [state-dump] - num idle workers: 20 [state-dump] TaskDependencyManager: [state-dump] - task deps map size: 0 [state-dump] - get req map size: 0 [state-dump] - wait req map size: 0 [state-dump] - local objects map size: 0 [state-dump] WaitManager: [state-dump] - num active wait requests: 0 [state-dump] Subscriber: [state-dump] Channel WORKER_OBJECT_LOCATIONS_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_OBJECT_EVICTION [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_REF_REMOVED_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] num async plasma notifications: 0 [state-dump] Remote node managers: [state-dump] Event stats: [state-dump] Global stats: 643981 total (35 active) [state-dump] Queueing time: mean = 176.529 us, max = 59.826 s, min = -0.001 s, total = 113.681 s [state-dump] Execution time: mean = 11.360 ms, total = 7315.478 s [state-dump] Event stats: [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog - 154938 total (0 active), Execution time: mean = 532.442 us, total = 82.495 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 154938 total (0 active), Execution time: mean = 35.504 us, total = 5.501 s, Queueing time: mean = 111.857 us, max = 3.855 ms, min = 1.846 us, total = 17.331 s [state-dump] RaySyncer.OnDemandBroadcasting - 73728 total (1 active), Execution time: mean = 11.147 us, total = 821.872 ms, Queueing time: mean = 93.033 us, max = 25.869 ms, min = 6.166 us, total = 6.859 s [state-dump] NodeManager.CheckGC - 73728 total (1 active), Execution time: mean = 3.733 us, total = 275.258 ms, Queueing time: mean = 99.579 us, max = 25.875 ms, min = -0.000 s, total = 7.342 s [state-dump] ObjectManager.UpdateAvailableMemory - 73728 total (0 active), Execution time: mean = 6.388 us, total = 470.969 ms, Queueing time: mean = 112.176 us, max = 45.939 ms, min = 2.228 us, total = 8.271 s [state-dump] RayletWorkerPool.deadline_timer.kill_idle_workers - 36885 total (1 active), Execution time: mean = 19.156 us, total = 706.565 ms, Queueing time: mean = 78.047 us, max = 26.386 ms, min = -0.001 s, total = 2.879 s [state-dump] MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 29459 total (1 active), Execution time: mean = 460.178 us, total = 13.556 s, Queueing time: mean = 76.571 us, max = 4.063 ms, min = -0.000 s, total = 2.256 s [state-dump] NodeManager.ScheduleAndDispatchTasks - 7380 total (1 active), Execution time: mean = 17.594 us, total = 129.847 ms, Queueing time: mean = 76.163 us, max = 2.581 ms, min = 7.438 us, total = 562.082 ms [state-dump] NodeManager.deadline_timer.flush_free_objects - 7379 total (1 active), Execution time: mean = 9.598 us, total = 70.827 ms, Queueing time: mean = 179.079 us, max = 2.947 ms, min = 2.735 us, total = 1.321 s [state-dump] NodeManager.deadline_timer.spill_objects_when_over_threshold - 7379 total (1 active), Execution time: mean = 3.170 us, total = 23.390 ms, Queueing time: mean = 183.424 us, max = 2.946 ms, min = 3.845 us, total = 1.353 s [state-dump] NodeManagerService.grpc_server.GetResourceLoad - 7378 total (0 active), Execution time: mean = 613.817 us, total = 4.529 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 7378 total (0 active), Execution time: mean = 97.782 us, total = 721.439 ms, Queueing time: mean = 113.758 us, max = 2.934 ms, min = 4.027 us, total = 839.308 ms [state-dump] ClusterResourceManager.ResetRemoteNodeView - 2461 total (1 active), Execution time: mean = 10.101 us, total = 24.858 ms, Queueing time: mean = 76.599 us, max = 564.135 us, min = 12.181 us, total = 188.510 ms [state-dump] NodeManager.deadline_timer.record_metrics - 1476 total (1 active), Execution time: mean = 585.247 us, total = 863.825 ms, Queueing time: mean = 346.810 us, max = 2.010 ms, min = 9.115 us, total = 511.891 ms [state-dump] NodeManager.GcsCheckAlive - 1476 total (1 active), Execution time: mean = 322.416 us, total = 475.887 ms, Queueing time: mean = 609.195 us, max = 2.567 ms, min = 6.690 us, total = 899.172 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 1476 total (0 active), Execution time: mean = 56.198 us, total = 82.949 ms, Queueing time: mean = 112.644 us, max = 4.779 ms, min = 11.561 us, total = 166.262 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 1476 total (0 active), Execution time: mean = 1.594 ms, total = 2.352 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.deadline_timer.debug_state_dump - 738 total (1 active), Execution time: mean = 1.789 ms, total = 1.320 s, Queueing time: mean = 72.954 us, max = 248.460 us, min = 11.609 us, total = 53.840 ms [state-dump] NodeManager.deadline_timer.print_event_loop_stats - 123 total (1 active, 1 running), Execution time: mean = 2.766 ms, total = 340.204 ms, Queueing time: mean = 75.892 us, max = 325.100 us, min = 9.635 us, total = 9.335 ms [state-dump] ClientConnection.async_read.ProcessMessageHeader - 95 total (21 active), Execution time: mean = 7.620 us, total = 723.861 us, Queueing time: mean = 660.815 ms, max = 59.826 s, min = 29.537 us, total = 62.777 s [state-dump] ClientConnection.async_read.ProcessMessage - 74 total (0 active), Execution time: mean = 752.373 us, total = 55.676 ms, Queueing time: mean = 18.898 us, max = 247.615 us, min = 4.453 us, total = 1.398 ms [state-dump] ClientConnection.async_write.DoAsyncWrites - 22 total (0 active), Execution time: mean = 2.129 us, total = 46.834 us, Queueing time: mean = 43.475 us, max = 264.443 us, min = 17.727 us, total = 956.459 us [state-dump] CoreWorkerService.grpc_client.LocalGC.OnReplyReceived - 21 total (0 active), Execution time: mean = 65.428 us, total = 1.374 ms, Queueing time: mean = 149.534 us, max = 223.485 us, min = 56.603 us, total = 3.140 ms [state-dump] ObjectManager.ObjectAdded - 21 total (0 active), Execution time: mean = 13.522 us, total = 283.961 us, Queueing time: mean = 115.200 us, max = 203.067 us, min = 15.853 us, total = 2.419 ms [state-dump] CoreWorkerService.grpc_client.LocalGC - 21 total (0 active), Execution time: mean = 93.349 ms, total = 1.960 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ObjectManager.ObjectDeleted - 21 total (0 active), Execution time: mean = 18.654 us, total = 391.728 us, Queueing time: mean = 66.932 us, max = 269.040 us, min = 24.280 us, total = 1.406 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 21 total (0 active), Execution time: mean = 119.783 us, total = 2.515 ms, Queueing time: mean = 119.709 us, max = 193.975 us, min = 26.529 us, total = 2.514 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig - 21 total (0 active), Execution time: mean = 800.451 us, total = 16.809 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 14 total (1 active), Execution time: mean = 514.115 s, total = 7197.613 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 13 total (0 active), Execution time: mean = 380.546 us, total = 4.947 ms, Queueing time: mean = 133.192 us, max = 279.406 us, min = 20.299 us, total = 1.731 ms [state-dump] PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 204.482 us, total = 2.658 ms, Queueing time: mean = 3.109 ms, max = 9.915 ms, min = 28.359 us, total = 40.413 ms [state-dump] RaySyncer.BroadcastMessage - 12 total (0 active), Execution time: mean = 229.790 us, total = 2.757 ms, Queueing time: mean = 796.500 ns, max = 1.330 us, min = 180.000 ns, total = 9.558 us [state-dump] - 12 total (0 active), Execution time: mean = 1.109 us, total = 13.308 us, Queueing time: mean = 117.235 us, max = 207.332 us, min = 32.817 us, total = 1.407 ms [state-dump] WorkerPool.PopWorkerCallback - 10 total (0 active), Execution time: mean = 24.488 us, total = 244.878 us, Queueing time: mean = 114.997 us, max = 206.869 us, min = 23.653 us, total = 1.150 ms [state-dump] NodeManagerService.grpc_server.RequestWorkerLease.HandleRequestImpl - 10 total (0 active), Execution time: mean = 133.400 us, total = 1.334 ms, Queueing time: mean = 133.654 us, max = 258.016 us, min = 50.206 us, total = 1.337 ms [state-dump] NodeManagerService.grpc_server.RequestWorkerLease - 10 total (0 active), Execution time: mean = 967.372 us, total = 9.674 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReturnWorker - 10 total (0 active), Execution time: mean = 698.937 us, total = 6.989 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReturnWorker.HandleRequestImpl - 10 total (0 active), Execution time: mean = 129.906 us, total = 1.299 ms, Queueing time: mean = 109.463 us, max = 137.712 us, min = 31.332 us, total = 1.095 ms [state-dump] NodeManager.GCTaskFailureReason - 9 total (1 active), Execution time: mean = 7.823 us, total = 70.403 us, Queueing time: mean = 60.810 us, max = 97.290 us, min = 24.344 us, total = 547.290 us [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 1.320 ms, total = 2.640 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] RaySyncerRegister - 2 total (0 active), Execution time: mean = 2.053 us, total = 4.106 us, Queueing time: mean = 222.000 ns, max = 369.000 ns, min = 75.000 ns, total = 444.000 ns [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 142.119 us, total = 284.237 us, Queueing time: mean = 702.200 us, max = 1.287 ms, min = 117.240 us, total = 1.404 ms [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 92.080 us, total = 92.080 us, Queueing time: mean = 36.492 us, max = 36.492 us, min = 36.492 us, total = 36.492 us [state-dump] Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 74.750 us, total = 74.750 us, Queueing time: mean = 225.429 us, max = 225.429 us, min = 225.429 us, total = 225.429 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 1.746 ms, total = 1.746 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 1.935 ms, total = 1.935 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 1.022 s, total = 1.022 s, Queueing time: mean = 126.779 us, max = 126.779 us, min = 126.779 us, total = 126.779 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 2.178 ms, total = 2.178 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 43.810 us, total = 43.810 us, Queueing time: mean = 116.080 us, max = 116.080 us, min = 116.080 us, total = 116.080 us [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 2.002 ms, total = 2.002 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 1.278 ms, total = 1.278 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 288.995 us, total = 288.995 us, Queueing time: mean = 122.868 us, max = 122.868 us, min = 122.868 us, total = 122.868 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 242.297 us, total = 242.297 us, Queueing time: mean = 122.977 us, max = 122.977 us, min = 122.977 us, total = 122.977 us [state-dump] DebugString() time ms: 2 [state-dump] [state-dump] [2025-01-21 01:05:54,940 I 13636 13664] (raylet) store.cc:564: Plasma store debug dump: Current usage: 0 / 2.14748 GB - num bytes created total: 168 0 pending objects of total size 0MB - objects spillable: 0 - bytes spillable: 0 - objects unsealed: 0 - bytes unsealed: 0 - objects in use: 0 - bytes in use: 0 - objects evictable: 0 - bytes evictable: 0 - objects created by worker: 0 - bytes created by worker: 0 - objects restored: 0 - bytes restored: 0 - objects received: 0 - bytes received: 0 - objects errored: 0 - bytes errored: 0 [2025-01-21 01:05:56,279 I 13636 13636] (raylet) node_manager.cc:525: [state-dump] NodeManager: [state-dump] Node ID: 94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3 [state-dump] Node name: 192.168.0.2 [state-dump] InitialConfigResources: {object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, GPU: 20000, node:__internal_head__: 10000, accelerator_type:A40: 10000, memory: 844966629380000} [state-dump] ClusterTaskManager: [state-dump] ========== Node: 94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3 ================= [state-dump] Infeasible queue length: 0 [state-dump] Schedule queue length: 0 [state-dump] Dispatch queue length: 0 [state-dump] num_waiting_for_resource: 0 [state-dump] num_waiting_for_plasma_memory: 0 [state-dump] num_waiting_for_remote_node_resources: 0 [state-dump] num_worker_not_started_by_job_config_not_exist: 0 [state-dump] num_worker_not_started_by_registration_timeout: 0 [state-dump] num_tasks_waiting_for_workers: 0 [state-dump] num_cancelled_tasks: 0 [state-dump] cluster_resource_scheduler state: [state-dump] Local id: -2729954820020547849 Local resources: {"total":{CPU: [200000], memory: [844966629380000], accelerator_type:A40: [10000], node:192.168.0.2: [10000], node:__internal_head__: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000]}}, "available": {CPU: [200000], memory: [844966629380000], accelerator_type:A40: [10000], node:192.168.0.2: [10000], node:__internal_head__: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000]}}, "labels":{"ray.io/node_id":"94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3",} is_draining: 0 is_idle: 1 Cluster resources: node id: -2729954820020547849{"total":{object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, GPU: 20000, node:__internal_head__: 10000, accelerator_type:A40: 10000, memory: 844966629380000}}, "available": {GPU: 20000, node:__internal_head__: 10000, memory: 844966629380000, accelerator_type:A40: 10000, object_store_memory: 21474836480000, node:192.168.0.2: 10000, CPU: 200000}}, "labels":{"ray.io/node_id":"94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} [state-dump] Waiting tasks size: 0 [state-dump] Number of executing tasks: 0 [state-dump] Number of pinned task arguments: 0 [state-dump] Number of total spilled tasks: 0 [state-dump] Number of spilled waiting tasks: 0 [state-dump] Number of spilled unschedulable tasks: 0 [state-dump] Resource usage { [state-dump] } [state-dump] Backlog Size per scheduling descriptor :{workerId: num backlogs}: [state-dump] [state-dump] Running tasks by scheduling class: [state-dump] ================================================== [state-dump] [state-dump] ClusterResources: [state-dump] LocalObjectManager: [state-dump] - num pinned objects: 0 [state-dump] - pinned objects size: 0 [state-dump] - num objects pending restore: 0 [state-dump] - num objects pending spill: 0 [state-dump] - num bytes pending spill: 0 [state-dump] - num bytes currently spilled: 0 [state-dump] - cumulative spill requests: 0 [state-dump] - cumulative restore requests: 0 [state-dump] - spilled objects pending delete: 0 [state-dump] [state-dump] ObjectManager: [state-dump] - num local objects: 0 [state-dump] - num unfulfilled push requests: 0 [state-dump] - num object pull requests: 0 [state-dump] - num chunks received total: 0 [state-dump] - num chunks received failed (all): 0 [state-dump] - num chunks received failed / cancelled: 0 [state-dump] - num chunks received failed / plasma error: 0 [state-dump] Event stats: [state-dump] Global stats: 0 total (0 active) [state-dump] Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Execution time: mean = -nan s, total = 0.000 s [state-dump] Event stats: [state-dump] PushManager: [state-dump] - num pushes in flight: 0 [state-dump] - num chunks in flight: 0 [state-dump] - num chunks remaining: 0 [state-dump] - max chunks allowed: 409 [state-dump] OwnershipBasedObjectDirectory: [state-dump] - num listeners: 0 [state-dump] - cumulative location updates: 0 [state-dump] - num location updates per second: 0.000 [state-dump] - num location lookups per second: 0.000 [state-dump] - num locations added per second: 0.000 [state-dump] - num locations removed per second: 0.000 [state-dump] BufferPool: [state-dump] - create buffer state map size: 0 [state-dump] PullManager: [state-dump] - num bytes available for pulled objects: 2147483648 [state-dump] - num bytes being pulled (all): 0 [state-dump] - num bytes being pulled / pinned: 0 [state-dump] - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - first get request bundle: N/A [state-dump] - first wait request bundle: N/A [state-dump] - first task request bundle: N/A [state-dump] - num objects queued: 0 [state-dump] - num objects actively pulled (all): 0 [state-dump] - num objects actively pulled / pinned: 0 [state-dump] - num bundles being pulled: 0 [state-dump] - num pull retries: 0 [state-dump] - max timeout seconds: 0 [state-dump] - max timeout request is already processed. No entry. [state-dump] [state-dump] WorkerPool: [state-dump] - registered jobs: 1 [state-dump] - process_failed_job_config_missing: 0 [state-dump] - process_failed_rate_limited: 0 [state-dump] - process_failed_pending_registration: 0 [state-dump] - process_failed_runtime_env_setup_failed: 0 [state-dump] - num PYTHON workers: 20 [state-dump] - num PYTHON drivers: 1 [state-dump] - num PYTHON pending start requests: 0 [state-dump] - num PYTHON pending registration requests: 0 [state-dump] - num object spill callbacks queued: 0 [state-dump] - num object restore queued: 0 [state-dump] - num util functions queued: 0 [state-dump] - num idle workers: 20 [state-dump] TaskDependencyManager: [state-dump] - task deps map size: 0 [state-dump] - get req map size: 0 [state-dump] - wait req map size: 0 [state-dump] - local objects map size: 0 [state-dump] WaitManager: [state-dump] - num active wait requests: 0 [state-dump] Subscriber: [state-dump] Channel WORKER_OBJECT_LOCATIONS_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_OBJECT_EVICTION [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_REF_REMOVED_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] num async plasma notifications: 0 [state-dump] Remote node managers: [state-dump] Event stats: [state-dump] Global stats: 649216 total (35 active) [state-dump] Queueing time: mean = 175.755 us, max = 59.826 s, min = -0.001 s, total = 114.103 s [state-dump] Execution time: mean = 11.270 ms, total = 7316.425 s [state-dump] Event stats: [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog - 156198 total (0 active), Execution time: mean = 532.490 us, total = 83.174 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 156198 total (0 active), Execution time: mean = 35.516 us, total = 5.548 s, Queueing time: mean = 111.870 us, max = 3.855 ms, min = 1.846 us, total = 17.474 s [state-dump] RaySyncer.OnDemandBroadcasting - 74328 total (1 active), Execution time: mean = 11.149 us, total = 828.702 ms, Queueing time: mean = 93.034 us, max = 25.869 ms, min = 6.166 us, total = 6.915 s [state-dump] NodeManager.CheckGC - 74328 total (1 active), Execution time: mean = 3.727 us, total = 277.057 ms, Queueing time: mean = 99.588 us, max = 25.875 ms, min = -0.000 s, total = 7.402 s [state-dump] ObjectManager.UpdateAvailableMemory - 74328 total (0 active), Execution time: mean = 6.390 us, total = 474.973 ms, Queueing time: mean = 112.187 us, max = 45.939 ms, min = 2.228 us, total = 8.339 s [state-dump] RayletWorkerPool.deadline_timer.kill_idle_workers - 37185 total (1 active), Execution time: mean = 19.157 us, total = 712.365 ms, Queueing time: mean = 78.066 us, max = 26.386 ms, min = -0.001 s, total = 2.903 s [state-dump] MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 29699 total (1 active), Execution time: mean = 460.244 us, total = 13.669 s, Queueing time: mean = 76.588 us, max = 4.063 ms, min = -0.000 s, total = 2.275 s [state-dump] NodeManager.ScheduleAndDispatchTasks - 7440 total (1 active), Execution time: mean = 17.598 us, total = 130.927 ms, Queueing time: mean = 76.163 us, max = 2.581 ms, min = 7.438 us, total = 566.655 ms [state-dump] NodeManager.deadline_timer.flush_free_objects - 7439 total (1 active), Execution time: mean = 9.601 us, total = 71.418 ms, Queueing time: mean = 179.195 us, max = 2.947 ms, min = 2.735 us, total = 1.333 s [state-dump] NodeManager.deadline_timer.spill_objects_when_over_threshold - 7439 total (1 active), Execution time: mean = 3.172 us, total = 23.594 ms, Queueing time: mean = 183.539 us, max = 2.946 ms, min = 3.845 us, total = 1.365 s [state-dump] NodeManagerService.grpc_server.GetResourceLoad - 7438 total (0 active), Execution time: mean = 614.179 us, total = 4.568 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 7438 total (0 active), Execution time: mean = 97.822 us, total = 727.600 ms, Queueing time: mean = 113.824 us, max = 2.934 ms, min = 4.027 us, total = 846.622 ms [state-dump] ClusterResourceManager.ResetRemoteNodeView - 2481 total (1 active), Execution time: mean = 10.099 us, total = 25.056 ms, Queueing time: mean = 76.569 us, max = 564.135 us, min = 12.181 us, total = 189.967 ms [state-dump] NodeManager.deadline_timer.record_metrics - 1488 total (1 active), Execution time: mean = 585.307 us, total = 870.936 ms, Queueing time: mean = 347.307 us, max = 2.010 ms, min = 9.115 us, total = 516.792 ms [state-dump] NodeManager.GcsCheckAlive - 1488 total (1 active), Execution time: mean = 322.500 us, total = 479.880 ms, Queueing time: mean = 609.680 us, max = 2.567 ms, min = 6.690 us, total = 907.204 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 1488 total (0 active), Execution time: mean = 56.198 us, total = 83.622 ms, Queueing time: mean = 112.519 us, max = 4.779 ms, min = 11.561 us, total = 167.428 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 1488 total (0 active), Execution time: mean = 1.593 ms, total = 2.370 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.deadline_timer.debug_state_dump - 744 total (1 active), Execution time: mean = 1.790 ms, total = 1.332 s, Queueing time: mean = 72.961 us, max = 248.460 us, min = 11.609 us, total = 54.283 ms [state-dump] NodeManager.deadline_timer.print_event_loop_stats - 124 total (1 active, 1 running), Execution time: mean = 2.767 ms, total = 343.133 ms, Queueing time: mean = 75.957 us, max = 325.100 us, min = 9.635 us, total = 9.419 ms [state-dump] ClientConnection.async_read.ProcessMessageHeader - 95 total (21 active), Execution time: mean = 7.620 us, total = 723.861 us, Queueing time: mean = 660.815 ms, max = 59.826 s, min = 29.537 us, total = 62.777 s [state-dump] ClientConnection.async_read.ProcessMessage - 74 total (0 active), Execution time: mean = 752.373 us, total = 55.676 ms, Queueing time: mean = 18.898 us, max = 247.615 us, min = 4.453 us, total = 1.398 ms [state-dump] ClientConnection.async_write.DoAsyncWrites - 22 total (0 active), Execution time: mean = 2.129 us, total = 46.834 us, Queueing time: mean = 43.475 us, max = 264.443 us, min = 17.727 us, total = 956.459 us [state-dump] CoreWorkerService.grpc_client.LocalGC.OnReplyReceived - 21 total (0 active), Execution time: mean = 65.428 us, total = 1.374 ms, Queueing time: mean = 149.534 us, max = 223.485 us, min = 56.603 us, total = 3.140 ms [state-dump] ObjectManager.ObjectAdded - 21 total (0 active), Execution time: mean = 13.522 us, total = 283.961 us, Queueing time: mean = 115.200 us, max = 203.067 us, min = 15.853 us, total = 2.419 ms [state-dump] CoreWorkerService.grpc_client.LocalGC - 21 total (0 active), Execution time: mean = 93.349 ms, total = 1.960 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ObjectManager.ObjectDeleted - 21 total (0 active), Execution time: mean = 18.654 us, total = 391.728 us, Queueing time: mean = 66.932 us, max = 269.040 us, min = 24.280 us, total = 1.406 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 21 total (0 active), Execution time: mean = 119.783 us, total = 2.515 ms, Queueing time: mean = 119.709 us, max = 193.975 us, min = 26.529 us, total = 2.514 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig - 21 total (0 active), Execution time: mean = 800.451 us, total = 16.809 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 14 total (1 active), Execution time: mean = 514.115 s, total = 7197.613 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 13 total (0 active), Execution time: mean = 380.546 us, total = 4.947 ms, Queueing time: mean = 133.192 us, max = 279.406 us, min = 20.299 us, total = 1.731 ms [state-dump] PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 204.482 us, total = 2.658 ms, Queueing time: mean = 3.109 ms, max = 9.915 ms, min = 28.359 us, total = 40.413 ms [state-dump] RaySyncer.BroadcastMessage - 12 total (0 active), Execution time: mean = 229.790 us, total = 2.757 ms, Queueing time: mean = 796.500 ns, max = 1.330 us, min = 180.000 ns, total = 9.558 us [state-dump] - 12 total (0 active), Execution time: mean = 1.109 us, total = 13.308 us, Queueing time: mean = 117.235 us, max = 207.332 us, min = 32.817 us, total = 1.407 ms [state-dump] WorkerPool.PopWorkerCallback - 10 total (0 active), Execution time: mean = 24.488 us, total = 244.878 us, Queueing time: mean = 114.997 us, max = 206.869 us, min = 23.653 us, total = 1.150 ms [state-dump] NodeManagerService.grpc_server.RequestWorkerLease.HandleRequestImpl - 10 total (0 active), Execution time: mean = 133.400 us, total = 1.334 ms, Queueing time: mean = 133.654 us, max = 258.016 us, min = 50.206 us, total = 1.337 ms [state-dump] NodeManagerService.grpc_server.RequestWorkerLease - 10 total (0 active), Execution time: mean = 967.372 us, total = 9.674 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReturnWorker - 10 total (0 active), Execution time: mean = 698.937 us, total = 6.989 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReturnWorker.HandleRequestImpl - 10 total (0 active), Execution time: mean = 129.906 us, total = 1.299 ms, Queueing time: mean = 109.463 us, max = 137.712 us, min = 31.332 us, total = 1.095 ms [state-dump] NodeManager.GCTaskFailureReason - 9 total (1 active), Execution time: mean = 7.823 us, total = 70.403 us, Queueing time: mean = 60.810 us, max = 97.290 us, min = 24.344 us, total = 547.290 us [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 1.320 ms, total = 2.640 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] RaySyncerRegister - 2 total (0 active), Execution time: mean = 2.053 us, total = 4.106 us, Queueing time: mean = 222.000 ns, max = 369.000 ns, min = 75.000 ns, total = 444.000 ns [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 142.119 us, total = 284.237 us, Queueing time: mean = 702.200 us, max = 1.287 ms, min = 117.240 us, total = 1.404 ms [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 92.080 us, total = 92.080 us, Queueing time: mean = 36.492 us, max = 36.492 us, min = 36.492 us, total = 36.492 us [state-dump] Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 74.750 us, total = 74.750 us, Queueing time: mean = 225.429 us, max = 225.429 us, min = 225.429 us, total = 225.429 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 1.746 ms, total = 1.746 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 1.935 ms, total = 1.935 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 1.022 s, total = 1.022 s, Queueing time: mean = 126.779 us, max = 126.779 us, min = 126.779 us, total = 126.779 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 2.178 ms, total = 2.178 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 43.810 us, total = 43.810 us, Queueing time: mean = 116.080 us, max = 116.080 us, min = 116.080 us, total = 116.080 us [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 2.002 ms, total = 2.002 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 1.278 ms, total = 1.278 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 288.995 us, total = 288.995 us, Queueing time: mean = 122.868 us, max = 122.868 us, min = 122.868 us, total = 122.868 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 242.297 us, total = 242.297 us, Queueing time: mean = 122.977 us, max = 122.977 us, min = 122.977 us, total = 122.977 us [state-dump] DebugString() time ms: 2 [state-dump] [state-dump] [2025-01-21 01:06:54,940 I 13636 13664] (raylet) store.cc:564: Plasma store debug dump: Current usage: 0 / 2.14748 GB - num bytes created total: 168 0 pending objects of total size 0MB - objects spillable: 0 - bytes spillable: 0 - objects unsealed: 0 - bytes unsealed: 0 - objects in use: 0 - bytes in use: 0 - objects evictable: 0 - bytes evictable: 0 - objects created by worker: 0 - bytes created by worker: 0 - objects restored: 0 - bytes restored: 0 - objects received: 0 - bytes received: 0 - objects errored: 0 - bytes errored: 0 [2025-01-21 01:06:56,282 I 13636 13636] (raylet) node_manager.cc:525: [state-dump] NodeManager: [state-dump] Node ID: 94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3 [state-dump] Node name: 192.168.0.2 [state-dump] InitialConfigResources: {object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, GPU: 20000, node:__internal_head__: 10000, accelerator_type:A40: 10000, memory: 844966629380000} [state-dump] ClusterTaskManager: [state-dump] ========== Node: 94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3 ================= [state-dump] Infeasible queue length: 0 [state-dump] Schedule queue length: 0 [state-dump] Dispatch queue length: 0 [state-dump] num_waiting_for_resource: 0 [state-dump] num_waiting_for_plasma_memory: 0 [state-dump] num_waiting_for_remote_node_resources: 0 [state-dump] num_worker_not_started_by_job_config_not_exist: 0 [state-dump] num_worker_not_started_by_registration_timeout: 0 [state-dump] num_tasks_waiting_for_workers: 0 [state-dump] num_cancelled_tasks: 0 [state-dump] cluster_resource_scheduler state: [state-dump] Local id: -2729954820020547849 Local resources: {"total":{CPU: [200000], memory: [844966629380000], accelerator_type:A40: [10000], node:192.168.0.2: [10000], node:__internal_head__: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000]}}, "available": {CPU: [200000], memory: [844966629380000], accelerator_type:A40: [10000], node:192.168.0.2: [10000], node:__internal_head__: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000]}}, "labels":{"ray.io/node_id":"94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3",} is_draining: 0 is_idle: 1 Cluster resources: node id: -2729954820020547849{"total":{object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, GPU: 20000, node:__internal_head__: 10000, accelerator_type:A40: 10000, memory: 844966629380000}}, "available": {GPU: 20000, node:__internal_head__: 10000, memory: 844966629380000, accelerator_type:A40: 10000, object_store_memory: 21474836480000, node:192.168.0.2: 10000, CPU: 200000}}, "labels":{"ray.io/node_id":"94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} [state-dump] Waiting tasks size: 0 [state-dump] Number of executing tasks: 0 [state-dump] Number of pinned task arguments: 0 [state-dump] Number of total spilled tasks: 0 [state-dump] Number of spilled waiting tasks: 0 [state-dump] Number of spilled unschedulable tasks: 0 [state-dump] Resource usage { [state-dump] } [state-dump] Backlog Size per scheduling descriptor :{workerId: num backlogs}: [state-dump] [state-dump] Running tasks by scheduling class: [state-dump] ================================================== [state-dump] [state-dump] ClusterResources: [state-dump] LocalObjectManager: [state-dump] - num pinned objects: 0 [state-dump] - pinned objects size: 0 [state-dump] - num objects pending restore: 0 [state-dump] - num objects pending spill: 0 [state-dump] - num bytes pending spill: 0 [state-dump] - num bytes currently spilled: 0 [state-dump] - cumulative spill requests: 0 [state-dump] - cumulative restore requests: 0 [state-dump] - spilled objects pending delete: 0 [state-dump] [state-dump] ObjectManager: [state-dump] - num local objects: 0 [state-dump] - num unfulfilled push requests: 0 [state-dump] - num object pull requests: 0 [state-dump] - num chunks received total: 0 [state-dump] - num chunks received failed (all): 0 [state-dump] - num chunks received failed / cancelled: 0 [state-dump] - num chunks received failed / plasma error: 0 [state-dump] Event stats: [state-dump] Global stats: 0 total (0 active) [state-dump] Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Execution time: mean = -nan s, total = 0.000 s [state-dump] Event stats: [state-dump] PushManager: [state-dump] - num pushes in flight: 0 [state-dump] - num chunks in flight: 0 [state-dump] - num chunks remaining: 0 [state-dump] - max chunks allowed: 409 [state-dump] OwnershipBasedObjectDirectory: [state-dump] - num listeners: 0 [state-dump] - cumulative location updates: 0 [state-dump] - num location updates per second: 0.000 [state-dump] - num location lookups per second: 0.000 [state-dump] - num locations added per second: 0.000 [state-dump] - num locations removed per second: 0.000 [state-dump] BufferPool: [state-dump] - create buffer state map size: 0 [state-dump] PullManager: [state-dump] - num bytes available for pulled objects: 2147483648 [state-dump] - num bytes being pulled (all): 0 [state-dump] - num bytes being pulled / pinned: 0 [state-dump] - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - first get request bundle: N/A [state-dump] - first wait request bundle: N/A [state-dump] - first task request bundle: N/A [state-dump] - num objects queued: 0 [state-dump] - num objects actively pulled (all): 0 [state-dump] - num objects actively pulled / pinned: 0 [state-dump] - num bundles being pulled: 0 [state-dump] - num pull retries: 0 [state-dump] - max timeout seconds: 0 [state-dump] - max timeout request is already processed. No entry. [state-dump] [state-dump] WorkerPool: [state-dump] - registered jobs: 1 [state-dump] - process_failed_job_config_missing: 0 [state-dump] - process_failed_rate_limited: 0 [state-dump] - process_failed_pending_registration: 0 [state-dump] - process_failed_runtime_env_setup_failed: 0 [state-dump] - num PYTHON workers: 20 [state-dump] - num PYTHON drivers: 1 [state-dump] - num PYTHON pending start requests: 0 [state-dump] - num PYTHON pending registration requests: 0 [state-dump] - num object spill callbacks queued: 0 [state-dump] - num object restore queued: 0 [state-dump] - num util functions queued: 0 [state-dump] - num idle workers: 20 [state-dump] TaskDependencyManager: [state-dump] - task deps map size: 0 [state-dump] - get req map size: 0 [state-dump] - wait req map size: 0 [state-dump] - local objects map size: 0 [state-dump] WaitManager: [state-dump] - num active wait requests: 0 [state-dump] Subscriber: [state-dump] Channel WORKER_OBJECT_LOCATIONS_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_OBJECT_EVICTION [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_REF_REMOVED_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] num async plasma notifications: 0 [state-dump] Remote node managers: [state-dump] Event stats: [state-dump] Global stats: 654447 total (35 active) [state-dump] Queueing time: mean = 174.903 us, max = 59.826 s, min = -0.001 s, total = 114.465 s [state-dump] Execution time: mean = 11.181 ms, total = 7317.300 s [state-dump] Event stats: [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog - 157458 total (0 active), Execution time: mean = 532.191 us, total = 83.798 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 157458 total (0 active), Execution time: mean = 35.488 us, total = 5.588 s, Queueing time: mean = 111.705 us, max = 3.855 ms, min = 1.846 us, total = 17.589 s [state-dump] RaySyncer.OnDemandBroadcasting - 74927 total (1 active), Execution time: mean = 11.144 us, total = 834.991 ms, Queueing time: mean = 92.953 us, max = 25.869 ms, min = 6.166 us, total = 6.965 s [state-dump] NodeManager.CheckGC - 74927 total (1 active), Execution time: mean = 3.721 us, total = 278.787 ms, Queueing time: mean = 99.509 us, max = 25.875 ms, min = -0.000 s, total = 7.456 s [state-dump] ObjectManager.UpdateAvailableMemory - 74927 total (0 active), Execution time: mean = 6.386 us, total = 478.473 ms, Queueing time: mean = 112.070 us, max = 45.939 ms, min = 2.228 us, total = 8.397 s [state-dump] RayletWorkerPool.deadline_timer.kill_idle_workers - 37485 total (1 active), Execution time: mean = 19.145 us, total = 717.646 ms, Queueing time: mean = 78.015 us, max = 26.386 ms, min = -0.001 s, total = 2.924 s [state-dump] MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 29938 total (1 active), Execution time: mean = 460.180 us, total = 13.777 s, Queueing time: mean = 76.567 us, max = 4.063 ms, min = -0.000 s, total = 2.292 s [state-dump] NodeManager.ScheduleAndDispatchTasks - 7500 total (1 active), Execution time: mean = 17.585 us, total = 131.885 ms, Queueing time: mean = 76.125 us, max = 2.581 ms, min = 7.438 us, total = 570.941 ms [state-dump] NodeManager.deadline_timer.flush_free_objects - 7499 total (1 active), Execution time: mean = 9.599 us, total = 71.981 ms, Queueing time: mean = 179.155 us, max = 2.947 ms, min = 2.735 us, total = 1.343 s [state-dump] NodeManager.deadline_timer.spill_objects_when_over_threshold - 7499 total (1 active), Execution time: mean = 3.170 us, total = 23.773 ms, Queueing time: mean = 183.498 us, max = 2.946 ms, min = 3.845 us, total = 1.376 s [state-dump] NodeManagerService.grpc_server.GetResourceLoad - 7498 total (0 active), Execution time: mean = 614.082 us, total = 4.604 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 7498 total (0 active), Execution time: mean = 97.780 us, total = 733.152 ms, Queueing time: mean = 113.679 us, max = 2.934 ms, min = 4.027 us, total = 852.363 ms [state-dump] ClusterResourceManager.ResetRemoteNodeView - 2501 total (1 active), Execution time: mean = 10.102 us, total = 25.266 ms, Queueing time: mean = 76.466 us, max = 564.135 us, min = 12.181 us, total = 191.241 ms [state-dump] NodeManager.deadline_timer.record_metrics - 1500 total (1 active), Execution time: mean = 585.116 us, total = 877.674 ms, Queueing time: mean = 347.371 us, max = 2.010 ms, min = 9.115 us, total = 521.056 ms [state-dump] NodeManager.GcsCheckAlive - 1500 total (1 active), Execution time: mean = 322.273 us, total = 483.410 ms, Queueing time: mean = 609.758 us, max = 2.567 ms, min = 6.690 us, total = 914.636 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 1500 total (0 active), Execution time: mean = 56.199 us, total = 84.299 ms, Queueing time: mean = 112.271 us, max = 4.779 ms, min = 11.561 us, total = 168.407 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 1500 total (0 active), Execution time: mean = 1.592 ms, total = 2.387 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.deadline_timer.debug_state_dump - 750 total (1 active), Execution time: mean = 1.790 ms, total = 1.343 s, Queueing time: mean = 72.927 us, max = 248.460 us, min = 11.609 us, total = 54.695 ms [state-dump] NodeManager.deadline_timer.print_event_loop_stats - 125 total (1 active, 1 running), Execution time: mean = 2.767 ms, total = 345.852 ms, Queueing time: mean = 75.814 us, max = 325.100 us, min = 9.635 us, total = 9.477 ms [state-dump] ClientConnection.async_read.ProcessMessageHeader - 95 total (21 active), Execution time: mean = 7.620 us, total = 723.861 us, Queueing time: mean = 660.815 ms, max = 59.826 s, min = 29.537 us, total = 62.777 s [state-dump] ClientConnection.async_read.ProcessMessage - 74 total (0 active), Execution time: mean = 752.373 us, total = 55.676 ms, Queueing time: mean = 18.898 us, max = 247.615 us, min = 4.453 us, total = 1.398 ms [state-dump] ClientConnection.async_write.DoAsyncWrites - 22 total (0 active), Execution time: mean = 2.129 us, total = 46.834 us, Queueing time: mean = 43.475 us, max = 264.443 us, min = 17.727 us, total = 956.459 us [state-dump] CoreWorkerService.grpc_client.LocalGC.OnReplyReceived - 21 total (0 active), Execution time: mean = 65.428 us, total = 1.374 ms, Queueing time: mean = 149.534 us, max = 223.485 us, min = 56.603 us, total = 3.140 ms [state-dump] ObjectManager.ObjectAdded - 21 total (0 active), Execution time: mean = 13.522 us, total = 283.961 us, Queueing time: mean = 115.200 us, max = 203.067 us, min = 15.853 us, total = 2.419 ms [state-dump] CoreWorkerService.grpc_client.LocalGC - 21 total (0 active), Execution time: mean = 93.349 ms, total = 1.960 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ObjectManager.ObjectDeleted - 21 total (0 active), Execution time: mean = 18.654 us, total = 391.728 us, Queueing time: mean = 66.932 us, max = 269.040 us, min = 24.280 us, total = 1.406 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 21 total (0 active), Execution time: mean = 119.783 us, total = 2.515 ms, Queueing time: mean = 119.709 us, max = 193.975 us, min = 26.529 us, total = 2.514 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig - 21 total (0 active), Execution time: mean = 800.451 us, total = 16.809 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 14 total (1 active), Execution time: mean = 514.115 s, total = 7197.613 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 13 total (0 active), Execution time: mean = 380.546 us, total = 4.947 ms, Queueing time: mean = 133.192 us, max = 279.406 us, min = 20.299 us, total = 1.731 ms [state-dump] PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 204.482 us, total = 2.658 ms, Queueing time: mean = 3.109 ms, max = 9.915 ms, min = 28.359 us, total = 40.413 ms [state-dump] RaySyncer.BroadcastMessage - 12 total (0 active), Execution time: mean = 229.790 us, total = 2.757 ms, Queueing time: mean = 796.500 ns, max = 1.330 us, min = 180.000 ns, total = 9.558 us [state-dump] - 12 total (0 active), Execution time: mean = 1.109 us, total = 13.308 us, Queueing time: mean = 117.235 us, max = 207.332 us, min = 32.817 us, total = 1.407 ms [state-dump] WorkerPool.PopWorkerCallback - 10 total (0 active), Execution time: mean = 24.488 us, total = 244.878 us, Queueing time: mean = 114.997 us, max = 206.869 us, min = 23.653 us, total = 1.150 ms [state-dump] NodeManagerService.grpc_server.RequestWorkerLease.HandleRequestImpl - 10 total (0 active), Execution time: mean = 133.400 us, total = 1.334 ms, Queueing time: mean = 133.654 us, max = 258.016 us, min = 50.206 us, total = 1.337 ms [state-dump] NodeManagerService.grpc_server.RequestWorkerLease - 10 total (0 active), Execution time: mean = 967.372 us, total = 9.674 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReturnWorker - 10 total (0 active), Execution time: mean = 698.937 us, total = 6.989 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReturnWorker.HandleRequestImpl - 10 total (0 active), Execution time: mean = 129.906 us, total = 1.299 ms, Queueing time: mean = 109.463 us, max = 137.712 us, min = 31.332 us, total = 1.095 ms [state-dump] NodeManager.GCTaskFailureReason - 9 total (1 active), Execution time: mean = 7.823 us, total = 70.403 us, Queueing time: mean = 60.810 us, max = 97.290 us, min = 24.344 us, total = 547.290 us [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 1.320 ms, total = 2.640 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] RaySyncerRegister - 2 total (0 active), Execution time: mean = 2.053 us, total = 4.106 us, Queueing time: mean = 222.000 ns, max = 369.000 ns, min = 75.000 ns, total = 444.000 ns [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 142.119 us, total = 284.237 us, Queueing time: mean = 702.200 us, max = 1.287 ms, min = 117.240 us, total = 1.404 ms [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 92.080 us, total = 92.080 us, Queueing time: mean = 36.492 us, max = 36.492 us, min = 36.492 us, total = 36.492 us [state-dump] Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 74.750 us, total = 74.750 us, Queueing time: mean = 225.429 us, max = 225.429 us, min = 225.429 us, total = 225.429 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 1.746 ms, total = 1.746 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 1.935 ms, total = 1.935 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 1.022 s, total = 1.022 s, Queueing time: mean = 126.779 us, max = 126.779 us, min = 126.779 us, total = 126.779 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 2.178 ms, total = 2.178 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 43.810 us, total = 43.810 us, Queueing time: mean = 116.080 us, max = 116.080 us, min = 116.080 us, total = 116.080 us [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 2.002 ms, total = 2.002 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 1.278 ms, total = 1.278 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 288.995 us, total = 288.995 us, Queueing time: mean = 122.868 us, max = 122.868 us, min = 122.868 us, total = 122.868 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 242.297 us, total = 242.297 us, Queueing time: mean = 122.977 us, max = 122.977 us, min = 122.977 us, total = 122.977 us [state-dump] DebugString() time ms: 2 [state-dump] [state-dump] [2025-01-21 01:07:54,941 I 13636 13664] (raylet) store.cc:564: Plasma store debug dump: Current usage: 0 / 2.14748 GB - num bytes created total: 168 0 pending objects of total size 0MB - objects spillable: 0 - bytes spillable: 0 - objects unsealed: 0 - bytes unsealed: 0 - objects in use: 0 - bytes in use: 0 - objects evictable: 0 - bytes evictable: 0 - objects created by worker: 0 - bytes created by worker: 0 - objects restored: 0 - bytes restored: 0 - objects received: 0 - bytes received: 0 - objects errored: 0 - bytes errored: 0 [2025-01-21 01:07:56,285 I 13636 13636] (raylet) node_manager.cc:525: [state-dump] NodeManager: [state-dump] Node ID: 94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3 [state-dump] Node name: 192.168.0.2 [state-dump] InitialConfigResources: {object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, GPU: 20000, node:__internal_head__: 10000, accelerator_type:A40: 10000, memory: 844966629380000} [state-dump] ClusterTaskManager: [state-dump] ========== Node: 94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3 ================= [state-dump] Infeasible queue length: 0 [state-dump] Schedule queue length: 0 [state-dump] Dispatch queue length: 0 [state-dump] num_waiting_for_resource: 0 [state-dump] num_waiting_for_plasma_memory: 0 [state-dump] num_waiting_for_remote_node_resources: 0 [state-dump] num_worker_not_started_by_job_config_not_exist: 0 [state-dump] num_worker_not_started_by_registration_timeout: 0 [state-dump] num_tasks_waiting_for_workers: 0 [state-dump] num_cancelled_tasks: 0 [state-dump] cluster_resource_scheduler state: [state-dump] Local id: -2729954820020547849 Local resources: {"total":{CPU: [200000], memory: [844966629380000], accelerator_type:A40: [10000], node:192.168.0.2: [10000], node:__internal_head__: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000]}}, "available": {CPU: [200000], memory: [844966629380000], accelerator_type:A40: [10000], node:192.168.0.2: [10000], node:__internal_head__: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000]}}, "labels":{"ray.io/node_id":"94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3",} is_draining: 0 is_idle: 1 Cluster resources: node id: -2729954820020547849{"total":{object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, GPU: 20000, node:__internal_head__: 10000, accelerator_type:A40: 10000, memory: 844966629380000}}, "available": {GPU: 20000, node:__internal_head__: 10000, memory: 844966629380000, accelerator_type:A40: 10000, object_store_memory: 21474836480000, node:192.168.0.2: 10000, CPU: 200000}}, "labels":{"ray.io/node_id":"94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} [state-dump] Waiting tasks size: 0 [state-dump] Number of executing tasks: 0 [state-dump] Number of pinned task arguments: 0 [state-dump] Number of total spilled tasks: 0 [state-dump] Number of spilled waiting tasks: 0 [state-dump] Number of spilled unschedulable tasks: 0 [state-dump] Resource usage { [state-dump] } [state-dump] Backlog Size per scheduling descriptor :{workerId: num backlogs}: [state-dump] [state-dump] Running tasks by scheduling class: [state-dump] ================================================== [state-dump] [state-dump] ClusterResources: [state-dump] LocalObjectManager: [state-dump] - num pinned objects: 0 [state-dump] - pinned objects size: 0 [state-dump] - num objects pending restore: 0 [state-dump] - num objects pending spill: 0 [state-dump] - num bytes pending spill: 0 [state-dump] - num bytes currently spilled: 0 [state-dump] - cumulative spill requests: 0 [state-dump] - cumulative restore requests: 0 [state-dump] - spilled objects pending delete: 0 [state-dump] [state-dump] ObjectManager: [state-dump] - num local objects: 0 [state-dump] - num unfulfilled push requests: 0 [state-dump] - num object pull requests: 0 [state-dump] - num chunks received total: 0 [state-dump] - num chunks received failed (all): 0 [state-dump] - num chunks received failed / cancelled: 0 [state-dump] - num chunks received failed / plasma error: 0 [state-dump] Event stats: [state-dump] Global stats: 0 total (0 active) [state-dump] Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Execution time: mean = -nan s, total = 0.000 s [state-dump] Event stats: [state-dump] PushManager: [state-dump] - num pushes in flight: 0 [state-dump] - num chunks in flight: 0 [state-dump] - num chunks remaining: 0 [state-dump] - max chunks allowed: 409 [state-dump] OwnershipBasedObjectDirectory: [state-dump] - num listeners: 0 [state-dump] - cumulative location updates: 0 [state-dump] - num location updates per second: 0.000 [state-dump] - num location lookups per second: 0.000 [state-dump] - num locations added per second: 0.000 [state-dump] - num locations removed per second: 0.000 [state-dump] BufferPool: [state-dump] - create buffer state map size: 0 [state-dump] PullManager: [state-dump] - num bytes available for pulled objects: 2147483648 [state-dump] - num bytes being pulled (all): 0 [state-dump] - num bytes being pulled / pinned: 0 [state-dump] - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - first get request bundle: N/A [state-dump] - first wait request bundle: N/A [state-dump] - first task request bundle: N/A [state-dump] - num objects queued: 0 [state-dump] - num objects actively pulled (all): 0 [state-dump] - num objects actively pulled / pinned: 0 [state-dump] - num bundles being pulled: 0 [state-dump] - num pull retries: 0 [state-dump] - max timeout seconds: 0 [state-dump] - max timeout request is already processed. No entry. [state-dump] [state-dump] WorkerPool: [state-dump] - registered jobs: 1 [state-dump] - process_failed_job_config_missing: 0 [state-dump] - process_failed_rate_limited: 0 [state-dump] - process_failed_pending_registration: 0 [state-dump] - process_failed_runtime_env_setup_failed: 0 [state-dump] - num PYTHON workers: 20 [state-dump] - num PYTHON drivers: 1 [state-dump] - num PYTHON pending start requests: 0 [state-dump] - num PYTHON pending registration requests: 0 [state-dump] - num object spill callbacks queued: 0 [state-dump] - num object restore queued: 0 [state-dump] - num util functions queued: 0 [state-dump] - num idle workers: 20 [state-dump] TaskDependencyManager: [state-dump] - task deps map size: 0 [state-dump] - get req map size: 0 [state-dump] - wait req map size: 0 [state-dump] - local objects map size: 0 [state-dump] WaitManager: [state-dump] - num active wait requests: 0 [state-dump] Subscriber: [state-dump] Channel WORKER_OBJECT_LOCATIONS_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_OBJECT_EVICTION [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_REF_REMOVED_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] num async plasma notifications: 0 [state-dump] Remote node managers: [state-dump] Event stats: [state-dump] Global stats: 659682 total (35 active) [state-dump] Queueing time: mean = 174.159 us, max = 59.826 s, min = -0.001 s, total = 114.890 s [state-dump] Execution time: mean = 11.094 ms, total = 7318.250 s [state-dump] Event stats: [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog - 158718 total (0 active), Execution time: mean = 532.274 us, total = 84.481 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 158718 total (0 active), Execution time: mean = 35.488 us, total = 5.633 s, Queueing time: mean = 111.710 us, max = 3.855 ms, min = 1.846 us, total = 17.730 s [state-dump] RaySyncer.OnDemandBroadcasting - 75527 total (1 active), Execution time: mean = 11.141 us, total = 841.484 ms, Queueing time: mean = 92.956 us, max = 25.869 ms, min = 6.166 us, total = 7.021 s [state-dump] NodeManager.CheckGC - 75527 total (1 active), Execution time: mean = 3.714 us, total = 280.517 ms, Queueing time: mean = 99.516 us, max = 25.875 ms, min = -0.000 s, total = 7.516 s [state-dump] ObjectManager.UpdateAvailableMemory - 75527 total (0 active), Execution time: mean = 6.387 us, total = 482.385 ms, Queueing time: mean = 112.128 us, max = 45.939 ms, min = 2.228 us, total = 8.469 s [state-dump] RayletWorkerPool.deadline_timer.kill_idle_workers - 37785 total (1 active), Execution time: mean = 19.140 us, total = 723.202 ms, Queueing time: mean = 77.990 us, max = 26.386 ms, min = -0.001 s, total = 2.947 s [state-dump] MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 30178 total (1 active), Execution time: mean = 460.220 us, total = 13.889 s, Queueing time: mean = 76.583 us, max = 4.063 ms, min = -0.000 s, total = 2.311 s [state-dump] NodeManager.ScheduleAndDispatchTasks - 7560 total (1 active), Execution time: mean = 17.591 us, total = 132.989 ms, Queueing time: mean = 76.187 us, max = 2.581 ms, min = 7.438 us, total = 575.974 ms [state-dump] NodeManager.deadline_timer.flush_free_objects - 7559 total (1 active), Execution time: mean = 9.598 us, total = 72.550 ms, Queueing time: mean = 179.336 us, max = 2.947 ms, min = 2.735 us, total = 1.356 s [state-dump] NodeManager.deadline_timer.spill_objects_when_over_threshold - 7559 total (1 active), Execution time: mean = 3.169 us, total = 23.954 ms, Queueing time: mean = 183.680 us, max = 2.946 ms, min = 3.845 us, total = 1.388 s [state-dump] NodeManagerService.grpc_server.GetResourceLoad - 7558 total (0 active), Execution time: mean = 614.336 us, total = 4.643 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 7558 total (0 active), Execution time: mean = 97.780 us, total = 739.023 ms, Queueing time: mean = 113.732 us, max = 2.934 ms, min = 4.027 us, total = 859.584 ms [state-dump] ClusterResourceManager.ResetRemoteNodeView - 2521 total (1 active), Execution time: mean = 10.104 us, total = 25.473 ms, Queueing time: mean = 76.460 us, max = 564.135 us, min = 12.181 us, total = 192.755 ms [state-dump] NodeManager.deadline_timer.record_metrics - 1512 total (1 active), Execution time: mean = 585.190 us, total = 884.807 ms, Queueing time: mean = 348.179 us, max = 2.010 ms, min = 9.115 us, total = 526.447 ms [state-dump] NodeManager.GcsCheckAlive - 1512 total (1 active), Execution time: mean = 322.098 us, total = 487.012 ms, Queueing time: mean = 610.831 us, max = 2.567 ms, min = 6.690 us, total = 923.577 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 1512 total (0 active), Execution time: mean = 56.200 us, total = 84.975 ms, Queueing time: mean = 112.322 us, max = 4.779 ms, min = 11.561 us, total = 169.832 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 1512 total (0 active), Execution time: mean = 1.591 ms, total = 2.406 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.deadline_timer.debug_state_dump - 756 total (1 active), Execution time: mean = 1.792 ms, total = 1.354 s, Queueing time: mean = 72.917 us, max = 248.460 us, min = 11.609 us, total = 55.125 ms [state-dump] NodeManager.deadline_timer.print_event_loop_stats - 126 total (1 active, 1 running), Execution time: mean = 2.768 ms, total = 348.708 ms, Queueing time: mean = 75.850 us, max = 325.100 us, min = 9.635 us, total = 9.557 ms [state-dump] ClientConnection.async_read.ProcessMessageHeader - 95 total (21 active), Execution time: mean = 7.620 us, total = 723.861 us, Queueing time: mean = 660.815 ms, max = 59.826 s, min = 29.537 us, total = 62.777 s [state-dump] ClientConnection.async_read.ProcessMessage - 74 total (0 active), Execution time: mean = 752.373 us, total = 55.676 ms, Queueing time: mean = 18.898 us, max = 247.615 us, min = 4.453 us, total = 1.398 ms [state-dump] ClientConnection.async_write.DoAsyncWrites - 22 total (0 active), Execution time: mean = 2.129 us, total = 46.834 us, Queueing time: mean = 43.475 us, max = 264.443 us, min = 17.727 us, total = 956.459 us [state-dump] CoreWorkerService.grpc_client.LocalGC.OnReplyReceived - 21 total (0 active), Execution time: mean = 65.428 us, total = 1.374 ms, Queueing time: mean = 149.534 us, max = 223.485 us, min = 56.603 us, total = 3.140 ms [state-dump] ObjectManager.ObjectAdded - 21 total (0 active), Execution time: mean = 13.522 us, total = 283.961 us, Queueing time: mean = 115.200 us, max = 203.067 us, min = 15.853 us, total = 2.419 ms [state-dump] CoreWorkerService.grpc_client.LocalGC - 21 total (0 active), Execution time: mean = 93.349 ms, total = 1.960 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ObjectManager.ObjectDeleted - 21 total (0 active), Execution time: mean = 18.654 us, total = 391.728 us, Queueing time: mean = 66.932 us, max = 269.040 us, min = 24.280 us, total = 1.406 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 21 total (0 active), Execution time: mean = 119.783 us, total = 2.515 ms, Queueing time: mean = 119.709 us, max = 193.975 us, min = 26.529 us, total = 2.514 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig - 21 total (0 active), Execution time: mean = 800.451 us, total = 16.809 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 14 total (1 active), Execution time: mean = 514.115 s, total = 7197.613 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 13 total (0 active), Execution time: mean = 380.546 us, total = 4.947 ms, Queueing time: mean = 133.192 us, max = 279.406 us, min = 20.299 us, total = 1.731 ms [state-dump] PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 204.482 us, total = 2.658 ms, Queueing time: mean = 3.109 ms, max = 9.915 ms, min = 28.359 us, total = 40.413 ms [state-dump] RaySyncer.BroadcastMessage - 12 total (0 active), Execution time: mean = 229.790 us, total = 2.757 ms, Queueing time: mean = 796.500 ns, max = 1.330 us, min = 180.000 ns, total = 9.558 us [state-dump] - 12 total (0 active), Execution time: mean = 1.109 us, total = 13.308 us, Queueing time: mean = 117.235 us, max = 207.332 us, min = 32.817 us, total = 1.407 ms [state-dump] WorkerPool.PopWorkerCallback - 10 total (0 active), Execution time: mean = 24.488 us, total = 244.878 us, Queueing time: mean = 114.997 us, max = 206.869 us, min = 23.653 us, total = 1.150 ms [state-dump] NodeManagerService.grpc_server.RequestWorkerLease.HandleRequestImpl - 10 total (0 active), Execution time: mean = 133.400 us, total = 1.334 ms, Queueing time: mean = 133.654 us, max = 258.016 us, min = 50.206 us, total = 1.337 ms [state-dump] NodeManagerService.grpc_server.RequestWorkerLease - 10 total (0 active), Execution time: mean = 967.372 us, total = 9.674 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReturnWorker - 10 total (0 active), Execution time: mean = 698.937 us, total = 6.989 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReturnWorker.HandleRequestImpl - 10 total (0 active), Execution time: mean = 129.906 us, total = 1.299 ms, Queueing time: mean = 109.463 us, max = 137.712 us, min = 31.332 us, total = 1.095 ms [state-dump] NodeManager.GCTaskFailureReason - 9 total (1 active), Execution time: mean = 7.823 us, total = 70.403 us, Queueing time: mean = 60.810 us, max = 97.290 us, min = 24.344 us, total = 547.290 us [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 1.320 ms, total = 2.640 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] RaySyncerRegister - 2 total (0 active), Execution time: mean = 2.053 us, total = 4.106 us, Queueing time: mean = 222.000 ns, max = 369.000 ns, min = 75.000 ns, total = 444.000 ns [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 142.119 us, total = 284.237 us, Queueing time: mean = 702.200 us, max = 1.287 ms, min = 117.240 us, total = 1.404 ms [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 92.080 us, total = 92.080 us, Queueing time: mean = 36.492 us, max = 36.492 us, min = 36.492 us, total = 36.492 us [state-dump] Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 74.750 us, total = 74.750 us, Queueing time: mean = 225.429 us, max = 225.429 us, min = 225.429 us, total = 225.429 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 1.746 ms, total = 1.746 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 1.935 ms, total = 1.935 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 1.022 s, total = 1.022 s, Queueing time: mean = 126.779 us, max = 126.779 us, min = 126.779 us, total = 126.779 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 2.178 ms, total = 2.178 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 43.810 us, total = 43.810 us, Queueing time: mean = 116.080 us, max = 116.080 us, min = 116.080 us, total = 116.080 us [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 2.002 ms, total = 2.002 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 1.278 ms, total = 1.278 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 288.995 us, total = 288.995 us, Queueing time: mean = 122.868 us, max = 122.868 us, min = 122.868 us, total = 122.868 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 242.297 us, total = 242.297 us, Queueing time: mean = 122.977 us, max = 122.977 us, min = 122.977 us, total = 122.977 us [state-dump] DebugString() time ms: 1 [state-dump] [state-dump] [2025-01-21 01:08:54,941 I 13636 13664] (raylet) store.cc:564: Plasma store debug dump: Current usage: 0 / 2.14748 GB - num bytes created total: 168 0 pending objects of total size 0MB - objects spillable: 0 - bytes spillable: 0 - objects unsealed: 0 - bytes unsealed: 0 - objects in use: 0 - bytes in use: 0 - objects evictable: 0 - bytes evictable: 0 - objects created by worker: 0 - bytes created by worker: 0 - objects restored: 0 - bytes restored: 0 - objects received: 0 - bytes received: 0 - objects errored: 0 - bytes errored: 0 [2025-01-21 01:08:56,288 I 13636 13636] (raylet) node_manager.cc:525: [state-dump] NodeManager: [state-dump] Node ID: 94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3 [state-dump] Node name: 192.168.0.2 [state-dump] InitialConfigResources: {object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, GPU: 20000, node:__internal_head__: 10000, accelerator_type:A40: 10000, memory: 844966629380000} [state-dump] ClusterTaskManager: [state-dump] ========== Node: 94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3 ================= [state-dump] Infeasible queue length: 0 [state-dump] Schedule queue length: 0 [state-dump] Dispatch queue length: 0 [state-dump] num_waiting_for_resource: 0 [state-dump] num_waiting_for_plasma_memory: 0 [state-dump] num_waiting_for_remote_node_resources: 0 [state-dump] num_worker_not_started_by_job_config_not_exist: 0 [state-dump] num_worker_not_started_by_registration_timeout: 0 [state-dump] num_tasks_waiting_for_workers: 0 [state-dump] num_cancelled_tasks: 0 [state-dump] cluster_resource_scheduler state: [state-dump] Local id: -2729954820020547849 Local resources: {"total":{CPU: [200000], memory: [844966629380000], accelerator_type:A40: [10000], node:192.168.0.2: [10000], node:__internal_head__: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000]}}, "available": {CPU: [200000], memory: [844966629380000], accelerator_type:A40: [10000], node:192.168.0.2: [10000], node:__internal_head__: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000]}}, "labels":{"ray.io/node_id":"94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3",} is_draining: 0 is_idle: 1 Cluster resources: node id: -2729954820020547849{"total":{object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, GPU: 20000, node:__internal_head__: 10000, accelerator_type:A40: 10000, memory: 844966629380000}}, "available": {GPU: 20000, node:__internal_head__: 10000, memory: 844966629380000, accelerator_type:A40: 10000, object_store_memory: 21474836480000, node:192.168.0.2: 10000, CPU: 200000}}, "labels":{"ray.io/node_id":"94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} [state-dump] Waiting tasks size: 0 [state-dump] Number of executing tasks: 0 [state-dump] Number of pinned task arguments: 0 [state-dump] Number of total spilled tasks: 0 [state-dump] Number of spilled waiting tasks: 0 [state-dump] Number of spilled unschedulable tasks: 0 [state-dump] Resource usage { [state-dump] } [state-dump] Backlog Size per scheduling descriptor :{workerId: num backlogs}: [state-dump] [state-dump] Running tasks by scheduling class: [state-dump] ================================================== [state-dump] [state-dump] ClusterResources: [state-dump] LocalObjectManager: [state-dump] - num pinned objects: 0 [state-dump] - pinned objects size: 0 [state-dump] - num objects pending restore: 0 [state-dump] - num objects pending spill: 0 [state-dump] - num bytes pending spill: 0 [state-dump] - num bytes currently spilled: 0 [state-dump] - cumulative spill requests: 0 [state-dump] - cumulative restore requests: 0 [state-dump] - spilled objects pending delete: 0 [state-dump] [state-dump] ObjectManager: [state-dump] - num local objects: 0 [state-dump] - num unfulfilled push requests: 0 [state-dump] - num object pull requests: 0 [state-dump] - num chunks received total: 0 [state-dump] - num chunks received failed (all): 0 [state-dump] - num chunks received failed / cancelled: 0 [state-dump] - num chunks received failed / plasma error: 0 [state-dump] Event stats: [state-dump] Global stats: 0 total (0 active) [state-dump] Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Execution time: mean = -nan s, total = 0.000 s [state-dump] Event stats: [state-dump] PushManager: [state-dump] - num pushes in flight: 0 [state-dump] - num chunks in flight: 0 [state-dump] - num chunks remaining: 0 [state-dump] - max chunks allowed: 409 [state-dump] OwnershipBasedObjectDirectory: [state-dump] - num listeners: 0 [state-dump] - cumulative location updates: 0 [state-dump] - num location updates per second: 0.000 [state-dump] - num location lookups per second: 0.000 [state-dump] - num locations added per second: 0.000 [state-dump] - num locations removed per second: 0.000 [state-dump] BufferPool: [state-dump] - create buffer state map size: 0 [state-dump] PullManager: [state-dump] - num bytes available for pulled objects: 2147483648 [state-dump] - num bytes being pulled (all): 0 [state-dump] - num bytes being pulled / pinned: 0 [state-dump] - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - first get request bundle: N/A [state-dump] - first wait request bundle: N/A [state-dump] - first task request bundle: N/A [state-dump] - num objects queued: 0 [state-dump] - num objects actively pulled (all): 0 [state-dump] - num objects actively pulled / pinned: 0 [state-dump] - num bundles being pulled: 0 [state-dump] - num pull retries: 0 [state-dump] - max timeout seconds: 0 [state-dump] - max timeout request is already processed. No entry. [state-dump] [state-dump] WorkerPool: [state-dump] - registered jobs: 1 [state-dump] - process_failed_job_config_missing: 0 [state-dump] - process_failed_rate_limited: 0 [state-dump] - process_failed_pending_registration: 0 [state-dump] - process_failed_runtime_env_setup_failed: 0 [state-dump] - num PYTHON workers: 20 [state-dump] - num PYTHON drivers: 1 [state-dump] - num PYTHON pending start requests: 0 [state-dump] - num PYTHON pending registration requests: 0 [state-dump] - num object spill callbacks queued: 0 [state-dump] - num object restore queued: 0 [state-dump] - num util functions queued: 0 [state-dump] - num idle workers: 20 [state-dump] TaskDependencyManager: [state-dump] - task deps map size: 0 [state-dump] - get req map size: 0 [state-dump] - wait req map size: 0 [state-dump] - local objects map size: 0 [state-dump] WaitManager: [state-dump] - num active wait requests: 0 [state-dump] Subscriber: [state-dump] Channel WORKER_OBJECT_LOCATIONS_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_OBJECT_EVICTION [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_REF_REMOVED_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] num async plasma notifications: 0 [state-dump] Remote node managers: [state-dump] Event stats: [state-dump] Global stats: 664912 total (35 active) [state-dump] Queueing time: mean = 173.432 us, max = 59.826 s, min = -0.001 s, total = 115.317 s [state-dump] Execution time: mean = 11.008 ms, total = 7319.202 s [state-dump] Event stats: [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog - 159978 total (0 active), Execution time: mean = 532.371 us, total = 85.168 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 159978 total (0 active), Execution time: mean = 35.503 us, total = 5.680 s, Queueing time: mean = 111.748 us, max = 3.855 ms, min = 1.846 us, total = 17.877 s [state-dump] RaySyncer.OnDemandBroadcasting - 76126 total (1 active), Execution time: mean = 11.144 us, total = 848.370 ms, Queueing time: mean = 92.954 us, max = 25.869 ms, min = 6.166 us, total = 7.076 s [state-dump] NodeManager.CheckGC - 76126 total (1 active), Execution time: mean = 3.708 us, total = 282.310 ms, Queueing time: mean = 99.521 us, max = 25.875 ms, min = -0.000 s, total = 7.576 s [state-dump] ObjectManager.UpdateAvailableMemory - 76126 total (0 active), Execution time: mean = 6.390 us, total = 486.425 ms, Queueing time: mean = 112.153 us, max = 45.939 ms, min = 2.228 us, total = 8.538 s [state-dump] RayletWorkerPool.deadline_timer.kill_idle_workers - 38084 total (1 active), Execution time: mean = 19.149 us, total = 729.289 ms, Queueing time: mean = 78.069 us, max = 26.386 ms, min = -0.001 s, total = 2.973 s [state-dump] MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 30417 total (1 active), Execution time: mean = 460.265 us, total = 14.000 s, Queueing time: mean = 76.644 us, max = 4.063 ms, min = -0.000 s, total = 2.331 s [state-dump] NodeManager.ScheduleAndDispatchTasks - 7620 total (1 active), Execution time: mean = 17.592 us, total = 134.052 ms, Queueing time: mean = 76.195 us, max = 2.581 ms, min = 7.438 us, total = 580.607 ms [state-dump] NodeManager.deadline_timer.flush_free_objects - 7619 total (1 active), Execution time: mean = 9.596 us, total = 73.111 ms, Queueing time: mean = 179.391 us, max = 2.947 ms, min = 2.735 us, total = 1.367 s [state-dump] NodeManager.deadline_timer.spill_objects_when_over_threshold - 7619 total (1 active), Execution time: mean = 3.167 us, total = 24.131 ms, Queueing time: mean = 183.736 us, max = 2.946 ms, min = 3.845 us, total = 1.400 s [state-dump] NodeManagerService.grpc_server.GetResourceLoad - 7618 total (0 active), Execution time: mean = 614.411 us, total = 4.681 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 7618 total (0 active), Execution time: mean = 97.790 us, total = 744.963 ms, Queueing time: mean = 113.691 us, max = 2.934 ms, min = 4.027 us, total = 866.101 ms [state-dump] ClusterResourceManager.ResetRemoteNodeView - 2541 total (1 active), Execution time: mean = 10.103 us, total = 25.672 ms, Queueing time: mean = 76.425 us, max = 564.135 us, min = 12.181 us, total = 194.197 ms [state-dump] NodeManager.deadline_timer.record_metrics - 1524 total (1 active), Execution time: mean = 585.445 us, total = 892.219 ms, Queueing time: mean = 348.135 us, max = 2.010 ms, min = 9.115 us, total = 530.557 ms [state-dump] NodeManager.GcsCheckAlive - 1524 total (1 active), Execution time: mean = 321.881 us, total = 490.546 ms, Queueing time: mean = 611.287 us, max = 2.567 ms, min = 6.690 us, total = 931.602 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 1524 total (0 active), Execution time: mean = 56.194 us, total = 85.640 ms, Queueing time: mean = 112.372 us, max = 4.779 ms, min = 11.561 us, total = 171.255 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 1524 total (0 active), Execution time: mean = 1.591 ms, total = 2.424 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.deadline_timer.debug_state_dump - 762 total (1 active), Execution time: mean = 1.792 ms, total = 1.365 s, Queueing time: mean = 72.927 us, max = 248.460 us, min = 11.609 us, total = 55.571 ms [state-dump] NodeManager.deadline_timer.print_event_loop_stats - 127 total (1 active, 1 running), Execution time: mean = 2.767 ms, total = 351.420 ms, Queueing time: mean = 75.668 us, max = 325.100 us, min = 9.635 us, total = 9.610 ms [state-dump] ClientConnection.async_read.ProcessMessageHeader - 95 total (21 active), Execution time: mean = 7.620 us, total = 723.861 us, Queueing time: mean = 660.815 ms, max = 59.826 s, min = 29.537 us, total = 62.777 s [state-dump] ClientConnection.async_read.ProcessMessage - 74 total (0 active), Execution time: mean = 752.373 us, total = 55.676 ms, Queueing time: mean = 18.898 us, max = 247.615 us, min = 4.453 us, total = 1.398 ms [state-dump] ClientConnection.async_write.DoAsyncWrites - 22 total (0 active), Execution time: mean = 2.129 us, total = 46.834 us, Queueing time: mean = 43.475 us, max = 264.443 us, min = 17.727 us, total = 956.459 us [state-dump] CoreWorkerService.grpc_client.LocalGC.OnReplyReceived - 21 total (0 active), Execution time: mean = 65.428 us, total = 1.374 ms, Queueing time: mean = 149.534 us, max = 223.485 us, min = 56.603 us, total = 3.140 ms [state-dump] ObjectManager.ObjectAdded - 21 total (0 active), Execution time: mean = 13.522 us, total = 283.961 us, Queueing time: mean = 115.200 us, max = 203.067 us, min = 15.853 us, total = 2.419 ms [state-dump] CoreWorkerService.grpc_client.LocalGC - 21 total (0 active), Execution time: mean = 93.349 ms, total = 1.960 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ObjectManager.ObjectDeleted - 21 total (0 active), Execution time: mean = 18.654 us, total = 391.728 us, Queueing time: mean = 66.932 us, max = 269.040 us, min = 24.280 us, total = 1.406 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 21 total (0 active), Execution time: mean = 119.783 us, total = 2.515 ms, Queueing time: mean = 119.709 us, max = 193.975 us, min = 26.529 us, total = 2.514 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig - 21 total (0 active), Execution time: mean = 800.451 us, total = 16.809 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 14 total (1 active), Execution time: mean = 514.115 s, total = 7197.613 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 13 total (0 active), Execution time: mean = 380.546 us, total = 4.947 ms, Queueing time: mean = 133.192 us, max = 279.406 us, min = 20.299 us, total = 1.731 ms [state-dump] PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 204.482 us, total = 2.658 ms, Queueing time: mean = 3.109 ms, max = 9.915 ms, min = 28.359 us, total = 40.413 ms [state-dump] RaySyncer.BroadcastMessage - 12 total (0 active), Execution time: mean = 229.790 us, total = 2.757 ms, Queueing time: mean = 796.500 ns, max = 1.330 us, min = 180.000 ns, total = 9.558 us [state-dump] - 12 total (0 active), Execution time: mean = 1.109 us, total = 13.308 us, Queueing time: mean = 117.235 us, max = 207.332 us, min = 32.817 us, total = 1.407 ms [state-dump] WorkerPool.PopWorkerCallback - 10 total (0 active), Execution time: mean = 24.488 us, total = 244.878 us, Queueing time: mean = 114.997 us, max = 206.869 us, min = 23.653 us, total = 1.150 ms [state-dump] NodeManagerService.grpc_server.RequestWorkerLease.HandleRequestImpl - 10 total (0 active), Execution time: mean = 133.400 us, total = 1.334 ms, Queueing time: mean = 133.654 us, max = 258.016 us, min = 50.206 us, total = 1.337 ms [state-dump] NodeManagerService.grpc_server.RequestWorkerLease - 10 total (0 active), Execution time: mean = 967.372 us, total = 9.674 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReturnWorker - 10 total (0 active), Execution time: mean = 698.937 us, total = 6.989 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReturnWorker.HandleRequestImpl - 10 total (0 active), Execution time: mean = 129.906 us, total = 1.299 ms, Queueing time: mean = 109.463 us, max = 137.712 us, min = 31.332 us, total = 1.095 ms [state-dump] NodeManager.GCTaskFailureReason - 9 total (1 active), Execution time: mean = 7.823 us, total = 70.403 us, Queueing time: mean = 60.810 us, max = 97.290 us, min = 24.344 us, total = 547.290 us [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 1.320 ms, total = 2.640 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] RaySyncerRegister - 2 total (0 active), Execution time: mean = 2.053 us, total = 4.106 us, Queueing time: mean = 222.000 ns, max = 369.000 ns, min = 75.000 ns, total = 444.000 ns [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 142.119 us, total = 284.237 us, Queueing time: mean = 702.200 us, max = 1.287 ms, min = 117.240 us, total = 1.404 ms [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 92.080 us, total = 92.080 us, Queueing time: mean = 36.492 us, max = 36.492 us, min = 36.492 us, total = 36.492 us [state-dump] Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 74.750 us, total = 74.750 us, Queueing time: mean = 225.429 us, max = 225.429 us, min = 225.429 us, total = 225.429 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 1.746 ms, total = 1.746 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 1.935 ms, total = 1.935 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 1.022 s, total = 1.022 s, Queueing time: mean = 126.779 us, max = 126.779 us, min = 126.779 us, total = 126.779 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 2.178 ms, total = 2.178 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 43.810 us, total = 43.810 us, Queueing time: mean = 116.080 us, max = 116.080 us, min = 116.080 us, total = 116.080 us [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 2.002 ms, total = 2.002 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 1.278 ms, total = 1.278 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 288.995 us, total = 288.995 us, Queueing time: mean = 122.868 us, max = 122.868 us, min = 122.868 us, total = 122.868 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 242.297 us, total = 242.297 us, Queueing time: mean = 122.977 us, max = 122.977 us, min = 122.977 us, total = 122.977 us [state-dump] DebugString() time ms: 1 [state-dump] [state-dump] [2025-01-21 01:09:54,941 I 13636 13664] (raylet) store.cc:564: Plasma store debug dump: Current usage: 0 / 2.14748 GB - num bytes created total: 168 0 pending objects of total size 0MB - objects spillable: 0 - bytes spillable: 0 - objects unsealed: 0 - bytes unsealed: 0 - objects in use: 0 - bytes in use: 0 - objects evictable: 0 - bytes evictable: 0 - objects created by worker: 0 - bytes created by worker: 0 - objects restored: 0 - bytes restored: 0 - objects received: 0 - bytes received: 0 - objects errored: 0 - bytes errored: 0 [2025-01-21 01:09:56,291 I 13636 13636] (raylet) node_manager.cc:525: [state-dump] NodeManager: [state-dump] Node ID: 94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3 [state-dump] Node name: 192.168.0.2 [state-dump] InitialConfigResources: {object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, GPU: 20000, node:__internal_head__: 10000, accelerator_type:A40: 10000, memory: 844966629380000} [state-dump] ClusterTaskManager: [state-dump] ========== Node: 94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3 ================= [state-dump] Infeasible queue length: 0 [state-dump] Schedule queue length: 0 [state-dump] Dispatch queue length: 0 [state-dump] num_waiting_for_resource: 0 [state-dump] num_waiting_for_plasma_memory: 0 [state-dump] num_waiting_for_remote_node_resources: 0 [state-dump] num_worker_not_started_by_job_config_not_exist: 0 [state-dump] num_worker_not_started_by_registration_timeout: 0 [state-dump] num_tasks_waiting_for_workers: 0 [state-dump] num_cancelled_tasks: 0 [state-dump] cluster_resource_scheduler state: [state-dump] Local id: -2729954820020547849 Local resources: {"total":{CPU: [200000], memory: [844966629380000], accelerator_type:A40: [10000], node:192.168.0.2: [10000], node:__internal_head__: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000]}}, "available": {CPU: [200000], memory: [844966629380000], accelerator_type:A40: [10000], node:192.168.0.2: [10000], node:__internal_head__: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000]}}, "labels":{"ray.io/node_id":"94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3",} is_draining: 0 is_idle: 1 Cluster resources: node id: -2729954820020547849{"total":{object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, GPU: 20000, node:__internal_head__: 10000, accelerator_type:A40: 10000, memory: 844966629380000}}, "available": {GPU: 20000, node:__internal_head__: 10000, memory: 844966629380000, accelerator_type:A40: 10000, object_store_memory: 21474836480000, node:192.168.0.2: 10000, CPU: 200000}}, "labels":{"ray.io/node_id":"94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} [state-dump] Waiting tasks size: 0 [state-dump] Number of executing tasks: 0 [state-dump] Number of pinned task arguments: 0 [state-dump] Number of total spilled tasks: 0 [state-dump] Number of spilled waiting tasks: 0 [state-dump] Number of spilled unschedulable tasks: 0 [state-dump] Resource usage { [state-dump] } [state-dump] Backlog Size per scheduling descriptor :{workerId: num backlogs}: [state-dump] [state-dump] Running tasks by scheduling class: [state-dump] ================================================== [state-dump] [state-dump] ClusterResources: [state-dump] LocalObjectManager: [state-dump] - num pinned objects: 0 [state-dump] - pinned objects size: 0 [state-dump] - num objects pending restore: 0 [state-dump] - num objects pending spill: 0 [state-dump] - num bytes pending spill: 0 [state-dump] - num bytes currently spilled: 0 [state-dump] - cumulative spill requests: 0 [state-dump] - cumulative restore requests: 0 [state-dump] - spilled objects pending delete: 0 [state-dump] [state-dump] ObjectManager: [state-dump] - num local objects: 0 [state-dump] - num unfulfilled push requests: 0 [state-dump] - num object pull requests: 0 [state-dump] - num chunks received total: 0 [state-dump] - num chunks received failed (all): 0 [state-dump] - num chunks received failed / cancelled: 0 [state-dump] - num chunks received failed / plasma error: 0 [state-dump] Event stats: [state-dump] Global stats: 0 total (0 active) [state-dump] Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Execution time: mean = -nan s, total = 0.000 s [state-dump] Event stats: [state-dump] PushManager: [state-dump] - num pushes in flight: 0 [state-dump] - num chunks in flight: 0 [state-dump] - num chunks remaining: 0 [state-dump] - max chunks allowed: 409 [state-dump] OwnershipBasedObjectDirectory: [state-dump] - num listeners: 0 [state-dump] - cumulative location updates: 0 [state-dump] - num location updates per second: 0.000 [state-dump] - num location lookups per second: 0.000 [state-dump] - num locations added per second: 0.000 [state-dump] - num locations removed per second: 0.000 [state-dump] BufferPool: [state-dump] - create buffer state map size: 0 [state-dump] PullManager: [state-dump] - num bytes available for pulled objects: 2147483648 [state-dump] - num bytes being pulled (all): 0 [state-dump] - num bytes being pulled / pinned: 0 [state-dump] - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - first get request bundle: N/A [state-dump] - first wait request bundle: N/A [state-dump] - first task request bundle: N/A [state-dump] - num objects queued: 0 [state-dump] - num objects actively pulled (all): 0 [state-dump] - num objects actively pulled / pinned: 0 [state-dump] - num bundles being pulled: 0 [state-dump] - num pull retries: 0 [state-dump] - max timeout seconds: 0 [state-dump] - max timeout request is already processed. No entry. [state-dump] [state-dump] WorkerPool: [state-dump] - registered jobs: 1 [state-dump] - process_failed_job_config_missing: 0 [state-dump] - process_failed_rate_limited: 0 [state-dump] - process_failed_pending_registration: 0 [state-dump] - process_failed_runtime_env_setup_failed: 0 [state-dump] - num PYTHON workers: 20 [state-dump] - num PYTHON drivers: 1 [state-dump] - num PYTHON pending start requests: 0 [state-dump] - num PYTHON pending registration requests: 0 [state-dump] - num object spill callbacks queued: 0 [state-dump] - num object restore queued: 0 [state-dump] - num util functions queued: 0 [state-dump] - num idle workers: 20 [state-dump] TaskDependencyManager: [state-dump] - task deps map size: 0 [state-dump] - get req map size: 0 [state-dump] - wait req map size: 0 [state-dump] - local objects map size: 0 [state-dump] WaitManager: [state-dump] - num active wait requests: 0 [state-dump] Subscriber: [state-dump] Channel WORKER_OBJECT_LOCATIONS_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_OBJECT_EVICTION [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_REF_REMOVED_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] num async plasma notifications: 0 [state-dump] Remote node managers: [state-dump] Event stats: [state-dump] Global stats: 670142 total (35 active) [state-dump] Queueing time: mean = 172.742 us, max = 59.826 s, min = -0.001 s, total = 115.762 s [state-dump] Execution time: mean = 10.923 ms, total = 7320.173 s [state-dump] Event stats: [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog - 161237 total (0 active), Execution time: mean = 532.573 us, total = 85.870 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 161237 total (0 active), Execution time: mean = 35.506 us, total = 5.725 s, Queueing time: mean = 111.834 us, max = 3.855 ms, min = 1.846 us, total = 18.032 s [state-dump] RaySyncer.OnDemandBroadcasting - 76725 total (1 active), Execution time: mean = 11.149 us, total = 855.399 ms, Queueing time: mean = 92.986 us, max = 25.869 ms, min = 6.166 us, total = 7.134 s [state-dump] NodeManager.CheckGC - 76725 total (1 active), Execution time: mean = 3.703 us, total = 284.098 ms, Queueing time: mean = 99.563 us, max = 25.875 ms, min = -0.000 s, total = 7.639 s [state-dump] ObjectManager.UpdateAvailableMemory - 76725 total (0 active), Execution time: mean = 6.393 us, total = 490.494 ms, Queueing time: mean = 112.242 us, max = 45.939 ms, min = 2.228 us, total = 8.612 s [state-dump] RayletWorkerPool.deadline_timer.kill_idle_workers - 38384 total (1 active), Execution time: mean = 19.158 us, total = 735.348 ms, Queueing time: mean = 78.097 us, max = 26.386 ms, min = -0.001 s, total = 2.998 s [state-dump] MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 30657 total (1 active), Execution time: mean = 460.399 us, total = 14.114 s, Queueing time: mean = 76.681 us, max = 4.063 ms, min = -0.000 s, total = 2.351 s [state-dump] NodeManager.ScheduleAndDispatchTasks - 7680 total (1 active), Execution time: mean = 17.601 us, total = 135.177 ms, Queueing time: mean = 76.210 us, max = 2.581 ms, min = 7.438 us, total = 585.295 ms [state-dump] NodeManager.deadline_timer.flush_free_objects - 7679 total (1 active), Execution time: mean = 9.601 us, total = 73.726 ms, Queueing time: mean = 179.454 us, max = 2.947 ms, min = 2.735 us, total = 1.378 s [state-dump] NodeManager.deadline_timer.spill_objects_when_over_threshold - 7679 total (1 active), Execution time: mean = 3.167 us, total = 24.322 ms, Queueing time: mean = 183.802 us, max = 2.946 ms, min = 3.845 us, total = 1.411 s [state-dump] NodeManagerService.grpc_server.GetResourceLoad - 7678 total (0 active), Execution time: mean = 614.447 us, total = 4.718 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 7678 total (0 active), Execution time: mean = 97.803 us, total = 750.928 ms, Queueing time: mean = 113.765 us, max = 2.934 ms, min = 4.027 us, total = 873.489 ms [state-dump] ClusterResourceManager.ResetRemoteNodeView - 2561 total (1 active), Execution time: mean = 10.098 us, total = 25.861 ms, Queueing time: mean = 76.450 us, max = 564.135 us, min = 12.181 us, total = 195.788 ms [state-dump] NodeManager.deadline_timer.record_metrics - 1536 total (1 active), Execution time: mean = 585.527 us, total = 899.369 ms, Queueing time: mean = 348.388 us, max = 2.010 ms, min = 9.115 us, total = 535.124 ms [state-dump] NodeManager.GcsCheckAlive - 1536 total (1 active), Execution time: mean = 321.822 us, total = 494.319 ms, Queueing time: mean = 611.700 us, max = 2.567 ms, min = 6.690 us, total = 939.571 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 1536 total (0 active), Execution time: mean = 56.183 us, total = 86.297 ms, Queueing time: mean = 112.570 us, max = 4.779 ms, min = 11.561 us, total = 172.908 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 1536 total (0 active), Execution time: mean = 1.591 ms, total = 2.443 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.deadline_timer.debug_state_dump - 768 total (1 active), Execution time: mean = 1.792 ms, total = 1.377 s, Queueing time: mean = 72.991 us, max = 248.460 us, min = 11.609 us, total = 56.057 ms [state-dump] NodeManager.deadline_timer.print_event_loop_stats - 128 total (1 active, 1 running), Execution time: mean = 2.766 ms, total = 354.091 ms, Queueing time: mean = 75.959 us, max = 325.100 us, min = 9.635 us, total = 9.723 ms [state-dump] ClientConnection.async_read.ProcessMessageHeader - 95 total (21 active), Execution time: mean = 7.620 us, total = 723.861 us, Queueing time: mean = 660.815 ms, max = 59.826 s, min = 29.537 us, total = 62.777 s [state-dump] ClientConnection.async_read.ProcessMessage - 74 total (0 active), Execution time: mean = 752.373 us, total = 55.676 ms, Queueing time: mean = 18.898 us, max = 247.615 us, min = 4.453 us, total = 1.398 ms [state-dump] ClientConnection.async_write.DoAsyncWrites - 22 total (0 active), Execution time: mean = 2.129 us, total = 46.834 us, Queueing time: mean = 43.475 us, max = 264.443 us, min = 17.727 us, total = 956.459 us [state-dump] CoreWorkerService.grpc_client.LocalGC.OnReplyReceived - 21 total (0 active), Execution time: mean = 65.428 us, total = 1.374 ms, Queueing time: mean = 149.534 us, max = 223.485 us, min = 56.603 us, total = 3.140 ms [state-dump] ObjectManager.ObjectAdded - 21 total (0 active), Execution time: mean = 13.522 us, total = 283.961 us, Queueing time: mean = 115.200 us, max = 203.067 us, min = 15.853 us, total = 2.419 ms [state-dump] CoreWorkerService.grpc_client.LocalGC - 21 total (0 active), Execution time: mean = 93.349 ms, total = 1.960 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ObjectManager.ObjectDeleted - 21 total (0 active), Execution time: mean = 18.654 us, total = 391.728 us, Queueing time: mean = 66.932 us, max = 269.040 us, min = 24.280 us, total = 1.406 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 21 total (0 active), Execution time: mean = 119.783 us, total = 2.515 ms, Queueing time: mean = 119.709 us, max = 193.975 us, min = 26.529 us, total = 2.514 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig - 21 total (0 active), Execution time: mean = 800.451 us, total = 16.809 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 14 total (1 active), Execution time: mean = 514.115 s, total = 7197.613 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 13 total (0 active), Execution time: mean = 380.546 us, total = 4.947 ms, Queueing time: mean = 133.192 us, max = 279.406 us, min = 20.299 us, total = 1.731 ms [state-dump] PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 204.482 us, total = 2.658 ms, Queueing time: mean = 3.109 ms, max = 9.915 ms, min = 28.359 us, total = 40.413 ms [state-dump] RaySyncer.BroadcastMessage - 12 total (0 active), Execution time: mean = 229.790 us, total = 2.757 ms, Queueing time: mean = 796.500 ns, max = 1.330 us, min = 180.000 ns, total = 9.558 us [state-dump] - 12 total (0 active), Execution time: mean = 1.109 us, total = 13.308 us, Queueing time: mean = 117.235 us, max = 207.332 us, min = 32.817 us, total = 1.407 ms [state-dump] WorkerPool.PopWorkerCallback - 10 total (0 active), Execution time: mean = 24.488 us, total = 244.878 us, Queueing time: mean = 114.997 us, max = 206.869 us, min = 23.653 us, total = 1.150 ms [state-dump] NodeManagerService.grpc_server.RequestWorkerLease.HandleRequestImpl - 10 total (0 active), Execution time: mean = 133.400 us, total = 1.334 ms, Queueing time: mean = 133.654 us, max = 258.016 us, min = 50.206 us, total = 1.337 ms [state-dump] NodeManagerService.grpc_server.RequestWorkerLease - 10 total (0 active), Execution time: mean = 967.372 us, total = 9.674 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReturnWorker - 10 total (0 active), Execution time: mean = 698.937 us, total = 6.989 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReturnWorker.HandleRequestImpl - 10 total (0 active), Execution time: mean = 129.906 us, total = 1.299 ms, Queueing time: mean = 109.463 us, max = 137.712 us, min = 31.332 us, total = 1.095 ms [state-dump] NodeManager.GCTaskFailureReason - 9 total (1 active), Execution time: mean = 7.823 us, total = 70.403 us, Queueing time: mean = 60.810 us, max = 97.290 us, min = 24.344 us, total = 547.290 us [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 1.320 ms, total = 2.640 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] RaySyncerRegister - 2 total (0 active), Execution time: mean = 2.053 us, total = 4.106 us, Queueing time: mean = 222.000 ns, max = 369.000 ns, min = 75.000 ns, total = 444.000 ns [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 142.119 us, total = 284.237 us, Queueing time: mean = 702.200 us, max = 1.287 ms, min = 117.240 us, total = 1.404 ms [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 92.080 us, total = 92.080 us, Queueing time: mean = 36.492 us, max = 36.492 us, min = 36.492 us, total = 36.492 us [state-dump] Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 74.750 us, total = 74.750 us, Queueing time: mean = 225.429 us, max = 225.429 us, min = 225.429 us, total = 225.429 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 1.746 ms, total = 1.746 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 1.935 ms, total = 1.935 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 1.022 s, total = 1.022 s, Queueing time: mean = 126.779 us, max = 126.779 us, min = 126.779 us, total = 126.779 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 2.178 ms, total = 2.178 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 43.810 us, total = 43.810 us, Queueing time: mean = 116.080 us, max = 116.080 us, min = 116.080 us, total = 116.080 us [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 2.002 ms, total = 2.002 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 1.278 ms, total = 1.278 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 288.995 us, total = 288.995 us, Queueing time: mean = 122.868 us, max = 122.868 us, min = 122.868 us, total = 122.868 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 242.297 us, total = 242.297 us, Queueing time: mean = 122.977 us, max = 122.977 us, min = 122.977 us, total = 122.977 us [state-dump] DebugString() time ms: 1 [state-dump] [state-dump] [2025-01-21 01:10:54,942 I 13636 13664] (raylet) store.cc:564: Plasma store debug dump: Current usage: 0 / 2.14748 GB - num bytes created total: 168 0 pending objects of total size 0MB - objects spillable: 0 - bytes spillable: 0 - objects unsealed: 0 - bytes unsealed: 0 - objects in use: 0 - bytes in use: 0 - objects evictable: 0 - bytes evictable: 0 - objects created by worker: 0 - bytes created by worker: 0 - objects restored: 0 - bytes restored: 0 - objects received: 0 - bytes received: 0 - objects errored: 0 - bytes errored: 0 [2025-01-21 01:10:56,293 I 13636 13636] (raylet) node_manager.cc:525: [state-dump] NodeManager: [state-dump] Node ID: 94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3 [state-dump] Node name: 192.168.0.2 [state-dump] InitialConfigResources: {object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, GPU: 20000, node:__internal_head__: 10000, accelerator_type:A40: 10000, memory: 844966629380000} [state-dump] ClusterTaskManager: [state-dump] ========== Node: 94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3 ================= [state-dump] Infeasible queue length: 0 [state-dump] Schedule queue length: 0 [state-dump] Dispatch queue length: 0 [state-dump] num_waiting_for_resource: 0 [state-dump] num_waiting_for_plasma_memory: 0 [state-dump] num_waiting_for_remote_node_resources: 0 [state-dump] num_worker_not_started_by_job_config_not_exist: 0 [state-dump] num_worker_not_started_by_registration_timeout: 0 [state-dump] num_tasks_waiting_for_workers: 0 [state-dump] num_cancelled_tasks: 0 [state-dump] cluster_resource_scheduler state: [state-dump] Local id: -2729954820020547849 Local resources: {"total":{CPU: [200000], memory: [844966629380000], accelerator_type:A40: [10000], node:192.168.0.2: [10000], node:__internal_head__: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000]}}, "available": {CPU: [200000], memory: [844966629380000], accelerator_type:A40: [10000], node:192.168.0.2: [10000], node:__internal_head__: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000]}}, "labels":{"ray.io/node_id":"94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3",} is_draining: 0 is_idle: 1 Cluster resources: node id: -2729954820020547849{"total":{object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, GPU: 20000, node:__internal_head__: 10000, accelerator_type:A40: 10000, memory: 844966629380000}}, "available": {GPU: 20000, node:__internal_head__: 10000, memory: 844966629380000, accelerator_type:A40: 10000, object_store_memory: 21474836480000, node:192.168.0.2: 10000, CPU: 200000}}, "labels":{"ray.io/node_id":"94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} [state-dump] Waiting tasks size: 0 [state-dump] Number of executing tasks: 0 [state-dump] Number of pinned task arguments: 0 [state-dump] Number of total spilled tasks: 0 [state-dump] Number of spilled waiting tasks: 0 [state-dump] Number of spilled unschedulable tasks: 0 [state-dump] Resource usage { [state-dump] } [state-dump] Backlog Size per scheduling descriptor :{workerId: num backlogs}: [state-dump] [state-dump] Running tasks by scheduling class: [state-dump] ================================================== [state-dump] [state-dump] ClusterResources: [state-dump] LocalObjectManager: [state-dump] - num pinned objects: 0 [state-dump] - pinned objects size: 0 [state-dump] - num objects pending restore: 0 [state-dump] - num objects pending spill: 0 [state-dump] - num bytes pending spill: 0 [state-dump] - num bytes currently spilled: 0 [state-dump] - cumulative spill requests: 0 [state-dump] - cumulative restore requests: 0 [state-dump] - spilled objects pending delete: 0 [state-dump] [state-dump] ObjectManager: [state-dump] - num local objects: 0 [state-dump] - num unfulfilled push requests: 0 [state-dump] - num object pull requests: 0 [state-dump] - num chunks received total: 0 [state-dump] - num chunks received failed (all): 0 [state-dump] - num chunks received failed / cancelled: 0 [state-dump] - num chunks received failed / plasma error: 0 [state-dump] Event stats: [state-dump] Global stats: 0 total (0 active) [state-dump] Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Execution time: mean = -nan s, total = 0.000 s [state-dump] Event stats: [state-dump] PushManager: [state-dump] - num pushes in flight: 0 [state-dump] - num chunks in flight: 0 [state-dump] - num chunks remaining: 0 [state-dump] - max chunks allowed: 409 [state-dump] OwnershipBasedObjectDirectory: [state-dump] - num listeners: 0 [state-dump] - cumulative location updates: 0 [state-dump] - num location updates per second: 0.000 [state-dump] - num location lookups per second: 0.000 [state-dump] - num locations added per second: 0.000 [state-dump] - num locations removed per second: 0.000 [state-dump] BufferPool: [state-dump] - create buffer state map size: 0 [state-dump] PullManager: [state-dump] - num bytes available for pulled objects: 2147483648 [state-dump] - num bytes being pulled (all): 0 [state-dump] - num bytes being pulled / pinned: 0 [state-dump] - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - first get request bundle: N/A [state-dump] - first wait request bundle: N/A [state-dump] - first task request bundle: N/A [state-dump] - num objects queued: 0 [state-dump] - num objects actively pulled (all): 0 [state-dump] - num objects actively pulled / pinned: 0 [state-dump] - num bundles being pulled: 0 [state-dump] - num pull retries: 0 [state-dump] - max timeout seconds: 0 [state-dump] - max timeout request is already processed. No entry. [state-dump] [state-dump] WorkerPool: [state-dump] - registered jobs: 1 [state-dump] - process_failed_job_config_missing: 0 [state-dump] - process_failed_rate_limited: 0 [state-dump] - process_failed_pending_registration: 0 [state-dump] - process_failed_runtime_env_setup_failed: 0 [state-dump] - num PYTHON workers: 20 [state-dump] - num PYTHON drivers: 1 [state-dump] - num PYTHON pending start requests: 0 [state-dump] - num PYTHON pending registration requests: 0 [state-dump] - num object spill callbacks queued: 0 [state-dump] - num object restore queued: 0 [state-dump] - num util functions queued: 0 [state-dump] - num idle workers: 20 [state-dump] TaskDependencyManager: [state-dump] - task deps map size: 0 [state-dump] - get req map size: 0 [state-dump] - wait req map size: 0 [state-dump] - local objects map size: 0 [state-dump] WaitManager: [state-dump] - num active wait requests: 0 [state-dump] Subscriber: [state-dump] Channel WORKER_OBJECT_LOCATIONS_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_OBJECT_EVICTION [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_REF_REMOVED_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] num async plasma notifications: 0 [state-dump] Remote node managers: [state-dump] Event stats: [state-dump] Global stats: 675374 total (35 active) [state-dump] Queueing time: mean = 172.048 us, max = 59.826 s, min = -0.001 s, total = 116.197 s [state-dump] Execution time: mean = 10.840 ms, total = 7321.113 s [state-dump] Event stats: [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog - 162496 total (0 active), Execution time: mean = 532.562 us, total = 86.539 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 162496 total (0 active), Execution time: mean = 35.510 us, total = 5.770 s, Queueing time: mean = 111.858 us, max = 3.855 ms, min = 1.846 us, total = 18.177 s [state-dump] RaySyncer.OnDemandBroadcasting - 77325 total (1 active), Execution time: mean = 11.162 us, total = 863.131 ms, Queueing time: mean = 93.055 us, max = 25.869 ms, min = 6.166 us, total = 7.195 s [state-dump] NodeManager.CheckGC - 77325 total (1 active), Execution time: mean = 3.698 us, total = 285.960 ms, Queueing time: mean = 99.649 us, max = 25.875 ms, min = -0.000 s, total = 7.705 s [state-dump] ObjectManager.UpdateAvailableMemory - 77325 total (0 active), Execution time: mean = 6.398 us, total = 494.756 ms, Queueing time: mean = 112.251 us, max = 45.939 ms, min = 2.228 us, total = 8.680 s [state-dump] RayletWorkerPool.deadline_timer.kill_idle_workers - 38684 total (1 active), Execution time: mean = 19.165 us, total = 741.386 ms, Queueing time: mean = 78.089 us, max = 26.386 ms, min = -0.001 s, total = 3.021 s [state-dump] MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 30896 total (1 active), Execution time: mean = 460.497 us, total = 14.228 s, Queueing time: mean = 76.688 us, max = 4.063 ms, min = -0.000 s, total = 2.369 s [state-dump] NodeManager.ScheduleAndDispatchTasks - 7740 total (1 active), Execution time: mean = 17.614 us, total = 136.332 ms, Queueing time: mean = 76.241 us, max = 2.581 ms, min = 7.438 us, total = 590.105 ms [state-dump] NodeManager.deadline_timer.flush_free_objects - 7739 total (1 active), Execution time: mean = 9.605 us, total = 74.330 ms, Queueing time: mean = 179.554 us, max = 2.947 ms, min = 2.735 us, total = 1.390 s [state-dump] NodeManager.deadline_timer.spill_objects_when_over_threshold - 7739 total (1 active), Execution time: mean = 3.167 us, total = 24.510 ms, Queueing time: mean = 183.903 us, max = 2.946 ms, min = 3.845 us, total = 1.423 s [state-dump] NodeManagerService.grpc_server.GetResourceLoad - 7738 total (0 active), Execution time: mean = 614.849 us, total = 4.758 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 7738 total (0 active), Execution time: mean = 97.839 us, total = 757.078 ms, Queueing time: mean = 113.932 us, max = 2.934 ms, min = 4.027 us, total = 881.610 ms [state-dump] ClusterResourceManager.ResetRemoteNodeView - 2581 total (1 active), Execution time: mean = 10.097 us, total = 26.061 ms, Queueing time: mean = 76.425 us, max = 564.135 us, min = 12.181 us, total = 197.252 ms [state-dump] NodeManager.deadline_timer.record_metrics - 1548 total (1 active), Execution time: mean = 585.418 us, total = 906.227 ms, Queueing time: mean = 349.061 us, max = 2.010 ms, min = 9.115 us, total = 540.346 ms [state-dump] NodeManager.GcsCheckAlive - 1548 total (1 active), Execution time: mean = 321.782 us, total = 498.118 ms, Queueing time: mean = 612.255 us, max = 2.567 ms, min = 6.690 us, total = 947.770 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 1548 total (0 active), Execution time: mean = 56.200 us, total = 86.998 ms, Queueing time: mean = 112.605 us, max = 4.779 ms, min = 11.561 us, total = 174.313 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 1548 total (0 active), Execution time: mean = 1.590 ms, total = 2.462 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.deadline_timer.debug_state_dump - 774 total (1 active), Execution time: mean = 1.794 ms, total = 1.389 s, Queueing time: mean = 73.091 us, max = 248.460 us, min = 11.609 us, total = 56.572 ms [state-dump] NodeManager.deadline_timer.print_event_loop_stats - 129 total (1 active, 1 running), Execution time: mean = 2.767 ms, total = 356.980 ms, Queueing time: mean = 76.455 us, max = 325.100 us, min = 9.635 us, total = 9.863 ms [state-dump] ClientConnection.async_read.ProcessMessageHeader - 95 total (21 active), Execution time: mean = 7.620 us, total = 723.861 us, Queueing time: mean = 660.815 ms, max = 59.826 s, min = 29.537 us, total = 62.777 s [state-dump] ClientConnection.async_read.ProcessMessage - 74 total (0 active), Execution time: mean = 752.373 us, total = 55.676 ms, Queueing time: mean = 18.898 us, max = 247.615 us, min = 4.453 us, total = 1.398 ms [state-dump] ClientConnection.async_write.DoAsyncWrites - 22 total (0 active), Execution time: mean = 2.129 us, total = 46.834 us, Queueing time: mean = 43.475 us, max = 264.443 us, min = 17.727 us, total = 956.459 us [state-dump] CoreWorkerService.grpc_client.LocalGC.OnReplyReceived - 21 total (0 active), Execution time: mean = 65.428 us, total = 1.374 ms, Queueing time: mean = 149.534 us, max = 223.485 us, min = 56.603 us, total = 3.140 ms [state-dump] ObjectManager.ObjectAdded - 21 total (0 active), Execution time: mean = 13.522 us, total = 283.961 us, Queueing time: mean = 115.200 us, max = 203.067 us, min = 15.853 us, total = 2.419 ms [state-dump] CoreWorkerService.grpc_client.LocalGC - 21 total (0 active), Execution time: mean = 93.349 ms, total = 1.960 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ObjectManager.ObjectDeleted - 21 total (0 active), Execution time: mean = 18.654 us, total = 391.728 us, Queueing time: mean = 66.932 us, max = 269.040 us, min = 24.280 us, total = 1.406 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 21 total (0 active), Execution time: mean = 119.783 us, total = 2.515 ms, Queueing time: mean = 119.709 us, max = 193.975 us, min = 26.529 us, total = 2.514 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig - 21 total (0 active), Execution time: mean = 800.451 us, total = 16.809 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 14 total (1 active), Execution time: mean = 514.115 s, total = 7197.613 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 13 total (0 active), Execution time: mean = 380.546 us, total = 4.947 ms, Queueing time: mean = 133.192 us, max = 279.406 us, min = 20.299 us, total = 1.731 ms [state-dump] PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 204.482 us, total = 2.658 ms, Queueing time: mean = 3.109 ms, max = 9.915 ms, min = 28.359 us, total = 40.413 ms [state-dump] RaySyncer.BroadcastMessage - 12 total (0 active), Execution time: mean = 229.790 us, total = 2.757 ms, Queueing time: mean = 796.500 ns, max = 1.330 us, min = 180.000 ns, total = 9.558 us [state-dump] - 12 total (0 active), Execution time: mean = 1.109 us, total = 13.308 us, Queueing time: mean = 117.235 us, max = 207.332 us, min = 32.817 us, total = 1.407 ms [state-dump] WorkerPool.PopWorkerCallback - 10 total (0 active), Execution time: mean = 24.488 us, total = 244.878 us, Queueing time: mean = 114.997 us, max = 206.869 us, min = 23.653 us, total = 1.150 ms [state-dump] NodeManagerService.grpc_server.RequestWorkerLease.HandleRequestImpl - 10 total (0 active), Execution time: mean = 133.400 us, total = 1.334 ms, Queueing time: mean = 133.654 us, max = 258.016 us, min = 50.206 us, total = 1.337 ms [state-dump] NodeManagerService.grpc_server.RequestWorkerLease - 10 total (0 active), Execution time: mean = 967.372 us, total = 9.674 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReturnWorker - 10 total (0 active), Execution time: mean = 698.937 us, total = 6.989 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReturnWorker.HandleRequestImpl - 10 total (0 active), Execution time: mean = 129.906 us, total = 1.299 ms, Queueing time: mean = 109.463 us, max = 137.712 us, min = 31.332 us, total = 1.095 ms [state-dump] NodeManager.GCTaskFailureReason - 9 total (1 active), Execution time: mean = 7.823 us, total = 70.403 us, Queueing time: mean = 60.810 us, max = 97.290 us, min = 24.344 us, total = 547.290 us [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 1.320 ms, total = 2.640 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] RaySyncerRegister - 2 total (0 active), Execution time: mean = 2.053 us, total = 4.106 us, Queueing time: mean = 222.000 ns, max = 369.000 ns, min = 75.000 ns, total = 444.000 ns [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 142.119 us, total = 284.237 us, Queueing time: mean = 702.200 us, max = 1.287 ms, min = 117.240 us, total = 1.404 ms [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 92.080 us, total = 92.080 us, Queueing time: mean = 36.492 us, max = 36.492 us, min = 36.492 us, total = 36.492 us [state-dump] Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 74.750 us, total = 74.750 us, Queueing time: mean = 225.429 us, max = 225.429 us, min = 225.429 us, total = 225.429 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 1.746 ms, total = 1.746 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 1.935 ms, total = 1.935 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 1.022 s, total = 1.022 s, Queueing time: mean = 126.779 us, max = 126.779 us, min = 126.779 us, total = 126.779 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 2.178 ms, total = 2.178 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 43.810 us, total = 43.810 us, Queueing time: mean = 116.080 us, max = 116.080 us, min = 116.080 us, total = 116.080 us [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 2.002 ms, total = 2.002 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 1.278 ms, total = 1.278 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 288.995 us, total = 288.995 us, Queueing time: mean = 122.868 us, max = 122.868 us, min = 122.868 us, total = 122.868 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 242.297 us, total = 242.297 us, Queueing time: mean = 122.977 us, max = 122.977 us, min = 122.977 us, total = 122.977 us [state-dump] DebugString() time ms: 1 [state-dump] [state-dump] [2025-01-21 01:11:54,942 I 13636 13664] (raylet) store.cc:564: Plasma store debug dump: Current usage: 0 / 2.14748 GB - num bytes created total: 168 0 pending objects of total size 0MB - objects spillable: 0 - bytes spillable: 0 - objects unsealed: 0 - bytes unsealed: 0 - objects in use: 0 - bytes in use: 0 - objects evictable: 0 - bytes evictable: 0 - objects created by worker: 0 - bytes created by worker: 0 - objects restored: 0 - bytes restored: 0 - objects received: 0 - bytes received: 0 - objects errored: 0 - bytes errored: 0 [2025-01-21 01:11:56,296 I 13636 13636] (raylet) node_manager.cc:525: [state-dump] NodeManager: [state-dump] Node ID: 94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3 [state-dump] Node name: 192.168.0.2 [state-dump] InitialConfigResources: {object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, GPU: 20000, node:__internal_head__: 10000, accelerator_type:A40: 10000, memory: 844966629380000} [state-dump] ClusterTaskManager: [state-dump] ========== Node: 94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3 ================= [state-dump] Infeasible queue length: 0 [state-dump] Schedule queue length: 0 [state-dump] Dispatch queue length: 0 [state-dump] num_waiting_for_resource: 0 [state-dump] num_waiting_for_plasma_memory: 0 [state-dump] num_waiting_for_remote_node_resources: 0 [state-dump] num_worker_not_started_by_job_config_not_exist: 0 [state-dump] num_worker_not_started_by_registration_timeout: 0 [state-dump] num_tasks_waiting_for_workers: 0 [state-dump] num_cancelled_tasks: 0 [state-dump] cluster_resource_scheduler state: [state-dump] Local id: -2729954820020547849 Local resources: {"total":{CPU: [200000], memory: [844966629380000], accelerator_type:A40: [10000], node:192.168.0.2: [10000], node:__internal_head__: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000]}}, "available": {CPU: [200000], memory: [844966629380000], accelerator_type:A40: [10000], node:192.168.0.2: [10000], node:__internal_head__: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000]}}, "labels":{"ray.io/node_id":"94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3",} is_draining: 0 is_idle: 1 Cluster resources: node id: -2729954820020547849{"total":{object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, GPU: 20000, node:__internal_head__: 10000, accelerator_type:A40: 10000, memory: 844966629380000}}, "available": {GPU: 20000, node:__internal_head__: 10000, memory: 844966629380000, accelerator_type:A40: 10000, object_store_memory: 21474836480000, node:192.168.0.2: 10000, CPU: 200000}}, "labels":{"ray.io/node_id":"94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} [state-dump] Waiting tasks size: 0 [state-dump] Number of executing tasks: 0 [state-dump] Number of pinned task arguments: 0 [state-dump] Number of total spilled tasks: 0 [state-dump] Number of spilled waiting tasks: 0 [state-dump] Number of spilled unschedulable tasks: 0 [state-dump] Resource usage { [state-dump] } [state-dump] Backlog Size per scheduling descriptor :{workerId: num backlogs}: [state-dump] [state-dump] Running tasks by scheduling class: [state-dump] ================================================== [state-dump] [state-dump] ClusterResources: [state-dump] LocalObjectManager: [state-dump] - num pinned objects: 0 [state-dump] - pinned objects size: 0 [state-dump] - num objects pending restore: 0 [state-dump] - num objects pending spill: 0 [state-dump] - num bytes pending spill: 0 [state-dump] - num bytes currently spilled: 0 [state-dump] - cumulative spill requests: 0 [state-dump] - cumulative restore requests: 0 [state-dump] - spilled objects pending delete: 0 [state-dump] [state-dump] ObjectManager: [state-dump] - num local objects: 0 [state-dump] - num unfulfilled push requests: 0 [state-dump] - num object pull requests: 0 [state-dump] - num chunks received total: 0 [state-dump] - num chunks received failed (all): 0 [state-dump] - num chunks received failed / cancelled: 0 [state-dump] - num chunks received failed / plasma error: 0 [state-dump] Event stats: [state-dump] Global stats: 0 total (0 active) [state-dump] Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Execution time: mean = -nan s, total = 0.000 s [state-dump] Event stats: [state-dump] PushManager: [state-dump] - num pushes in flight: 0 [state-dump] - num chunks in flight: 0 [state-dump] - num chunks remaining: 0 [state-dump] - max chunks allowed: 409 [state-dump] OwnershipBasedObjectDirectory: [state-dump] - num listeners: 0 [state-dump] - cumulative location updates: 0 [state-dump] - num location updates per second: 0.000 [state-dump] - num location lookups per second: 0.000 [state-dump] - num locations added per second: 0.000 [state-dump] - num locations removed per second: 0.000 [state-dump] BufferPool: [state-dump] - create buffer state map size: 0 [state-dump] PullManager: [state-dump] - num bytes available for pulled objects: 2147483648 [state-dump] - num bytes being pulled (all): 0 [state-dump] - num bytes being pulled / pinned: 0 [state-dump] - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - first get request bundle: N/A [state-dump] - first wait request bundle: N/A [state-dump] - first task request bundle: N/A [state-dump] - num objects queued: 0 [state-dump] - num objects actively pulled (all): 0 [state-dump] - num objects actively pulled / pinned: 0 [state-dump] - num bundles being pulled: 0 [state-dump] - num pull retries: 0 [state-dump] - max timeout seconds: 0 [state-dump] - max timeout request is already processed. No entry. [state-dump] [state-dump] WorkerPool: [state-dump] - registered jobs: 1 [state-dump] - process_failed_job_config_missing: 0 [state-dump] - process_failed_rate_limited: 0 [state-dump] - process_failed_pending_registration: 0 [state-dump] - process_failed_runtime_env_setup_failed: 0 [state-dump] - num PYTHON workers: 20 [state-dump] - num PYTHON drivers: 1 [state-dump] - num PYTHON pending start requests: 0 [state-dump] - num PYTHON pending registration requests: 0 [state-dump] - num object spill callbacks queued: 0 [state-dump] - num object restore queued: 0 [state-dump] - num util functions queued: 0 [state-dump] - num idle workers: 20 [state-dump] TaskDependencyManager: [state-dump] - task deps map size: 0 [state-dump] - get req map size: 0 [state-dump] - wait req map size: 0 [state-dump] - local objects map size: 0 [state-dump] WaitManager: [state-dump] - num active wait requests: 0 [state-dump] Subscriber: [state-dump] Channel WORKER_OBJECT_LOCATIONS_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_OBJECT_EVICTION [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_REF_REMOVED_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] num async plasma notifications: 0 [state-dump] Remote node managers: [state-dump] Event stats: [state-dump] Global stats: 680567 total (35 active) [state-dump] Queueing time: mean = 171.366 us, max = 59.826 s, min = -0.001 s, total = 116.626 s [state-dump] Execution time: mean = 11.640 ms, total = 7922.051 s [state-dump] Event stats: [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog - 163737 total (0 active), Execution time: mean = 532.630 us, total = 87.211 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 163737 total (0 active), Execution time: mean = 35.510 us, total = 5.814 s, Queueing time: mean = 111.896 us, max = 3.855 ms, min = 1.846 us, total = 18.322 s [state-dump] RaySyncer.OnDemandBroadcasting - 77924 total (1 active), Execution time: mean = 11.167 us, total = 870.175 ms, Queueing time: mean = 93.087 us, max = 25.869 ms, min = 6.166 us, total = 7.254 s [state-dump] NodeManager.CheckGC - 77924 total (1 active), Execution time: mean = 3.694 us, total = 287.860 ms, Queueing time: mean = 99.689 us, max = 25.875 ms, min = -0.000 s, total = 7.768 s [state-dump] ObjectManager.UpdateAvailableMemory - 77924 total (0 active), Execution time: mean = 6.403 us, total = 498.960 ms, Queueing time: mean = 112.313 us, max = 45.939 ms, min = 2.228 us, total = 8.752 s [state-dump] RayletWorkerPool.deadline_timer.kill_idle_workers - 38984 total (1 active), Execution time: mean = 19.178 us, total = 747.643 ms, Queueing time: mean = 78.167 us, max = 26.386 ms, min = -0.001 s, total = 3.047 s [state-dump] MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 31135 total (1 active), Execution time: mean = 460.622 us, total = 14.341 s, Queueing time: mean = 76.700 us, max = 4.063 ms, min = -0.000 s, total = 2.388 s [state-dump] NodeManager.ScheduleAndDispatchTasks - 7800 total (1 active), Execution time: mean = 17.630 us, total = 137.516 ms, Queueing time: mean = 76.256 us, max = 2.581 ms, min = 7.438 us, total = 594.797 ms [state-dump] NodeManager.deadline_timer.flush_free_objects - 7799 total (1 active), Execution time: mean = 9.607 us, total = 74.926 ms, Queueing time: mean = 179.423 us, max = 2.947 ms, min = 2.735 us, total = 1.399 s [state-dump] NodeManager.deadline_timer.spill_objects_when_over_threshold - 7799 total (1 active), Execution time: mean = 3.167 us, total = 24.699 ms, Queueing time: mean = 183.776 us, max = 2.946 ms, min = 3.845 us, total = 1.433 s [state-dump] NodeManagerService.grpc_server.GetResourceLoad - 7797 total (0 active), Execution time: mean = 615.095 us, total = 4.796 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 7797 total (0 active), Execution time: mean = 97.849 us, total = 762.929 ms, Queueing time: mean = 114.057 us, max = 2.934 ms, min = 4.027 us, total = 889.305 ms [state-dump] ClusterResourceManager.ResetRemoteNodeView - 2601 total (1 active), Execution time: mean = 10.110 us, total = 26.295 ms, Queueing time: mean = 76.468 us, max = 564.135 us, min = 12.181 us, total = 198.895 ms [state-dump] NodeManager.deadline_timer.record_metrics - 1560 total (1 active), Execution time: mean = 585.199 us, total = 912.911 ms, Queueing time: mean = 348.579 us, max = 2.010 ms, min = 9.115 us, total = 543.783 ms [state-dump] NodeManager.GcsCheckAlive - 1560 total (1 active), Execution time: mean = 321.657 us, total = 501.785 ms, Queueing time: mean = 611.718 us, max = 2.567 ms, min = 6.690 us, total = 954.280 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 1560 total (0 active), Execution time: mean = 56.188 us, total = 87.653 ms, Queueing time: mean = 112.638 us, max = 4.779 ms, min = 11.561 us, total = 175.716 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 1560 total (0 active), Execution time: mean = 1.590 ms, total = 2.480 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.deadline_timer.debug_state_dump - 780 total (1 active), Execution time: mean = 1.792 ms, total = 1.398 s, Queueing time: mean = 72.977 us, max = 248.460 us, min = 11.609 us, total = 56.922 ms [state-dump] NodeManager.deadline_timer.print_event_loop_stats - 130 total (1 active, 1 running), Execution time: mean = 2.763 ms, total = 359.205 ms, Queueing time: mean = 76.096 us, max = 325.100 us, min = 9.635 us, total = 9.892 ms [state-dump] ClientConnection.async_read.ProcessMessageHeader - 95 total (21 active), Execution time: mean = 7.620 us, total = 723.861 us, Queueing time: mean = 660.815 ms, max = 59.826 s, min = 29.537 us, total = 62.777 s [state-dump] ClientConnection.async_read.ProcessMessage - 74 total (0 active), Execution time: mean = 752.373 us, total = 55.676 ms, Queueing time: mean = 18.898 us, max = 247.615 us, min = 4.453 us, total = 1.398 ms [state-dump] ClientConnection.async_write.DoAsyncWrites - 22 total (0 active), Execution time: mean = 2.129 us, total = 46.834 us, Queueing time: mean = 43.475 us, max = 264.443 us, min = 17.727 us, total = 956.459 us [state-dump] CoreWorkerService.grpc_client.LocalGC.OnReplyReceived - 21 total (0 active), Execution time: mean = 65.428 us, total = 1.374 ms, Queueing time: mean = 149.534 us, max = 223.485 us, min = 56.603 us, total = 3.140 ms [state-dump] ObjectManager.ObjectAdded - 21 total (0 active), Execution time: mean = 13.522 us, total = 283.961 us, Queueing time: mean = 115.200 us, max = 203.067 us, min = 15.853 us, total = 2.419 ms [state-dump] CoreWorkerService.grpc_client.LocalGC - 21 total (0 active), Execution time: mean = 93.349 ms, total = 1.960 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ObjectManager.ObjectDeleted - 21 total (0 active), Execution time: mean = 18.654 us, total = 391.728 us, Queueing time: mean = 66.932 us, max = 269.040 us, min = 24.280 us, total = 1.406 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 21 total (0 active), Execution time: mean = 119.783 us, total = 2.515 ms, Queueing time: mean = 119.709 us, max = 193.975 us, min = 26.529 us, total = 2.514 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig - 21 total (0 active), Execution time: mean = 800.451 us, total = 16.809 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 15 total (1 active), Execution time: mean = 519.841 s, total = 7797.614 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 14 total (0 active), Execution time: mean = 384.839 us, total = 5.388 ms, Queueing time: mean = 129.434 us, max = 279.406 us, min = 20.299 us, total = 1.812 ms [state-dump] PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 204.482 us, total = 2.658 ms, Queueing time: mean = 3.109 ms, max = 9.915 ms, min = 28.359 us, total = 40.413 ms [state-dump] RaySyncer.BroadcastMessage - 12 total (0 active), Execution time: mean = 229.790 us, total = 2.757 ms, Queueing time: mean = 796.500 ns, max = 1.330 us, min = 180.000 ns, total = 9.558 us [state-dump] - 12 total (0 active), Execution time: mean = 1.109 us, total = 13.308 us, Queueing time: mean = 117.235 us, max = 207.332 us, min = 32.817 us, total = 1.407 ms [state-dump] WorkerPool.PopWorkerCallback - 10 total (0 active), Execution time: mean = 24.488 us, total = 244.878 us, Queueing time: mean = 114.997 us, max = 206.869 us, min = 23.653 us, total = 1.150 ms [state-dump] NodeManagerService.grpc_server.RequestWorkerLease.HandleRequestImpl - 10 total (0 active), Execution time: mean = 133.400 us, total = 1.334 ms, Queueing time: mean = 133.654 us, max = 258.016 us, min = 50.206 us, total = 1.337 ms [state-dump] NodeManagerService.grpc_server.RequestWorkerLease - 10 total (0 active), Execution time: mean = 967.372 us, total = 9.674 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReturnWorker - 10 total (0 active), Execution time: mean = 698.937 us, total = 6.989 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReturnWorker.HandleRequestImpl - 10 total (0 active), Execution time: mean = 129.906 us, total = 1.299 ms, Queueing time: mean = 109.463 us, max = 137.712 us, min = 31.332 us, total = 1.095 ms [state-dump] NodeManager.GCTaskFailureReason - 9 total (1 active), Execution time: mean = 7.823 us, total = 70.403 us, Queueing time: mean = 60.810 us, max = 97.290 us, min = 24.344 us, total = 547.290 us [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 1.320 ms, total = 2.640 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] RaySyncerRegister - 2 total (0 active), Execution time: mean = 2.053 us, total = 4.106 us, Queueing time: mean = 222.000 ns, max = 369.000 ns, min = 75.000 ns, total = 444.000 ns [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 142.119 us, total = 284.237 us, Queueing time: mean = 702.200 us, max = 1.287 ms, min = 117.240 us, total = 1.404 ms [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 92.080 us, total = 92.080 us, Queueing time: mean = 36.492 us, max = 36.492 us, min = 36.492 us, total = 36.492 us [state-dump] Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 74.750 us, total = 74.750 us, Queueing time: mean = 225.429 us, max = 225.429 us, min = 225.429 us, total = 225.429 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 1.746 ms, total = 1.746 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 1.935 ms, total = 1.935 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 1.022 s, total = 1.022 s, Queueing time: mean = 126.779 us, max = 126.779 us, min = 126.779 us, total = 126.779 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 2.178 ms, total = 2.178 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 43.810 us, total = 43.810 us, Queueing time: mean = 116.080 us, max = 116.080 us, min = 116.080 us, total = 116.080 us [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 2.002 ms, total = 2.002 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 1.278 ms, total = 1.278 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 288.995 us, total = 288.995 us, Queueing time: mean = 122.868 us, max = 122.868 us, min = 122.868 us, total = 122.868 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 242.297 us, total = 242.297 us, Queueing time: mean = 122.977 us, max = 122.977 us, min = 122.977 us, total = 122.977 us [state-dump] DebugString() time ms: 1 [state-dump] [state-dump] [2025-01-21 01:12:54,942 I 13636 13664] (raylet) store.cc:564: Plasma store debug dump: Current usage: 0 / 2.14748 GB - num bytes created total: 168 0 pending objects of total size 0MB - objects spillable: 0 - bytes spillable: 0 - objects unsealed: 0 - bytes unsealed: 0 - objects in use: 0 - bytes in use: 0 - objects evictable: 0 - bytes evictable: 0 - objects created by worker: 0 - bytes created by worker: 0 - objects restored: 0 - bytes restored: 0 - objects received: 0 - bytes received: 0 - objects errored: 0 - bytes errored: 0 [2025-01-21 01:12:56,299 I 13636 13636] (raylet) node_manager.cc:525: [state-dump] NodeManager: [state-dump] Node ID: 94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3 [state-dump] Node name: 192.168.0.2 [state-dump] InitialConfigResources: {object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, GPU: 20000, node:__internal_head__: 10000, accelerator_type:A40: 10000, memory: 844966629380000} [state-dump] ClusterTaskManager: [state-dump] ========== Node: 94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3 ================= [state-dump] Infeasible queue length: 0 [state-dump] Schedule queue length: 0 [state-dump] Dispatch queue length: 0 [state-dump] num_waiting_for_resource: 0 [state-dump] num_waiting_for_plasma_memory: 0 [state-dump] num_waiting_for_remote_node_resources: 0 [state-dump] num_worker_not_started_by_job_config_not_exist: 0 [state-dump] num_worker_not_started_by_registration_timeout: 0 [state-dump] num_tasks_waiting_for_workers: 0 [state-dump] num_cancelled_tasks: 0 [state-dump] cluster_resource_scheduler state: [state-dump] Local id: -2729954820020547849 Local resources: {"total":{CPU: [200000], memory: [844966629380000], accelerator_type:A40: [10000], node:192.168.0.2: [10000], node:__internal_head__: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000]}}, "available": {CPU: [200000], memory: [844966629380000], accelerator_type:A40: [10000], node:192.168.0.2: [10000], node:__internal_head__: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000]}}, "labels":{"ray.io/node_id":"94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3",} is_draining: 0 is_idle: 1 Cluster resources: node id: -2729954820020547849{"total":{object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, GPU: 20000, node:__internal_head__: 10000, accelerator_type:A40: 10000, memory: 844966629380000}}, "available": {GPU: 20000, node:__internal_head__: 10000, memory: 844966629380000, accelerator_type:A40: 10000, object_store_memory: 21474836480000, node:192.168.0.2: 10000, CPU: 200000}}, "labels":{"ray.io/node_id":"94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} [state-dump] Waiting tasks size: 0 [state-dump] Number of executing tasks: 0 [state-dump] Number of pinned task arguments: 0 [state-dump] Number of total spilled tasks: 0 [state-dump] Number of spilled waiting tasks: 0 [state-dump] Number of spilled unschedulable tasks: 0 [state-dump] Resource usage { [state-dump] } [state-dump] Backlog Size per scheduling descriptor :{workerId: num backlogs}: [state-dump] [state-dump] Running tasks by scheduling class: [state-dump] ================================================== [state-dump] [state-dump] ClusterResources: [state-dump] LocalObjectManager: [state-dump] - num pinned objects: 0 [state-dump] - pinned objects size: 0 [state-dump] - num objects pending restore: 0 [state-dump] - num objects pending spill: 0 [state-dump] - num bytes pending spill: 0 [state-dump] - num bytes currently spilled: 0 [state-dump] - cumulative spill requests: 0 [state-dump] - cumulative restore requests: 0 [state-dump] - spilled objects pending delete: 0 [state-dump] [state-dump] ObjectManager: [state-dump] - num local objects: 0 [state-dump] - num unfulfilled push requests: 0 [state-dump] - num object pull requests: 0 [state-dump] - num chunks received total: 0 [state-dump] - num chunks received failed (all): 0 [state-dump] - num chunks received failed / cancelled: 0 [state-dump] - num chunks received failed / plasma error: 0 [state-dump] Event stats: [state-dump] Global stats: 0 total (0 active) [state-dump] Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Execution time: mean = -nan s, total = 0.000 s [state-dump] Event stats: [state-dump] PushManager: [state-dump] - num pushes in flight: 0 [state-dump] - num chunks in flight: 0 [state-dump] - num chunks remaining: 0 [state-dump] - max chunks allowed: 409 [state-dump] OwnershipBasedObjectDirectory: [state-dump] - num listeners: 0 [state-dump] - cumulative location updates: 0 [state-dump] - num location updates per second: 0.000 [state-dump] - num location lookups per second: 0.000 [state-dump] - num locations added per second: 0.000 [state-dump] - num locations removed per second: 0.000 [state-dump] BufferPool: [state-dump] - create buffer state map size: 0 [state-dump] PullManager: [state-dump] - num bytes available for pulled objects: 2147483648 [state-dump] - num bytes being pulled (all): 0 [state-dump] - num bytes being pulled / pinned: 0 [state-dump] - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - first get request bundle: N/A [state-dump] - first wait request bundle: N/A [state-dump] - first task request bundle: N/A [state-dump] - num objects queued: 0 [state-dump] - num objects actively pulled (all): 0 [state-dump] - num objects actively pulled / pinned: 0 [state-dump] - num bundles being pulled: 0 [state-dump] - num pull retries: 0 [state-dump] - max timeout seconds: 0 [state-dump] - max timeout request is already processed. No entry. [state-dump] [state-dump] WorkerPool: [state-dump] - registered jobs: 1 [state-dump] - process_failed_job_config_missing: 0 [state-dump] - process_failed_rate_limited: 0 [state-dump] - process_failed_pending_registration: 0 [state-dump] - process_failed_runtime_env_setup_failed: 0 [state-dump] - num PYTHON workers: 20 [state-dump] - num PYTHON drivers: 1 [state-dump] - num PYTHON pending start requests: 0 [state-dump] - num PYTHON pending registration requests: 0 [state-dump] - num object spill callbacks queued: 0 [state-dump] - num object restore queued: 0 [state-dump] - num util functions queued: 0 [state-dump] - num idle workers: 20 [state-dump] TaskDependencyManager: [state-dump] - task deps map size: 0 [state-dump] - get req map size: 0 [state-dump] - wait req map size: 0 [state-dump] - local objects map size: 0 [state-dump] WaitManager: [state-dump] - num active wait requests: 0 [state-dump] Subscriber: [state-dump] Channel WORKER_OBJECT_LOCATIONS_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_OBJECT_EVICTION [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_REF_REMOVED_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] num async plasma notifications: 0 [state-dump] Remote node managers: [state-dump] Event stats: [state-dump] Global stats: 685802 total (35 active) [state-dump] Queueing time: mean = 170.718 us, max = 59.826 s, min = -0.001 s, total = 117.078 s [state-dump] Execution time: mean = 11.553 ms, total = 7923.041 s [state-dump] Event stats: [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog - 164997 total (0 active), Execution time: mean = 532.913 us, total = 87.929 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 164997 total (0 active), Execution time: mean = 35.523 us, total = 5.861 s, Queueing time: mean = 112.019 us, max = 3.855 ms, min = 1.846 us, total = 18.483 s [state-dump] RaySyncer.OnDemandBroadcasting - 78524 total (1 active), Execution time: mean = 11.180 us, total = 877.872 ms, Queueing time: mean = 93.144 us, max = 25.869 ms, min = 6.166 us, total = 7.314 s [state-dump] NodeManager.CheckGC - 78524 total (1 active), Execution time: mean = 3.689 us, total = 289.705 ms, Queueing time: mean = 99.761 us, max = 25.875 ms, min = -0.000 s, total = 7.834 s [state-dump] ObjectManager.UpdateAvailableMemory - 78524 total (0 active), Execution time: mean = 6.410 us, total = 503.365 ms, Queueing time: mean = 112.404 us, max = 45.939 ms, min = 2.228 us, total = 8.826 s [state-dump] RayletWorkerPool.deadline_timer.kill_idle_workers - 39284 total (1 active), Execution time: mean = 19.188 us, total = 753.795 ms, Queueing time: mean = 78.198 us, max = 26.386 ms, min = -0.001 s, total = 3.072 s [state-dump] MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 31375 total (1 active), Execution time: mean = 460.710 us, total = 14.455 s, Queueing time: mean = 76.777 us, max = 4.063 ms, min = -0.000 s, total = 2.409 s [state-dump] NodeManager.ScheduleAndDispatchTasks - 7860 total (1 active), Execution time: mean = 17.641 us, total = 138.655 ms, Queueing time: mean = 76.283 us, max = 2.581 ms, min = 7.438 us, total = 599.581 ms [state-dump] NodeManager.deadline_timer.flush_free_objects - 7859 total (1 active), Execution time: mean = 9.610 us, total = 75.523 ms, Queueing time: mean = 179.311 us, max = 2.947 ms, min = 2.735 us, total = 1.409 s [state-dump] NodeManager.deadline_timer.spill_objects_when_over_threshold - 7859 total (1 active), Execution time: mean = 3.167 us, total = 24.893 ms, Queueing time: mean = 183.664 us, max = 2.946 ms, min = 3.845 us, total = 1.443 s [state-dump] NodeManagerService.grpc_server.GetResourceLoad - 7857 total (0 active), Execution time: mean = 615.625 us, total = 4.837 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 7857 total (0 active), Execution time: mean = 97.944 us, total = 769.548 ms, Queueing time: mean = 114.086 us, max = 2.934 ms, min = 4.027 us, total = 896.375 ms [state-dump] ClusterResourceManager.ResetRemoteNodeView - 2621 total (1 active), Execution time: mean = 10.113 us, total = 26.507 ms, Queueing time: mean = 76.507 us, max = 564.135 us, min = 12.181 us, total = 200.525 ms [state-dump] NodeManager.deadline_timer.record_metrics - 1572 total (1 active), Execution time: mean = 585.473 us, total = 920.364 ms, Queueing time: mean = 347.806 us, max = 2.010 ms, min = 9.115 us, total = 546.751 ms [state-dump] NodeManager.GcsCheckAlive - 1572 total (1 active), Execution time: mean = 321.500 us, total = 505.399 ms, Queueing time: mean = 611.375 us, max = 2.567 ms, min = 6.690 us, total = 961.082 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 1572 total (0 active), Execution time: mean = 56.205 us, total = 88.355 ms, Queueing time: mean = 112.720 us, max = 4.779 ms, min = 11.561 us, total = 177.196 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 1572 total (0 active), Execution time: mean = 1.589 ms, total = 2.498 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.deadline_timer.debug_state_dump - 786 total (1 active), Execution time: mean = 1.792 ms, total = 1.408 s, Queueing time: mean = 73.105 us, max = 248.460 us, min = 11.609 us, total = 57.461 ms [state-dump] NodeManager.deadline_timer.print_event_loop_stats - 131 total (1 active, 1 running), Execution time: mean = 2.766 ms, total = 362.292 ms, Queueing time: mean = 76.361 us, max = 325.100 us, min = 9.635 us, total = 10.003 ms [state-dump] ClientConnection.async_read.ProcessMessageHeader - 95 total (21 active), Execution time: mean = 7.620 us, total = 723.861 us, Queueing time: mean = 660.815 ms, max = 59.826 s, min = 29.537 us, total = 62.777 s [state-dump] ClientConnection.async_read.ProcessMessage - 74 total (0 active), Execution time: mean = 752.373 us, total = 55.676 ms, Queueing time: mean = 18.898 us, max = 247.615 us, min = 4.453 us, total = 1.398 ms [state-dump] ClientConnection.async_write.DoAsyncWrites - 22 total (0 active), Execution time: mean = 2.129 us, total = 46.834 us, Queueing time: mean = 43.475 us, max = 264.443 us, min = 17.727 us, total = 956.459 us [state-dump] CoreWorkerService.grpc_client.LocalGC.OnReplyReceived - 21 total (0 active), Execution time: mean = 65.428 us, total = 1.374 ms, Queueing time: mean = 149.534 us, max = 223.485 us, min = 56.603 us, total = 3.140 ms [state-dump] ObjectManager.ObjectAdded - 21 total (0 active), Execution time: mean = 13.522 us, total = 283.961 us, Queueing time: mean = 115.200 us, max = 203.067 us, min = 15.853 us, total = 2.419 ms [state-dump] CoreWorkerService.grpc_client.LocalGC - 21 total (0 active), Execution time: mean = 93.349 ms, total = 1.960 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ObjectManager.ObjectDeleted - 21 total (0 active), Execution time: mean = 18.654 us, total = 391.728 us, Queueing time: mean = 66.932 us, max = 269.040 us, min = 24.280 us, total = 1.406 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 21 total (0 active), Execution time: mean = 119.783 us, total = 2.515 ms, Queueing time: mean = 119.709 us, max = 193.975 us, min = 26.529 us, total = 2.514 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig - 21 total (0 active), Execution time: mean = 800.451 us, total = 16.809 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 15 total (1 active), Execution time: mean = 519.841 s, total = 7797.614 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 14 total (0 active), Execution time: mean = 384.839 us, total = 5.388 ms, Queueing time: mean = 129.434 us, max = 279.406 us, min = 20.299 us, total = 1.812 ms [state-dump] PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 204.482 us, total = 2.658 ms, Queueing time: mean = 3.109 ms, max = 9.915 ms, min = 28.359 us, total = 40.413 ms [state-dump] RaySyncer.BroadcastMessage - 12 total (0 active), Execution time: mean = 229.790 us, total = 2.757 ms, Queueing time: mean = 796.500 ns, max = 1.330 us, min = 180.000 ns, total = 9.558 us [state-dump] - 12 total (0 active), Execution time: mean = 1.109 us, total = 13.308 us, Queueing time: mean = 117.235 us, max = 207.332 us, min = 32.817 us, total = 1.407 ms [state-dump] WorkerPool.PopWorkerCallback - 10 total (0 active), Execution time: mean = 24.488 us, total = 244.878 us, Queueing time: mean = 114.997 us, max = 206.869 us, min = 23.653 us, total = 1.150 ms [state-dump] NodeManagerService.grpc_server.RequestWorkerLease.HandleRequestImpl - 10 total (0 active), Execution time: mean = 133.400 us, total = 1.334 ms, Queueing time: mean = 133.654 us, max = 258.016 us, min = 50.206 us, total = 1.337 ms [state-dump] NodeManagerService.grpc_server.RequestWorkerLease - 10 total (0 active), Execution time: mean = 967.372 us, total = 9.674 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReturnWorker - 10 total (0 active), Execution time: mean = 698.937 us, total = 6.989 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReturnWorker.HandleRequestImpl - 10 total (0 active), Execution time: mean = 129.906 us, total = 1.299 ms, Queueing time: mean = 109.463 us, max = 137.712 us, min = 31.332 us, total = 1.095 ms [state-dump] NodeManager.GCTaskFailureReason - 9 total (1 active), Execution time: mean = 7.823 us, total = 70.403 us, Queueing time: mean = 60.810 us, max = 97.290 us, min = 24.344 us, total = 547.290 us [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 1.320 ms, total = 2.640 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] RaySyncerRegister - 2 total (0 active), Execution time: mean = 2.053 us, total = 4.106 us, Queueing time: mean = 222.000 ns, max = 369.000 ns, min = 75.000 ns, total = 444.000 ns [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 142.119 us, total = 284.237 us, Queueing time: mean = 702.200 us, max = 1.287 ms, min = 117.240 us, total = 1.404 ms [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 92.080 us, total = 92.080 us, Queueing time: mean = 36.492 us, max = 36.492 us, min = 36.492 us, total = 36.492 us [state-dump] Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 74.750 us, total = 74.750 us, Queueing time: mean = 225.429 us, max = 225.429 us, min = 225.429 us, total = 225.429 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 1.746 ms, total = 1.746 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 1.935 ms, total = 1.935 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 1.022 s, total = 1.022 s, Queueing time: mean = 126.779 us, max = 126.779 us, min = 126.779 us, total = 126.779 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 2.178 ms, total = 2.178 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 43.810 us, total = 43.810 us, Queueing time: mean = 116.080 us, max = 116.080 us, min = 116.080 us, total = 116.080 us [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 2.002 ms, total = 2.002 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 1.278 ms, total = 1.278 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 288.995 us, total = 288.995 us, Queueing time: mean = 122.868 us, max = 122.868 us, min = 122.868 us, total = 122.868 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 242.297 us, total = 242.297 us, Queueing time: mean = 122.977 us, max = 122.977 us, min = 122.977 us, total = 122.977 us [state-dump] DebugString() time ms: 2 [state-dump] [state-dump] [2025-01-21 01:13:54,942 I 13636 13664] (raylet) store.cc:564: Plasma store debug dump: Current usage: 0 / 2.14748 GB - num bytes created total: 168 0 pending objects of total size 0MB - objects spillable: 0 - bytes spillable: 0 - objects unsealed: 0 - bytes unsealed: 0 - objects in use: 0 - bytes in use: 0 - objects evictable: 0 - bytes evictable: 0 - objects created by worker: 0 - bytes created by worker: 0 - objects restored: 0 - bytes restored: 0 - objects received: 0 - bytes received: 0 - objects errored: 0 - bytes errored: 0 [2025-01-21 01:13:56,302 I 13636 13636] (raylet) node_manager.cc:525: [state-dump] NodeManager: [state-dump] Node ID: 94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3 [state-dump] Node name: 192.168.0.2 [state-dump] InitialConfigResources: {object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, GPU: 20000, node:__internal_head__: 10000, accelerator_type:A40: 10000, memory: 844966629380000} [state-dump] ClusterTaskManager: [state-dump] ========== Node: 94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3 ================= [state-dump] Infeasible queue length: 0 [state-dump] Schedule queue length: 0 [state-dump] Dispatch queue length: 0 [state-dump] num_waiting_for_resource: 0 [state-dump] num_waiting_for_plasma_memory: 0 [state-dump] num_waiting_for_remote_node_resources: 0 [state-dump] num_worker_not_started_by_job_config_not_exist: 0 [state-dump] num_worker_not_started_by_registration_timeout: 0 [state-dump] num_tasks_waiting_for_workers: 0 [state-dump] num_cancelled_tasks: 0 [state-dump] cluster_resource_scheduler state: [state-dump] Local id: -2729954820020547849 Local resources: {"total":{CPU: [200000], memory: [844966629380000], accelerator_type:A40: [10000], node:192.168.0.2: [10000], node:__internal_head__: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000]}}, "available": {CPU: [200000], memory: [844966629380000], accelerator_type:A40: [10000], node:192.168.0.2: [10000], node:__internal_head__: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000]}}, "labels":{"ray.io/node_id":"94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3",} is_draining: 0 is_idle: 1 Cluster resources: node id: -2729954820020547849{"total":{object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, GPU: 20000, node:__internal_head__: 10000, accelerator_type:A40: 10000, memory: 844966629380000}}, "available": {GPU: 20000, node:__internal_head__: 10000, memory: 844966629380000, accelerator_type:A40: 10000, object_store_memory: 21474836480000, node:192.168.0.2: 10000, CPU: 200000}}, "labels":{"ray.io/node_id":"94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} [state-dump] Waiting tasks size: 0 [state-dump] Number of executing tasks: 0 [state-dump] Number of pinned task arguments: 0 [state-dump] Number of total spilled tasks: 0 [state-dump] Number of spilled waiting tasks: 0 [state-dump] Number of spilled unschedulable tasks: 0 [state-dump] Resource usage { [state-dump] } [state-dump] Backlog Size per scheduling descriptor :{workerId: num backlogs}: [state-dump] [state-dump] Running tasks by scheduling class: [state-dump] ================================================== [state-dump] [state-dump] ClusterResources: [state-dump] LocalObjectManager: [state-dump] - num pinned objects: 0 [state-dump] - pinned objects size: 0 [state-dump] - num objects pending restore: 0 [state-dump] - num objects pending spill: 0 [state-dump] - num bytes pending spill: 0 [state-dump] - num bytes currently spilled: 0 [state-dump] - cumulative spill requests: 0 [state-dump] - cumulative restore requests: 0 [state-dump] - spilled objects pending delete: 0 [state-dump] [state-dump] ObjectManager: [state-dump] - num local objects: 0 [state-dump] - num unfulfilled push requests: 0 [state-dump] - num object pull requests: 0 [state-dump] - num chunks received total: 0 [state-dump] - num chunks received failed (all): 0 [state-dump] - num chunks received failed / cancelled: 0 [state-dump] - num chunks received failed / plasma error: 0 [state-dump] Event stats: [state-dump] Global stats: 0 total (0 active) [state-dump] Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Execution time: mean = -nan s, total = 0.000 s [state-dump] Event stats: [state-dump] PushManager: [state-dump] - num pushes in flight: 0 [state-dump] - num chunks in flight: 0 [state-dump] - num chunks remaining: 0 [state-dump] - max chunks allowed: 409 [state-dump] OwnershipBasedObjectDirectory: [state-dump] - num listeners: 0 [state-dump] - cumulative location updates: 0 [state-dump] - num location updates per second: 0.000 [state-dump] - num location lookups per second: 0.000 [state-dump] - num locations added per second: 0.000 [state-dump] - num locations removed per second: 0.000 [state-dump] BufferPool: [state-dump] - create buffer state map size: 0 [state-dump] PullManager: [state-dump] - num bytes available for pulled objects: 2147483648 [state-dump] - num bytes being pulled (all): 0 [state-dump] - num bytes being pulled / pinned: 0 [state-dump] - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - first get request bundle: N/A [state-dump] - first wait request bundle: N/A [state-dump] - first task request bundle: N/A [state-dump] - num objects queued: 0 [state-dump] - num objects actively pulled (all): 0 [state-dump] - num objects actively pulled / pinned: 0 [state-dump] - num bundles being pulled: 0 [state-dump] - num pull retries: 0 [state-dump] - max timeout seconds: 0 [state-dump] - max timeout request is already processed. No entry. [state-dump] [state-dump] WorkerPool: [state-dump] - registered jobs: 1 [state-dump] - process_failed_job_config_missing: 0 [state-dump] - process_failed_rate_limited: 0 [state-dump] - process_failed_pending_registration: 0 [state-dump] - process_failed_runtime_env_setup_failed: 0 [state-dump] - num PYTHON workers: 20 [state-dump] - num PYTHON drivers: 1 [state-dump] - num PYTHON pending start requests: 0 [state-dump] - num PYTHON pending registration requests: 0 [state-dump] - num object spill callbacks queued: 0 [state-dump] - num object restore queued: 0 [state-dump] - num util functions queued: 0 [state-dump] - num idle workers: 20 [state-dump] TaskDependencyManager: [state-dump] - task deps map size: 0 [state-dump] - get req map size: 0 [state-dump] - wait req map size: 0 [state-dump] - local objects map size: 0 [state-dump] WaitManager: [state-dump] - num active wait requests: 0 [state-dump] Subscriber: [state-dump] Channel WORKER_OBJECT_LOCATIONS_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_OBJECT_EVICTION [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_REF_REMOVED_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] num async plasma notifications: 0 [state-dump] Remote node managers: [state-dump] Event stats: [state-dump] Global stats: 691033 total (35 active) [state-dump] Queueing time: mean = 170.056 us, max = 59.826 s, min = -0.001 s, total = 117.514 s [state-dump] Execution time: mean = 11.467 ms, total = 7924.023 s [state-dump] Event stats: [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog - 166257 total (0 active), Execution time: mean = 533.170 us, total = 88.643 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 166257 total (0 active), Execution time: mean = 35.529 us, total = 5.907 s, Queueing time: mean = 112.127 us, max = 3.855 ms, min = 1.846 us, total = 18.642 s [state-dump] RaySyncer.OnDemandBroadcasting - 79123 total (1 active), Execution time: mean = 11.185 us, total = 884.978 ms, Queueing time: mean = 93.150 us, max = 25.869 ms, min = 6.166 us, total = 7.370 s [state-dump] NodeManager.CheckGC - 79123 total (1 active), Execution time: mean = 3.685 us, total = 291.570 ms, Queueing time: mean = 99.777 us, max = 25.875 ms, min = -0.000 s, total = 7.895 s [state-dump] ObjectManager.UpdateAvailableMemory - 79123 total (0 active), Execution time: mean = 6.415 us, total = 507.555 ms, Queueing time: mean = 112.460 us, max = 45.939 ms, min = 2.228 us, total = 8.898 s [state-dump] RayletWorkerPool.deadline_timer.kill_idle_workers - 39584 total (1 active), Execution time: mean = 19.189 us, total = 759.568 ms, Queueing time: mean = 78.165 us, max = 26.386 ms, min = -0.001 s, total = 3.094 s [state-dump] MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 31614 total (1 active), Execution time: mean = 460.835 us, total = 14.569 s, Queueing time: mean = 76.788 us, max = 4.063 ms, min = -0.000 s, total = 2.428 s [state-dump] NodeManager.ScheduleAndDispatchTasks - 7920 total (1 active), Execution time: mean = 17.643 us, total = 139.733 ms, Queueing time: mean = 76.218 us, max = 2.581 ms, min = 7.438 us, total = 603.643 ms [state-dump] NodeManager.deadline_timer.flush_free_objects - 7919 total (1 active), Execution time: mean = 9.615 us, total = 76.141 ms, Queueing time: mean = 179.222 us, max = 2.947 ms, min = 2.735 us, total = 1.419 s [state-dump] NodeManager.deadline_timer.spill_objects_when_over_threshold - 7919 total (1 active), Execution time: mean = 3.168 us, total = 25.087 ms, Queueing time: mean = 183.577 us, max = 2.946 ms, min = 3.845 us, total = 1.454 s [state-dump] NodeManagerService.grpc_server.GetResourceLoad - 7917 total (0 active), Execution time: mean = 615.919 us, total = 4.876 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 7917 total (0 active), Execution time: mean = 98.003 us, total = 775.891 ms, Queueing time: mean = 114.160 us, max = 2.934 ms, min = 4.027 us, total = 903.803 ms [state-dump] ClusterResourceManager.ResetRemoteNodeView - 2641 total (1 active), Execution time: mean = 10.115 us, total = 26.714 ms, Queueing time: mean = 76.630 us, max = 564.135 us, min = 12.181 us, total = 202.381 ms [state-dump] NodeManager.deadline_timer.record_metrics - 1584 total (1 active), Execution time: mean = 584.962 us, total = 926.580 ms, Queueing time: mean = 347.878 us, max = 2.010 ms, min = 9.115 us, total = 551.039 ms [state-dump] NodeManager.GcsCheckAlive - 1584 total (1 active), Execution time: mean = 321.327 us, total = 508.982 ms, Queueing time: mean = 611.066 us, max = 2.567 ms, min = 6.690 us, total = 967.929 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 1584 total (0 active), Execution time: mean = 56.179 us, total = 88.987 ms, Queueing time: mean = 112.804 us, max = 4.779 ms, min = 11.561 us, total = 178.682 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 1584 total (0 active), Execution time: mean = 1.588 ms, total = 2.515 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.deadline_timer.debug_state_dump - 792 total (1 active), Execution time: mean = 1.790 ms, total = 1.418 s, Queueing time: mean = 73.295 us, max = 248.460 us, min = 11.609 us, total = 58.050 ms [state-dump] NodeManager.deadline_timer.print_event_loop_stats - 132 total (1 active, 1 running), Execution time: mean = 2.768 ms, total = 365.311 ms, Queueing time: mean = 76.243 us, max = 325.100 us, min = 9.635 us, total = 10.064 ms [state-dump] ClientConnection.async_read.ProcessMessageHeader - 95 total (21 active), Execution time: mean = 7.620 us, total = 723.861 us, Queueing time: mean = 660.815 ms, max = 59.826 s, min = 29.537 us, total = 62.777 s [state-dump] ClientConnection.async_read.ProcessMessage - 74 total (0 active), Execution time: mean = 752.373 us, total = 55.676 ms, Queueing time: mean = 18.898 us, max = 247.615 us, min = 4.453 us, total = 1.398 ms [state-dump] ClientConnection.async_write.DoAsyncWrites - 22 total (0 active), Execution time: mean = 2.129 us, total = 46.834 us, Queueing time: mean = 43.475 us, max = 264.443 us, min = 17.727 us, total = 956.459 us [state-dump] CoreWorkerService.grpc_client.LocalGC.OnReplyReceived - 21 total (0 active), Execution time: mean = 65.428 us, total = 1.374 ms, Queueing time: mean = 149.534 us, max = 223.485 us, min = 56.603 us, total = 3.140 ms [state-dump] ObjectManager.ObjectAdded - 21 total (0 active), Execution time: mean = 13.522 us, total = 283.961 us, Queueing time: mean = 115.200 us, max = 203.067 us, min = 15.853 us, total = 2.419 ms [state-dump] CoreWorkerService.grpc_client.LocalGC - 21 total (0 active), Execution time: mean = 93.349 ms, total = 1.960 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ObjectManager.ObjectDeleted - 21 total (0 active), Execution time: mean = 18.654 us, total = 391.728 us, Queueing time: mean = 66.932 us, max = 269.040 us, min = 24.280 us, total = 1.406 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 21 total (0 active), Execution time: mean = 119.783 us, total = 2.515 ms, Queueing time: mean = 119.709 us, max = 193.975 us, min = 26.529 us, total = 2.514 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig - 21 total (0 active), Execution time: mean = 800.451 us, total = 16.809 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 15 total (1 active), Execution time: mean = 519.841 s, total = 7797.614 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 14 total (0 active), Execution time: mean = 384.839 us, total = 5.388 ms, Queueing time: mean = 129.434 us, max = 279.406 us, min = 20.299 us, total = 1.812 ms [state-dump] PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 204.482 us, total = 2.658 ms, Queueing time: mean = 3.109 ms, max = 9.915 ms, min = 28.359 us, total = 40.413 ms [state-dump] RaySyncer.BroadcastMessage - 12 total (0 active), Execution time: mean = 229.790 us, total = 2.757 ms, Queueing time: mean = 796.500 ns, max = 1.330 us, min = 180.000 ns, total = 9.558 us [state-dump] - 12 total (0 active), Execution time: mean = 1.109 us, total = 13.308 us, Queueing time: mean = 117.235 us, max = 207.332 us, min = 32.817 us, total = 1.407 ms [state-dump] WorkerPool.PopWorkerCallback - 10 total (0 active), Execution time: mean = 24.488 us, total = 244.878 us, Queueing time: mean = 114.997 us, max = 206.869 us, min = 23.653 us, total = 1.150 ms [state-dump] NodeManagerService.grpc_server.RequestWorkerLease.HandleRequestImpl - 10 total (0 active), Execution time: mean = 133.400 us, total = 1.334 ms, Queueing time: mean = 133.654 us, max = 258.016 us, min = 50.206 us, total = 1.337 ms [state-dump] NodeManagerService.grpc_server.RequestWorkerLease - 10 total (0 active), Execution time: mean = 967.372 us, total = 9.674 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReturnWorker - 10 total (0 active), Execution time: mean = 698.937 us, total = 6.989 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReturnWorker.HandleRequestImpl - 10 total (0 active), Execution time: mean = 129.906 us, total = 1.299 ms, Queueing time: mean = 109.463 us, max = 137.712 us, min = 31.332 us, total = 1.095 ms [state-dump] NodeManager.GCTaskFailureReason - 9 total (1 active), Execution time: mean = 7.823 us, total = 70.403 us, Queueing time: mean = 60.810 us, max = 97.290 us, min = 24.344 us, total = 547.290 us [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 1.320 ms, total = 2.640 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] RaySyncerRegister - 2 total (0 active), Execution time: mean = 2.053 us, total = 4.106 us, Queueing time: mean = 222.000 ns, max = 369.000 ns, min = 75.000 ns, total = 444.000 ns [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 142.119 us, total = 284.237 us, Queueing time: mean = 702.200 us, max = 1.287 ms, min = 117.240 us, total = 1.404 ms [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 92.080 us, total = 92.080 us, Queueing time: mean = 36.492 us, max = 36.492 us, min = 36.492 us, total = 36.492 us [state-dump] Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 74.750 us, total = 74.750 us, Queueing time: mean = 225.429 us, max = 225.429 us, min = 225.429 us, total = 225.429 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 1.746 ms, total = 1.746 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 1.935 ms, total = 1.935 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 1.022 s, total = 1.022 s, Queueing time: mean = 126.779 us, max = 126.779 us, min = 126.779 us, total = 126.779 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 2.178 ms, total = 2.178 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 43.810 us, total = 43.810 us, Queueing time: mean = 116.080 us, max = 116.080 us, min = 116.080 us, total = 116.080 us [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 2.002 ms, total = 2.002 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 1.278 ms, total = 1.278 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 288.995 us, total = 288.995 us, Queueing time: mean = 122.868 us, max = 122.868 us, min = 122.868 us, total = 122.868 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 242.297 us, total = 242.297 us, Queueing time: mean = 122.977 us, max = 122.977 us, min = 122.977 us, total = 122.977 us [state-dump] DebugString() time ms: 1 [state-dump] [state-dump] [2025-01-21 01:14:54,943 I 13636 13664] (raylet) store.cc:564: Plasma store debug dump: Current usage: 0 / 2.14748 GB - num bytes created total: 168 0 pending objects of total size 0MB - objects spillable: 0 - bytes spillable: 0 - objects unsealed: 0 - bytes unsealed: 0 - objects in use: 0 - bytes in use: 0 - objects evictable: 0 - bytes evictable: 0 - objects created by worker: 0 - bytes created by worker: 0 - objects restored: 0 - bytes restored: 0 - objects received: 0 - bytes received: 0 - objects errored: 0 - bytes errored: 0 [2025-01-21 01:14:56,304 I 13636 13636] (raylet) node_manager.cc:525: [state-dump] NodeManager: [state-dump] Node ID: 94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3 [state-dump] Node name: 192.168.0.2 [state-dump] InitialConfigResources: {object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, GPU: 20000, node:__internal_head__: 10000, accelerator_type:A40: 10000, memory: 844966629380000} [state-dump] ClusterTaskManager: [state-dump] ========== Node: 94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3 ================= [state-dump] Infeasible queue length: 0 [state-dump] Schedule queue length: 0 [state-dump] Dispatch queue length: 0 [state-dump] num_waiting_for_resource: 0 [state-dump] num_waiting_for_plasma_memory: 0 [state-dump] num_waiting_for_remote_node_resources: 0 [state-dump] num_worker_not_started_by_job_config_not_exist: 0 [state-dump] num_worker_not_started_by_registration_timeout: 0 [state-dump] num_tasks_waiting_for_workers: 0 [state-dump] num_cancelled_tasks: 0 [state-dump] cluster_resource_scheduler state: [state-dump] Local id: -2729954820020547849 Local resources: {"total":{CPU: [200000], memory: [844966629380000], accelerator_type:A40: [10000], node:192.168.0.2: [10000], node:__internal_head__: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000]}}, "available": {CPU: [200000], memory: [844966629380000], accelerator_type:A40: [10000], node:192.168.0.2: [10000], node:__internal_head__: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000]}}, "labels":{"ray.io/node_id":"94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3",} is_draining: 0 is_idle: 1 Cluster resources: node id: -2729954820020547849{"total":{object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, GPU: 20000, node:__internal_head__: 10000, accelerator_type:A40: 10000, memory: 844966629380000}}, "available": {GPU: 20000, node:__internal_head__: 10000, memory: 844966629380000, accelerator_type:A40: 10000, object_store_memory: 21474836480000, node:192.168.0.2: 10000, CPU: 200000}}, "labels":{"ray.io/node_id":"94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} [state-dump] Waiting tasks size: 0 [state-dump] Number of executing tasks: 0 [state-dump] Number of pinned task arguments: 0 [state-dump] Number of total spilled tasks: 0 [state-dump] Number of spilled waiting tasks: 0 [state-dump] Number of spilled unschedulable tasks: 0 [state-dump] Resource usage { [state-dump] } [state-dump] Backlog Size per scheduling descriptor :{workerId: num backlogs}: [state-dump] [state-dump] Running tasks by scheduling class: [state-dump] ================================================== [state-dump] [state-dump] ClusterResources: [state-dump] LocalObjectManager: [state-dump] - num pinned objects: 0 [state-dump] - pinned objects size: 0 [state-dump] - num objects pending restore: 0 [state-dump] - num objects pending spill: 0 [state-dump] - num bytes pending spill: 0 [state-dump] - num bytes currently spilled: 0 [state-dump] - cumulative spill requests: 0 [state-dump] - cumulative restore requests: 0 [state-dump] - spilled objects pending delete: 0 [state-dump] [state-dump] ObjectManager: [state-dump] - num local objects: 0 [state-dump] - num unfulfilled push requests: 0 [state-dump] - num object pull requests: 0 [state-dump] - num chunks received total: 0 [state-dump] - num chunks received failed (all): 0 [state-dump] - num chunks received failed / cancelled: 0 [state-dump] - num chunks received failed / plasma error: 0 [state-dump] Event stats: [state-dump] Global stats: 0 total (0 active) [state-dump] Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Execution time: mean = -nan s, total = 0.000 s [state-dump] Event stats: [state-dump] PushManager: [state-dump] - num pushes in flight: 0 [state-dump] - num chunks in flight: 0 [state-dump] - num chunks remaining: 0 [state-dump] - max chunks allowed: 409 [state-dump] OwnershipBasedObjectDirectory: [state-dump] - num listeners: 0 [state-dump] - cumulative location updates: 0 [state-dump] - num location updates per second: 0.000 [state-dump] - num location lookups per second: 0.000 [state-dump] - num locations added per second: 0.000 [state-dump] - num locations removed per second: 0.000 [state-dump] BufferPool: [state-dump] - create buffer state map size: 0 [state-dump] PullManager: [state-dump] - num bytes available for pulled objects: 2147483648 [state-dump] - num bytes being pulled (all): 0 [state-dump] - num bytes being pulled / pinned: 0 [state-dump] - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - first get request bundle: N/A [state-dump] - first wait request bundle: N/A [state-dump] - first task request bundle: N/A [state-dump] - num objects queued: 0 [state-dump] - num objects actively pulled (all): 0 [state-dump] - num objects actively pulled / pinned: 0 [state-dump] - num bundles being pulled: 0 [state-dump] - num pull retries: 0 [state-dump] - max timeout seconds: 0 [state-dump] - max timeout request is already processed. No entry. [state-dump] [state-dump] WorkerPool: [state-dump] - registered jobs: 1 [state-dump] - process_failed_job_config_missing: 0 [state-dump] - process_failed_rate_limited: 0 [state-dump] - process_failed_pending_registration: 0 [state-dump] - process_failed_runtime_env_setup_failed: 0 [state-dump] - num PYTHON workers: 20 [state-dump] - num PYTHON drivers: 1 [state-dump] - num PYTHON pending start requests: 0 [state-dump] - num PYTHON pending registration requests: 0 [state-dump] - num object spill callbacks queued: 0 [state-dump] - num object restore queued: 0 [state-dump] - num util functions queued: 0 [state-dump] - num idle workers: 20 [state-dump] TaskDependencyManager: [state-dump] - task deps map size: 0 [state-dump] - get req map size: 0 [state-dump] - wait req map size: 0 [state-dump] - local objects map size: 0 [state-dump] WaitManager: [state-dump] - num active wait requests: 0 [state-dump] Subscriber: [state-dump] Channel WORKER_OBJECT_LOCATIONS_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_OBJECT_EVICTION [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_REF_REMOVED_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] num async plasma notifications: 0 [state-dump] Remote node managers: [state-dump] Event stats: [state-dump] Global stats: 696265 total (35 active) [state-dump] Queueing time: mean = 169.380 us, max = 59.826 s, min = -0.001 s, total = 117.933 s [state-dump] Execution time: mean = 11.382 ms, total = 7924.952 s [state-dump] Event stats: [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog - 167517 total (0 active), Execution time: mean = 533.175 us, total = 89.316 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 167517 total (0 active), Execution time: mean = 35.512 us, total = 5.949 s, Queueing time: mean = 112.133 us, max = 3.855 ms, min = 1.846 us, total = 18.784 s [state-dump] RaySyncer.OnDemandBroadcasting - 79722 total (1 active), Execution time: mean = 11.188 us, total = 891.899 ms, Queueing time: mean = 93.158 us, max = 25.869 ms, min = 6.166 us, total = 7.427 s [state-dump] NodeManager.CheckGC - 79722 total (1 active), Execution time: mean = 3.679 us, total = 293.328 ms, Queueing time: mean = 99.792 us, max = 25.875 ms, min = -0.000 s, total = 7.956 s [state-dump] ObjectManager.UpdateAvailableMemory - 79722 total (0 active), Execution time: mean = 6.415 us, total = 511.404 ms, Queueing time: mean = 112.506 us, max = 45.939 ms, min = 2.228 us, total = 8.969 s [state-dump] RayletWorkerPool.deadline_timer.kill_idle_workers - 39884 total (1 active), Execution time: mean = 19.192 us, total = 765.468 ms, Queueing time: mean = 78.154 us, max = 26.386 ms, min = -0.001 s, total = 3.117 s [state-dump] MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 31854 total (1 active), Execution time: mean = 460.829 us, total = 14.679 s, Queueing time: mean = 76.776 us, max = 4.063 ms, min = -0.000 s, total = 2.446 s [state-dump] NodeManager.ScheduleAndDispatchTasks - 7980 total (1 active), Execution time: mean = 17.646 us, total = 140.816 ms, Queueing time: mean = 76.211 us, max = 2.581 ms, min = 7.438 us, total = 608.167 ms [state-dump] NodeManager.deadline_timer.flush_free_objects - 7979 total (1 active), Execution time: mean = 9.615 us, total = 76.718 ms, Queueing time: mean = 179.189 us, max = 2.947 ms, min = 2.735 us, total = 1.430 s [state-dump] NodeManager.deadline_timer.spill_objects_when_over_threshold - 7979 total (1 active), Execution time: mean = 3.168 us, total = 25.275 ms, Queueing time: mean = 183.544 us, max = 2.946 ms, min = 3.845 us, total = 1.464 s [state-dump] NodeManagerService.grpc_server.GetResourceLoad - 7977 total (0 active), Execution time: mean = 615.921 us, total = 4.913 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 7977 total (0 active), Execution time: mean = 97.988 us, total = 781.649 ms, Queueing time: mean = 114.162 us, max = 2.934 ms, min = 4.027 us, total = 910.671 ms [state-dump] ClusterResourceManager.ResetRemoteNodeView - 2661 total (1 active), Execution time: mean = 10.118 us, total = 26.924 ms, Queueing time: mean = 76.648 us, max = 564.135 us, min = 12.181 us, total = 203.960 ms [state-dump] NodeManager.deadline_timer.record_metrics - 1596 total (1 active), Execution time: mean = 585.040 us, total = 933.724 ms, Queueing time: mean = 347.670 us, max = 2.010 ms, min = 9.115 us, total = 554.881 ms [state-dump] NodeManager.GcsCheckAlive - 1596 total (1 active), Execution time: mean = 321.113 us, total = 512.496 ms, Queueing time: mean = 611.125 us, max = 2.567 ms, min = 6.690 us, total = 975.355 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 1596 total (0 active), Execution time: mean = 56.158 us, total = 89.628 ms, Queueing time: mean = 112.761 us, max = 4.779 ms, min = 11.561 us, total = 179.966 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 1596 total (0 active), Execution time: mean = 1.586 ms, total = 2.532 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.deadline_timer.debug_state_dump - 798 total (1 active), Execution time: mean = 1.790 ms, total = 1.429 s, Queueing time: mean = 73.243 us, max = 248.460 us, min = 11.609 us, total = 58.448 ms [state-dump] NodeManager.deadline_timer.print_event_loop_stats - 133 total (1 active, 1 running), Execution time: mean = 2.767 ms, total = 367.990 ms, Queueing time: mean = 76.134 us, max = 325.100 us, min = 9.635 us, total = 10.126 ms [state-dump] ClientConnection.async_read.ProcessMessageHeader - 95 total (21 active), Execution time: mean = 7.620 us, total = 723.861 us, Queueing time: mean = 660.815 ms, max = 59.826 s, min = 29.537 us, total = 62.777 s [state-dump] ClientConnection.async_read.ProcessMessage - 74 total (0 active), Execution time: mean = 752.373 us, total = 55.676 ms, Queueing time: mean = 18.898 us, max = 247.615 us, min = 4.453 us, total = 1.398 ms [state-dump] ClientConnection.async_write.DoAsyncWrites - 22 total (0 active), Execution time: mean = 2.129 us, total = 46.834 us, Queueing time: mean = 43.475 us, max = 264.443 us, min = 17.727 us, total = 956.459 us [state-dump] CoreWorkerService.grpc_client.LocalGC.OnReplyReceived - 21 total (0 active), Execution time: mean = 65.428 us, total = 1.374 ms, Queueing time: mean = 149.534 us, max = 223.485 us, min = 56.603 us, total = 3.140 ms [state-dump] ObjectManager.ObjectAdded - 21 total (0 active), Execution time: mean = 13.522 us, total = 283.961 us, Queueing time: mean = 115.200 us, max = 203.067 us, min = 15.853 us, total = 2.419 ms [state-dump] CoreWorkerService.grpc_client.LocalGC - 21 total (0 active), Execution time: mean = 93.349 ms, total = 1.960 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ObjectManager.ObjectDeleted - 21 total (0 active), Execution time: mean = 18.654 us, total = 391.728 us, Queueing time: mean = 66.932 us, max = 269.040 us, min = 24.280 us, total = 1.406 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 21 total (0 active), Execution time: mean = 119.783 us, total = 2.515 ms, Queueing time: mean = 119.709 us, max = 193.975 us, min = 26.529 us, total = 2.514 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig - 21 total (0 active), Execution time: mean = 800.451 us, total = 16.809 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 15 total (1 active), Execution time: mean = 519.841 s, total = 7797.614 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 14 total (0 active), Execution time: mean = 384.839 us, total = 5.388 ms, Queueing time: mean = 129.434 us, max = 279.406 us, min = 20.299 us, total = 1.812 ms [state-dump] PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 204.482 us, total = 2.658 ms, Queueing time: mean = 3.109 ms, max = 9.915 ms, min = 28.359 us, total = 40.413 ms [state-dump] RaySyncer.BroadcastMessage - 12 total (0 active), Execution time: mean = 229.790 us, total = 2.757 ms, Queueing time: mean = 796.500 ns, max = 1.330 us, min = 180.000 ns, total = 9.558 us [state-dump] - 12 total (0 active), Execution time: mean = 1.109 us, total = 13.308 us, Queueing time: mean = 117.235 us, max = 207.332 us, min = 32.817 us, total = 1.407 ms [state-dump] WorkerPool.PopWorkerCallback - 10 total (0 active), Execution time: mean = 24.488 us, total = 244.878 us, Queueing time: mean = 114.997 us, max = 206.869 us, min = 23.653 us, total = 1.150 ms [state-dump] NodeManagerService.grpc_server.RequestWorkerLease.HandleRequestImpl - 10 total (0 active), Execution time: mean = 133.400 us, total = 1.334 ms, Queueing time: mean = 133.654 us, max = 258.016 us, min = 50.206 us, total = 1.337 ms [state-dump] NodeManagerService.grpc_server.RequestWorkerLease - 10 total (0 active), Execution time: mean = 967.372 us, total = 9.674 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReturnWorker - 10 total (0 active), Execution time: mean = 698.937 us, total = 6.989 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReturnWorker.HandleRequestImpl - 10 total (0 active), Execution time: mean = 129.906 us, total = 1.299 ms, Queueing time: mean = 109.463 us, max = 137.712 us, min = 31.332 us, total = 1.095 ms [state-dump] NodeManager.GCTaskFailureReason - 9 total (1 active), Execution time: mean = 7.823 us, total = 70.403 us, Queueing time: mean = 60.810 us, max = 97.290 us, min = 24.344 us, total = 547.290 us [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 1.320 ms, total = 2.640 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] RaySyncerRegister - 2 total (0 active), Execution time: mean = 2.053 us, total = 4.106 us, Queueing time: mean = 222.000 ns, max = 369.000 ns, min = 75.000 ns, total = 444.000 ns [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 142.119 us, total = 284.237 us, Queueing time: mean = 702.200 us, max = 1.287 ms, min = 117.240 us, total = 1.404 ms [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 92.080 us, total = 92.080 us, Queueing time: mean = 36.492 us, max = 36.492 us, min = 36.492 us, total = 36.492 us [state-dump] Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 74.750 us, total = 74.750 us, Queueing time: mean = 225.429 us, max = 225.429 us, min = 225.429 us, total = 225.429 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 1.746 ms, total = 1.746 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 1.935 ms, total = 1.935 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 1.022 s, total = 1.022 s, Queueing time: mean = 126.779 us, max = 126.779 us, min = 126.779 us, total = 126.779 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 2.178 ms, total = 2.178 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 43.810 us, total = 43.810 us, Queueing time: mean = 116.080 us, max = 116.080 us, min = 116.080 us, total = 116.080 us [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 2.002 ms, total = 2.002 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 1.278 ms, total = 1.278 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 288.995 us, total = 288.995 us, Queueing time: mean = 122.868 us, max = 122.868 us, min = 122.868 us, total = 122.868 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 242.297 us, total = 242.297 us, Queueing time: mean = 122.977 us, max = 122.977 us, min = 122.977 us, total = 122.977 us [state-dump] DebugString() time ms: 1 [state-dump] [state-dump] [2025-01-21 01:15:54,943 I 13636 13664] (raylet) store.cc:564: Plasma store debug dump: Current usage: 0 / 2.14748 GB - num bytes created total: 168 0 pending objects of total size 0MB - objects spillable: 0 - bytes spillable: 0 - objects unsealed: 0 - bytes unsealed: 0 - objects in use: 0 - bytes in use: 0 - objects evictable: 0 - bytes evictable: 0 - objects created by worker: 0 - bytes created by worker: 0 - objects restored: 0 - bytes restored: 0 - objects received: 0 - bytes received: 0 - objects errored: 0 - bytes errored: 0 [2025-01-21 01:15:56,306 I 13636 13636] (raylet) node_manager.cc:525: [state-dump] NodeManager: [state-dump] Node ID: 94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3 [state-dump] Node name: 192.168.0.2 [state-dump] InitialConfigResources: {object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, GPU: 20000, node:__internal_head__: 10000, accelerator_type:A40: 10000, memory: 844966629380000} [state-dump] ClusterTaskManager: [state-dump] ========== Node: 94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3 ================= [state-dump] Infeasible queue length: 0 [state-dump] Schedule queue length: 0 [state-dump] Dispatch queue length: 0 [state-dump] num_waiting_for_resource: 0 [state-dump] num_waiting_for_plasma_memory: 0 [state-dump] num_waiting_for_remote_node_resources: 0 [state-dump] num_worker_not_started_by_job_config_not_exist: 0 [state-dump] num_worker_not_started_by_registration_timeout: 0 [state-dump] num_tasks_waiting_for_workers: 0 [state-dump] num_cancelled_tasks: 0 [state-dump] cluster_resource_scheduler state: [state-dump] Local id: -2729954820020547849 Local resources: {"total":{CPU: [200000], memory: [844966629380000], accelerator_type:A40: [10000], node:192.168.0.2: [10000], node:__internal_head__: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000]}}, "available": {CPU: [200000], memory: [844966629380000], accelerator_type:A40: [10000], node:192.168.0.2: [10000], node:__internal_head__: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000]}}, "labels":{"ray.io/node_id":"94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3",} is_draining: 0 is_idle: 1 Cluster resources: node id: -2729954820020547849{"total":{object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, GPU: 20000, node:__internal_head__: 10000, accelerator_type:A40: 10000, memory: 844966629380000}}, "available": {GPU: 20000, node:__internal_head__: 10000, memory: 844966629380000, accelerator_type:A40: 10000, object_store_memory: 21474836480000, node:192.168.0.2: 10000, CPU: 200000}}, "labels":{"ray.io/node_id":"94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} [state-dump] Waiting tasks size: 0 [state-dump] Number of executing tasks: 0 [state-dump] Number of pinned task arguments: 0 [state-dump] Number of total spilled tasks: 0 [state-dump] Number of spilled waiting tasks: 0 [state-dump] Number of spilled unschedulable tasks: 0 [state-dump] Resource usage { [state-dump] } [state-dump] Backlog Size per scheduling descriptor :{workerId: num backlogs}: [state-dump] [state-dump] Running tasks by scheduling class: [state-dump] ================================================== [state-dump] [state-dump] ClusterResources: [state-dump] LocalObjectManager: [state-dump] - num pinned objects: 0 [state-dump] - pinned objects size: 0 [state-dump] - num objects pending restore: 0 [state-dump] - num objects pending spill: 0 [state-dump] - num bytes pending spill: 0 [state-dump] - num bytes currently spilled: 0 [state-dump] - cumulative spill requests: 0 [state-dump] - cumulative restore requests: 0 [state-dump] - spilled objects pending delete: 0 [state-dump] [state-dump] ObjectManager: [state-dump] - num local objects: 0 [state-dump] - num unfulfilled push requests: 0 [state-dump] - num object pull requests: 0 [state-dump] - num chunks received total: 0 [state-dump] - num chunks received failed (all): 0 [state-dump] - num chunks received failed / cancelled: 0 [state-dump] - num chunks received failed / plasma error: 0 [state-dump] Event stats: [state-dump] Global stats: 0 total (0 active) [state-dump] Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Execution time: mean = -nan s, total = 0.000 s [state-dump] Event stats: [state-dump] PushManager: [state-dump] - num pushes in flight: 0 [state-dump] - num chunks in flight: 0 [state-dump] - num chunks remaining: 0 [state-dump] - max chunks allowed: 409 [state-dump] OwnershipBasedObjectDirectory: [state-dump] - num listeners: 0 [state-dump] - cumulative location updates: 0 [state-dump] - num location updates per second: 0.000 [state-dump] - num location lookups per second: 0.000 [state-dump] - num locations added per second: 0.000 [state-dump] - num locations removed per second: 0.000 [state-dump] BufferPool: [state-dump] - create buffer state map size: 0 [state-dump] PullManager: [state-dump] - num bytes available for pulled objects: 2147483648 [state-dump] - num bytes being pulled (all): 0 [state-dump] - num bytes being pulled / pinned: 0 [state-dump] - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - first get request bundle: N/A [state-dump] - first wait request bundle: N/A [state-dump] - first task request bundle: N/A [state-dump] - num objects queued: 0 [state-dump] - num objects actively pulled (all): 0 [state-dump] - num objects actively pulled / pinned: 0 [state-dump] - num bundles being pulled: 0 [state-dump] - num pull retries: 0 [state-dump] - max timeout seconds: 0 [state-dump] - max timeout request is already processed. No entry. [state-dump] [state-dump] WorkerPool: [state-dump] - registered jobs: 1 [state-dump] - process_failed_job_config_missing: 0 [state-dump] - process_failed_rate_limited: 0 [state-dump] - process_failed_pending_registration: 0 [state-dump] - process_failed_runtime_env_setup_failed: 0 [state-dump] - num PYTHON workers: 20 [state-dump] - num PYTHON drivers: 1 [state-dump] - num PYTHON pending start requests: 0 [state-dump] - num PYTHON pending registration requests: 0 [state-dump] - num object spill callbacks queued: 0 [state-dump] - num object restore queued: 0 [state-dump] - num util functions queued: 0 [state-dump] - num idle workers: 20 [state-dump] TaskDependencyManager: [state-dump] - task deps map size: 0 [state-dump] - get req map size: 0 [state-dump] - wait req map size: 0 [state-dump] - local objects map size: 0 [state-dump] WaitManager: [state-dump] - num active wait requests: 0 [state-dump] Subscriber: [state-dump] Channel WORKER_OBJECT_LOCATIONS_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_OBJECT_EVICTION [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_REF_REMOVED_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] num async plasma notifications: 0 [state-dump] Remote node managers: [state-dump] Event stats: [state-dump] Global stats: 701499 total (35 active) [state-dump] Queueing time: mean = 168.697 us, max = 59.826 s, min = -0.001 s, total = 118.340 s [state-dump] Execution time: mean = 11.298 ms, total = 7925.863 s [state-dump] Event stats: [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog - 168777 total (0 active), Execution time: mean = 533.086 us, total = 89.973 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 168777 total (0 active), Execution time: mean = 35.494 us, total = 5.991 s, Queueing time: mean = 112.107 us, max = 3.855 ms, min = 1.846 us, total = 18.921 s [state-dump] RaySyncer.OnDemandBroadcasting - 80322 total (1 active), Execution time: mean = 11.186 us, total = 898.502 ms, Queueing time: mean = 93.127 us, max = 25.869 ms, min = 6.166 us, total = 7.480 s [state-dump] NodeManager.CheckGC - 80322 total (1 active), Execution time: mean = 3.674 us, total = 295.082 ms, Queueing time: mean = 99.765 us, max = 25.875 ms, min = -0.000 s, total = 8.013 s [state-dump] ObjectManager.UpdateAvailableMemory - 80322 total (0 active), Execution time: mean = 6.414 us, total = 515.164 ms, Queueing time: mean = 112.509 us, max = 45.939 ms, min = 2.228 us, total = 9.037 s [state-dump] RayletWorkerPool.deadline_timer.kill_idle_workers - 40184 total (1 active), Execution time: mean = 19.192 us, total = 771.197 ms, Queueing time: mean = 78.122 us, max = 26.386 ms, min = -0.001 s, total = 3.139 s [state-dump] MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 32093 total (1 active), Execution time: mean = 460.796 us, total = 14.788 s, Queueing time: mean = 76.760 us, max = 4.063 ms, min = -0.000 s, total = 2.463 s [state-dump] NodeManager.ScheduleAndDispatchTasks - 8040 total (1 active), Execution time: mean = 17.645 us, total = 141.869 ms, Queueing time: mean = 76.175 us, max = 2.581 ms, min = 7.438 us, total = 612.450 ms [state-dump] NodeManager.deadline_timer.flush_free_objects - 8039 total (1 active), Execution time: mean = 9.614 us, total = 77.287 ms, Queueing time: mean = 179.263 us, max = 2.947 ms, min = 2.735 us, total = 1.441 s [state-dump] NodeManager.deadline_timer.spill_objects_when_over_threshold - 8039 total (1 active), Execution time: mean = 3.167 us, total = 25.458 ms, Queueing time: mean = 183.615 us, max = 2.946 ms, min = 3.845 us, total = 1.476 s [state-dump] NodeManagerService.grpc_server.GetResourceLoad - 8037 total (0 active), Execution time: mean = 616.013 us, total = 4.951 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 8037 total (0 active), Execution time: mean = 97.965 us, total = 787.345 ms, Queueing time: mean = 114.193 us, max = 2.934 ms, min = 4.027 us, total = 917.770 ms [state-dump] ClusterResourceManager.ResetRemoteNodeView - 2681 total (1 active), Execution time: mean = 10.117 us, total = 27.123 ms, Queueing time: mean = 76.675 us, max = 564.135 us, min = 12.181 us, total = 205.565 ms [state-dump] NodeManager.deadline_timer.record_metrics - 1608 total (1 active), Execution time: mean = 584.799 us, total = 940.357 ms, Queueing time: mean = 348.233 us, max = 2.010 ms, min = 9.115 us, total = 559.958 ms [state-dump] NodeManager.GcsCheckAlive - 1608 total (1 active), Execution time: mean = 320.754 us, total = 515.772 ms, Queueing time: mean = 611.844 us, max = 2.567 ms, min = 6.690 us, total = 983.846 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 1608 total (0 active), Execution time: mean = 56.112 us, total = 90.228 ms, Queueing time: mean = 112.636 us, max = 4.779 ms, min = 11.561 us, total = 181.118 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 1608 total (0 active), Execution time: mean = 1.585 ms, total = 2.549 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.deadline_timer.debug_state_dump - 804 total (1 active), Execution time: mean = 1.791 ms, total = 1.440 s, Queueing time: mean = 73.444 us, max = 248.460 us, min = 11.609 us, total = 59.049 ms [state-dump] NodeManager.deadline_timer.print_event_loop_stats - 134 total (1 active, 1 running), Execution time: mean = 2.757 ms, total = 369.410 ms, Queueing time: mean = 75.798 us, max = 325.100 us, min = 9.635 us, total = 10.157 ms [state-dump] ClientConnection.async_read.ProcessMessageHeader - 95 total (21 active), Execution time: mean = 7.620 us, total = 723.861 us, Queueing time: mean = 660.815 ms, max = 59.826 s, min = 29.537 us, total = 62.777 s [state-dump] ClientConnection.async_read.ProcessMessage - 74 total (0 active), Execution time: mean = 752.373 us, total = 55.676 ms, Queueing time: mean = 18.898 us, max = 247.615 us, min = 4.453 us, total = 1.398 ms [state-dump] ClientConnection.async_write.DoAsyncWrites - 22 total (0 active), Execution time: mean = 2.129 us, total = 46.834 us, Queueing time: mean = 43.475 us, max = 264.443 us, min = 17.727 us, total = 956.459 us [state-dump] CoreWorkerService.grpc_client.LocalGC.OnReplyReceived - 21 total (0 active), Execution time: mean = 65.428 us, total = 1.374 ms, Queueing time: mean = 149.534 us, max = 223.485 us, min = 56.603 us, total = 3.140 ms [state-dump] ObjectManager.ObjectAdded - 21 total (0 active), Execution time: mean = 13.522 us, total = 283.961 us, Queueing time: mean = 115.200 us, max = 203.067 us, min = 15.853 us, total = 2.419 ms [state-dump] CoreWorkerService.grpc_client.LocalGC - 21 total (0 active), Execution time: mean = 93.349 ms, total = 1.960 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ObjectManager.ObjectDeleted - 21 total (0 active), Execution time: mean = 18.654 us, total = 391.728 us, Queueing time: mean = 66.932 us, max = 269.040 us, min = 24.280 us, total = 1.406 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 21 total (0 active), Execution time: mean = 119.783 us, total = 2.515 ms, Queueing time: mean = 119.709 us, max = 193.975 us, min = 26.529 us, total = 2.514 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig - 21 total (0 active), Execution time: mean = 800.451 us, total = 16.809 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 15 total (1 active), Execution time: mean = 519.841 s, total = 7797.614 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 14 total (0 active), Execution time: mean = 384.839 us, total = 5.388 ms, Queueing time: mean = 129.434 us, max = 279.406 us, min = 20.299 us, total = 1.812 ms [state-dump] PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 204.482 us, total = 2.658 ms, Queueing time: mean = 3.109 ms, max = 9.915 ms, min = 28.359 us, total = 40.413 ms [state-dump] RaySyncer.BroadcastMessage - 12 total (0 active), Execution time: mean = 229.790 us, total = 2.757 ms, Queueing time: mean = 796.500 ns, max = 1.330 us, min = 180.000 ns, total = 9.558 us [state-dump] - 12 total (0 active), Execution time: mean = 1.109 us, total = 13.308 us, Queueing time: mean = 117.235 us, max = 207.332 us, min = 32.817 us, total = 1.407 ms [state-dump] WorkerPool.PopWorkerCallback - 10 total (0 active), Execution time: mean = 24.488 us, total = 244.878 us, Queueing time: mean = 114.997 us, max = 206.869 us, min = 23.653 us, total = 1.150 ms [state-dump] NodeManagerService.grpc_server.RequestWorkerLease.HandleRequestImpl - 10 total (0 active), Execution time: mean = 133.400 us, total = 1.334 ms, Queueing time: mean = 133.654 us, max = 258.016 us, min = 50.206 us, total = 1.337 ms [state-dump] NodeManagerService.grpc_server.RequestWorkerLease - 10 total (0 active), Execution time: mean = 967.372 us, total = 9.674 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReturnWorker - 10 total (0 active), Execution time: mean = 698.937 us, total = 6.989 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReturnWorker.HandleRequestImpl - 10 total (0 active), Execution time: mean = 129.906 us, total = 1.299 ms, Queueing time: mean = 109.463 us, max = 137.712 us, min = 31.332 us, total = 1.095 ms [state-dump] NodeManager.GCTaskFailureReason - 9 total (1 active), Execution time: mean = 7.823 us, total = 70.403 us, Queueing time: mean = 60.810 us, max = 97.290 us, min = 24.344 us, total = 547.290 us [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 1.320 ms, total = 2.640 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] RaySyncerRegister - 2 total (0 active), Execution time: mean = 2.053 us, total = 4.106 us, Queueing time: mean = 222.000 ns, max = 369.000 ns, min = 75.000 ns, total = 444.000 ns [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 142.119 us, total = 284.237 us, Queueing time: mean = 702.200 us, max = 1.287 ms, min = 117.240 us, total = 1.404 ms [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 92.080 us, total = 92.080 us, Queueing time: mean = 36.492 us, max = 36.492 us, min = 36.492 us, total = 36.492 us [state-dump] Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 74.750 us, total = 74.750 us, Queueing time: mean = 225.429 us, max = 225.429 us, min = 225.429 us, total = 225.429 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 1.746 ms, total = 1.746 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 1.935 ms, total = 1.935 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 1.022 s, total = 1.022 s, Queueing time: mean = 126.779 us, max = 126.779 us, min = 126.779 us, total = 126.779 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 2.178 ms, total = 2.178 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 43.810 us, total = 43.810 us, Queueing time: mean = 116.080 us, max = 116.080 us, min = 116.080 us, total = 116.080 us [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 2.002 ms, total = 2.002 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 1.278 ms, total = 1.278 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 288.995 us, total = 288.995 us, Queueing time: mean = 122.868 us, max = 122.868 us, min = 122.868 us, total = 122.868 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 242.297 us, total = 242.297 us, Queueing time: mean = 122.977 us, max = 122.977 us, min = 122.977 us, total = 122.977 us [state-dump] DebugString() time ms: 2 [state-dump] [state-dump] [2025-01-21 01:16:54,943 I 13636 13664] (raylet) store.cc:564: Plasma store debug dump: Current usage: 0 / 2.14748 GB - num bytes created total: 168 0 pending objects of total size 0MB - objects spillable: 0 - bytes spillable: 0 - objects unsealed: 0 - bytes unsealed: 0 - objects in use: 0 - bytes in use: 0 - objects evictable: 0 - bytes evictable: 0 - objects created by worker: 0 - bytes created by worker: 0 - objects restored: 0 - bytes restored: 0 - objects received: 0 - bytes received: 0 - objects errored: 0 - bytes errored: 0 [2025-01-21 01:16:56,310 I 13636 13636] (raylet) node_manager.cc:525: [state-dump] NodeManager: [state-dump] Node ID: 94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3 [state-dump] Node name: 192.168.0.2 [state-dump] InitialConfigResources: {object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, GPU: 20000, node:__internal_head__: 10000, accelerator_type:A40: 10000, memory: 844966629380000} [state-dump] ClusterTaskManager: [state-dump] ========== Node: 94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3 ================= [state-dump] Infeasible queue length: 0 [state-dump] Schedule queue length: 0 [state-dump] Dispatch queue length: 0 [state-dump] num_waiting_for_resource: 0 [state-dump] num_waiting_for_plasma_memory: 0 [state-dump] num_waiting_for_remote_node_resources: 0 [state-dump] num_worker_not_started_by_job_config_not_exist: 0 [state-dump] num_worker_not_started_by_registration_timeout: 0 [state-dump] num_tasks_waiting_for_workers: 0 [state-dump] num_cancelled_tasks: 0 [state-dump] cluster_resource_scheduler state: [state-dump] Local id: -2729954820020547849 Local resources: {"total":{CPU: [200000], memory: [844966629380000], accelerator_type:A40: [10000], node:192.168.0.2: [10000], node:__internal_head__: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000]}}, "available": {CPU: [200000], memory: [844966629380000], accelerator_type:A40: [10000], node:192.168.0.2: [10000], node:__internal_head__: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000]}}, "labels":{"ray.io/node_id":"94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3",} is_draining: 0 is_idle: 1 Cluster resources: node id: -2729954820020547849{"total":{object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, GPU: 20000, node:__internal_head__: 10000, accelerator_type:A40: 10000, memory: 844966629380000}}, "available": {GPU: 20000, node:__internal_head__: 10000, memory: 844966629380000, accelerator_type:A40: 10000, object_store_memory: 21474836480000, node:192.168.0.2: 10000, CPU: 200000}}, "labels":{"ray.io/node_id":"94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} [state-dump] Waiting tasks size: 0 [state-dump] Number of executing tasks: 0 [state-dump] Number of pinned task arguments: 0 [state-dump] Number of total spilled tasks: 0 [state-dump] Number of spilled waiting tasks: 0 [state-dump] Number of spilled unschedulable tasks: 0 [state-dump] Resource usage { [state-dump] } [state-dump] Backlog Size per scheduling descriptor :{workerId: num backlogs}: [state-dump] [state-dump] Running tasks by scheduling class: [state-dump] ================================================== [state-dump] [state-dump] ClusterResources: [state-dump] LocalObjectManager: [state-dump] - num pinned objects: 0 [state-dump] - pinned objects size: 0 [state-dump] - num objects pending restore: 0 [state-dump] - num objects pending spill: 0 [state-dump] - num bytes pending spill: 0 [state-dump] - num bytes currently spilled: 0 [state-dump] - cumulative spill requests: 0 [state-dump] - cumulative restore requests: 0 [state-dump] - spilled objects pending delete: 0 [state-dump] [state-dump] ObjectManager: [state-dump] - num local objects: 0 [state-dump] - num unfulfilled push requests: 0 [state-dump] - num object pull requests: 0 [state-dump] - num chunks received total: 0 [state-dump] - num chunks received failed (all): 0 [state-dump] - num chunks received failed / cancelled: 0 [state-dump] - num chunks received failed / plasma error: 0 [state-dump] Event stats: [state-dump] Global stats: 0 total (0 active) [state-dump] Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Execution time: mean = -nan s, total = 0.000 s [state-dump] Event stats: [state-dump] PushManager: [state-dump] - num pushes in flight: 0 [state-dump] - num chunks in flight: 0 [state-dump] - num chunks remaining: 0 [state-dump] - max chunks allowed: 409 [state-dump] OwnershipBasedObjectDirectory: [state-dump] - num listeners: 0 [state-dump] - cumulative location updates: 0 [state-dump] - num location updates per second: 0.000 [state-dump] - num location lookups per second: 0.000 [state-dump] - num locations added per second: 0.000 [state-dump] - num locations removed per second: 0.000 [state-dump] BufferPool: [state-dump] - create buffer state map size: 0 [state-dump] PullManager: [state-dump] - num bytes available for pulled objects: 2147483648 [state-dump] - num bytes being pulled (all): 0 [state-dump] - num bytes being pulled / pinned: 0 [state-dump] - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - first get request bundle: N/A [state-dump] - first wait request bundle: N/A [state-dump] - first task request bundle: N/A [state-dump] - num objects queued: 0 [state-dump] - num objects actively pulled (all): 0 [state-dump] - num objects actively pulled / pinned: 0 [state-dump] - num bundles being pulled: 0 [state-dump] - num pull retries: 0 [state-dump] - max timeout seconds: 0 [state-dump] - max timeout request is already processed. No entry. [state-dump] [state-dump] WorkerPool: [state-dump] - registered jobs: 1 [state-dump] - process_failed_job_config_missing: 0 [state-dump] - process_failed_rate_limited: 0 [state-dump] - process_failed_pending_registration: 0 [state-dump] - process_failed_runtime_env_setup_failed: 0 [state-dump] - num PYTHON workers: 20 [state-dump] - num PYTHON drivers: 1 [state-dump] - num PYTHON pending start requests: 0 [state-dump] - num PYTHON pending registration requests: 0 [state-dump] - num object spill callbacks queued: 0 [state-dump] - num object restore queued: 0 [state-dump] - num util functions queued: 0 [state-dump] - num idle workers: 20 [state-dump] TaskDependencyManager: [state-dump] - task deps map size: 0 [state-dump] - get req map size: 0 [state-dump] - wait req map size: 0 [state-dump] - local objects map size: 0 [state-dump] WaitManager: [state-dump] - num active wait requests: 0 [state-dump] Subscriber: [state-dump] Channel WORKER_OBJECT_LOCATIONS_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_OBJECT_EVICTION [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_REF_REMOVED_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] num async plasma notifications: 0 [state-dump] Remote node managers: [state-dump] Event stats: [state-dump] Global stats: 706731 total (35 active) [state-dump] Queueing time: mean = 168.074 us, max = 59.826 s, min = -0.001 s, total = 118.783 s [state-dump] Execution time: mean = 11.216 ms, total = 7926.828 s [state-dump] Event stats: [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog - 170037 total (0 active), Execution time: mean = 533.232 us, total = 90.669 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 170037 total (0 active), Execution time: mean = 35.504 us, total = 6.037 s, Queueing time: mean = 112.174 us, max = 3.855 ms, min = 1.846 us, total = 19.074 s [state-dump] RaySyncer.OnDemandBroadcasting - 80921 total (1 active), Execution time: mean = 11.191 us, total = 905.547 ms, Queueing time: mean = 93.179 us, max = 25.869 ms, min = 6.166 us, total = 7.540 s [state-dump] NodeManager.CheckGC - 80921 total (1 active), Execution time: mean = 3.669 us, total = 296.930 ms, Queueing time: mean = 99.825 us, max = 25.875 ms, min = -0.000 s, total = 8.078 s [state-dump] ObjectManager.UpdateAvailableMemory - 80921 total (0 active), Execution time: mean = 6.420 us, total = 519.481 ms, Queueing time: mean = 112.580 us, max = 45.939 ms, min = 2.228 us, total = 9.110 s [state-dump] RayletWorkerPool.deadline_timer.kill_idle_workers - 40483 total (1 active), Execution time: mean = 19.201 us, total = 777.300 ms, Queueing time: mean = 78.113 us, max = 26.386 ms, min = -0.001 s, total = 3.162 s [state-dump] MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 32333 total (1 active), Execution time: mean = 460.908 us, total = 14.903 s, Queueing time: mean = 76.784 us, max = 4.063 ms, min = -0.000 s, total = 2.483 s [state-dump] NodeManager.ScheduleAndDispatchTasks - 8100 total (1 active), Execution time: mean = 17.661 us, total = 143.055 ms, Queueing time: mean = 76.252 us, max = 2.581 ms, min = 7.438 us, total = 617.642 ms [state-dump] NodeManager.deadline_timer.flush_free_objects - 8099 total (1 active), Execution time: mean = 9.617 us, total = 77.890 ms, Queueing time: mean = 179.268 us, max = 2.947 ms, min = 2.735 us, total = 1.452 s [state-dump] NodeManager.deadline_timer.spill_objects_when_over_threshold - 8099 total (1 active), Execution time: mean = 3.167 us, total = 25.652 ms, Queueing time: mean = 183.620 us, max = 2.946 ms, min = 3.845 us, total = 1.487 s [state-dump] NodeManagerService.grpc_server.GetResourceLoad - 8097 total (0 active), Execution time: mean = 616.110 us, total = 4.989 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 8097 total (0 active), Execution time: mean = 98.028 us, total = 793.735 ms, Queueing time: mean = 114.236 us, max = 2.934 ms, min = 4.027 us, total = 924.970 ms [state-dump] ClusterResourceManager.ResetRemoteNodeView - 2701 total (1 active), Execution time: mean = 10.115 us, total = 27.322 ms, Queueing time: mean = 76.651 us, max = 564.135 us, min = 12.181 us, total = 207.035 ms [state-dump] NodeManager.deadline_timer.record_metrics - 1620 total (1 active), Execution time: mean = 584.566 us, total = 946.997 ms, Queueing time: mean = 348.440 us, max = 2.010 ms, min = 9.115 us, total = 564.473 ms [state-dump] NodeManager.GcsCheckAlive - 1620 total (1 active), Execution time: mean = 320.463 us, total = 519.150 ms, Queueing time: mean = 612.121 us, max = 2.567 ms, min = 6.690 us, total = 991.636 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 1620 total (0 active), Execution time: mean = 56.077 us, total = 90.845 ms, Queueing time: mean = 112.710 us, max = 4.779 ms, min = 11.561 us, total = 182.590 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 1620 total (0 active), Execution time: mean = 1.584 ms, total = 2.566 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.deadline_timer.debug_state_dump - 810 total (1 active), Execution time: mean = 1.790 ms, total = 1.450 s, Queueing time: mean = 73.445 us, max = 248.460 us, min = 11.609 us, total = 59.490 ms [state-dump] NodeManager.deadline_timer.print_event_loop_stats - 135 total (1 active, 1 running), Execution time: mean = 2.758 ms, total = 372.391 ms, Queueing time: mean = 75.725 us, max = 325.100 us, min = 9.635 us, total = 10.223 ms [state-dump] ClientConnection.async_read.ProcessMessageHeader - 95 total (21 active), Execution time: mean = 7.620 us, total = 723.861 us, Queueing time: mean = 660.815 ms, max = 59.826 s, min = 29.537 us, total = 62.777 s [state-dump] ClientConnection.async_read.ProcessMessage - 74 total (0 active), Execution time: mean = 752.373 us, total = 55.676 ms, Queueing time: mean = 18.898 us, max = 247.615 us, min = 4.453 us, total = 1.398 ms [state-dump] ClientConnection.async_write.DoAsyncWrites - 22 total (0 active), Execution time: mean = 2.129 us, total = 46.834 us, Queueing time: mean = 43.475 us, max = 264.443 us, min = 17.727 us, total = 956.459 us [state-dump] CoreWorkerService.grpc_client.LocalGC.OnReplyReceived - 21 total (0 active), Execution time: mean = 65.428 us, total = 1.374 ms, Queueing time: mean = 149.534 us, max = 223.485 us, min = 56.603 us, total = 3.140 ms [state-dump] ObjectManager.ObjectAdded - 21 total (0 active), Execution time: mean = 13.522 us, total = 283.961 us, Queueing time: mean = 115.200 us, max = 203.067 us, min = 15.853 us, total = 2.419 ms [state-dump] CoreWorkerService.grpc_client.LocalGC - 21 total (0 active), Execution time: mean = 93.349 ms, total = 1.960 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ObjectManager.ObjectDeleted - 21 total (0 active), Execution time: mean = 18.654 us, total = 391.728 us, Queueing time: mean = 66.932 us, max = 269.040 us, min = 24.280 us, total = 1.406 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 21 total (0 active), Execution time: mean = 119.783 us, total = 2.515 ms, Queueing time: mean = 119.709 us, max = 193.975 us, min = 26.529 us, total = 2.514 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig - 21 total (0 active), Execution time: mean = 800.451 us, total = 16.809 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 15 total (1 active), Execution time: mean = 519.841 s, total = 7797.614 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 14 total (0 active), Execution time: mean = 384.839 us, total = 5.388 ms, Queueing time: mean = 129.434 us, max = 279.406 us, min = 20.299 us, total = 1.812 ms [state-dump] PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 204.482 us, total = 2.658 ms, Queueing time: mean = 3.109 ms, max = 9.915 ms, min = 28.359 us, total = 40.413 ms [state-dump] RaySyncer.BroadcastMessage - 12 total (0 active), Execution time: mean = 229.790 us, total = 2.757 ms, Queueing time: mean = 796.500 ns, max = 1.330 us, min = 180.000 ns, total = 9.558 us [state-dump] - 12 total (0 active), Execution time: mean = 1.109 us, total = 13.308 us, Queueing time: mean = 117.235 us, max = 207.332 us, min = 32.817 us, total = 1.407 ms [state-dump] WorkerPool.PopWorkerCallback - 10 total (0 active), Execution time: mean = 24.488 us, total = 244.878 us, Queueing time: mean = 114.997 us, max = 206.869 us, min = 23.653 us, total = 1.150 ms [state-dump] NodeManagerService.grpc_server.RequestWorkerLease.HandleRequestImpl - 10 total (0 active), Execution time: mean = 133.400 us, total = 1.334 ms, Queueing time: mean = 133.654 us, max = 258.016 us, min = 50.206 us, total = 1.337 ms [state-dump] NodeManagerService.grpc_server.RequestWorkerLease - 10 total (0 active), Execution time: mean = 967.372 us, total = 9.674 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReturnWorker - 10 total (0 active), Execution time: mean = 698.937 us, total = 6.989 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReturnWorker.HandleRequestImpl - 10 total (0 active), Execution time: mean = 129.906 us, total = 1.299 ms, Queueing time: mean = 109.463 us, max = 137.712 us, min = 31.332 us, total = 1.095 ms [state-dump] NodeManager.GCTaskFailureReason - 10 total (1 active), Execution time: mean = 8.497 us, total = 84.973 us, Queueing time: mean = 64.210 us, max = 97.290 us, min = 24.344 us, total = 642.101 us [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 1.320 ms, total = 2.640 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] RaySyncerRegister - 2 total (0 active), Execution time: mean = 2.053 us, total = 4.106 us, Queueing time: mean = 222.000 ns, max = 369.000 ns, min = 75.000 ns, total = 444.000 ns [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 142.119 us, total = 284.237 us, Queueing time: mean = 702.200 us, max = 1.287 ms, min = 117.240 us, total = 1.404 ms [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 92.080 us, total = 92.080 us, Queueing time: mean = 36.492 us, max = 36.492 us, min = 36.492 us, total = 36.492 us [state-dump] Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 74.750 us, total = 74.750 us, Queueing time: mean = 225.429 us, max = 225.429 us, min = 225.429 us, total = 225.429 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 1.746 ms, total = 1.746 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 1.935 ms, total = 1.935 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 1.022 s, total = 1.022 s, Queueing time: mean = 126.779 us, max = 126.779 us, min = 126.779 us, total = 126.779 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 2.178 ms, total = 2.178 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 43.810 us, total = 43.810 us, Queueing time: mean = 116.080 us, max = 116.080 us, min = 116.080 us, total = 116.080 us [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 2.002 ms, total = 2.002 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 1.278 ms, total = 1.278 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 288.995 us, total = 288.995 us, Queueing time: mean = 122.868 us, max = 122.868 us, min = 122.868 us, total = 122.868 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 242.297 us, total = 242.297 us, Queueing time: mean = 122.977 us, max = 122.977 us, min = 122.977 us, total = 122.977 us [state-dump] DebugString() time ms: 1 [state-dump] [state-dump] [2025-01-21 01:17:54,943 I 13636 13664] (raylet) store.cc:564: Plasma store debug dump: Current usage: 0 / 2.14748 GB - num bytes created total: 168 0 pending objects of total size 0MB - objects spillable: 0 - bytes spillable: 0 - objects unsealed: 0 - bytes unsealed: 0 - objects in use: 0 - bytes in use: 0 - objects evictable: 0 - bytes evictable: 0 - objects created by worker: 0 - bytes created by worker: 0 - objects restored: 0 - bytes restored: 0 - objects received: 0 - bytes received: 0 - objects errored: 0 - bytes errored: 0 [2025-01-21 01:17:56,313 I 13636 13636] (raylet) node_manager.cc:525: [state-dump] NodeManager: [state-dump] Node ID: 94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3 [state-dump] Node name: 192.168.0.2 [state-dump] InitialConfigResources: {object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, GPU: 20000, node:__internal_head__: 10000, accelerator_type:A40: 10000, memory: 844966629380000} [state-dump] ClusterTaskManager: [state-dump] ========== Node: 94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3 ================= [state-dump] Infeasible queue length: 0 [state-dump] Schedule queue length: 0 [state-dump] Dispatch queue length: 0 [state-dump] num_waiting_for_resource: 0 [state-dump] num_waiting_for_plasma_memory: 0 [state-dump] num_waiting_for_remote_node_resources: 0 [state-dump] num_worker_not_started_by_job_config_not_exist: 0 [state-dump] num_worker_not_started_by_registration_timeout: 0 [state-dump] num_tasks_waiting_for_workers: 0 [state-dump] num_cancelled_tasks: 0 [state-dump] cluster_resource_scheduler state: [state-dump] Local id: -2729954820020547849 Local resources: {"total":{CPU: [200000], memory: [844966629380000], accelerator_type:A40: [10000], node:192.168.0.2: [10000], node:__internal_head__: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000]}}, "available": {CPU: [200000], memory: [844966629380000], accelerator_type:A40: [10000], node:192.168.0.2: [10000], node:__internal_head__: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000]}}, "labels":{"ray.io/node_id":"94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3",} is_draining: 0 is_idle: 1 Cluster resources: node id: -2729954820020547849{"total":{object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, GPU: 20000, node:__internal_head__: 10000, accelerator_type:A40: 10000, memory: 844966629380000}}, "available": {GPU: 20000, node:__internal_head__: 10000, memory: 844966629380000, accelerator_type:A40: 10000, object_store_memory: 21474836480000, node:192.168.0.2: 10000, CPU: 200000}}, "labels":{"ray.io/node_id":"94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} [state-dump] Waiting tasks size: 0 [state-dump] Number of executing tasks: 0 [state-dump] Number of pinned task arguments: 0 [state-dump] Number of total spilled tasks: 0 [state-dump] Number of spilled waiting tasks: 0 [state-dump] Number of spilled unschedulable tasks: 0 [state-dump] Resource usage { [state-dump] } [state-dump] Backlog Size per scheduling descriptor :{workerId: num backlogs}: [state-dump] [state-dump] Running tasks by scheduling class: [state-dump] ================================================== [state-dump] [state-dump] ClusterResources: [state-dump] LocalObjectManager: [state-dump] - num pinned objects: 0 [state-dump] - pinned objects size: 0 [state-dump] - num objects pending restore: 0 [state-dump] - num objects pending spill: 0 [state-dump] - num bytes pending spill: 0 [state-dump] - num bytes currently spilled: 0 [state-dump] - cumulative spill requests: 0 [state-dump] - cumulative restore requests: 0 [state-dump] - spilled objects pending delete: 0 [state-dump] [state-dump] ObjectManager: [state-dump] - num local objects: 0 [state-dump] - num unfulfilled push requests: 0 [state-dump] - num object pull requests: 0 [state-dump] - num chunks received total: 0 [state-dump] - num chunks received failed (all): 0 [state-dump] - num chunks received failed / cancelled: 0 [state-dump] - num chunks received failed / plasma error: 0 [state-dump] Event stats: [state-dump] Global stats: 0 total (0 active) [state-dump] Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Execution time: mean = -nan s, total = 0.000 s [state-dump] Event stats: [state-dump] PushManager: [state-dump] - num pushes in flight: 0 [state-dump] - num chunks in flight: 0 [state-dump] - num chunks remaining: 0 [state-dump] - max chunks allowed: 409 [state-dump] OwnershipBasedObjectDirectory: [state-dump] - num listeners: 0 [state-dump] - cumulative location updates: 0 [state-dump] - num location updates per second: 0.000 [state-dump] - num location lookups per second: 0.000 [state-dump] - num locations added per second: 0.000 [state-dump] - num locations removed per second: 0.000 [state-dump] BufferPool: [state-dump] - create buffer state map size: 0 [state-dump] PullManager: [state-dump] - num bytes available for pulled objects: 2147483648 [state-dump] - num bytes being pulled (all): 0 [state-dump] - num bytes being pulled / pinned: 0 [state-dump] - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - first get request bundle: N/A [state-dump] - first wait request bundle: N/A [state-dump] - first task request bundle: N/A [state-dump] - num objects queued: 0 [state-dump] - num objects actively pulled (all): 0 [state-dump] - num objects actively pulled / pinned: 0 [state-dump] - num bundles being pulled: 0 [state-dump] - num pull retries: 0 [state-dump] - max timeout seconds: 0 [state-dump] - max timeout request is already processed. No entry. [state-dump] [state-dump] WorkerPool: [state-dump] - registered jobs: 1 [state-dump] - process_failed_job_config_missing: 0 [state-dump] - process_failed_rate_limited: 0 [state-dump] - process_failed_pending_registration: 0 [state-dump] - process_failed_runtime_env_setup_failed: 0 [state-dump] - num PYTHON workers: 20 [state-dump] - num PYTHON drivers: 1 [state-dump] - num PYTHON pending start requests: 0 [state-dump] - num PYTHON pending registration requests: 0 [state-dump] - num object spill callbacks queued: 0 [state-dump] - num object restore queued: 0 [state-dump] - num util functions queued: 0 [state-dump] - num idle workers: 20 [state-dump] TaskDependencyManager: [state-dump] - task deps map size: 0 [state-dump] - get req map size: 0 [state-dump] - wait req map size: 0 [state-dump] - local objects map size: 0 [state-dump] WaitManager: [state-dump] - num active wait requests: 0 [state-dump] Subscriber: [state-dump] Channel WORKER_OBJECT_LOCATIONS_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_OBJECT_EVICTION [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_REF_REMOVED_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] num async plasma notifications: 0 [state-dump] Remote node managers: [state-dump] Event stats: [state-dump] Global stats: 711965 total (35 active) [state-dump] Queueing time: mean = 167.434 us, max = 59.826 s, min = -0.001 s, total = 119.207 s [state-dump] Execution time: mean = 11.135 ms, total = 7927.769 s [state-dump] Event stats: [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog - 171297 total (0 active), Execution time: mean = 533.259 us, total = 91.346 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 171297 total (0 active), Execution time: mean = 35.497 us, total = 6.080 s, Queueing time: mean = 112.216 us, max = 3.855 ms, min = 1.846 us, total = 19.222 s [state-dump] RaySyncer.OnDemandBroadcasting - 81521 total (1 active), Execution time: mean = 11.197 us, total = 912.800 ms, Queueing time: mean = 93.174 us, max = 25.869 ms, min = 6.166 us, total = 7.596 s [state-dump] NodeManager.CheckGC - 81521 total (1 active), Execution time: mean = 3.665 us, total = 298.740 ms, Queueing time: mean = 99.831 us, max = 25.875 ms, min = -0.000 s, total = 8.138 s [state-dump] ObjectManager.UpdateAvailableMemory - 81521 total (0 active), Execution time: mean = 6.423 us, total = 523.571 ms, Queueing time: mean = 112.580 us, max = 45.939 ms, min = 2.228 us, total = 9.178 s [state-dump] RayletWorkerPool.deadline_timer.kill_idle_workers - 40783 total (1 active), Execution time: mean = 19.221 us, total = 783.896 ms, Queueing time: mean = 78.142 us, max = 26.386 ms, min = -0.001 s, total = 3.187 s [state-dump] MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 32572 total (1 active), Execution time: mean = 460.932 us, total = 15.013 s, Queueing time: mean = 76.784 us, max = 4.063 ms, min = -0.000 s, total = 2.501 s [state-dump] NodeManager.ScheduleAndDispatchTasks - 8160 total (1 active), Execution time: mean = 17.680 us, total = 144.269 ms, Queueing time: mean = 76.252 us, max = 2.581 ms, min = 7.438 us, total = 622.218 ms [state-dump] NodeManager.deadline_timer.flush_free_objects - 8159 total (1 active), Execution time: mean = 9.620 us, total = 78.488 ms, Queueing time: mean = 179.257 us, max = 2.947 ms, min = 2.735 us, total = 1.463 s [state-dump] NodeManager.deadline_timer.spill_objects_when_over_threshold - 8159 total (1 active), Execution time: mean = 3.167 us, total = 25.838 ms, Queueing time: mean = 183.608 us, max = 2.946 ms, min = 3.845 us, total = 1.498 s [state-dump] NodeManagerService.grpc_server.GetResourceLoad - 8157 total (0 active), Execution time: mean = 616.307 us, total = 5.027 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 8157 total (0 active), Execution time: mean = 98.081 us, total = 800.044 ms, Queueing time: mean = 114.320 us, max = 2.934 ms, min = 4.027 us, total = 932.507 ms [state-dump] ClusterResourceManager.ResetRemoteNodeView - 2721 total (1 active), Execution time: mean = 10.111 us, total = 27.511 ms, Queueing time: mean = 76.672 us, max = 564.135 us, min = 12.181 us, total = 208.625 ms [state-dump] NodeManager.deadline_timer.record_metrics - 1632 total (1 active), Execution time: mean = 584.365 us, total = 953.684 ms, Queueing time: mean = 348.645 us, max = 2.010 ms, min = 9.115 us, total = 568.988 ms [state-dump] NodeManager.GcsCheckAlive - 1632 total (1 active), Execution time: mean = 320.444 us, total = 522.964 ms, Queueing time: mean = 612.116 us, max = 2.567 ms, min = 6.690 us, total = 998.974 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 1632 total (0 active), Execution time: mean = 56.073 us, total = 91.511 ms, Queueing time: mean = 112.712 us, max = 4.779 ms, min = 11.561 us, total = 183.946 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 1632 total (0 active), Execution time: mean = 1.584 ms, total = 2.585 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.deadline_timer.debug_state_dump - 816 total (1 active), Execution time: mean = 1.790 ms, total = 1.461 s, Queueing time: mean = 73.659 us, max = 248.460 us, min = 11.609 us, total = 60.106 ms [state-dump] NodeManager.deadline_timer.print_event_loop_stats - 136 total (1 active, 1 running), Execution time: mean = 2.761 ms, total = 375.505 ms, Queueing time: mean = 75.987 us, max = 325.100 us, min = 9.635 us, total = 10.334 ms [state-dump] ClientConnection.async_read.ProcessMessageHeader - 95 total (21 active), Execution time: mean = 7.620 us, total = 723.861 us, Queueing time: mean = 660.815 ms, max = 59.826 s, min = 29.537 us, total = 62.777 s [state-dump] ClientConnection.async_read.ProcessMessage - 74 total (0 active), Execution time: mean = 752.373 us, total = 55.676 ms, Queueing time: mean = 18.898 us, max = 247.615 us, min = 4.453 us, total = 1.398 ms [state-dump] ClientConnection.async_write.DoAsyncWrites - 22 total (0 active), Execution time: mean = 2.129 us, total = 46.834 us, Queueing time: mean = 43.475 us, max = 264.443 us, min = 17.727 us, total = 956.459 us [state-dump] CoreWorkerService.grpc_client.LocalGC.OnReplyReceived - 21 total (0 active), Execution time: mean = 65.428 us, total = 1.374 ms, Queueing time: mean = 149.534 us, max = 223.485 us, min = 56.603 us, total = 3.140 ms [state-dump] ObjectManager.ObjectAdded - 21 total (0 active), Execution time: mean = 13.522 us, total = 283.961 us, Queueing time: mean = 115.200 us, max = 203.067 us, min = 15.853 us, total = 2.419 ms [state-dump] CoreWorkerService.grpc_client.LocalGC - 21 total (0 active), Execution time: mean = 93.349 ms, total = 1.960 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ObjectManager.ObjectDeleted - 21 total (0 active), Execution time: mean = 18.654 us, total = 391.728 us, Queueing time: mean = 66.932 us, max = 269.040 us, min = 24.280 us, total = 1.406 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 21 total (0 active), Execution time: mean = 119.783 us, total = 2.515 ms, Queueing time: mean = 119.709 us, max = 193.975 us, min = 26.529 us, total = 2.514 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig - 21 total (0 active), Execution time: mean = 800.451 us, total = 16.809 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 15 total (1 active), Execution time: mean = 519.841 s, total = 7797.614 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 14 total (0 active), Execution time: mean = 384.839 us, total = 5.388 ms, Queueing time: mean = 129.434 us, max = 279.406 us, min = 20.299 us, total = 1.812 ms [state-dump] PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 204.482 us, total = 2.658 ms, Queueing time: mean = 3.109 ms, max = 9.915 ms, min = 28.359 us, total = 40.413 ms [state-dump] RaySyncer.BroadcastMessage - 12 total (0 active), Execution time: mean = 229.790 us, total = 2.757 ms, Queueing time: mean = 796.500 ns, max = 1.330 us, min = 180.000 ns, total = 9.558 us [state-dump] - 12 total (0 active), Execution time: mean = 1.109 us, total = 13.308 us, Queueing time: mean = 117.235 us, max = 207.332 us, min = 32.817 us, total = 1.407 ms [state-dump] WorkerPool.PopWorkerCallback - 10 total (0 active), Execution time: mean = 24.488 us, total = 244.878 us, Queueing time: mean = 114.997 us, max = 206.869 us, min = 23.653 us, total = 1.150 ms [state-dump] NodeManagerService.grpc_server.RequestWorkerLease.HandleRequestImpl - 10 total (0 active), Execution time: mean = 133.400 us, total = 1.334 ms, Queueing time: mean = 133.654 us, max = 258.016 us, min = 50.206 us, total = 1.337 ms [state-dump] NodeManagerService.grpc_server.RequestWorkerLease - 10 total (0 active), Execution time: mean = 967.372 us, total = 9.674 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReturnWorker - 10 total (0 active), Execution time: mean = 698.937 us, total = 6.989 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReturnWorker.HandleRequestImpl - 10 total (0 active), Execution time: mean = 129.906 us, total = 1.299 ms, Queueing time: mean = 109.463 us, max = 137.712 us, min = 31.332 us, total = 1.095 ms [state-dump] NodeManager.GCTaskFailureReason - 10 total (1 active), Execution time: mean = 8.497 us, total = 84.973 us, Queueing time: mean = 64.210 us, max = 97.290 us, min = 24.344 us, total = 642.101 us [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 1.320 ms, total = 2.640 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] RaySyncerRegister - 2 total (0 active), Execution time: mean = 2.053 us, total = 4.106 us, Queueing time: mean = 222.000 ns, max = 369.000 ns, min = 75.000 ns, total = 444.000 ns [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 142.119 us, total = 284.237 us, Queueing time: mean = 702.200 us, max = 1.287 ms, min = 117.240 us, total = 1.404 ms [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 92.080 us, total = 92.080 us, Queueing time: mean = 36.492 us, max = 36.492 us, min = 36.492 us, total = 36.492 us [state-dump] Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 74.750 us, total = 74.750 us, Queueing time: mean = 225.429 us, max = 225.429 us, min = 225.429 us, total = 225.429 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 1.746 ms, total = 1.746 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 1.935 ms, total = 1.935 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 1.022 s, total = 1.022 s, Queueing time: mean = 126.779 us, max = 126.779 us, min = 126.779 us, total = 126.779 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 2.178 ms, total = 2.178 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 43.810 us, total = 43.810 us, Queueing time: mean = 116.080 us, max = 116.080 us, min = 116.080 us, total = 116.080 us [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 2.002 ms, total = 2.002 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 1.278 ms, total = 1.278 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 288.995 us, total = 288.995 us, Queueing time: mean = 122.868 us, max = 122.868 us, min = 122.868 us, total = 122.868 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 242.297 us, total = 242.297 us, Queueing time: mean = 122.977 us, max = 122.977 us, min = 122.977 us, total = 122.977 us [state-dump] DebugString() time ms: 1 [state-dump] [state-dump] [2025-01-21 01:18:54,944 I 13636 13664] (raylet) store.cc:564: Plasma store debug dump: Current usage: 0 / 2.14748 GB - num bytes created total: 168 0 pending objects of total size 0MB - objects spillable: 0 - bytes spillable: 0 - objects unsealed: 0 - bytes unsealed: 0 - objects in use: 0 - bytes in use: 0 - objects evictable: 0 - bytes evictable: 0 - objects created by worker: 0 - bytes created by worker: 0 - objects restored: 0 - bytes restored: 0 - objects received: 0 - bytes received: 0 - objects errored: 0 - bytes errored: 0 [2025-01-21 01:18:56,317 I 13636 13636] (raylet) node_manager.cc:525: [state-dump] NodeManager: [state-dump] Node ID: 94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3 [state-dump] Node name: 192.168.0.2 [state-dump] InitialConfigResources: {object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, GPU: 20000, node:__internal_head__: 10000, accelerator_type:A40: 10000, memory: 844966629380000} [state-dump] ClusterTaskManager: [state-dump] ========== Node: 94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3 ================= [state-dump] Infeasible queue length: 0 [state-dump] Schedule queue length: 0 [state-dump] Dispatch queue length: 0 [state-dump] num_waiting_for_resource: 0 [state-dump] num_waiting_for_plasma_memory: 0 [state-dump] num_waiting_for_remote_node_resources: 0 [state-dump] num_worker_not_started_by_job_config_not_exist: 0 [state-dump] num_worker_not_started_by_registration_timeout: 0 [state-dump] num_tasks_waiting_for_workers: 0 [state-dump] num_cancelled_tasks: 0 [state-dump] cluster_resource_scheduler state: [state-dump] Local id: -2729954820020547849 Local resources: {"total":{CPU: [200000], memory: [844966629380000], accelerator_type:A40: [10000], node:192.168.0.2: [10000], node:__internal_head__: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000]}}, "available": {CPU: [200000], memory: [844966629380000], accelerator_type:A40: [10000], node:192.168.0.2: [10000], node:__internal_head__: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000]}}, "labels":{"ray.io/node_id":"94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3",} is_draining: 0 is_idle: 1 Cluster resources: node id: -2729954820020547849{"total":{object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, GPU: 20000, node:__internal_head__: 10000, accelerator_type:A40: 10000, memory: 844966629380000}}, "available": {GPU: 20000, node:__internal_head__: 10000, memory: 844966629380000, accelerator_type:A40: 10000, object_store_memory: 21474836480000, node:192.168.0.2: 10000, CPU: 200000}}, "labels":{"ray.io/node_id":"94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} [state-dump] Waiting tasks size: 0 [state-dump] Number of executing tasks: 0 [state-dump] Number of pinned task arguments: 0 [state-dump] Number of total spilled tasks: 0 [state-dump] Number of spilled waiting tasks: 0 [state-dump] Number of spilled unschedulable tasks: 0 [state-dump] Resource usage { [state-dump] } [state-dump] Backlog Size per scheduling descriptor :{workerId: num backlogs}: [state-dump] [state-dump] Running tasks by scheduling class: [state-dump] ================================================== [state-dump] [state-dump] ClusterResources: [state-dump] LocalObjectManager: [state-dump] - num pinned objects: 0 [state-dump] - pinned objects size: 0 [state-dump] - num objects pending restore: 0 [state-dump] - num objects pending spill: 0 [state-dump] - num bytes pending spill: 0 [state-dump] - num bytes currently spilled: 0 [state-dump] - cumulative spill requests: 0 [state-dump] - cumulative restore requests: 0 [state-dump] - spilled objects pending delete: 0 [state-dump] [state-dump] ObjectManager: [state-dump] - num local objects: 0 [state-dump] - num unfulfilled push requests: 0 [state-dump] - num object pull requests: 0 [state-dump] - num chunks received total: 0 [state-dump] - num chunks received failed (all): 0 [state-dump] - num chunks received failed / cancelled: 0 [state-dump] - num chunks received failed / plasma error: 0 [state-dump] Event stats: [state-dump] Global stats: 0 total (0 active) [state-dump] Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Execution time: mean = -nan s, total = 0.000 s [state-dump] Event stats: [state-dump] PushManager: [state-dump] - num pushes in flight: 0 [state-dump] - num chunks in flight: 0 [state-dump] - num chunks remaining: 0 [state-dump] - max chunks allowed: 409 [state-dump] OwnershipBasedObjectDirectory: [state-dump] - num listeners: 0 [state-dump] - cumulative location updates: 0 [state-dump] - num location updates per second: 0.000 [state-dump] - num location lookups per second: 0.000 [state-dump] - num locations added per second: 0.000 [state-dump] - num locations removed per second: 0.000 [state-dump] BufferPool: [state-dump] - create buffer state map size: 0 [state-dump] PullManager: [state-dump] - num bytes available for pulled objects: 2147483648 [state-dump] - num bytes being pulled (all): 0 [state-dump] - num bytes being pulled / pinned: 0 [state-dump] - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - first get request bundle: N/A [state-dump] - first wait request bundle: N/A [state-dump] - first task request bundle: N/A [state-dump] - num objects queued: 0 [state-dump] - num objects actively pulled (all): 0 [state-dump] - num objects actively pulled / pinned: 0 [state-dump] - num bundles being pulled: 0 [state-dump] - num pull retries: 0 [state-dump] - max timeout seconds: 0 [state-dump] - max timeout request is already processed. No entry. [state-dump] [state-dump] WorkerPool: [state-dump] - registered jobs: 1 [state-dump] - process_failed_job_config_missing: 0 [state-dump] - process_failed_rate_limited: 0 [state-dump] - process_failed_pending_registration: 0 [state-dump] - process_failed_runtime_env_setup_failed: 0 [state-dump] - num PYTHON workers: 20 [state-dump] - num PYTHON drivers: 1 [state-dump] - num PYTHON pending start requests: 0 [state-dump] - num PYTHON pending registration requests: 0 [state-dump] - num object spill callbacks queued: 0 [state-dump] - num object restore queued: 0 [state-dump] - num util functions queued: 0 [state-dump] - num idle workers: 20 [state-dump] TaskDependencyManager: [state-dump] - task deps map size: 0 [state-dump] - get req map size: 0 [state-dump] - wait req map size: 0 [state-dump] - local objects map size: 0 [state-dump] WaitManager: [state-dump] - num active wait requests: 0 [state-dump] Subscriber: [state-dump] Channel WORKER_OBJECT_LOCATIONS_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_OBJECT_EVICTION [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_REF_REMOVED_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] num async plasma notifications: 0 [state-dump] Remote node managers: [state-dump] Event stats: [state-dump] Global stats: 717197 total (35 active) [state-dump] Queueing time: mean = 166.762 us, max = 59.826 s, min = -0.001 s, total = 119.601 s [state-dump] Execution time: mean = 11.055 ms, total = 7928.612 s [state-dump] Event stats: [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog - 172557 total (0 active), Execution time: mean = 532.827 us, total = 91.943 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 172557 total (0 active), Execution time: mean = 35.451 us, total = 6.117 s, Queueing time: mean = 112.142 us, max = 3.855 ms, min = 1.846 us, total = 19.351 s [state-dump] RaySyncer.OnDemandBroadcasting - 82120 total (1 active), Execution time: mean = 11.197 us, total = 919.523 ms, Queueing time: mean = 93.166 us, max = 25.869 ms, min = 6.166 us, total = 7.651 s [state-dump] NodeManager.CheckGC - 82120 total (1 active), Execution time: mean = 3.659 us, total = 300.487 ms, Queueing time: mean = 99.829 us, max = 25.875 ms, min = -0.000 s, total = 8.198 s [state-dump] ObjectManager.UpdateAvailableMemory - 82120 total (0 active), Execution time: mean = 6.420 us, total = 527.214 ms, Queueing time: mean = 112.523 us, max = 45.939 ms, min = 2.228 us, total = 9.240 s [state-dump] RayletWorkerPool.deadline_timer.kill_idle_workers - 41083 total (1 active), Execution time: mean = 19.217 us, total = 789.497 ms, Queueing time: mean = 78.095 us, max = 26.386 ms, min = -0.001 s, total = 3.208 s [state-dump] MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 32812 total (1 active), Execution time: mean = 460.873 us, total = 15.122 s, Queueing time: mean = 76.745 us, max = 4.063 ms, min = -0.000 s, total = 2.518 s [state-dump] NodeManager.ScheduleAndDispatchTasks - 8220 total (1 active), Execution time: mean = 17.672 us, total = 145.267 ms, Queueing time: mean = 76.175 us, max = 2.581 ms, min = 7.438 us, total = 626.158 ms [state-dump] NodeManager.deadline_timer.flush_free_objects - 8219 total (1 active), Execution time: mean = 9.610 us, total = 78.986 ms, Queueing time: mean = 179.308 us, max = 2.947 ms, min = 2.735 us, total = 1.474 s [state-dump] NodeManager.deadline_timer.spill_objects_when_over_threshold - 8219 total (1 active), Execution time: mean = 3.165 us, total = 26.009 ms, Queueing time: mean = 183.654 us, max = 2.946 ms, min = 3.845 us, total = 1.509 s [state-dump] NodeManagerService.grpc_server.GetResourceLoad - 8217 total (0 active), Execution time: mean = 616.100 us, total = 5.062 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 8217 total (0 active), Execution time: mean = 98.019 us, total = 805.419 ms, Queueing time: mean = 114.260 us, max = 2.934 ms, min = 4.027 us, total = 938.870 ms [state-dump] ClusterResourceManager.ResetRemoteNodeView - 2741 total (1 active), Execution time: mean = 10.107 us, total = 27.702 ms, Queueing time: mean = 76.619 us, max = 564.135 us, min = 12.181 us, total = 210.011 ms [state-dump] NodeManager.deadline_timer.record_metrics - 1644 total (1 active), Execution time: mean = 584.227 us, total = 960.469 ms, Queueing time: mean = 349.082 us, max = 2.010 ms, min = 9.115 us, total = 573.890 ms [state-dump] NodeManager.GcsCheckAlive - 1644 total (1 active), Execution time: mean = 320.132 us, total = 526.297 ms, Queueing time: mean = 612.706 us, max = 2.567 ms, min = 6.690 us, total = 1.007 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 1644 total (0 active), Execution time: mean = 56.029 us, total = 92.112 ms, Queueing time: mean = 112.572 us, max = 4.779 ms, min = 11.561 us, total = 185.068 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 1644 total (0 active), Execution time: mean = 1.582 ms, total = 2.600 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.deadline_timer.debug_state_dump - 822 total (1 active), Execution time: mean = 1.791 ms, total = 1.472 s, Queueing time: mean = 73.609 us, max = 248.460 us, min = 11.609 us, total = 60.506 ms [state-dump] NodeManager.deadline_timer.print_event_loop_stats - 137 total (1 active, 1 running), Execution time: mean = 2.763 ms, total = 378.507 ms, Queueing time: mean = 75.910 us, max = 325.100 us, min = 9.635 us, total = 10.400 ms [state-dump] ClientConnection.async_read.ProcessMessageHeader - 95 total (21 active), Execution time: mean = 7.620 us, total = 723.861 us, Queueing time: mean = 660.815 ms, max = 59.826 s, min = 29.537 us, total = 62.777 s [state-dump] ClientConnection.async_read.ProcessMessage - 74 total (0 active), Execution time: mean = 752.373 us, total = 55.676 ms, Queueing time: mean = 18.898 us, max = 247.615 us, min = 4.453 us, total = 1.398 ms [state-dump] ClientConnection.async_write.DoAsyncWrites - 22 total (0 active), Execution time: mean = 2.129 us, total = 46.834 us, Queueing time: mean = 43.475 us, max = 264.443 us, min = 17.727 us, total = 956.459 us [state-dump] CoreWorkerService.grpc_client.LocalGC.OnReplyReceived - 21 total (0 active), Execution time: mean = 65.428 us, total = 1.374 ms, Queueing time: mean = 149.534 us, max = 223.485 us, min = 56.603 us, total = 3.140 ms [state-dump] ObjectManager.ObjectAdded - 21 total (0 active), Execution time: mean = 13.522 us, total = 283.961 us, Queueing time: mean = 115.200 us, max = 203.067 us, min = 15.853 us, total = 2.419 ms [state-dump] CoreWorkerService.grpc_client.LocalGC - 21 total (0 active), Execution time: mean = 93.349 ms, total = 1.960 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ObjectManager.ObjectDeleted - 21 total (0 active), Execution time: mean = 18.654 us, total = 391.728 us, Queueing time: mean = 66.932 us, max = 269.040 us, min = 24.280 us, total = 1.406 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 21 total (0 active), Execution time: mean = 119.783 us, total = 2.515 ms, Queueing time: mean = 119.709 us, max = 193.975 us, min = 26.529 us, total = 2.514 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig - 21 total (0 active), Execution time: mean = 800.451 us, total = 16.809 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 15 total (1 active), Execution time: mean = 519.841 s, total = 7797.614 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 14 total (0 active), Execution time: mean = 384.839 us, total = 5.388 ms, Queueing time: mean = 129.434 us, max = 279.406 us, min = 20.299 us, total = 1.812 ms [state-dump] PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 204.482 us, total = 2.658 ms, Queueing time: mean = 3.109 ms, max = 9.915 ms, min = 28.359 us, total = 40.413 ms [state-dump] RaySyncer.BroadcastMessage - 12 total (0 active), Execution time: mean = 229.790 us, total = 2.757 ms, Queueing time: mean = 796.500 ns, max = 1.330 us, min = 180.000 ns, total = 9.558 us [state-dump] - 12 total (0 active), Execution time: mean = 1.109 us, total = 13.308 us, Queueing time: mean = 117.235 us, max = 207.332 us, min = 32.817 us, total = 1.407 ms [state-dump] WorkerPool.PopWorkerCallback - 10 total (0 active), Execution time: mean = 24.488 us, total = 244.878 us, Queueing time: mean = 114.997 us, max = 206.869 us, min = 23.653 us, total = 1.150 ms [state-dump] NodeManagerService.grpc_server.RequestWorkerLease.HandleRequestImpl - 10 total (0 active), Execution time: mean = 133.400 us, total = 1.334 ms, Queueing time: mean = 133.654 us, max = 258.016 us, min = 50.206 us, total = 1.337 ms [state-dump] NodeManagerService.grpc_server.RequestWorkerLease - 10 total (0 active), Execution time: mean = 967.372 us, total = 9.674 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReturnWorker - 10 total (0 active), Execution time: mean = 698.937 us, total = 6.989 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReturnWorker.HandleRequestImpl - 10 total (0 active), Execution time: mean = 129.906 us, total = 1.299 ms, Queueing time: mean = 109.463 us, max = 137.712 us, min = 31.332 us, total = 1.095 ms [state-dump] NodeManager.GCTaskFailureReason - 10 total (1 active), Execution time: mean = 8.497 us, total = 84.973 us, Queueing time: mean = 64.210 us, max = 97.290 us, min = 24.344 us, total = 642.101 us [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 1.320 ms, total = 2.640 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] RaySyncerRegister - 2 total (0 active), Execution time: mean = 2.053 us, total = 4.106 us, Queueing time: mean = 222.000 ns, max = 369.000 ns, min = 75.000 ns, total = 444.000 ns [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 142.119 us, total = 284.237 us, Queueing time: mean = 702.200 us, max = 1.287 ms, min = 117.240 us, total = 1.404 ms [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 92.080 us, total = 92.080 us, Queueing time: mean = 36.492 us, max = 36.492 us, min = 36.492 us, total = 36.492 us [state-dump] Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 74.750 us, total = 74.750 us, Queueing time: mean = 225.429 us, max = 225.429 us, min = 225.429 us, total = 225.429 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 1.746 ms, total = 1.746 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 1.935 ms, total = 1.935 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 1.022 s, total = 1.022 s, Queueing time: mean = 126.779 us, max = 126.779 us, min = 126.779 us, total = 126.779 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 2.178 ms, total = 2.178 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 43.810 us, total = 43.810 us, Queueing time: mean = 116.080 us, max = 116.080 us, min = 116.080 us, total = 116.080 us [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 2.002 ms, total = 2.002 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 1.278 ms, total = 1.278 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 288.995 us, total = 288.995 us, Queueing time: mean = 122.868 us, max = 122.868 us, min = 122.868 us, total = 122.868 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 242.297 us, total = 242.297 us, Queueing time: mean = 122.977 us, max = 122.977 us, min = 122.977 us, total = 122.977 us [state-dump] DebugString() time ms: 2 [state-dump] [state-dump] [2025-01-21 01:19:54,944 I 13636 13664] (raylet) store.cc:564: Plasma store debug dump: Current usage: 0 / 2.14748 GB - num bytes created total: 168 0 pending objects of total size 0MB - objects spillable: 0 - bytes spillable: 0 - objects unsealed: 0 - bytes unsealed: 0 - objects in use: 0 - bytes in use: 0 - objects evictable: 0 - bytes evictable: 0 - objects created by worker: 0 - bytes created by worker: 0 - objects restored: 0 - bytes restored: 0 - objects received: 0 - bytes received: 0 - objects errored: 0 - bytes errored: 0 [2025-01-21 01:19:56,320 I 13636 13636] (raylet) node_manager.cc:525: [state-dump] NodeManager: [state-dump] Node ID: 94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3 [state-dump] Node name: 192.168.0.2 [state-dump] InitialConfigResources: {object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, GPU: 20000, node:__internal_head__: 10000, accelerator_type:A40: 10000, memory: 844966629380000} [state-dump] ClusterTaskManager: [state-dump] ========== Node: 94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3 ================= [state-dump] Infeasible queue length: 0 [state-dump] Schedule queue length: 0 [state-dump] Dispatch queue length: 0 [state-dump] num_waiting_for_resource: 0 [state-dump] num_waiting_for_plasma_memory: 0 [state-dump] num_waiting_for_remote_node_resources: 0 [state-dump] num_worker_not_started_by_job_config_not_exist: 0 [state-dump] num_worker_not_started_by_registration_timeout: 0 [state-dump] num_tasks_waiting_for_workers: 0 [state-dump] num_cancelled_tasks: 0 [state-dump] cluster_resource_scheduler state: [state-dump] Local id: -2729954820020547849 Local resources: {"total":{CPU: [200000], memory: [844966629380000], accelerator_type:A40: [10000], node:192.168.0.2: [10000], node:__internal_head__: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000]}}, "available": {CPU: [200000], memory: [844966629380000], accelerator_type:A40: [10000], node:192.168.0.2: [10000], node:__internal_head__: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000]}}, "labels":{"ray.io/node_id":"94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3",} is_draining: 0 is_idle: 1 Cluster resources: node id: -2729954820020547849{"total":{object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, GPU: 20000, node:__internal_head__: 10000, accelerator_type:A40: 10000, memory: 844966629380000}}, "available": {GPU: 20000, node:__internal_head__: 10000, memory: 844966629380000, accelerator_type:A40: 10000, object_store_memory: 21474836480000, node:192.168.0.2: 10000, CPU: 200000}}, "labels":{"ray.io/node_id":"94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} [state-dump] Waiting tasks size: 0 [state-dump] Number of executing tasks: 0 [state-dump] Number of pinned task arguments: 0 [state-dump] Number of total spilled tasks: 0 [state-dump] Number of spilled waiting tasks: 0 [state-dump] Number of spilled unschedulable tasks: 0 [state-dump] Resource usage { [state-dump] } [state-dump] Backlog Size per scheduling descriptor :{workerId: num backlogs}: [state-dump] [state-dump] Running tasks by scheduling class: [state-dump] ================================================== [state-dump] [state-dump] ClusterResources: [state-dump] LocalObjectManager: [state-dump] - num pinned objects: 0 [state-dump] - pinned objects size: 0 [state-dump] - num objects pending restore: 0 [state-dump] - num objects pending spill: 0 [state-dump] - num bytes pending spill: 0 [state-dump] - num bytes currently spilled: 0 [state-dump] - cumulative spill requests: 0 [state-dump] - cumulative restore requests: 0 [state-dump] - spilled objects pending delete: 0 [state-dump] [state-dump] ObjectManager: [state-dump] - num local objects: 0 [state-dump] - num unfulfilled push requests: 0 [state-dump] - num object pull requests: 0 [state-dump] - num chunks received total: 0 [state-dump] - num chunks received failed (all): 0 [state-dump] - num chunks received failed / cancelled: 0 [state-dump] - num chunks received failed / plasma error: 0 [state-dump] Event stats: [state-dump] Global stats: 0 total (0 active) [state-dump] Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Execution time: mean = -nan s, total = 0.000 s [state-dump] Event stats: [state-dump] PushManager: [state-dump] - num pushes in flight: 0 [state-dump] - num chunks in flight: 0 [state-dump] - num chunks remaining: 0 [state-dump] - max chunks allowed: 409 [state-dump] OwnershipBasedObjectDirectory: [state-dump] - num listeners: 0 [state-dump] - cumulative location updates: 0 [state-dump] - num location updates per second: 0.000 [state-dump] - num location lookups per second: 0.000 [state-dump] - num locations added per second: 0.000 [state-dump] - num locations removed per second: 0.000 [state-dump] BufferPool: [state-dump] - create buffer state map size: 0 [state-dump] PullManager: [state-dump] - num bytes available for pulled objects: 2147483648 [state-dump] - num bytes being pulled (all): 0 [state-dump] - num bytes being pulled / pinned: 0 [state-dump] - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - first get request bundle: N/A [state-dump] - first wait request bundle: N/A [state-dump] - first task request bundle: N/A [state-dump] - num objects queued: 0 [state-dump] - num objects actively pulled (all): 0 [state-dump] - num objects actively pulled / pinned: 0 [state-dump] - num bundles being pulled: 0 [state-dump] - num pull retries: 0 [state-dump] - max timeout seconds: 0 [state-dump] - max timeout request is already processed. No entry. [state-dump] [state-dump] WorkerPool: [state-dump] - registered jobs: 1 [state-dump] - process_failed_job_config_missing: 0 [state-dump] - process_failed_rate_limited: 0 [state-dump] - process_failed_pending_registration: 0 [state-dump] - process_failed_runtime_env_setup_failed: 0 [state-dump] - num PYTHON workers: 20 [state-dump] - num PYTHON drivers: 1 [state-dump] - num PYTHON pending start requests: 0 [state-dump] - num PYTHON pending registration requests: 0 [state-dump] - num object spill callbacks queued: 0 [state-dump] - num object restore queued: 0 [state-dump] - num util functions queued: 0 [state-dump] - num idle workers: 20 [state-dump] TaskDependencyManager: [state-dump] - task deps map size: 0 [state-dump] - get req map size: 0 [state-dump] - wait req map size: 0 [state-dump] - local objects map size: 0 [state-dump] WaitManager: [state-dump] - num active wait requests: 0 [state-dump] Subscriber: [state-dump] Channel WORKER_OBJECT_LOCATIONS_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_OBJECT_EVICTION [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_REF_REMOVED_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] num async plasma notifications: 0 [state-dump] Remote node managers: [state-dump] Event stats: [state-dump] Global stats: 722428 total (35 active) [state-dump] Queueing time: mean = 166.153 us, max = 59.826 s, min = -0.001 s, total = 120.034 s [state-dump] Execution time: mean = 10.976 ms, total = 7929.551 s [state-dump] Event stats: [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog - 173817 total (0 active), Execution time: mean = 532.856 us, total = 92.619 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 173817 total (0 active), Execution time: mean = 35.446 us, total = 6.161 s, Queueing time: mean = 112.234 us, max = 3.855 ms, min = 1.846 us, total = 19.508 s [state-dump] RaySyncer.OnDemandBroadcasting - 82719 total (1 active), Execution time: mean = 11.201 us, total = 926.516 ms, Queueing time: mean = 93.165 us, max = 25.869 ms, min = 6.166 us, total = 7.706 s [state-dump] NodeManager.CheckGC - 82719 total (1 active), Execution time: mean = 3.655 us, total = 302.346 ms, Queueing time: mean = 99.834 us, max = 25.875 ms, min = -0.000 s, total = 8.258 s [state-dump] ObjectManager.UpdateAvailableMemory - 82719 total (0 active), Execution time: mean = 6.423 us, total = 531.294 ms, Queueing time: mean = 112.549 us, max = 45.939 ms, min = 2.228 us, total = 9.310 s [state-dump] RayletWorkerPool.deadline_timer.kill_idle_workers - 41383 total (1 active), Execution time: mean = 19.219 us, total = 795.357 ms, Queueing time: mean = 78.078 us, max = 26.386 ms, min = -0.001 s, total = 3.231 s [state-dump] MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 33051 total (1 active), Execution time: mean = 460.927 us, total = 15.234 s, Queueing time: mean = 76.782 us, max = 4.063 ms, min = -0.000 s, total = 2.538 s [state-dump] NodeManager.ScheduleAndDispatchTasks - 8280 total (1 active), Execution time: mean = 17.688 us, total = 146.461 ms, Queueing time: mean = 76.218 us, max = 2.581 ms, min = 7.438 us, total = 631.081 ms [state-dump] NodeManager.deadline_timer.flush_free_objects - 8279 total (1 active), Execution time: mean = 9.607 us, total = 79.540 ms, Queueing time: mean = 179.207 us, max = 2.947 ms, min = 2.735 us, total = 1.484 s [state-dump] NodeManager.deadline_timer.spill_objects_when_over_threshold - 8279 total (1 active), Execution time: mean = 3.163 us, total = 26.189 ms, Queueing time: mean = 183.548 us, max = 2.946 ms, min = 3.845 us, total = 1.520 s [state-dump] NodeManagerService.grpc_server.GetResourceLoad - 8277 total (0 active), Execution time: mean = 616.304 us, total = 5.101 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 8277 total (0 active), Execution time: mean = 98.049 us, total = 811.549 ms, Queueing time: mean = 114.350 us, max = 2.934 ms, min = 4.027 us, total = 946.476 ms [state-dump] ClusterResourceManager.ResetRemoteNodeView - 2761 total (1 active), Execution time: mean = 10.115 us, total = 27.927 ms, Queueing time: mean = 76.693 us, max = 564.135 us, min = 12.181 us, total = 211.750 ms [state-dump] NodeManager.deadline_timer.record_metrics - 1656 total (1 active), Execution time: mean = 583.950 us, total = 967.021 ms, Queueing time: mean = 348.837 us, max = 2.010 ms, min = 9.115 us, total = 577.673 ms [state-dump] NodeManager.GcsCheckAlive - 1656 total (1 active), Execution time: mean = 319.772 us, total = 529.542 ms, Queueing time: mean = 612.519 us, max = 2.567 ms, min = 6.690 us, total = 1.014 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 1656 total (0 active), Execution time: mean = 55.971 us, total = 92.688 ms, Queueing time: mean = 112.733 us, max = 4.779 ms, min = 11.561 us, total = 186.686 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 1656 total (0 active), Execution time: mean = 1.580 ms, total = 2.617 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.deadline_timer.debug_state_dump - 828 total (1 active), Execution time: mean = 1.790 ms, total = 1.482 s, Queueing time: mean = 74.023 us, max = 248.460 us, min = 11.609 us, total = 61.291 ms [state-dump] NodeManager.deadline_timer.print_event_loop_stats - 138 total (1 active, 1 running), Execution time: mean = 2.772 ms, total = 382.537 ms, Queueing time: mean = 75.956 us, max = 325.100 us, min = 9.635 us, total = 10.482 ms [state-dump] ClientConnection.async_read.ProcessMessageHeader - 95 total (21 active), Execution time: mean = 7.620 us, total = 723.861 us, Queueing time: mean = 660.815 ms, max = 59.826 s, min = 29.537 us, total = 62.777 s [state-dump] ClientConnection.async_read.ProcessMessage - 74 total (0 active), Execution time: mean = 752.373 us, total = 55.676 ms, Queueing time: mean = 18.898 us, max = 247.615 us, min = 4.453 us, total = 1.398 ms [state-dump] ClientConnection.async_write.DoAsyncWrites - 22 total (0 active), Execution time: mean = 2.129 us, total = 46.834 us, Queueing time: mean = 43.475 us, max = 264.443 us, min = 17.727 us, total = 956.459 us [state-dump] CoreWorkerService.grpc_client.LocalGC.OnReplyReceived - 21 total (0 active), Execution time: mean = 65.428 us, total = 1.374 ms, Queueing time: mean = 149.534 us, max = 223.485 us, min = 56.603 us, total = 3.140 ms [state-dump] ObjectManager.ObjectAdded - 21 total (0 active), Execution time: mean = 13.522 us, total = 283.961 us, Queueing time: mean = 115.200 us, max = 203.067 us, min = 15.853 us, total = 2.419 ms [state-dump] CoreWorkerService.grpc_client.LocalGC - 21 total (0 active), Execution time: mean = 93.349 ms, total = 1.960 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ObjectManager.ObjectDeleted - 21 total (0 active), Execution time: mean = 18.654 us, total = 391.728 us, Queueing time: mean = 66.932 us, max = 269.040 us, min = 24.280 us, total = 1.406 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 21 total (0 active), Execution time: mean = 119.783 us, total = 2.515 ms, Queueing time: mean = 119.709 us, max = 193.975 us, min = 26.529 us, total = 2.514 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig - 21 total (0 active), Execution time: mean = 800.451 us, total = 16.809 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 15 total (1 active), Execution time: mean = 519.841 s, total = 7797.614 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 14 total (0 active), Execution time: mean = 384.839 us, total = 5.388 ms, Queueing time: mean = 129.434 us, max = 279.406 us, min = 20.299 us, total = 1.812 ms [state-dump] PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 204.482 us, total = 2.658 ms, Queueing time: mean = 3.109 ms, max = 9.915 ms, min = 28.359 us, total = 40.413 ms [state-dump] RaySyncer.BroadcastMessage - 12 total (0 active), Execution time: mean = 229.790 us, total = 2.757 ms, Queueing time: mean = 796.500 ns, max = 1.330 us, min = 180.000 ns, total = 9.558 us [state-dump] - 12 total (0 active), Execution time: mean = 1.109 us, total = 13.308 us, Queueing time: mean = 117.235 us, max = 207.332 us, min = 32.817 us, total = 1.407 ms [state-dump] WorkerPool.PopWorkerCallback - 10 total (0 active), Execution time: mean = 24.488 us, total = 244.878 us, Queueing time: mean = 114.997 us, max = 206.869 us, min = 23.653 us, total = 1.150 ms [state-dump] NodeManagerService.grpc_server.RequestWorkerLease.HandleRequestImpl - 10 total (0 active), Execution time: mean = 133.400 us, total = 1.334 ms, Queueing time: mean = 133.654 us, max = 258.016 us, min = 50.206 us, total = 1.337 ms [state-dump] NodeManagerService.grpc_server.RequestWorkerLease - 10 total (0 active), Execution time: mean = 967.372 us, total = 9.674 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReturnWorker - 10 total (0 active), Execution time: mean = 698.937 us, total = 6.989 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReturnWorker.HandleRequestImpl - 10 total (0 active), Execution time: mean = 129.906 us, total = 1.299 ms, Queueing time: mean = 109.463 us, max = 137.712 us, min = 31.332 us, total = 1.095 ms [state-dump] NodeManager.GCTaskFailureReason - 10 total (1 active), Execution time: mean = 8.497 us, total = 84.973 us, Queueing time: mean = 64.210 us, max = 97.290 us, min = 24.344 us, total = 642.101 us [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 1.320 ms, total = 2.640 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] RaySyncerRegister - 2 total (0 active), Execution time: mean = 2.053 us, total = 4.106 us, Queueing time: mean = 222.000 ns, max = 369.000 ns, min = 75.000 ns, total = 444.000 ns [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 142.119 us, total = 284.237 us, Queueing time: mean = 702.200 us, max = 1.287 ms, min = 117.240 us, total = 1.404 ms [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 92.080 us, total = 92.080 us, Queueing time: mean = 36.492 us, max = 36.492 us, min = 36.492 us, total = 36.492 us [state-dump] Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 74.750 us, total = 74.750 us, Queueing time: mean = 225.429 us, max = 225.429 us, min = 225.429 us, total = 225.429 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 1.746 ms, total = 1.746 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 1.935 ms, total = 1.935 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 1.022 s, total = 1.022 s, Queueing time: mean = 126.779 us, max = 126.779 us, min = 126.779 us, total = 126.779 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 2.178 ms, total = 2.178 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 43.810 us, total = 43.810 us, Queueing time: mean = 116.080 us, max = 116.080 us, min = 116.080 us, total = 116.080 us [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 2.002 ms, total = 2.002 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 1.278 ms, total = 1.278 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 288.995 us, total = 288.995 us, Queueing time: mean = 122.868 us, max = 122.868 us, min = 122.868 us, total = 122.868 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 242.297 us, total = 242.297 us, Queueing time: mean = 122.977 us, max = 122.977 us, min = 122.977 us, total = 122.977 us [state-dump] DebugString() time ms: 1 [state-dump] [state-dump] [2025-01-21 01:20:54,944 I 13636 13664] (raylet) store.cc:564: Plasma store debug dump: Current usage: 0 / 2.14748 GB - num bytes created total: 168 0 pending objects of total size 0MB - objects spillable: 0 - bytes spillable: 0 - objects unsealed: 0 - bytes unsealed: 0 - objects in use: 0 - bytes in use: 0 - objects evictable: 0 - bytes evictable: 0 - objects created by worker: 0 - bytes created by worker: 0 - objects restored: 0 - bytes restored: 0 - objects received: 0 - bytes received: 0 - objects errored: 0 - bytes errored: 0 [2025-01-21 01:20:56,323 I 13636 13636] (raylet) node_manager.cc:525: [state-dump] NodeManager: [state-dump] Node ID: 94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3 [state-dump] Node name: 192.168.0.2 [state-dump] InitialConfigResources: {object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, GPU: 20000, node:__internal_head__: 10000, accelerator_type:A40: 10000, memory: 844966629380000} [state-dump] ClusterTaskManager: [state-dump] ========== Node: 94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3 ================= [state-dump] Infeasible queue length: 0 [state-dump] Schedule queue length: 0 [state-dump] Dispatch queue length: 0 [state-dump] num_waiting_for_resource: 0 [state-dump] num_waiting_for_plasma_memory: 0 [state-dump] num_waiting_for_remote_node_resources: 0 [state-dump] num_worker_not_started_by_job_config_not_exist: 0 [state-dump] num_worker_not_started_by_registration_timeout: 0 [state-dump] num_tasks_waiting_for_workers: 0 [state-dump] num_cancelled_tasks: 0 [state-dump] cluster_resource_scheduler state: [state-dump] Local id: -2729954820020547849 Local resources: {"total":{CPU: [200000], memory: [844966629380000], accelerator_type:A40: [10000], node:192.168.0.2: [10000], node:__internal_head__: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000]}}, "available": {CPU: [200000], memory: [844966629380000], accelerator_type:A40: [10000], node:192.168.0.2: [10000], node:__internal_head__: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000]}}, "labels":{"ray.io/node_id":"94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3",} is_draining: 0 is_idle: 1 Cluster resources: node id: -2729954820020547849{"total":{object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, GPU: 20000, node:__internal_head__: 10000, accelerator_type:A40: 10000, memory: 844966629380000}}, "available": {GPU: 20000, node:__internal_head__: 10000, memory: 844966629380000, accelerator_type:A40: 10000, object_store_memory: 21474836480000, node:192.168.0.2: 10000, CPU: 200000}}, "labels":{"ray.io/node_id":"94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} [state-dump] Waiting tasks size: 0 [state-dump] Number of executing tasks: 0 [state-dump] Number of pinned task arguments: 0 [state-dump] Number of total spilled tasks: 0 [state-dump] Number of spilled waiting tasks: 0 [state-dump] Number of spilled unschedulable tasks: 0 [state-dump] Resource usage { [state-dump] } [state-dump] Backlog Size per scheduling descriptor :{workerId: num backlogs}: [state-dump] [state-dump] Running tasks by scheduling class: [state-dump] ================================================== [state-dump] [state-dump] ClusterResources: [state-dump] LocalObjectManager: [state-dump] - num pinned objects: 0 [state-dump] - pinned objects size: 0 [state-dump] - num objects pending restore: 0 [state-dump] - num objects pending spill: 0 [state-dump] - num bytes pending spill: 0 [state-dump] - num bytes currently spilled: 0 [state-dump] - cumulative spill requests: 0 [state-dump] - cumulative restore requests: 0 [state-dump] - spilled objects pending delete: 0 [state-dump] [state-dump] ObjectManager: [state-dump] - num local objects: 0 [state-dump] - num unfulfilled push requests: 0 [state-dump] - num object pull requests: 0 [state-dump] - num chunks received total: 0 [state-dump] - num chunks received failed (all): 0 [state-dump] - num chunks received failed / cancelled: 0 [state-dump] - num chunks received failed / plasma error: 0 [state-dump] Event stats: [state-dump] Global stats: 0 total (0 active) [state-dump] Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Execution time: mean = -nan s, total = 0.000 s [state-dump] Event stats: [state-dump] PushManager: [state-dump] - num pushes in flight: 0 [state-dump] - num chunks in flight: 0 [state-dump] - num chunks remaining: 0 [state-dump] - max chunks allowed: 409 [state-dump] OwnershipBasedObjectDirectory: [state-dump] - num listeners: 0 [state-dump] - cumulative location updates: 0 [state-dump] - num location updates per second: 0.000 [state-dump] - num location lookups per second: 0.000 [state-dump] - num locations added per second: 0.000 [state-dump] - num locations removed per second: 0.000 [state-dump] BufferPool: [state-dump] - create buffer state map size: 0 [state-dump] PullManager: [state-dump] - num bytes available for pulled objects: 2147483648 [state-dump] - num bytes being pulled (all): 0 [state-dump] - num bytes being pulled / pinned: 0 [state-dump] - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - first get request bundle: N/A [state-dump] - first wait request bundle: N/A [state-dump] - first task request bundle: N/A [state-dump] - num objects queued: 0 [state-dump] - num objects actively pulled (all): 0 [state-dump] - num objects actively pulled / pinned: 0 [state-dump] - num bundles being pulled: 0 [state-dump] - num pull retries: 0 [state-dump] - max timeout seconds: 0 [state-dump] - max timeout request is already processed. No entry. [state-dump] [state-dump] WorkerPool: [state-dump] - registered jobs: 1 [state-dump] - process_failed_job_config_missing: 0 [state-dump] - process_failed_rate_limited: 0 [state-dump] - process_failed_pending_registration: 0 [state-dump] - process_failed_runtime_env_setup_failed: 0 [state-dump] - num PYTHON workers: 20 [state-dump] - num PYTHON drivers: 1 [state-dump] - num PYTHON pending start requests: 0 [state-dump] - num PYTHON pending registration requests: 0 [state-dump] - num object spill callbacks queued: 0 [state-dump] - num object restore queued: 0 [state-dump] - num util functions queued: 0 [state-dump] - num idle workers: 20 [state-dump] TaskDependencyManager: [state-dump] - task deps map size: 0 [state-dump] - get req map size: 0 [state-dump] - wait req map size: 0 [state-dump] - local objects map size: 0 [state-dump] WaitManager: [state-dump] - num active wait requests: 0 [state-dump] Subscriber: [state-dump] Channel WORKER_OBJECT_LOCATIONS_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_OBJECT_EVICTION [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_REF_REMOVED_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] num async plasma notifications: 0 [state-dump] Remote node managers: [state-dump] Event stats: [state-dump] Global stats: 727663 total (35 active) [state-dump] Queueing time: mean = 165.558 us, max = 59.826 s, min = -0.001 s, total = 120.471 s [state-dump] Execution time: mean = 10.899 ms, total = 7930.488 s [state-dump] Event stats: [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog - 175077 total (0 active), Execution time: mean = 532.865 us, total = 93.292 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 175077 total (0 active), Execution time: mean = 35.425 us, total = 6.202 s, Queueing time: mean = 112.283 us, max = 3.855 ms, min = 1.846 us, total = 19.658 s [state-dump] RaySyncer.OnDemandBroadcasting - 83319 total (1 active), Execution time: mean = 11.207 us, total = 933.761 ms, Queueing time: mean = 93.177 us, max = 25.869 ms, min = 6.166 us, total = 7.763 s [state-dump] NodeManager.CheckGC - 83319 total (1 active), Execution time: mean = 3.651 us, total = 304.197 ms, Queueing time: mean = 99.856 us, max = 25.875 ms, min = -0.000 s, total = 8.320 s [state-dump] ObjectManager.UpdateAvailableMemory - 83319 total (0 active), Execution time: mean = 6.426 us, total = 535.438 ms, Queueing time: mean = 112.592 us, max = 45.939 ms, min = 2.228 us, total = 9.381 s [state-dump] RayletWorkerPool.deadline_timer.kill_idle_workers - 41683 total (1 active), Execution time: mean = 19.236 us, total = 801.801 ms, Queueing time: mean = 78.114 us, max = 26.386 ms, min = -0.001 s, total = 3.256 s [state-dump] MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 33291 total (1 active), Execution time: mean = 460.987 us, total = 15.347 s, Queueing time: mean = 76.784 us, max = 4.063 ms, min = -0.000 s, total = 2.556 s [state-dump] NodeManager.ScheduleAndDispatchTasks - 8340 total (1 active), Execution time: mean = 17.691 us, total = 147.545 ms, Queueing time: mean = 76.181 us, max = 2.581 ms, min = 7.438 us, total = 635.346 ms [state-dump] NodeManager.deadline_timer.flush_free_objects - 8339 total (1 active), Execution time: mean = 9.601 us, total = 80.066 ms, Queueing time: mean = 179.295 us, max = 2.947 ms, min = 2.735 us, total = 1.495 s [state-dump] NodeManager.deadline_timer.spill_objects_when_over_threshold - 8339 total (1 active), Execution time: mean = 3.162 us, total = 26.364 ms, Queueing time: mean = 183.633 us, max = 2.946 ms, min = 3.845 us, total = 1.531 s [state-dump] NodeManagerService.grpc_server.GetResourceLoad - 8337 total (0 active), Execution time: mean = 616.766 us, total = 5.142 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 8337 total (0 active), Execution time: mean = 98.040 us, total = 817.357 ms, Queueing time: mean = 114.710 us, max = 2.934 ms, min = 4.027 us, total = 956.340 ms [state-dump] ClusterResourceManager.ResetRemoteNodeView - 2781 total (1 active), Execution time: mean = 10.129 us, total = 28.168 ms, Queueing time: mean = 76.697 us, max = 564.135 us, min = 12.181 us, total = 213.294 ms [state-dump] NodeManager.deadline_timer.record_metrics - 1668 total (1 active), Execution time: mean = 584.132 us, total = 974.332 ms, Queueing time: mean = 349.025 us, max = 2.010 ms, min = 9.115 us, total = 582.173 ms [state-dump] NodeManager.GcsCheckAlive - 1668 total (1 active), Execution time: mean = 319.477 us, total = 532.888 ms, Queueing time: mean = 613.225 us, max = 2.567 ms, min = 6.690 us, total = 1.023 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 1668 total (0 active), Execution time: mean = 55.957 us, total = 93.336 ms, Queueing time: mean = 112.575 us, max = 4.779 ms, min = 11.561 us, total = 187.775 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 1668 total (0 active), Execution time: mean = 1.579 ms, total = 2.634 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.deadline_timer.debug_state_dump - 834 total (1 active), Execution time: mean = 1.790 ms, total = 1.493 s, Queueing time: mean = 74.298 us, max = 248.460 us, min = 11.609 us, total = 61.964 ms [state-dump] NodeManager.deadline_timer.print_event_loop_stats - 139 total (1 active, 1 running), Execution time: mean = 2.774 ms, total = 385.638 ms, Queueing time: mean = 75.655 us, max = 325.100 us, min = 9.635 us, total = 10.516 ms [state-dump] ClientConnection.async_read.ProcessMessageHeader - 95 total (21 active), Execution time: mean = 7.620 us, total = 723.861 us, Queueing time: mean = 660.815 ms, max = 59.826 s, min = 29.537 us, total = 62.777 s [state-dump] ClientConnection.async_read.ProcessMessage - 74 total (0 active), Execution time: mean = 752.373 us, total = 55.676 ms, Queueing time: mean = 18.898 us, max = 247.615 us, min = 4.453 us, total = 1.398 ms [state-dump] ClientConnection.async_write.DoAsyncWrites - 22 total (0 active), Execution time: mean = 2.129 us, total = 46.834 us, Queueing time: mean = 43.475 us, max = 264.443 us, min = 17.727 us, total = 956.459 us [state-dump] CoreWorkerService.grpc_client.LocalGC.OnReplyReceived - 21 total (0 active), Execution time: mean = 65.428 us, total = 1.374 ms, Queueing time: mean = 149.534 us, max = 223.485 us, min = 56.603 us, total = 3.140 ms [state-dump] ObjectManager.ObjectAdded - 21 total (0 active), Execution time: mean = 13.522 us, total = 283.961 us, Queueing time: mean = 115.200 us, max = 203.067 us, min = 15.853 us, total = 2.419 ms [state-dump] CoreWorkerService.grpc_client.LocalGC - 21 total (0 active), Execution time: mean = 93.349 ms, total = 1.960 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ObjectManager.ObjectDeleted - 21 total (0 active), Execution time: mean = 18.654 us, total = 391.728 us, Queueing time: mean = 66.932 us, max = 269.040 us, min = 24.280 us, total = 1.406 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 21 total (0 active), Execution time: mean = 119.783 us, total = 2.515 ms, Queueing time: mean = 119.709 us, max = 193.975 us, min = 26.529 us, total = 2.514 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig - 21 total (0 active), Execution time: mean = 800.451 us, total = 16.809 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 15 total (1 active), Execution time: mean = 519.841 s, total = 7797.614 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 14 total (0 active), Execution time: mean = 384.839 us, total = 5.388 ms, Queueing time: mean = 129.434 us, max = 279.406 us, min = 20.299 us, total = 1.812 ms [state-dump] PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 204.482 us, total = 2.658 ms, Queueing time: mean = 3.109 ms, max = 9.915 ms, min = 28.359 us, total = 40.413 ms [state-dump] RaySyncer.BroadcastMessage - 12 total (0 active), Execution time: mean = 229.790 us, total = 2.757 ms, Queueing time: mean = 796.500 ns, max = 1.330 us, min = 180.000 ns, total = 9.558 us [state-dump] - 12 total (0 active), Execution time: mean = 1.109 us, total = 13.308 us, Queueing time: mean = 117.235 us, max = 207.332 us, min = 32.817 us, total = 1.407 ms [state-dump] WorkerPool.PopWorkerCallback - 10 total (0 active), Execution time: mean = 24.488 us, total = 244.878 us, Queueing time: mean = 114.997 us, max = 206.869 us, min = 23.653 us, total = 1.150 ms [state-dump] NodeManagerService.grpc_server.RequestWorkerLease.HandleRequestImpl - 10 total (0 active), Execution time: mean = 133.400 us, total = 1.334 ms, Queueing time: mean = 133.654 us, max = 258.016 us, min = 50.206 us, total = 1.337 ms [state-dump] NodeManagerService.grpc_server.RequestWorkerLease - 10 total (0 active), Execution time: mean = 967.372 us, total = 9.674 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReturnWorker - 10 total (0 active), Execution time: mean = 698.937 us, total = 6.989 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReturnWorker.HandleRequestImpl - 10 total (0 active), Execution time: mean = 129.906 us, total = 1.299 ms, Queueing time: mean = 109.463 us, max = 137.712 us, min = 31.332 us, total = 1.095 ms [state-dump] NodeManager.GCTaskFailureReason - 10 total (1 active), Execution time: mean = 8.497 us, total = 84.973 us, Queueing time: mean = 64.210 us, max = 97.290 us, min = 24.344 us, total = 642.101 us [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 1.320 ms, total = 2.640 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] RaySyncerRegister - 2 total (0 active), Execution time: mean = 2.053 us, total = 4.106 us, Queueing time: mean = 222.000 ns, max = 369.000 ns, min = 75.000 ns, total = 444.000 ns [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 142.119 us, total = 284.237 us, Queueing time: mean = 702.200 us, max = 1.287 ms, min = 117.240 us, total = 1.404 ms [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 92.080 us, total = 92.080 us, Queueing time: mean = 36.492 us, max = 36.492 us, min = 36.492 us, total = 36.492 us [state-dump] Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 74.750 us, total = 74.750 us, Queueing time: mean = 225.429 us, max = 225.429 us, min = 225.429 us, total = 225.429 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 1.746 ms, total = 1.746 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 1.935 ms, total = 1.935 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 1.022 s, total = 1.022 s, Queueing time: mean = 126.779 us, max = 126.779 us, min = 126.779 us, total = 126.779 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 2.178 ms, total = 2.178 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 43.810 us, total = 43.810 us, Queueing time: mean = 116.080 us, max = 116.080 us, min = 116.080 us, total = 116.080 us [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 2.002 ms, total = 2.002 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 1.278 ms, total = 1.278 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 288.995 us, total = 288.995 us, Queueing time: mean = 122.868 us, max = 122.868 us, min = 122.868 us, total = 122.868 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 242.297 us, total = 242.297 us, Queueing time: mean = 122.977 us, max = 122.977 us, min = 122.977 us, total = 122.977 us [state-dump] DebugString() time ms: 1 [state-dump] [state-dump] [2025-01-21 01:21:54,945 I 13636 13664] (raylet) store.cc:564: Plasma store debug dump: Current usage: 0 / 2.14748 GB - num bytes created total: 168 0 pending objects of total size 0MB - objects spillable: 0 - bytes spillable: 0 - objects unsealed: 0 - bytes unsealed: 0 - objects in use: 0 - bytes in use: 0 - objects evictable: 0 - bytes evictable: 0 - objects created by worker: 0 - bytes created by worker: 0 - objects restored: 0 - bytes restored: 0 - objects received: 0 - bytes received: 0 - objects errored: 0 - bytes errored: 0 [2025-01-21 01:21:56,326 I 13636 13636] (raylet) node_manager.cc:525: [state-dump] NodeManager: [state-dump] Node ID: 94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3 [state-dump] Node name: 192.168.0.2 [state-dump] InitialConfigResources: {object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, GPU: 20000, node:__internal_head__: 10000, accelerator_type:A40: 10000, memory: 844966629380000} [state-dump] ClusterTaskManager: [state-dump] ========== Node: 94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3 ================= [state-dump] Infeasible queue length: 0 [state-dump] Schedule queue length: 0 [state-dump] Dispatch queue length: 0 [state-dump] num_waiting_for_resource: 0 [state-dump] num_waiting_for_plasma_memory: 0 [state-dump] num_waiting_for_remote_node_resources: 0 [state-dump] num_worker_not_started_by_job_config_not_exist: 0 [state-dump] num_worker_not_started_by_registration_timeout: 0 [state-dump] num_tasks_waiting_for_workers: 0 [state-dump] num_cancelled_tasks: 0 [state-dump] cluster_resource_scheduler state: [state-dump] Local id: -2729954820020547849 Local resources: {"total":{CPU: [200000], memory: [844966629380000], accelerator_type:A40: [10000], node:192.168.0.2: [10000], node:__internal_head__: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000]}}, "available": {CPU: [200000], memory: [844966629380000], accelerator_type:A40: [10000], node:192.168.0.2: [10000], node:__internal_head__: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000]}}, "labels":{"ray.io/node_id":"94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3",} is_draining: 0 is_idle: 1 Cluster resources: node id: -2729954820020547849{"total":{object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, GPU: 20000, node:__internal_head__: 10000, accelerator_type:A40: 10000, memory: 844966629380000}}, "available": {GPU: 20000, node:__internal_head__: 10000, memory: 844966629380000, accelerator_type:A40: 10000, object_store_memory: 21474836480000, node:192.168.0.2: 10000, CPU: 200000}}, "labels":{"ray.io/node_id":"94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} [state-dump] Waiting tasks size: 0 [state-dump] Number of executing tasks: 0 [state-dump] Number of pinned task arguments: 0 [state-dump] Number of total spilled tasks: 0 [state-dump] Number of spilled waiting tasks: 0 [state-dump] Number of spilled unschedulable tasks: 0 [state-dump] Resource usage { [state-dump] } [state-dump] Backlog Size per scheduling descriptor :{workerId: num backlogs}: [state-dump] [state-dump] Running tasks by scheduling class: [state-dump] ================================================== [state-dump] [state-dump] ClusterResources: [state-dump] LocalObjectManager: [state-dump] - num pinned objects: 0 [state-dump] - pinned objects size: 0 [state-dump] - num objects pending restore: 0 [state-dump] - num objects pending spill: 0 [state-dump] - num bytes pending spill: 0 [state-dump] - num bytes currently spilled: 0 [state-dump] - cumulative spill requests: 0 [state-dump] - cumulative restore requests: 0 [state-dump] - spilled objects pending delete: 0 [state-dump] [state-dump] ObjectManager: [state-dump] - num local objects: 0 [state-dump] - num unfulfilled push requests: 0 [state-dump] - num object pull requests: 0 [state-dump] - num chunks received total: 0 [state-dump] - num chunks received failed (all): 0 [state-dump] - num chunks received failed / cancelled: 0 [state-dump] - num chunks received failed / plasma error: 0 [state-dump] Event stats: [state-dump] Global stats: 0 total (0 active) [state-dump] Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Execution time: mean = -nan s, total = 0.000 s [state-dump] Event stats: [state-dump] PushManager: [state-dump] - num pushes in flight: 0 [state-dump] - num chunks in flight: 0 [state-dump] - num chunks remaining: 0 [state-dump] - max chunks allowed: 409 [state-dump] OwnershipBasedObjectDirectory: [state-dump] - num listeners: 0 [state-dump] - cumulative location updates: 0 [state-dump] - num location updates per second: 0.000 [state-dump] - num location lookups per second: 0.000 [state-dump] - num locations added per second: 0.000 [state-dump] - num locations removed per second: 0.000 [state-dump] BufferPool: [state-dump] - create buffer state map size: 0 [state-dump] PullManager: [state-dump] - num bytes available for pulled objects: 2147483648 [state-dump] - num bytes being pulled (all): 0 [state-dump] - num bytes being pulled / pinned: 0 [state-dump] - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - first get request bundle: N/A [state-dump] - first wait request bundle: N/A [state-dump] - first task request bundle: N/A [state-dump] - num objects queued: 0 [state-dump] - num objects actively pulled (all): 0 [state-dump] - num objects actively pulled / pinned: 0 [state-dump] - num bundles being pulled: 0 [state-dump] - num pull retries: 0 [state-dump] - max timeout seconds: 0 [state-dump] - max timeout request is already processed. No entry. [state-dump] [state-dump] WorkerPool: [state-dump] - registered jobs: 1 [state-dump] - process_failed_job_config_missing: 0 [state-dump] - process_failed_rate_limited: 0 [state-dump] - process_failed_pending_registration: 0 [state-dump] - process_failed_runtime_env_setup_failed: 0 [state-dump] - num PYTHON workers: 20 [state-dump] - num PYTHON drivers: 1 [state-dump] - num PYTHON pending start requests: 0 [state-dump] - num PYTHON pending registration requests: 0 [state-dump] - num object spill callbacks queued: 0 [state-dump] - num object restore queued: 0 [state-dump] - num util functions queued: 0 [state-dump] - num idle workers: 20 [state-dump] TaskDependencyManager: [state-dump] - task deps map size: 0 [state-dump] - get req map size: 0 [state-dump] - wait req map size: 0 [state-dump] - local objects map size: 0 [state-dump] WaitManager: [state-dump] - num active wait requests: 0 [state-dump] Subscriber: [state-dump] Channel WORKER_OBJECT_LOCATIONS_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_OBJECT_EVICTION [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_REF_REMOVED_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] num async plasma notifications: 0 [state-dump] Remote node managers: [state-dump] Event stats: [state-dump] Global stats: 732896 total (35 active) [state-dump] Queueing time: mean = 165.010 us, max = 59.826 s, min = -0.001 s, total = 120.935 s [state-dump] Execution time: mean = 11.641 ms, total = 8531.466 s [state-dump] Event stats: [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog - 176337 total (0 active), Execution time: mean = 533.075 us, total = 94.001 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 176337 total (0 active), Execution time: mean = 35.420 us, total = 6.246 s, Queueing time: mean = 112.490 us, max = 3.855 ms, min = 1.846 us, total = 19.836 s [state-dump] RaySyncer.OnDemandBroadcasting - 83918 total (1 active), Execution time: mean = 11.215 us, total = 941.132 ms, Queueing time: mean = 93.206 us, max = 25.869 ms, min = 6.166 us, total = 7.822 s [state-dump] NodeManager.CheckGC - 83918 total (1 active), Execution time: mean = 3.647 us, total = 306.081 ms, Queueing time: mean = 99.895 us, max = 25.875 ms, min = -0.000 s, total = 8.383 s [state-dump] ObjectManager.UpdateAvailableMemory - 83918 total (0 active), Execution time: mean = 6.430 us, total = 539.610 ms, Queueing time: mean = 112.602 us, max = 45.939 ms, min = 2.228 us, total = 9.449 s [state-dump] RayletWorkerPool.deadline_timer.kill_idle_workers - 41983 total (1 active), Execution time: mean = 19.244 us, total = 807.919 ms, Queueing time: mean = 78.117 us, max = 26.386 ms, min = -0.001 s, total = 3.280 s [state-dump] MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 33530 total (1 active), Execution time: mean = 461.104 us, total = 15.461 s, Queueing time: mean = 76.788 us, max = 4.063 ms, min = -0.000 s, total = 2.575 s [state-dump] NodeManager.ScheduleAndDispatchTasks - 8400 total (1 active), Execution time: mean = 17.706 us, total = 148.728 ms, Queueing time: mean = 76.222 us, max = 2.581 ms, min = 7.438 us, total = 640.267 ms [state-dump] NodeManager.deadline_timer.flush_free_objects - 8399 total (1 active), Execution time: mean = 9.600 us, total = 80.632 ms, Queueing time: mean = 179.475 us, max = 2.947 ms, min = 2.735 us, total = 1.507 s [state-dump] NodeManager.deadline_timer.spill_objects_when_over_threshold - 8399 total (1 active), Execution time: mean = 3.162 us, total = 26.558 ms, Queueing time: mean = 183.809 us, max = 2.946 ms, min = 3.845 us, total = 1.544 s [state-dump] NodeManagerService.grpc_server.GetResourceLoad - 8397 total (0 active), Execution time: mean = 616.775 us, total = 5.179 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 8397 total (0 active), Execution time: mean = 98.050 us, total = 823.325 ms, Queueing time: mean = 114.739 us, max = 2.934 ms, min = 4.027 us, total = 963.460 ms [state-dump] ClusterResourceManager.ResetRemoteNodeView - 2801 total (1 active), Execution time: mean = 10.131 us, total = 28.377 ms, Queueing time: mean = 76.704 us, max = 564.135 us, min = 12.181 us, total = 214.848 ms [state-dump] NodeManager.deadline_timer.record_metrics - 1680 total (1 active), Execution time: mean = 584.817 us, total = 982.493 ms, Queueing time: mean = 349.153 us, max = 2.010 ms, min = 9.115 us, total = 586.576 ms [state-dump] NodeManager.GcsCheckAlive - 1680 total (1 active), Execution time: mean = 319.324 us, total = 536.465 ms, Queueing time: mean = 614.228 us, max = 2.567 ms, min = 6.690 us, total = 1.032 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 1680 total (0 active), Execution time: mean = 55.929 us, total = 93.961 ms, Queueing time: mean = 112.568 us, max = 4.779 ms, min = 11.561 us, total = 189.114 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 1680 total (0 active), Execution time: mean = 1.579 ms, total = 2.653 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.deadline_timer.debug_state_dump - 840 total (1 active), Execution time: mean = 1.790 ms, total = 1.504 s, Queueing time: mean = 75.255 us, max = 466.018 us, min = 11.609 us, total = 63.214 ms [state-dump] NodeManager.deadline_timer.print_event_loop_stats - 140 total (1 active, 1 running), Execution time: mean = 2.776 ms, total = 388.631 ms, Queueing time: mean = 75.593 us, max = 325.100 us, min = 9.635 us, total = 10.583 ms [state-dump] ClientConnection.async_read.ProcessMessageHeader - 95 total (21 active), Execution time: mean = 7.620 us, total = 723.861 us, Queueing time: mean = 660.815 ms, max = 59.826 s, min = 29.537 us, total = 62.777 s [state-dump] ClientConnection.async_read.ProcessMessage - 74 total (0 active), Execution time: mean = 752.373 us, total = 55.676 ms, Queueing time: mean = 18.898 us, max = 247.615 us, min = 4.453 us, total = 1.398 ms [state-dump] ClientConnection.async_write.DoAsyncWrites - 22 total (0 active), Execution time: mean = 2.129 us, total = 46.834 us, Queueing time: mean = 43.475 us, max = 264.443 us, min = 17.727 us, total = 956.459 us [state-dump] CoreWorkerService.grpc_client.LocalGC.OnReplyReceived - 21 total (0 active), Execution time: mean = 65.428 us, total = 1.374 ms, Queueing time: mean = 149.534 us, max = 223.485 us, min = 56.603 us, total = 3.140 ms [state-dump] ObjectManager.ObjectAdded - 21 total (0 active), Execution time: mean = 13.522 us, total = 283.961 us, Queueing time: mean = 115.200 us, max = 203.067 us, min = 15.853 us, total = 2.419 ms [state-dump] CoreWorkerService.grpc_client.LocalGC - 21 total (0 active), Execution time: mean = 93.349 ms, total = 1.960 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ObjectManager.ObjectDeleted - 21 total (0 active), Execution time: mean = 18.654 us, total = 391.728 us, Queueing time: mean = 66.932 us, max = 269.040 us, min = 24.280 us, total = 1.406 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 21 total (0 active), Execution time: mean = 119.783 us, total = 2.515 ms, Queueing time: mean = 119.709 us, max = 193.975 us, min = 26.529 us, total = 2.514 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig - 21 total (0 active), Execution time: mean = 800.451 us, total = 16.809 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 16 total (1 active), Execution time: mean = 524.851 s, total = 8397.615 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 15 total (0 active), Execution time: mean = 381.405 us, total = 5.721 ms, Queueing time: mean = 123.825 us, max = 279.406 us, min = 20.299 us, total = 1.857 ms [state-dump] PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 204.482 us, total = 2.658 ms, Queueing time: mean = 3.109 ms, max = 9.915 ms, min = 28.359 us, total = 40.413 ms [state-dump] RaySyncer.BroadcastMessage - 12 total (0 active), Execution time: mean = 229.790 us, total = 2.757 ms, Queueing time: mean = 796.500 ns, max = 1.330 us, min = 180.000 ns, total = 9.558 us [state-dump] - 12 total (0 active), Execution time: mean = 1.109 us, total = 13.308 us, Queueing time: mean = 117.235 us, max = 207.332 us, min = 32.817 us, total = 1.407 ms [state-dump] WorkerPool.PopWorkerCallback - 10 total (0 active), Execution time: mean = 24.488 us, total = 244.878 us, Queueing time: mean = 114.997 us, max = 206.869 us, min = 23.653 us, total = 1.150 ms [state-dump] NodeManagerService.grpc_server.RequestWorkerLease.HandleRequestImpl - 10 total (0 active), Execution time: mean = 133.400 us, total = 1.334 ms, Queueing time: mean = 133.654 us, max = 258.016 us, min = 50.206 us, total = 1.337 ms [state-dump] NodeManagerService.grpc_server.RequestWorkerLease - 10 total (0 active), Execution time: mean = 967.372 us, total = 9.674 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReturnWorker - 10 total (0 active), Execution time: mean = 698.937 us, total = 6.989 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReturnWorker.HandleRequestImpl - 10 total (0 active), Execution time: mean = 129.906 us, total = 1.299 ms, Queueing time: mean = 109.463 us, max = 137.712 us, min = 31.332 us, total = 1.095 ms [state-dump] NodeManager.GCTaskFailureReason - 10 total (1 active), Execution time: mean = 8.497 us, total = 84.973 us, Queueing time: mean = 64.210 us, max = 97.290 us, min = 24.344 us, total = 642.101 us [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 1.320 ms, total = 2.640 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] RaySyncerRegister - 2 total (0 active), Execution time: mean = 2.053 us, total = 4.106 us, Queueing time: mean = 222.000 ns, max = 369.000 ns, min = 75.000 ns, total = 444.000 ns [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 142.119 us, total = 284.237 us, Queueing time: mean = 702.200 us, max = 1.287 ms, min = 117.240 us, total = 1.404 ms [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 92.080 us, total = 92.080 us, Queueing time: mean = 36.492 us, max = 36.492 us, min = 36.492 us, total = 36.492 us [state-dump] Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 74.750 us, total = 74.750 us, Queueing time: mean = 225.429 us, max = 225.429 us, min = 225.429 us, total = 225.429 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 1.746 ms, total = 1.746 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 1.935 ms, total = 1.935 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 1.022 s, total = 1.022 s, Queueing time: mean = 126.779 us, max = 126.779 us, min = 126.779 us, total = 126.779 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 2.178 ms, total = 2.178 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 43.810 us, total = 43.810 us, Queueing time: mean = 116.080 us, max = 116.080 us, min = 116.080 us, total = 116.080 us [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 2.002 ms, total = 2.002 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 1.278 ms, total = 1.278 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 288.995 us, total = 288.995 us, Queueing time: mean = 122.868 us, max = 122.868 us, min = 122.868 us, total = 122.868 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 242.297 us, total = 242.297 us, Queueing time: mean = 122.977 us, max = 122.977 us, min = 122.977 us, total = 122.977 us [state-dump] DebugString() time ms: 1 [state-dump] [state-dump] [2025-01-21 01:22:54,945 I 13636 13664] (raylet) store.cc:564: Plasma store debug dump: Current usage: 0 / 2.14748 GB - num bytes created total: 168 0 pending objects of total size 0MB - objects spillable: 0 - bytes spillable: 0 - objects unsealed: 0 - bytes unsealed: 0 - objects in use: 0 - bytes in use: 0 - objects evictable: 0 - bytes evictable: 0 - objects created by worker: 0 - bytes created by worker: 0 - objects restored: 0 - bytes restored: 0 - objects received: 0 - bytes received: 0 - objects errored: 0 - bytes errored: 0 [2025-01-21 01:22:56,329 I 13636 13636] (raylet) node_manager.cc:525: [state-dump] NodeManager: [state-dump] Node ID: 94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3 [state-dump] Node name: 192.168.0.2 [state-dump] InitialConfigResources: {object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, GPU: 20000, node:__internal_head__: 10000, accelerator_type:A40: 10000, memory: 844966629380000} [state-dump] ClusterTaskManager: [state-dump] ========== Node: 94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3 ================= [state-dump] Infeasible queue length: 0 [state-dump] Schedule queue length: 0 [state-dump] Dispatch queue length: 0 [state-dump] num_waiting_for_resource: 0 [state-dump] num_waiting_for_plasma_memory: 0 [state-dump] num_waiting_for_remote_node_resources: 0 [state-dump] num_worker_not_started_by_job_config_not_exist: 0 [state-dump] num_worker_not_started_by_registration_timeout: 0 [state-dump] num_tasks_waiting_for_workers: 0 [state-dump] num_cancelled_tasks: 0 [state-dump] cluster_resource_scheduler state: [state-dump] Local id: -2729954820020547849 Local resources: {"total":{CPU: [200000], memory: [844966629380000], accelerator_type:A40: [10000], node:192.168.0.2: [10000], node:__internal_head__: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000]}}, "available": {CPU: [200000], memory: [844966629380000], accelerator_type:A40: [10000], node:192.168.0.2: [10000], node:__internal_head__: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000]}}, "labels":{"ray.io/node_id":"94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3",} is_draining: 0 is_idle: 1 Cluster resources: node id: -2729954820020547849{"total":{object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, GPU: 20000, node:__internal_head__: 10000, accelerator_type:A40: 10000, memory: 844966629380000}}, "available": {GPU: 20000, node:__internal_head__: 10000, memory: 844966629380000, accelerator_type:A40: 10000, object_store_memory: 21474836480000, node:192.168.0.2: 10000, CPU: 200000}}, "labels":{"ray.io/node_id":"94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} [state-dump] Waiting tasks size: 0 [state-dump] Number of executing tasks: 0 [state-dump] Number of pinned task arguments: 0 [state-dump] Number of total spilled tasks: 0 [state-dump] Number of spilled waiting tasks: 0 [state-dump] Number of spilled unschedulable tasks: 0 [state-dump] Resource usage { [state-dump] } [state-dump] Backlog Size per scheduling descriptor :{workerId: num backlogs}: [state-dump] [state-dump] Running tasks by scheduling class: [state-dump] ================================================== [state-dump] [state-dump] ClusterResources: [state-dump] LocalObjectManager: [state-dump] - num pinned objects: 0 [state-dump] - pinned objects size: 0 [state-dump] - num objects pending restore: 0 [state-dump] - num objects pending spill: 0 [state-dump] - num bytes pending spill: 0 [state-dump] - num bytes currently spilled: 0 [state-dump] - cumulative spill requests: 0 [state-dump] - cumulative restore requests: 0 [state-dump] - spilled objects pending delete: 0 [state-dump] [state-dump] ObjectManager: [state-dump] - num local objects: 0 [state-dump] - num unfulfilled push requests: 0 [state-dump] - num object pull requests: 0 [state-dump] - num chunks received total: 0 [state-dump] - num chunks received failed (all): 0 [state-dump] - num chunks received failed / cancelled: 0 [state-dump] - num chunks received failed / plasma error: 0 [state-dump] Event stats: [state-dump] Global stats: 0 total (0 active) [state-dump] Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Execution time: mean = -nan s, total = 0.000 s [state-dump] Event stats: [state-dump] PushManager: [state-dump] - num pushes in flight: 0 [state-dump] - num chunks in flight: 0 [state-dump] - num chunks remaining: 0 [state-dump] - max chunks allowed: 409 [state-dump] OwnershipBasedObjectDirectory: [state-dump] - num listeners: 0 [state-dump] - cumulative location updates: 0 [state-dump] - num location updates per second: 0.000 [state-dump] - num location lookups per second: 0.000 [state-dump] - num locations added per second: 0.000 [state-dump] - num locations removed per second: 0.000 [state-dump] BufferPool: [state-dump] - create buffer state map size: 0 [state-dump] PullManager: [state-dump] - num bytes available for pulled objects: 2147483648 [state-dump] - num bytes being pulled (all): 0 [state-dump] - num bytes being pulled / pinned: 0 [state-dump] - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - first get request bundle: N/A [state-dump] - first wait request bundle: N/A [state-dump] - first task request bundle: N/A [state-dump] - num objects queued: 0 [state-dump] - num objects actively pulled (all): 0 [state-dump] - num objects actively pulled / pinned: 0 [state-dump] - num bundles being pulled: 0 [state-dump] - num pull retries: 0 [state-dump] - max timeout seconds: 0 [state-dump] - max timeout request is already processed. No entry. [state-dump] [state-dump] WorkerPool: [state-dump] - registered jobs: 1 [state-dump] - process_failed_job_config_missing: 0 [state-dump] - process_failed_rate_limited: 0 [state-dump] - process_failed_pending_registration: 0 [state-dump] - process_failed_runtime_env_setup_failed: 0 [state-dump] - num PYTHON workers: 20 [state-dump] - num PYTHON drivers: 1 [state-dump] - num PYTHON pending start requests: 0 [state-dump] - num PYTHON pending registration requests: 0 [state-dump] - num object spill callbacks queued: 0 [state-dump] - num object restore queued: 0 [state-dump] - num util functions queued: 0 [state-dump] - num idle workers: 20 [state-dump] TaskDependencyManager: [state-dump] - task deps map size: 0 [state-dump] - get req map size: 0 [state-dump] - wait req map size: 0 [state-dump] - local objects map size: 0 [state-dump] WaitManager: [state-dump] - num active wait requests: 0 [state-dump] Subscriber: [state-dump] Channel WORKER_OBJECT_LOCATIONS_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_OBJECT_EVICTION [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_REF_REMOVED_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] num async plasma notifications: 0 [state-dump] Remote node managers: [state-dump] Event stats: [state-dump] Global stats: 738131 total (35 active) [state-dump] Queueing time: mean = 164.416 us, max = 59.826 s, min = -0.001 s, total = 121.361 s [state-dump] Execution time: mean = 11.559 ms, total = 8532.385 s [state-dump] Event stats: [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog - 177597 total (0 active), Execution time: mean = 533.004 us, total = 94.660 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 177597 total (0 active), Execution time: mean = 35.403 us, total = 6.288 s, Queueing time: mean = 112.523 us, max = 3.855 ms, min = 1.846 us, total = 19.984 s [state-dump] RaySyncer.OnDemandBroadcasting - 84518 total (1 active), Execution time: mean = 11.218 us, total = 948.127 ms, Queueing time: mean = 93.216 us, max = 25.869 ms, min = 6.166 us, total = 7.878 s [state-dump] NodeManager.CheckGC - 84518 total (1 active), Execution time: mean = 3.643 us, total = 307.884 ms, Queueing time: mean = 99.912 us, max = 25.875 ms, min = -0.000 s, total = 8.444 s [state-dump] ObjectManager.UpdateAvailableMemory - 84518 total (0 active), Execution time: mean = 6.431 us, total = 543.561 ms, Queueing time: mean = 112.649 us, max = 45.939 ms, min = 2.228 us, total = 9.521 s [state-dump] RayletWorkerPool.deadline_timer.kill_idle_workers - 42283 total (1 active), Execution time: mean = 19.246 us, total = 813.777 ms, Queueing time: mean = 78.093 us, max = 26.386 ms, min = -0.001 s, total = 3.302 s [state-dump] MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 33770 total (1 active), Execution time: mean = 461.113 us, total = 15.572 s, Queueing time: mean = 76.767 us, max = 4.063 ms, min = -0.000 s, total = 2.592 s [state-dump] NodeManager.ScheduleAndDispatchTasks - 8460 total (1 active), Execution time: mean = 17.697 us, total = 149.717 ms, Queueing time: mean = 76.172 us, max = 2.581 ms, min = 7.438 us, total = 644.415 ms [state-dump] NodeManager.deadline_timer.flush_free_objects - 8459 total (1 active), Execution time: mean = 9.592 us, total = 81.141 ms, Queueing time: mean = 179.455 us, max = 2.947 ms, min = 2.735 us, total = 1.518 s [state-dump] NodeManager.deadline_timer.spill_objects_when_over_threshold - 8459 total (1 active), Execution time: mean = 3.160 us, total = 26.730 ms, Queueing time: mean = 183.786 us, max = 2.946 ms, min = 3.845 us, total = 1.555 s [state-dump] NodeManagerService.grpc_server.GetResourceLoad - 8457 total (0 active), Execution time: mean = 616.826 us, total = 5.216 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 8457 total (0 active), Execution time: mean = 98.048 us, total = 829.194 ms, Queueing time: mean = 114.866 us, max = 2.934 ms, min = 4.027 us, total = 971.422 ms [state-dump] ClusterResourceManager.ResetRemoteNodeView - 2821 total (1 active), Execution time: mean = 10.137 us, total = 28.598 ms, Queueing time: mean = 76.713 us, max = 564.135 us, min = 12.181 us, total = 216.409 ms [state-dump] NodeManager.deadline_timer.record_metrics - 1692 total (1 active), Execution time: mean = 584.614 us, total = 989.167 ms, Queueing time: mean = 349.288 us, max = 2.010 ms, min = 9.115 us, total = 590.995 ms [state-dump] NodeManager.GcsCheckAlive - 1692 total (1 active), Execution time: mean = 319.365 us, total = 540.365 ms, Queueing time: mean = 614.128 us, max = 2.567 ms, min = 6.690 us, total = 1.039 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 1692 total (0 active), Execution time: mean = 55.927 us, total = 94.628 ms, Queueing time: mean = 112.530 us, max = 4.779 ms, min = 11.561 us, total = 190.401 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 1692 total (0 active), Execution time: mean = 1.579 ms, total = 2.671 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.deadline_timer.debug_state_dump - 846 total (1 active), Execution time: mean = 1.791 ms, total = 1.515 s, Queueing time: mean = 75.236 us, max = 466.018 us, min = 11.609 us, total = 63.650 ms [state-dump] NodeManager.deadline_timer.print_event_loop_stats - 141 total (1 active, 1 running), Execution time: mean = 2.778 ms, total = 391.678 ms, Queueing time: mean = 75.669 us, max = 325.100 us, min = 9.635 us, total = 10.669 ms [state-dump] ClientConnection.async_read.ProcessMessageHeader - 95 total (21 active), Execution time: mean = 7.620 us, total = 723.861 us, Queueing time: mean = 660.815 ms, max = 59.826 s, min = 29.537 us, total = 62.777 s [state-dump] ClientConnection.async_read.ProcessMessage - 74 total (0 active), Execution time: mean = 752.373 us, total = 55.676 ms, Queueing time: mean = 18.898 us, max = 247.615 us, min = 4.453 us, total = 1.398 ms [state-dump] ClientConnection.async_write.DoAsyncWrites - 22 total (0 active), Execution time: mean = 2.129 us, total = 46.834 us, Queueing time: mean = 43.475 us, max = 264.443 us, min = 17.727 us, total = 956.459 us [state-dump] CoreWorkerService.grpc_client.LocalGC.OnReplyReceived - 21 total (0 active), Execution time: mean = 65.428 us, total = 1.374 ms, Queueing time: mean = 149.534 us, max = 223.485 us, min = 56.603 us, total = 3.140 ms [state-dump] ObjectManager.ObjectAdded - 21 total (0 active), Execution time: mean = 13.522 us, total = 283.961 us, Queueing time: mean = 115.200 us, max = 203.067 us, min = 15.853 us, total = 2.419 ms [state-dump] CoreWorkerService.grpc_client.LocalGC - 21 total (0 active), Execution time: mean = 93.349 ms, total = 1.960 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ObjectManager.ObjectDeleted - 21 total (0 active), Execution time: mean = 18.654 us, total = 391.728 us, Queueing time: mean = 66.932 us, max = 269.040 us, min = 24.280 us, total = 1.406 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 21 total (0 active), Execution time: mean = 119.783 us, total = 2.515 ms, Queueing time: mean = 119.709 us, max = 193.975 us, min = 26.529 us, total = 2.514 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig - 21 total (0 active), Execution time: mean = 800.451 us, total = 16.809 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 16 total (1 active), Execution time: mean = 524.851 s, total = 8397.615 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 15 total (0 active), Execution time: mean = 381.405 us, total = 5.721 ms, Queueing time: mean = 123.825 us, max = 279.406 us, min = 20.299 us, total = 1.857 ms [state-dump] PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 204.482 us, total = 2.658 ms, Queueing time: mean = 3.109 ms, max = 9.915 ms, min = 28.359 us, total = 40.413 ms [state-dump] RaySyncer.BroadcastMessage - 12 total (0 active), Execution time: mean = 229.790 us, total = 2.757 ms, Queueing time: mean = 796.500 ns, max = 1.330 us, min = 180.000 ns, total = 9.558 us [state-dump] - 12 total (0 active), Execution time: mean = 1.109 us, total = 13.308 us, Queueing time: mean = 117.235 us, max = 207.332 us, min = 32.817 us, total = 1.407 ms [state-dump] WorkerPool.PopWorkerCallback - 10 total (0 active), Execution time: mean = 24.488 us, total = 244.878 us, Queueing time: mean = 114.997 us, max = 206.869 us, min = 23.653 us, total = 1.150 ms [state-dump] NodeManagerService.grpc_server.RequestWorkerLease.HandleRequestImpl - 10 total (0 active), Execution time: mean = 133.400 us, total = 1.334 ms, Queueing time: mean = 133.654 us, max = 258.016 us, min = 50.206 us, total = 1.337 ms [state-dump] NodeManagerService.grpc_server.RequestWorkerLease - 10 total (0 active), Execution time: mean = 967.372 us, total = 9.674 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReturnWorker - 10 total (0 active), Execution time: mean = 698.937 us, total = 6.989 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReturnWorker.HandleRequestImpl - 10 total (0 active), Execution time: mean = 129.906 us, total = 1.299 ms, Queueing time: mean = 109.463 us, max = 137.712 us, min = 31.332 us, total = 1.095 ms [state-dump] NodeManager.GCTaskFailureReason - 10 total (1 active), Execution time: mean = 8.497 us, total = 84.973 us, Queueing time: mean = 64.210 us, max = 97.290 us, min = 24.344 us, total = 642.101 us [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 1.320 ms, total = 2.640 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] RaySyncerRegister - 2 total (0 active), Execution time: mean = 2.053 us, total = 4.106 us, Queueing time: mean = 222.000 ns, max = 369.000 ns, min = 75.000 ns, total = 444.000 ns [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 142.119 us, total = 284.237 us, Queueing time: mean = 702.200 us, max = 1.287 ms, min = 117.240 us, total = 1.404 ms [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 92.080 us, total = 92.080 us, Queueing time: mean = 36.492 us, max = 36.492 us, min = 36.492 us, total = 36.492 us [state-dump] Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 74.750 us, total = 74.750 us, Queueing time: mean = 225.429 us, max = 225.429 us, min = 225.429 us, total = 225.429 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 1.746 ms, total = 1.746 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 1.935 ms, total = 1.935 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 1.022 s, total = 1.022 s, Queueing time: mean = 126.779 us, max = 126.779 us, min = 126.779 us, total = 126.779 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 2.178 ms, total = 2.178 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 43.810 us, total = 43.810 us, Queueing time: mean = 116.080 us, max = 116.080 us, min = 116.080 us, total = 116.080 us [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 2.002 ms, total = 2.002 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 1.278 ms, total = 1.278 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 288.995 us, total = 288.995 us, Queueing time: mean = 122.868 us, max = 122.868 us, min = 122.868 us, total = 122.868 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 242.297 us, total = 242.297 us, Queueing time: mean = 122.977 us, max = 122.977 us, min = 122.977 us, total = 122.977 us [state-dump] DebugString() time ms: 2 [state-dump] [state-dump] [2025-01-21 01:23:54,945 I 13636 13664] (raylet) store.cc:564: Plasma store debug dump: Current usage: 0 / 2.14748 GB - num bytes created total: 168 0 pending objects of total size 0MB - objects spillable: 0 - bytes spillable: 0 - objects unsealed: 0 - bytes unsealed: 0 - objects in use: 0 - bytes in use: 0 - objects evictable: 0 - bytes evictable: 0 - objects created by worker: 0 - bytes created by worker: 0 - objects restored: 0 - bytes restored: 0 - objects received: 0 - bytes received: 0 - objects errored: 0 - bytes errored: 0 [2025-01-21 01:23:56,332 I 13636 13636] (raylet) node_manager.cc:525: [state-dump] NodeManager: [state-dump] Node ID: 94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3 [state-dump] Node name: 192.168.0.2 [state-dump] InitialConfigResources: {object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, GPU: 20000, node:__internal_head__: 10000, accelerator_type:A40: 10000, memory: 844966629380000} [state-dump] ClusterTaskManager: [state-dump] ========== Node: 94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3 ================= [state-dump] Infeasible queue length: 0 [state-dump] Schedule queue length: 0 [state-dump] Dispatch queue length: 0 [state-dump] num_waiting_for_resource: 0 [state-dump] num_waiting_for_plasma_memory: 0 [state-dump] num_waiting_for_remote_node_resources: 0 [state-dump] num_worker_not_started_by_job_config_not_exist: 0 [state-dump] num_worker_not_started_by_registration_timeout: 0 [state-dump] num_tasks_waiting_for_workers: 0 [state-dump] num_cancelled_tasks: 0 [state-dump] cluster_resource_scheduler state: [state-dump] Local id: -2729954820020547849 Local resources: {"total":{CPU: [200000], memory: [844966629380000], accelerator_type:A40: [10000], node:192.168.0.2: [10000], node:__internal_head__: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000]}}, "available": {CPU: [200000], memory: [844966629380000], accelerator_type:A40: [10000], node:192.168.0.2: [10000], node:__internal_head__: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000]}}, "labels":{"ray.io/node_id":"94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3",} is_draining: 0 is_idle: 1 Cluster resources: node id: -2729954820020547849{"total":{object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, GPU: 20000, node:__internal_head__: 10000, accelerator_type:A40: 10000, memory: 844966629380000}}, "available": {GPU: 20000, node:__internal_head__: 10000, memory: 844966629380000, accelerator_type:A40: 10000, object_store_memory: 21474836480000, node:192.168.0.2: 10000, CPU: 200000}}, "labels":{"ray.io/node_id":"94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} [state-dump] Waiting tasks size: 0 [state-dump] Number of executing tasks: 0 [state-dump] Number of pinned task arguments: 0 [state-dump] Number of total spilled tasks: 0 [state-dump] Number of spilled waiting tasks: 0 [state-dump] Number of spilled unschedulable tasks: 0 [state-dump] Resource usage { [state-dump] } [state-dump] Backlog Size per scheduling descriptor :{workerId: num backlogs}: [state-dump] [state-dump] Running tasks by scheduling class: [state-dump] ================================================== [state-dump] [state-dump] ClusterResources: [state-dump] LocalObjectManager: [state-dump] - num pinned objects: 0 [state-dump] - pinned objects size: 0 [state-dump] - num objects pending restore: 0 [state-dump] - num objects pending spill: 0 [state-dump] - num bytes pending spill: 0 [state-dump] - num bytes currently spilled: 0 [state-dump] - cumulative spill requests: 0 [state-dump] - cumulative restore requests: 0 [state-dump] - spilled objects pending delete: 0 [state-dump] [state-dump] ObjectManager: [state-dump] - num local objects: 0 [state-dump] - num unfulfilled push requests: 0 [state-dump] - num object pull requests: 0 [state-dump] - num chunks received total: 0 [state-dump] - num chunks received failed (all): 0 [state-dump] - num chunks received failed / cancelled: 0 [state-dump] - num chunks received failed / plasma error: 0 [state-dump] Event stats: [state-dump] Global stats: 0 total (0 active) [state-dump] Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Execution time: mean = -nan s, total = 0.000 s [state-dump] Event stats: [state-dump] PushManager: [state-dump] - num pushes in flight: 0 [state-dump] - num chunks in flight: 0 [state-dump] - num chunks remaining: 0 [state-dump] - max chunks allowed: 409 [state-dump] OwnershipBasedObjectDirectory: [state-dump] - num listeners: 0 [state-dump] - cumulative location updates: 0 [state-dump] - num location updates per second: 0.000 [state-dump] - num location lookups per second: 0.000 [state-dump] - num locations added per second: 0.000 [state-dump] - num locations removed per second: 0.000 [state-dump] BufferPool: [state-dump] - create buffer state map size: 0 [state-dump] PullManager: [state-dump] - num bytes available for pulled objects: 2147483648 [state-dump] - num bytes being pulled (all): 0 [state-dump] - num bytes being pulled / pinned: 0 [state-dump] - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - first get request bundle: N/A [state-dump] - first wait request bundle: N/A [state-dump] - first task request bundle: N/A [state-dump] - num objects queued: 0 [state-dump] - num objects actively pulled (all): 0 [state-dump] - num objects actively pulled / pinned: 0 [state-dump] - num bundles being pulled: 0 [state-dump] - num pull retries: 0 [state-dump] - max timeout seconds: 0 [state-dump] - max timeout request is already processed. No entry. [state-dump] [state-dump] WorkerPool: [state-dump] - registered jobs: 1 [state-dump] - process_failed_job_config_missing: 0 [state-dump] - process_failed_rate_limited: 0 [state-dump] - process_failed_pending_registration: 0 [state-dump] - process_failed_runtime_env_setup_failed: 0 [state-dump] - num PYTHON workers: 20 [state-dump] - num PYTHON drivers: 1 [state-dump] - num PYTHON pending start requests: 0 [state-dump] - num PYTHON pending registration requests: 0 [state-dump] - num object spill callbacks queued: 0 [state-dump] - num object restore queued: 0 [state-dump] - num util functions queued: 0 [state-dump] - num idle workers: 20 [state-dump] TaskDependencyManager: [state-dump] - task deps map size: 0 [state-dump] - get req map size: 0 [state-dump] - wait req map size: 0 [state-dump] - local objects map size: 0 [state-dump] WaitManager: [state-dump] - num active wait requests: 0 [state-dump] Subscriber: [state-dump] Channel WORKER_OBJECT_LOCATIONS_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_OBJECT_EVICTION [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_REF_REMOVED_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] num async plasma notifications: 0 [state-dump] Remote node managers: [state-dump] Event stats: [state-dump] Global stats: 743362 total (35 active) [state-dump] Queueing time: mean = 163.828 us, max = 59.826 s, min = -0.001 s, total = 121.783 s [state-dump] Execution time: mean = 11.479 ms, total = 8533.310 s [state-dump] Event stats: [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog - 178857 total (0 active), Execution time: mean = 532.964 us, total = 95.324 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 178857 total (0 active), Execution time: mean = 35.388 us, total = 6.329 s, Queueing time: mean = 112.527 us, max = 3.855 ms, min = 1.846 us, total = 20.126 s [state-dump] RaySyncer.OnDemandBroadcasting - 85117 total (1 active), Execution time: mean = 11.223 us, total = 955.252 ms, Queueing time: mean = 93.226 us, max = 25.869 ms, min = 6.166 us, total = 7.935 s [state-dump] NodeManager.CheckGC - 85117 total (1 active), Execution time: mean = 3.639 us, total = 309.701 ms, Queueing time: mean = 99.930 us, max = 25.875 ms, min = -0.000 s, total = 8.506 s [state-dump] ObjectManager.UpdateAvailableMemory - 85117 total (0 active), Execution time: mean = 6.433 us, total = 547.521 ms, Queueing time: mean = 112.680 us, max = 45.939 ms, min = 2.228 us, total = 9.591 s [state-dump] RayletWorkerPool.deadline_timer.kill_idle_workers - 42583 total (1 active), Execution time: mean = 19.245 us, total = 819.528 ms, Queueing time: mean = 78.087 us, max = 26.386 ms, min = -0.001 s, total = 3.325 s [state-dump] MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 34009 total (1 active), Execution time: mean = 461.108 us, total = 15.682 s, Queueing time: mean = 76.812 us, max = 4.063 ms, min = -0.000 s, total = 2.612 s [state-dump] NodeManager.ScheduleAndDispatchTasks - 8520 total (1 active), Execution time: mean = 17.703 us, total = 150.831 ms, Queueing time: mean = 76.192 us, max = 2.581 ms, min = 7.438 us, total = 649.158 ms [state-dump] NodeManager.deadline_timer.flush_free_objects - 8519 total (1 active), Execution time: mean = 9.591 us, total = 81.703 ms, Queueing time: mean = 179.442 us, max = 2.947 ms, min = 2.735 us, total = 1.529 s [state-dump] NodeManager.deadline_timer.spill_objects_when_over_threshold - 8519 total (1 active), Execution time: mean = 3.160 us, total = 26.917 ms, Queueing time: mean = 183.773 us, max = 2.946 ms, min = 3.845 us, total = 1.566 s [state-dump] NodeManagerService.grpc_server.GetResourceLoad - 8517 total (0 active), Execution time: mean = 616.941 us, total = 5.254 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 8517 total (0 active), Execution time: mean = 98.051 us, total = 835.097 ms, Queueing time: mean = 114.898 us, max = 2.934 ms, min = 4.027 us, total = 978.585 ms [state-dump] ClusterResourceManager.ResetRemoteNodeView - 2841 total (1 active), Execution time: mean = 10.136 us, total = 28.795 ms, Queueing time: mean = 76.697 us, max = 564.135 us, min = 12.181 us, total = 217.896 ms [state-dump] NodeManager.deadline_timer.record_metrics - 1704 total (1 active), Execution time: mean = 584.185 us, total = 995.451 ms, Queueing time: mean = 349.613 us, max = 2.010 ms, min = 9.115 us, total = 595.740 ms [state-dump] NodeManager.GcsCheckAlive - 1704 total (1 active), Execution time: mean = 319.466 us, total = 544.370 ms, Queueing time: mean = 613.942 us, max = 2.567 ms, min = 6.690 us, total = 1.046 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 1704 total (0 active), Execution time: mean = 55.908 us, total = 95.268 ms, Queueing time: mean = 112.535 us, max = 4.779 ms, min = 11.561 us, total = 191.759 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 1704 total (0 active), Execution time: mean = 1.579 ms, total = 2.691 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.deadline_timer.debug_state_dump - 852 total (1 active), Execution time: mean = 1.790 ms, total = 1.525 s, Queueing time: mean = 75.163 us, max = 466.018 us, min = 11.609 us, total = 64.039 ms [state-dump] NodeManager.deadline_timer.print_event_loop_stats - 142 total (1 active, 1 running), Execution time: mean = 2.779 ms, total = 394.641 ms, Queueing time: mean = 75.755 us, max = 325.100 us, min = 9.635 us, total = 10.757 ms [state-dump] ClientConnection.async_read.ProcessMessageHeader - 95 total (21 active), Execution time: mean = 7.620 us, total = 723.861 us, Queueing time: mean = 660.815 ms, max = 59.826 s, min = 29.537 us, total = 62.777 s [state-dump] ClientConnection.async_read.ProcessMessage - 74 total (0 active), Execution time: mean = 752.373 us, total = 55.676 ms, Queueing time: mean = 18.898 us, max = 247.615 us, min = 4.453 us, total = 1.398 ms [state-dump] ClientConnection.async_write.DoAsyncWrites - 22 total (0 active), Execution time: mean = 2.129 us, total = 46.834 us, Queueing time: mean = 43.475 us, max = 264.443 us, min = 17.727 us, total = 956.459 us [state-dump] CoreWorkerService.grpc_client.LocalGC.OnReplyReceived - 21 total (0 active), Execution time: mean = 65.428 us, total = 1.374 ms, Queueing time: mean = 149.534 us, max = 223.485 us, min = 56.603 us, total = 3.140 ms [state-dump] ObjectManager.ObjectAdded - 21 total (0 active), Execution time: mean = 13.522 us, total = 283.961 us, Queueing time: mean = 115.200 us, max = 203.067 us, min = 15.853 us, total = 2.419 ms [state-dump] CoreWorkerService.grpc_client.LocalGC - 21 total (0 active), Execution time: mean = 93.349 ms, total = 1.960 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ObjectManager.ObjectDeleted - 21 total (0 active), Execution time: mean = 18.654 us, total = 391.728 us, Queueing time: mean = 66.932 us, max = 269.040 us, min = 24.280 us, total = 1.406 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 21 total (0 active), Execution time: mean = 119.783 us, total = 2.515 ms, Queueing time: mean = 119.709 us, max = 193.975 us, min = 26.529 us, total = 2.514 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig - 21 total (0 active), Execution time: mean = 800.451 us, total = 16.809 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 16 total (1 active), Execution time: mean = 524.851 s, total = 8397.615 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 15 total (0 active), Execution time: mean = 381.405 us, total = 5.721 ms, Queueing time: mean = 123.825 us, max = 279.406 us, min = 20.299 us, total = 1.857 ms [state-dump] PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 204.482 us, total = 2.658 ms, Queueing time: mean = 3.109 ms, max = 9.915 ms, min = 28.359 us, total = 40.413 ms [state-dump] RaySyncer.BroadcastMessage - 12 total (0 active), Execution time: mean = 229.790 us, total = 2.757 ms, Queueing time: mean = 796.500 ns, max = 1.330 us, min = 180.000 ns, total = 9.558 us [state-dump] - 12 total (0 active), Execution time: mean = 1.109 us, total = 13.308 us, Queueing time: mean = 117.235 us, max = 207.332 us, min = 32.817 us, total = 1.407 ms [state-dump] WorkerPool.PopWorkerCallback - 10 total (0 active), Execution time: mean = 24.488 us, total = 244.878 us, Queueing time: mean = 114.997 us, max = 206.869 us, min = 23.653 us, total = 1.150 ms [state-dump] NodeManagerService.grpc_server.RequestWorkerLease.HandleRequestImpl - 10 total (0 active), Execution time: mean = 133.400 us, total = 1.334 ms, Queueing time: mean = 133.654 us, max = 258.016 us, min = 50.206 us, total = 1.337 ms [state-dump] NodeManagerService.grpc_server.RequestWorkerLease - 10 total (0 active), Execution time: mean = 967.372 us, total = 9.674 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReturnWorker - 10 total (0 active), Execution time: mean = 698.937 us, total = 6.989 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReturnWorker.HandleRequestImpl - 10 total (0 active), Execution time: mean = 129.906 us, total = 1.299 ms, Queueing time: mean = 109.463 us, max = 137.712 us, min = 31.332 us, total = 1.095 ms [state-dump] NodeManager.GCTaskFailureReason - 10 total (1 active), Execution time: mean = 8.497 us, total = 84.973 us, Queueing time: mean = 64.210 us, max = 97.290 us, min = 24.344 us, total = 642.101 us [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 1.320 ms, total = 2.640 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] RaySyncerRegister - 2 total (0 active), Execution time: mean = 2.053 us, total = 4.106 us, Queueing time: mean = 222.000 ns, max = 369.000 ns, min = 75.000 ns, total = 444.000 ns [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 142.119 us, total = 284.237 us, Queueing time: mean = 702.200 us, max = 1.287 ms, min = 117.240 us, total = 1.404 ms [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 92.080 us, total = 92.080 us, Queueing time: mean = 36.492 us, max = 36.492 us, min = 36.492 us, total = 36.492 us [state-dump] Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 74.750 us, total = 74.750 us, Queueing time: mean = 225.429 us, max = 225.429 us, min = 225.429 us, total = 225.429 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 1.746 ms, total = 1.746 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 1.935 ms, total = 1.935 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 1.022 s, total = 1.022 s, Queueing time: mean = 126.779 us, max = 126.779 us, min = 126.779 us, total = 126.779 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 2.178 ms, total = 2.178 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 43.810 us, total = 43.810 us, Queueing time: mean = 116.080 us, max = 116.080 us, min = 116.080 us, total = 116.080 us [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 2.002 ms, total = 2.002 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 1.278 ms, total = 1.278 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 288.995 us, total = 288.995 us, Queueing time: mean = 122.868 us, max = 122.868 us, min = 122.868 us, total = 122.868 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 242.297 us, total = 242.297 us, Queueing time: mean = 122.977 us, max = 122.977 us, min = 122.977 us, total = 122.977 us [state-dump] DebugString() time ms: 2 [state-dump] [state-dump] [2025-01-21 01:24:54,945 I 13636 13664] (raylet) store.cc:564: Plasma store debug dump: Current usage: 0 / 2.14748 GB - num bytes created total: 168 0 pending objects of total size 0MB - objects spillable: 0 - bytes spillable: 0 - objects unsealed: 0 - bytes unsealed: 0 - objects in use: 0 - bytes in use: 0 - objects evictable: 0 - bytes evictable: 0 - objects created by worker: 0 - bytes created by worker: 0 - objects restored: 0 - bytes restored: 0 - objects received: 0 - bytes received: 0 - objects errored: 0 - bytes errored: 0 [2025-01-21 01:24:56,335 I 13636 13636] (raylet) node_manager.cc:525: [state-dump] NodeManager: [state-dump] Node ID: 94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3 [state-dump] Node name: 192.168.0.2 [state-dump] InitialConfigResources: {object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, GPU: 20000, node:__internal_head__: 10000, accelerator_type:A40: 10000, memory: 844966629380000} [state-dump] ClusterTaskManager: [state-dump] ========== Node: 94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3 ================= [state-dump] Infeasible queue length: 0 [state-dump] Schedule queue length: 0 [state-dump] Dispatch queue length: 0 [state-dump] num_waiting_for_resource: 0 [state-dump] num_waiting_for_plasma_memory: 0 [state-dump] num_waiting_for_remote_node_resources: 0 [state-dump] num_worker_not_started_by_job_config_not_exist: 0 [state-dump] num_worker_not_started_by_registration_timeout: 0 [state-dump] num_tasks_waiting_for_workers: 0 [state-dump] num_cancelled_tasks: 0 [state-dump] cluster_resource_scheduler state: [state-dump] Local id: -2729954820020547849 Local resources: {"total":{CPU: [200000], memory: [844966629380000], accelerator_type:A40: [10000], node:192.168.0.2: [10000], node:__internal_head__: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000]}}, "available": {CPU: [200000], memory: [844966629380000], accelerator_type:A40: [10000], node:192.168.0.2: [10000], node:__internal_head__: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000]}}, "labels":{"ray.io/node_id":"94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3",} is_draining: 0 is_idle: 1 Cluster resources: node id: -2729954820020547849{"total":{object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, GPU: 20000, node:__internal_head__: 10000, accelerator_type:A40: 10000, memory: 844966629380000}}, "available": {GPU: 20000, node:__internal_head__: 10000, memory: 844966629380000, accelerator_type:A40: 10000, object_store_memory: 21474836480000, node:192.168.0.2: 10000, CPU: 200000}}, "labels":{"ray.io/node_id":"94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} [state-dump] Waiting tasks size: 0 [state-dump] Number of executing tasks: 0 [state-dump] Number of pinned task arguments: 0 [state-dump] Number of total spilled tasks: 0 [state-dump] Number of spilled waiting tasks: 0 [state-dump] Number of spilled unschedulable tasks: 0 [state-dump] Resource usage { [state-dump] } [state-dump] Backlog Size per scheduling descriptor :{workerId: num backlogs}: [state-dump] [state-dump] Running tasks by scheduling class: [state-dump] ================================================== [state-dump] [state-dump] ClusterResources: [state-dump] LocalObjectManager: [state-dump] - num pinned objects: 0 [state-dump] - pinned objects size: 0 [state-dump] - num objects pending restore: 0 [state-dump] - num objects pending spill: 0 [state-dump] - num bytes pending spill: 0 [state-dump] - num bytes currently spilled: 0 [state-dump] - cumulative spill requests: 0 [state-dump] - cumulative restore requests: 0 [state-dump] - spilled objects pending delete: 0 [state-dump] [state-dump] ObjectManager: [state-dump] - num local objects: 0 [state-dump] - num unfulfilled push requests: 0 [state-dump] - num object pull requests: 0 [state-dump] - num chunks received total: 0 [state-dump] - num chunks received failed (all): 0 [state-dump] - num chunks received failed / cancelled: 0 [state-dump] - num chunks received failed / plasma error: 0 [state-dump] Event stats: [state-dump] Global stats: 0 total (0 active) [state-dump] Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Execution time: mean = -nan s, total = 0.000 s [state-dump] Event stats: [state-dump] PushManager: [state-dump] - num pushes in flight: 0 [state-dump] - num chunks in flight: 0 [state-dump] - num chunks remaining: 0 [state-dump] - max chunks allowed: 409 [state-dump] OwnershipBasedObjectDirectory: [state-dump] - num listeners: 0 [state-dump] - cumulative location updates: 0 [state-dump] - num location updates per second: 0.000 [state-dump] - num location lookups per second: 0.000 [state-dump] - num locations added per second: 0.000 [state-dump] - num locations removed per second: 0.000 [state-dump] BufferPool: [state-dump] - create buffer state map size: 0 [state-dump] PullManager: [state-dump] - num bytes available for pulled objects: 2147483648 [state-dump] - num bytes being pulled (all): 0 [state-dump] - num bytes being pulled / pinned: 0 [state-dump] - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - first get request bundle: N/A [state-dump] - first wait request bundle: N/A [state-dump] - first task request bundle: N/A [state-dump] - num objects queued: 0 [state-dump] - num objects actively pulled (all): 0 [state-dump] - num objects actively pulled / pinned: 0 [state-dump] - num bundles being pulled: 0 [state-dump] - num pull retries: 0 [state-dump] - max timeout seconds: 0 [state-dump] - max timeout request is already processed. No entry. [state-dump] [state-dump] WorkerPool: [state-dump] - registered jobs: 1 [state-dump] - process_failed_job_config_missing: 0 [state-dump] - process_failed_rate_limited: 0 [state-dump] - process_failed_pending_registration: 0 [state-dump] - process_failed_runtime_env_setup_failed: 0 [state-dump] - num PYTHON workers: 20 [state-dump] - num PYTHON drivers: 1 [state-dump] - num PYTHON pending start requests: 0 [state-dump] - num PYTHON pending registration requests: 0 [state-dump] - num object spill callbacks queued: 0 [state-dump] - num object restore queued: 0 [state-dump] - num util functions queued: 0 [state-dump] - num idle workers: 20 [state-dump] TaskDependencyManager: [state-dump] - task deps map size: 0 [state-dump] - get req map size: 0 [state-dump] - wait req map size: 0 [state-dump] - local objects map size: 0 [state-dump] WaitManager: [state-dump] - num active wait requests: 0 [state-dump] Subscriber: [state-dump] Channel WORKER_OBJECT_LOCATIONS_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_OBJECT_EVICTION [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_REF_REMOVED_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] num async plasma notifications: 0 [state-dump] Remote node managers: [state-dump] Event stats: [state-dump] Global stats: 748593 total (35 active) [state-dump] Queueing time: mean = 163.168 us, max = 59.826 s, min = -0.001 s, total = 122.146 s [state-dump] Execution time: mean = 11.400 ms, total = 8534.152 s [state-dump] Event stats: [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog - 180117 total (0 active), Execution time: mean = 532.561 us, total = 95.923 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 180117 total (0 active), Execution time: mean = 35.342 us, total = 6.366 s, Queueing time: mean = 112.389 us, max = 3.855 ms, min = 1.846 us, total = 20.243 s [state-dump] RaySyncer.OnDemandBroadcasting - 85716 total (1 active), Execution time: mean = 11.216 us, total = 961.414 ms, Queueing time: mean = 93.161 us, max = 25.869 ms, min = 6.166 us, total = 7.985 s [state-dump] NodeManager.CheckGC - 85716 total (1 active), Execution time: mean = 3.632 us, total = 311.360 ms, Queueing time: mean = 99.865 us, max = 25.875 ms, min = -0.000 s, total = 8.560 s [state-dump] ObjectManager.UpdateAvailableMemory - 85716 total (0 active), Execution time: mean = 6.426 us, total = 550.788 ms, Queueing time: mean = 112.562 us, max = 45.939 ms, min = 2.228 us, total = 9.648 s [state-dump] RayletWorkerPool.deadline_timer.kill_idle_workers - 42882 total (1 active), Execution time: mean = 19.225 us, total = 824.411 ms, Queueing time: mean = 77.998 us, max = 26.386 ms, min = -0.001 s, total = 3.345 s [state-dump] MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 34249 total (1 active), Execution time: mean = 461.013 us, total = 15.789 s, Queueing time: mean = 76.749 us, max = 4.063 ms, min = -0.000 s, total = 2.629 s [state-dump] NodeManager.ScheduleAndDispatchTasks - 8580 total (1 active), Execution time: mean = 17.685 us, total = 151.739 ms, Queueing time: mean = 76.103 us, max = 2.581 ms, min = 7.438 us, total = 652.961 ms [state-dump] NodeManager.deadline_timer.flush_free_objects - 8579 total (1 active), Execution time: mean = 9.581 us, total = 82.195 ms, Queueing time: mean = 179.520 us, max = 2.947 ms, min = 2.735 us, total = 1.540 s [state-dump] NodeManager.deadline_timer.spill_objects_when_over_threshold - 8579 total (1 active), Execution time: mean = 3.157 us, total = 27.086 ms, Queueing time: mean = 183.846 us, max = 2.946 ms, min = 3.845 us, total = 1.577 s [state-dump] NodeManagerService.grpc_server.GetResourceLoad - 8577 total (0 active), Execution time: mean = 616.474 us, total = 5.287 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 8577 total (0 active), Execution time: mean = 97.997 us, total = 840.516 ms, Queueing time: mean = 114.798 us, max = 2.934 ms, min = 4.027 us, total = 984.620 ms [state-dump] ClusterResourceManager.ResetRemoteNodeView - 2861 total (1 active), Execution time: mean = 10.123 us, total = 28.962 ms, Queueing time: mean = 76.576 us, max = 564.135 us, min = 12.181 us, total = 219.084 ms [state-dump] NodeManager.deadline_timer.record_metrics - 1716 total (1 active), Execution time: mean = 584.392 us, total = 1.003 s, Queueing time: mean = 349.788 us, max = 2.010 ms, min = 9.115 us, total = 600.236 ms [state-dump] NodeManager.GcsCheckAlive - 1716 total (1 active), Execution time: mean = 319.381 us, total = 548.058 ms, Queueing time: mean = 614.412 us, max = 2.567 ms, min = 6.690 us, total = 1.054 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 1716 total (0 active), Execution time: mean = 55.871 us, total = 95.874 ms, Queueing time: mean = 112.391 us, max = 4.779 ms, min = 11.561 us, total = 192.862 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 1716 total (0 active), Execution time: mean = 1.579 ms, total = 2.709 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.deadline_timer.debug_state_dump - 858 total (1 active), Execution time: mean = 1.791 ms, total = 1.536 s, Queueing time: mean = 75.069 us, max = 466.018 us, min = 11.609 us, total = 64.410 ms [state-dump] NodeManager.deadline_timer.print_event_loop_stats - 143 total (1 active, 1 running), Execution time: mean = 2.778 ms, total = 397.319 ms, Queueing time: mean = 75.676 us, max = 325.100 us, min = 9.635 us, total = 10.822 ms [state-dump] ClientConnection.async_read.ProcessMessageHeader - 95 total (21 active), Execution time: mean = 7.620 us, total = 723.861 us, Queueing time: mean = 660.815 ms, max = 59.826 s, min = 29.537 us, total = 62.777 s [state-dump] ClientConnection.async_read.ProcessMessage - 74 total (0 active), Execution time: mean = 752.373 us, total = 55.676 ms, Queueing time: mean = 18.898 us, max = 247.615 us, min = 4.453 us, total = 1.398 ms [state-dump] ClientConnection.async_write.DoAsyncWrites - 22 total (0 active), Execution time: mean = 2.129 us, total = 46.834 us, Queueing time: mean = 43.475 us, max = 264.443 us, min = 17.727 us, total = 956.459 us [state-dump] CoreWorkerService.grpc_client.LocalGC.OnReplyReceived - 21 total (0 active), Execution time: mean = 65.428 us, total = 1.374 ms, Queueing time: mean = 149.534 us, max = 223.485 us, min = 56.603 us, total = 3.140 ms [state-dump] ObjectManager.ObjectAdded - 21 total (0 active), Execution time: mean = 13.522 us, total = 283.961 us, Queueing time: mean = 115.200 us, max = 203.067 us, min = 15.853 us, total = 2.419 ms [state-dump] CoreWorkerService.grpc_client.LocalGC - 21 total (0 active), Execution time: mean = 93.349 ms, total = 1.960 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ObjectManager.ObjectDeleted - 21 total (0 active), Execution time: mean = 18.654 us, total = 391.728 us, Queueing time: mean = 66.932 us, max = 269.040 us, min = 24.280 us, total = 1.406 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 21 total (0 active), Execution time: mean = 119.783 us, total = 2.515 ms, Queueing time: mean = 119.709 us, max = 193.975 us, min = 26.529 us, total = 2.514 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig - 21 total (0 active), Execution time: mean = 800.451 us, total = 16.809 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 16 total (1 active), Execution time: mean = 524.851 s, total = 8397.615 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 15 total (0 active), Execution time: mean = 381.405 us, total = 5.721 ms, Queueing time: mean = 123.825 us, max = 279.406 us, min = 20.299 us, total = 1.857 ms [state-dump] PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 204.482 us, total = 2.658 ms, Queueing time: mean = 3.109 ms, max = 9.915 ms, min = 28.359 us, total = 40.413 ms [state-dump] RaySyncer.BroadcastMessage - 12 total (0 active), Execution time: mean = 229.790 us, total = 2.757 ms, Queueing time: mean = 796.500 ns, max = 1.330 us, min = 180.000 ns, total = 9.558 us [state-dump] - 12 total (0 active), Execution time: mean = 1.109 us, total = 13.308 us, Queueing time: mean = 117.235 us, max = 207.332 us, min = 32.817 us, total = 1.407 ms [state-dump] WorkerPool.PopWorkerCallback - 10 total (0 active), Execution time: mean = 24.488 us, total = 244.878 us, Queueing time: mean = 114.997 us, max = 206.869 us, min = 23.653 us, total = 1.150 ms [state-dump] NodeManagerService.grpc_server.RequestWorkerLease.HandleRequestImpl - 10 total (0 active), Execution time: mean = 133.400 us, total = 1.334 ms, Queueing time: mean = 133.654 us, max = 258.016 us, min = 50.206 us, total = 1.337 ms [state-dump] NodeManagerService.grpc_server.RequestWorkerLease - 10 total (0 active), Execution time: mean = 967.372 us, total = 9.674 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReturnWorker - 10 total (0 active), Execution time: mean = 698.937 us, total = 6.989 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReturnWorker.HandleRequestImpl - 10 total (0 active), Execution time: mean = 129.906 us, total = 1.299 ms, Queueing time: mean = 109.463 us, max = 137.712 us, min = 31.332 us, total = 1.095 ms [state-dump] NodeManager.GCTaskFailureReason - 10 total (1 active), Execution time: mean = 8.497 us, total = 84.973 us, Queueing time: mean = 64.210 us, max = 97.290 us, min = 24.344 us, total = 642.101 us [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 1.320 ms, total = 2.640 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] RaySyncerRegister - 2 total (0 active), Execution time: mean = 2.053 us, total = 4.106 us, Queueing time: mean = 222.000 ns, max = 369.000 ns, min = 75.000 ns, total = 444.000 ns [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 142.119 us, total = 284.237 us, Queueing time: mean = 702.200 us, max = 1.287 ms, min = 117.240 us, total = 1.404 ms [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 92.080 us, total = 92.080 us, Queueing time: mean = 36.492 us, max = 36.492 us, min = 36.492 us, total = 36.492 us [state-dump] Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 74.750 us, total = 74.750 us, Queueing time: mean = 225.429 us, max = 225.429 us, min = 225.429 us, total = 225.429 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 1.746 ms, total = 1.746 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 1.935 ms, total = 1.935 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 1.022 s, total = 1.022 s, Queueing time: mean = 126.779 us, max = 126.779 us, min = 126.779 us, total = 126.779 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 2.178 ms, total = 2.178 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 43.810 us, total = 43.810 us, Queueing time: mean = 116.080 us, max = 116.080 us, min = 116.080 us, total = 116.080 us [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 2.002 ms, total = 2.002 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 1.278 ms, total = 1.278 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 288.995 us, total = 288.995 us, Queueing time: mean = 122.868 us, max = 122.868 us, min = 122.868 us, total = 122.868 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 242.297 us, total = 242.297 us, Queueing time: mean = 122.977 us, max = 122.977 us, min = 122.977 us, total = 122.977 us [state-dump] DebugString() time ms: 1 [state-dump] [state-dump] [2025-01-21 01:25:54,946 I 13636 13664] (raylet) store.cc:564: Plasma store debug dump: Current usage: 0 / 2.14748 GB - num bytes created total: 168 0 pending objects of total size 0MB - objects spillable: 0 - bytes spillable: 0 - objects unsealed: 0 - bytes unsealed: 0 - objects in use: 0 - bytes in use: 0 - objects evictable: 0 - bytes evictable: 0 - objects created by worker: 0 - bytes created by worker: 0 - objects restored: 0 - bytes restored: 0 - objects received: 0 - bytes received: 0 - objects errored: 0 - bytes errored: 0 [2025-01-21 01:25:56,338 I 13636 13636] (raylet) node_manager.cc:525: [state-dump] NodeManager: [state-dump] Node ID: 94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3 [state-dump] Node name: 192.168.0.2 [state-dump] InitialConfigResources: {object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, GPU: 20000, node:__internal_head__: 10000, accelerator_type:A40: 10000, memory: 844966629380000} [state-dump] ClusterTaskManager: [state-dump] ========== Node: 94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3 ================= [state-dump] Infeasible queue length: 0 [state-dump] Schedule queue length: 0 [state-dump] Dispatch queue length: 0 [state-dump] num_waiting_for_resource: 0 [state-dump] num_waiting_for_plasma_memory: 0 [state-dump] num_waiting_for_remote_node_resources: 0 [state-dump] num_worker_not_started_by_job_config_not_exist: 0 [state-dump] num_worker_not_started_by_registration_timeout: 0 [state-dump] num_tasks_waiting_for_workers: 0 [state-dump] num_cancelled_tasks: 0 [state-dump] cluster_resource_scheduler state: [state-dump] Local id: -2729954820020547849 Local resources: {"total":{CPU: [200000], memory: [844966629380000], accelerator_type:A40: [10000], node:192.168.0.2: [10000], node:__internal_head__: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000]}}, "available": {CPU: [200000], memory: [844966629380000], accelerator_type:A40: [10000], node:192.168.0.2: [10000], node:__internal_head__: [10000], GPU: [10000, 10000], object_store_memory: [21474836480000]}}, "labels":{"ray.io/node_id":"94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3",} is_draining: 0 is_idle: 1 Cluster resources: node id: -2729954820020547849{"total":{object_store_memory: 21474836480000, CPU: 200000, node:192.168.0.2: 10000, GPU: 20000, node:__internal_head__: 10000, accelerator_type:A40: 10000, memory: 844966629380000}}, "available": {GPU: 20000, node:__internal_head__: 10000, memory: 844966629380000, accelerator_type:A40: 10000, object_store_memory: 21474836480000, node:192.168.0.2: 10000, CPU: 200000}}, "labels":{"ray.io/node_id":"94df2f5134fc51771c1dcb24cd489e24913981486355e9608be28de3",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} [state-dump] Waiting tasks size: 0 [state-dump] Number of executing tasks: 0 [state-dump] Number of pinned task arguments: 0 [state-dump] Number of total spilled tasks: 0 [state-dump] Number of spilled waiting tasks: 0 [state-dump] Number of spilled unschedulable tasks: 0 [state-dump] Resource usage { [state-dump] } [state-dump] Backlog Size per scheduling descriptor :{workerId: num backlogs}: [state-dump] [state-dump] Running tasks by scheduling class: [state-dump] ================================================== [state-dump] [state-dump] ClusterResources: [state-dump] LocalObjectManager: [state-dump] - num pinned objects: 0 [state-dump] - pinned objects size: 0 [state-dump] - num objects pending restore: 0 [state-dump] - num objects pending spill: 0 [state-dump] - num bytes pending spill: 0 [state-dump] - num bytes currently spilled: 0 [state-dump] - cumulative spill requests: 0 [state-dump] - cumulative restore requests: 0 [state-dump] - spilled objects pending delete: 0 [state-dump] [state-dump] ObjectManager: [state-dump] - num local objects: 0 [state-dump] - num unfulfilled push requests: 0 [state-dump] - num object pull requests: 0 [state-dump] - num chunks received total: 0 [state-dump] - num chunks received failed (all): 0 [state-dump] - num chunks received failed / cancelled: 0 [state-dump] - num chunks received failed / plasma error: 0 [state-dump] Event stats: [state-dump] Global stats: 0 total (0 active) [state-dump] Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Execution time: mean = -nan s, total = 0.000 s [state-dump] Event stats: [state-dump] PushManager: [state-dump] - num pushes in flight: 0 [state-dump] - num chunks in flight: 0 [state-dump] - num chunks remaining: 0 [state-dump] - max chunks allowed: 409 [state-dump] OwnershipBasedObjectDirectory: [state-dump] - num listeners: 0 [state-dump] - cumulative location updates: 0 [state-dump] - num location updates per second: 0.000 [state-dump] - num location lookups per second: 0.000 [state-dump] - num locations added per second: 0.000 [state-dump] - num locations removed per second: 0.000 [state-dump] BufferPool: [state-dump] - create buffer state map size: 0 [state-dump] PullManager: [state-dump] - num bytes available for pulled objects: 2147483648 [state-dump] - num bytes being pulled (all): 0 [state-dump] - num bytes being pulled / pinned: 0 [state-dump] - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - first get request bundle: N/A [state-dump] - first wait request bundle: N/A [state-dump] - first task request bundle: N/A [state-dump] - num objects queued: 0 [state-dump] - num objects actively pulled (all): 0 [state-dump] - num objects actively pulled / pinned: 0 [state-dump] - num bundles being pulled: 0 [state-dump] - num pull retries: 0 [state-dump] - max timeout seconds: 0 [state-dump] - max timeout request is already processed. No entry. [state-dump] [state-dump] WorkerPool: [state-dump] - registered jobs: 1 [state-dump] - process_failed_job_config_missing: 0 [state-dump] - process_failed_rate_limited: 0 [state-dump] - process_failed_pending_registration: 0 [state-dump] - process_failed_runtime_env_setup_failed: 0 [state-dump] - num PYTHON workers: 20 [state-dump] - num PYTHON drivers: 1 [state-dump] - num PYTHON pending start requests: 0 [state-dump] - num PYTHON pending registration requests: 0 [state-dump] - num object spill callbacks queued: 0 [state-dump] - num object restore queued: 0 [state-dump] - num util functions queued: 0 [state-dump] - num idle workers: 20 [state-dump] TaskDependencyManager: [state-dump] - task deps map size: 0 [state-dump] - get req map size: 0 [state-dump] - wait req map size: 0 [state-dump] - local objects map size: 0 [state-dump] WaitManager: [state-dump] - num active wait requests: 0 [state-dump] Subscriber: [state-dump] Channel WORKER_OBJECT_LOCATIONS_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_OBJECT_EVICTION [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_REF_REMOVED_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] num async plasma notifications: 0 [state-dump] Remote node managers: [state-dump] Event stats: [state-dump] Global stats: 753827 total (35 active) [state-dump] Queueing time: mean = 162.614 us, max = 59.826 s, min = -0.001 s, total = 122.583 s [state-dump] Execution time: mean = 11.322 ms, total = 8535.084 s [state-dump] Event stats: [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog - 181377 total (0 active), Execution time: mean = 532.554 us, total = 96.593 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 181377 total (0 active), Execution time: mean = 35.339 us, total = 6.410 s, Queueing time: mean = 112.438 us, max = 3.855 ms, min = 1.846 us, total = 20.394 s [state-dump] RaySyncer.OnDemandBroadcasting - 86316 total (1 active), Execution time: mean = 11.221 us, total = 968.586 ms, Queueing time: mean = 93.190 us, max = 25.869 ms, min = 6.166 us, total = 8.044 s [state-dump] NodeManager.CheckGC - 86316 total (1 active), Execution time: mean = 3.628 us, total = 313.180 ms, Queueing time: mean = 99.901 us, max = 25.875 ms, min = -0.000 s, total = 8.623 s [state-dump] ObjectManager.UpdateAvailableMemory - 86316 total (0 active), Execution time: mean = 6.428 us, total = 554.839 ms, Queueing time: mean = 112.583 us, max = 45.939 ms, min = 2.228 us, total = 9.718 s [state-dump] RayletWorkerPool.deadline_timer.kill_idle_workers - 43182 total (1 active), Execution time: mean = 19.226 us, total = 830.208 ms, Queueing time: mean = 78.083 us, max = 26.386 ms, min = -0.001 s, total = 3.372 s [state-dump] MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 34488 total (1 active), Execution time: mean = 461.095 us, total = 15.902 s, Queueing time: mean = 76.785 us, max = 4.063 ms, min = -0.000 s, total = 2.648 s [state-dump] NodeManager.ScheduleAndDispatchTasks - 8640 total (1 active), Execution time: mean = 17.696 us, total = 152.893 ms, Queueing time: mean = 76.140 us, max = 2.581 ms, min = 7.438 us, total = 657.850 ms [state-dump] NodeManager.deadline_timer.flush_free_objects - 8639 total (1 active), Execution time: mean = 9.586 us, total = 82.814 ms, Queueing time: mean = 179.527 us, max = 2.947 ms, min = 2.735 us, total = 1.551 s [state-dump] NodeManager.deadline_timer.spill_objects_when_over_threshold - 8639 total (1 active), Execution time: mean = 3.158 us, total = 27.281 ms, Queueing time: mean = 183.856 us, max = 2.946 ms, min = 3.845 us, total = 1.588 s [state-dump] NodeManagerService.grpc_server.GetResourceLoad - 8637 total (0 active), Execution time: mean = 616.177 us, total = 5.322 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 8637 total (0 active), Execution time: mean = 97.965 us, total = 846.122 ms, Queueing time: mean = 114.772 us, max = 2.934 ms, min = 4.027 us, total = 991.289 ms [state-dump] ClusterResourceManager.ResetRemoteNodeView - 2881 total (1 active), Execution time: mean = 10.123 us, total = 29.163 ms, Queueing time: mean = 76.632 us, max = 564.135 us, min = 12.181 us, total = 220.778 ms [state-dump] NodeManager.deadline_timer.record_metrics - 1728 total (1 active), Execution time: mean = 584.130 us, total = 1.009 s, Queueing time: mean = 350.080 us, max = 2.010 ms, min = 9.115 us, total = 604.938 ms [state-dump] NodeManager.GcsCheckAlive - 1728 total (1 active), Execution time: mean = 319.493 us, total = 552.084 ms, Queueing time: mean = 614.311 us, max = 2.567 ms, min = 6.690 us, total = 1.062 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 1728 total (0 active), Execution time: mean = 55.859 us, total = 96.524 ms, Queueing time: mean = 112.392 us, max = 4.779 ms, min = 11.561 us, total = 194.214 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 1728 total (0 active), Execution time: mean = 1.579 ms, total = 2.728 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.deadline_timer.debug_state_dump - 864 total (1 active), Execution time: mean = 1.791 ms, total = 1.548 s, Queueing time: mean = 75.122 us, max = 466.018 us, min = 11.609 us, total = 64.905 ms [state-dump] NodeManager.deadline_timer.print_event_loop_stats - 144 total (1 active, 1 running), Execution time: mean = 2.780 ms, total = 400.283 ms, Queueing time: mean = 75.710 us, max = 325.100 us, min = 9.635 us, total = 10.902 ms [state-dump] ClientConnection.async_read.ProcessMessageHeader - 95 total (21 active), Execution time: mean = 7.620 us, total = 723.861 us, Queueing time: mean = 660.815 ms, max = 59.826 s, min = 29.537 us, total = 62.777 s [state-dump] ClientConnection.async_read.ProcessMessage - 74 total (0 active), Execution time: mean = 752.373 us, total = 55.676 ms, Queueing time: mean = 18.898 us, max = 247.615 us, min = 4.453 us, total = 1.398 ms [state-dump] ClientConnection.async_write.DoAsyncWrites - 22 total (0 active), Execution time: mean = 2.129 us, total = 46.834 us, Queueing time: mean = 43.475 us, max = 264.443 us, min = 17.727 us, total = 956.459 us [state-dump] CoreWorkerService.grpc_client.LocalGC.OnReplyReceived - 21 total (0 active), Execution time: mean = 65.428 us, total = 1.374 ms, Queueing time: mean = 149.534 us, max = 223.485 us, min = 56.603 us, total = 3.140 ms [state-dump] ObjectManager.ObjectAdded - 21 total (0 active), Execution time: mean = 13.522 us, total = 283.961 us, Queueing time: mean = 115.200 us, max = 203.067 us, min = 15.853 us, total = 2.419 ms [state-dump] CoreWorkerService.grpc_client.LocalGC - 21 total (0 active), Execution time: mean = 93.349 ms, total = 1.960 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ObjectManager.ObjectDeleted - 21 total (0 active), Execution time: mean = 18.654 us, total = 391.728 us, Queueing time: mean = 66.932 us, max = 269.040 us, min = 24.280 us, total = 1.406 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 21 total (0 active), Execution time: mean = 119.783 us, total = 2.515 ms, Queueing time: mean = 119.709 us, max = 193.975 us, min = 26.529 us, total = 2.514 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig - 21 total (0 active), Execution time: mean = 800.451 us, total = 16.809 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 16 total (1 active), Execution time: mean = 524.851 s, total = 8397.615 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 15 total (0 active), Execution time: mean = 381.405 us, total = 5.721 ms, Queueing time: mean = 123.825 us, max = 279.406 us, min = 20.299 us, total = 1.857 ms [state-dump] PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 204.482 us, total = 2.658 ms, Queueing time: mean = 3.109 ms, max = 9.915 ms, min = 28.359 us, total = 40.413 ms [state-dump] RaySyncer.BroadcastMessage - 12 total (0 active), Execution time: mean = 229.790 us, total = 2.757 ms, Queueing time: mean = 796.500 ns, max = 1.330 us, min = 180.000 ns, total = 9.558 us [state-dump] - 12 total (0 active), Execution time: mean = 1.109 us, total = 13.308 us, Queueing time: mean = 117.235 us, max = 207.332 us, min = 32.817 us, total = 1.407 ms [state-dump] WorkerPool.PopWorkerCallback - 10 total (0 active), Execution time: mean = 24.488 us, total = 244.878 us, Queueing time: mean = 114.997 us, max = 206.869 us, min = 23.653 us, total = 1.150 ms [state-dump] NodeManagerService.grpc_server.RequestWorkerLease.HandleRequestImpl - 10 total (0 active), Execution time: mean = 133.400 us, total = 1.334 ms, Queueing time: mean = 133.654 us, max = 258.016 us, min = 50.206 us, total = 1.337 ms [state-dump] NodeManagerService.grpc_server.RequestWorkerLease - 10 total (0 active), Execution time: mean = 967.372 us, total = 9.674 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReturnWorker - 10 total (0 active), Execution time: mean = 698.937 us, total = 6.989 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReturnWorker.HandleRequestImpl - 10 total (0 active), Execution time: mean = 129.906 us, total = 1.299 ms, Queueing time: mean = 109.463 us, max = 137.712 us, min = 31.332 us, total = 1.095 ms [state-dump] NodeManager.GCTaskFailureReason - 10 total (1 active), Execution time: mean = 8.497 us, total = 84.973 us, Queueing time: mean = 64.210 us, max = 97.290 us, min = 24.344 us, total = 642.101 us [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 1.320 ms, total = 2.640 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] RaySyncerRegister - 2 total (0 active), Execution time: mean = 2.053 us, total = 4.106 us, Queueing time: mean = 222.000 ns, max = 369.000 ns, min = 75.000 ns, total = 444.000 ns [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 142.119 us, total = 284.237 us, Queueing time: mean = 702.200 us, max = 1.287 ms, min = 117.240 us, total = 1.404 ms [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 92.080 us, total = 92.080 us, Queueing time: mean = 36.492 us, max = 36.492 us, min = 36.492 us, total = 36.492 us [state-dump] Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 74.750 us, total = 74.750 us, Queueing time: mean = 225.429 us, max = 225.429 us, min = 225.429 us, total = 225.429 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 1.746 ms, total = 1.746 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 1.935 ms, total = 1.935 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 1.022 s, total = 1.022 s, Queueing time: mean = 126.779 us, max = 126.779 us, min = 126.779 us, total = 126.779 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 2.178 ms, total = 2.178 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 43.810 us, total = 43.810 us, Queueing time: mean = 116.080 us, max = 116.080 us, min = 116.080 us, total = 116.080 us [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 2.002 ms, total = 2.002 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 1.278 ms, total = 1.278 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 288.995 us, total = 288.995 us, Queueing time: mean = 122.868 us, max = 122.868 us, min = 122.868 us, total = 122.868 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 242.297 us, total = 242.297 us, Queueing time: mean = 122.977 us, max = 122.977 us, min = 122.977 us, total = 122.977 us [state-dump] DebugString() time ms: 1 [state-dump] [state-dump]