[2025-01-20 22:53:02,840 I 11815 11815] (raylet) main.cc:180: Setting cluster ID to: 9ba956de09a13ad1bbed4734328a99f6ee8647c3b5e312c581e20f93 [2025-01-20 22:53:02,850 I 11815 11815] (raylet) main.cc:289: Raylet is not set to kill unknown children. [2025-01-20 22:53:02,850 I 11815 11815] (raylet) io_service_pool.cc:35: IOServicePool is running with 1 io_service. [2025-01-20 22:53:02,850 I 11815 11815] (raylet) main.cc:419: Setting node ID node_id=13be7277f830f5a8b967d2a0092091c94c7576cfebf8a5fa66025fcf [2025-01-20 22:53:02,850 I 11815 11815] (raylet) store_runner.cc:32: Allowing the Plasma store to use up to 2.14748GB of memory. [2025-01-20 22:53:02,850 I 11815 11815] (raylet) store_runner.cc:48: Starting object store with directory /dev/shm, fallback /tmp/ray, and huge page support disabled [2025-01-20 22:53:02,851 I 11815 11843] (raylet) dlmalloc.cc:154: create_and_mmap_buffer(2147483656, /dev/shm/plasmaXXXXXX) [2025-01-20 22:53:02,852 I 11815 11843] (raylet) store.cc:564: Plasma store debug dump: Current usage: 0 / 2.14748 GB - num bytes created total: 0 0 pending objects of total size 0MB - objects spillable: 0 - bytes spillable: 0 - objects unsealed: 0 - bytes unsealed: 0 - objects in use: 0 - bytes in use: 0 - objects evictable: 0 - bytes evictable: 0 - objects created by worker: 0 - bytes created by worker: 0 - objects restored: 0 - bytes restored: 0 - objects received: 0 - bytes received: 0 - objects errored: 0 - bytes errored: 0 [2025-01-20 22:53:03,856 I 11815 11815] (raylet) grpc_server.cc:134: ObjectManager server started, listening on port 45981. [2025-01-20 22:53:03,860 I 11815 11815] (raylet) worker_killing_policy.cc:101: Running GroupByOwner policy. [2025-01-20 22:53:03,861 I 11815 11815] (raylet) memory_monitor.cc:47: MemoryMonitor initialized with usage threshold at 94999994368 bytes (0.95 system memory), total system memory bytes: 99999997952 [2025-01-20 22:53:03,861 I 11815 11815] (raylet) node_manager.cc:287: Initializing NodeManager node_id=13be7277f830f5a8b967d2a0092091c94c7576cfebf8a5fa66025fcf [2025-01-20 22:53:03,862 I 11815 11815] (raylet) grpc_server.cc:134: NodeManager server started, listening on port 34189. [2025-01-20 22:53:03,871 I 11815 11908] (raylet) agent_manager.cc:77: Monitor agent process with name dashboard_agent/424238335 [2025-01-20 22:53:03,875 I 11815 11910] (raylet) agent_manager.cc:77: Monitor agent process with name runtime_env_agent [2025-01-20 22:53:03,875 I 11815 11815] (raylet) event.cc:493: Ray Event initialized for RAYLET [2025-01-20 22:53:03,875 I 11815 11815] (raylet) event.cc:324: Set ray event level to warning [2025-01-20 22:53:03,878 I 11815 11815] (raylet) raylet.cc:134: Raylet of id, 13be7277f830f5a8b967d2a0092091c94c7576cfebf8a5fa66025fcf started. Raylet consists of node_manager and object_manager. node_manager address: 192.168.0.2:34189 object_manager address: 192.168.0.2:45981 hostname: 0cd925b1f73b [2025-01-20 22:53:03,880 I 11815 11815] (raylet) node_manager.cc:525: [state-dump] NodeManager: [state-dump] Node ID: 13be7277f830f5a8b967d2a0092091c94c7576cfebf8a5fa66025fcf [state-dump] Node name: 192.168.0.2 [state-dump] InitialConfigResources: {node:192.168.0.2: 10000, accelerator_type:A40: 10000, node:__internal_head__: 10000, memory: 844922429440000, object_store_memory: 21474836480000, CPU: 200000, GPU: 20000} [state-dump] ClusterTaskManager: [state-dump] ========== Node: 13be7277f830f5a8b967d2a0092091c94c7576cfebf8a5fa66025fcf ================= [state-dump] Infeasible queue length: 0 [state-dump] Schedule queue length: 0 [state-dump] Dispatch queue length: 0 [state-dump] num_waiting_for_resource: 0 [state-dump] num_waiting_for_plasma_memory: 0 [state-dump] num_waiting_for_remote_node_resources: 0 [state-dump] num_worker_not_started_by_job_config_not_exist: 0 [state-dump] num_worker_not_started_by_registration_timeout: 0 [state-dump] num_tasks_waiting_for_workers: 0 [state-dump] num_cancelled_tasks: 0 [state-dump] cluster_resource_scheduler state: [state-dump] Local id: -2158256074887862688 Local resources: {"total":{node:__internal_head__: [10000], accelerator_type:A40: [10000], GPU: [10000, 10000], CPU: [200000], memory: [844922429440000], object_store_memory: [21474836480000], node:192.168.0.2: [10000]}}, "available": {node:__internal_head__: [10000], accelerator_type:A40: [10000], GPU: [10000, 10000], CPU: [200000], memory: [844922429440000], object_store_memory: [21474836480000], node:192.168.0.2: [10000]}}, "labels":{"ray.io/node_id":"13be7277f830f5a8b967d2a0092091c94c7576cfebf8a5fa66025fcf",} is_draining: 0 is_idle: 1 Cluster resources: node id: -2158256074887862688{"total":{object_store_memory: 21474836480000, memory: 844922429440000, node:__internal_head__: 10000, node:192.168.0.2: 10000, accelerator_type:A40: 10000, GPU: 20000, CPU: 200000}}, "available": {object_store_memory: 21474836480000, memory: 844922429440000, node:__internal_head__: 10000, node:192.168.0.2: 10000, accelerator_type:A40: 10000, GPU: 20000, CPU: 200000}}, "labels":{"ray.io/node_id":"13be7277f830f5a8b967d2a0092091c94c7576cfebf8a5fa66025fcf",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} [state-dump] Waiting tasks size: 0 [state-dump] Number of executing tasks: 0 [state-dump] Number of pinned task arguments: 0 [state-dump] Number of total spilled tasks: 0 [state-dump] Number of spilled waiting tasks: 0 [state-dump] Number of spilled unschedulable tasks: 0 [state-dump] Resource usage { [state-dump] } [state-dump] Backlog Size per scheduling descriptor :{workerId: num backlogs}: [state-dump] [state-dump] Running tasks by scheduling class: [state-dump] ================================================== [state-dump] [state-dump] ClusterResources: [state-dump] LocalObjectManager: [state-dump] - num pinned objects: 0 [state-dump] - pinned objects size: 0 [state-dump] - num objects pending restore: 0 [state-dump] - num objects pending spill: 0 [state-dump] - num bytes pending spill: 0 [state-dump] - num bytes currently spilled: 0 [state-dump] - cumulative spill requests: 0 [state-dump] - cumulative restore requests: 0 [state-dump] - spilled objects pending delete: 0 [state-dump] [state-dump] ObjectManager: [state-dump] - num local objects: 0 [state-dump] - num unfulfilled push requests: 0 [state-dump] - num object pull requests: 0 [state-dump] - num chunks received total: 0 [state-dump] - num chunks received failed (all): 0 [state-dump] - num chunks received failed / cancelled: 0 [state-dump] - num chunks received failed / plasma error: 0 [state-dump] Event stats: [state-dump] Global stats: 0 total (0 active) [state-dump] Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Execution time: mean = -nan s, total = 0.000 s [state-dump] Event stats: [state-dump] PushManager: [state-dump] - num pushes in flight: 0 [state-dump] - num chunks in flight: 0 [state-dump] - num chunks remaining: 0 [state-dump] - max chunks allowed: 409 [state-dump] OwnershipBasedObjectDirectory: [state-dump] - num listeners: 0 [state-dump] - cumulative location updates: 0 [state-dump] - num location updates per second: 70341898936028000.000 [state-dump] - num location lookups per second: 70341898936016000.000 [state-dump] - num locations added per second: 0.000 [state-dump] - num locations removed per second: 0.000 [state-dump] BufferPool: [state-dump] - create buffer state map size: 0 [state-dump] PullManager: [state-dump] - num bytes available for pulled objects: 2147483648 [state-dump] - num bytes being pulled (all): 0 [state-dump] - num bytes being pulled / pinned: 0 [state-dump] - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - first get request bundle: N/A [state-dump] - first wait request bundle: N/A [state-dump] - first task request bundle: N/A [state-dump] - num objects queued: 0 [state-dump] - num objects actively pulled (all): 0 [state-dump] - num objects actively pulled / pinned: 0 [state-dump] - num bundles being pulled: 0 [state-dump] - num pull retries: 0 [state-dump] - max timeout seconds: 0 [state-dump] - max timeout request is already processed. No entry. [state-dump] [state-dump] WorkerPool: [state-dump] - registered jobs: 0 [state-dump] - process_failed_job_config_missing: 0 [state-dump] - process_failed_rate_limited: 0 [state-dump] - process_failed_pending_registration: 0 [state-dump] - process_failed_runtime_env_setup_failed: 0 [state-dump] - num PYTHON workers: 0 [state-dump] - num PYTHON drivers: 0 [state-dump] - num PYTHON pending start requests: 0 [state-dump] - num PYTHON pending registration requests: 0 [state-dump] - num object spill callbacks queued: 0 [state-dump] - num object restore queued: 0 [state-dump] - num util functions queued: 0 [state-dump] - num idle workers: 0 [state-dump] TaskDependencyManager: [state-dump] - task deps map size: 0 [state-dump] - get req map size: 0 [state-dump] - wait req map size: 0 [state-dump] - local objects map size: 0 [state-dump] WaitManager: [state-dump] - num active wait requests: 0 [state-dump] Subscriber: [state-dump] Channel WORKER_OBJECT_LOCATIONS_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_OBJECT_EVICTION [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_REF_REMOVED_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] num async plasma notifications: 0 [state-dump] Remote node managers: [state-dump] Event stats: [state-dump] Global stats: 28 total (13 active) [state-dump] Queueing time: mean = 1.862 ms, max = 15.027 ms, min = 25.785 us, total = 52.143 ms [state-dump] Execution time: mean = 36.945 ms, total = 1.034 s [state-dump] Event stats: [state-dump] PeriodicalRunner.RunFnPeriodically - 11 total (2 active, 1 running), Execution time: mean = 171.887 us, total = 1.891 ms, Queueing time: mean = 4.723 ms, max = 15.027 ms, min = 25.785 us, total = 51.955 ms [state-dump] MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 1 total (1 active), Execution time: mean = 0.000 s, total = 0.000 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 1 total (1 active), Execution time: mean = 0.000 s, total = 0.000 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.ScheduleAndDispatchTasks - 1 total (1 active), Execution time: mean = 0.000 s, total = 0.000 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.GCTaskFailureReason - 1 total (1 active), Execution time: mean = 0.000 s, total = 0.000 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 1.740 ms, total = 1.740 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 2.263 ms, total = 2.263 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 1 total (1 active), Execution time: mean = 0.000 s, total = 0.000 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 1.026 s, total = 1.026 s, Queueing time: mean = 109.121 us, max = 109.121 us, min = 109.121 us, total = 109.121 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 407.271 us, total = 407.271 us, Queueing time: mean = 44.821 us, max = 44.821 us, min = 44.821 us, total = 44.821 us [state-dump] NodeManager.deadline_timer.record_metrics - 1 total (1 active), Execution time: mean = 0.000 s, total = 0.000 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ObjectManager.UpdateAvailableMemory - 1 total (0 active), Execution time: mean = 5.766 us, total = 5.766 us, Queueing time: mean = 33.633 us, max = 33.633 us, min = 33.633 us, total = 33.633 us [state-dump] ClusterResourceManager.ResetRemoteNodeView - 1 total (1 active), Execution time: mean = 0.000 s, total = 0.000 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.deadline_timer.debug_state_dump - 1 total (1 active), Execution time: mean = 0.000 s, total = 0.000 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] RayletWorkerPool.deadline_timer.kill_idle_workers - 1 total (1 active), Execution time: mean = 0.000 s, total = 0.000 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 1 total (0 active), Execution time: mean = 1.828 ms, total = 1.828 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.deadline_timer.spill_objects_when_over_threshold - 1 total (1 active), Execution time: mean = 0.000 s, total = 0.000 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.deadline_timer.flush_free_objects - 1 total (1 active), Execution time: mean = 0.000 s, total = 0.000 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] DebugString() time ms: 0 [state-dump] [state-dump] [2025-01-20 22:53:03,882 I 11815 11815] (raylet) accessor.cc:762: Received notification for node, IsAlive = 1 node_id=13be7277f830f5a8b967d2a0092091c94c7576cfebf8a5fa66025fcf [2025-01-20 22:53:03,960 I 11815 11815] (raylet) worker_pool.cc:501: Started worker process with pid 11946, the token is 0 [2025-01-20 22:53:03,964 I 11815 11815] (raylet) worker_pool.cc:501: Started worker process with pid 11947, the token is 1 [2025-01-20 22:53:03,966 I 11815 11815] (raylet) worker_pool.cc:501: Started worker process with pid 11948, the token is 2 [2025-01-20 22:53:03,969 I 11815 11815] (raylet) worker_pool.cc:501: Started worker process with pid 11949, the token is 3 [2025-01-20 22:53:03,971 I 11815 11815] (raylet) worker_pool.cc:501: Started worker process with pid 11950, the token is 4 [2025-01-20 22:53:03,973 I 11815 11815] (raylet) worker_pool.cc:501: Started worker process with pid 11951, the token is 5 [2025-01-20 22:53:03,976 I 11815 11815] (raylet) worker_pool.cc:501: Started worker process with pid 11952, the token is 6 [2025-01-20 22:53:03,978 I 11815 11815] (raylet) worker_pool.cc:501: Started worker process with pid 11953, the token is 7 [2025-01-20 22:53:03,981 I 11815 11815] (raylet) worker_pool.cc:501: Started worker process with pid 11954, the token is 8 [2025-01-20 22:53:03,983 I 11815 11815] (raylet) worker_pool.cc:501: Started worker process with pid 11955, the token is 9 [2025-01-20 22:53:03,985 I 11815 11815] (raylet) worker_pool.cc:501: Started worker process with pid 11956, the token is 10 [2025-01-20 22:53:03,988 I 11815 11815] (raylet) worker_pool.cc:501: Started worker process with pid 11957, the token is 11 [2025-01-20 22:53:03,990 I 11815 11815] (raylet) worker_pool.cc:501: Started worker process with pid 11958, the token is 12 [2025-01-20 22:53:03,993 I 11815 11815] (raylet) worker_pool.cc:501: Started worker process with pid 11959, the token is 13 [2025-01-20 22:53:03,996 I 11815 11815] (raylet) worker_pool.cc:501: Started worker process with pid 11960, the token is 14 [2025-01-20 22:53:03,999 I 11815 11815] (raylet) worker_pool.cc:501: Started worker process with pid 11961, the token is 15 [2025-01-20 22:53:04,002 I 11815 11815] (raylet) worker_pool.cc:501: Started worker process with pid 11962, the token is 16 [2025-01-20 22:53:04,005 I 11815 11815] (raylet) worker_pool.cc:501: Started worker process with pid 11963, the token is 17 [2025-01-20 22:53:04,008 I 11815 11815] (raylet) worker_pool.cc:501: Started worker process with pid 11964, the token is 18 [2025-01-20 22:53:04,010 I 11815 11815] (raylet) worker_pool.cc:501: Started worker process with pid 11965, the token is 19 [2025-01-20 22:53:04,662 I 11815 11843] (raylet) object_store.cc:35: Object store current usage 8e-09 / 2.14748 GB. [2025-01-20 22:53:04,800 I 11815 11815] (raylet) worker_pool.cc:692: Job 01000000 already started in worker pool. [2025-01-20 22:53:12,865 W 11815 11837] (raylet) metric_exporter.cc:105: [1] Export metrics to agent failed: RpcError: RPC Error message: failed to connect to all addresses; last error: UNKNOWN: ipv4:127.0.0.1:56214: Failed to connect to remote host: Connection refused; RPC Error details: . This won't affect Ray, but you can lose metrics from the cluster. [2025-01-20 22:54:02,852 I 11815 11843] (raylet) store.cc:564: Plasma store debug dump: Current usage: 0 / 2.14748 GB - num bytes created total: 168 0 pending objects of total size 0MB - objects spillable: 0 - bytes spillable: 0 - objects unsealed: 0 - bytes unsealed: 0 - objects in use: 0 - bytes in use: 0 - objects evictable: 0 - bytes evictable: 0 - objects created by worker: 0 - bytes created by worker: 0 - objects restored: 0 - bytes restored: 0 - objects received: 0 - bytes received: 0 - objects errored: 0 - bytes errored: 0 [2025-01-20 22:54:03,883 I 11815 11815] (raylet) node_manager.cc:525: [state-dump] NodeManager: [state-dump] Node ID: 13be7277f830f5a8b967d2a0092091c94c7576cfebf8a5fa66025fcf [state-dump] Node name: 192.168.0.2 [state-dump] InitialConfigResources: {node:192.168.0.2: 10000, accelerator_type:A40: 10000, node:__internal_head__: 10000, memory: 844922429440000, object_store_memory: 21474836480000, CPU: 200000, GPU: 20000} [state-dump] ClusterTaskManager: [state-dump] ========== Node: 13be7277f830f5a8b967d2a0092091c94c7576cfebf8a5fa66025fcf ================= [state-dump] Infeasible queue length: 0 [state-dump] Schedule queue length: 0 [state-dump] Dispatch queue length: 0 [state-dump] num_waiting_for_resource: 0 [state-dump] num_waiting_for_plasma_memory: 0 [state-dump] num_waiting_for_remote_node_resources: 0 [state-dump] num_worker_not_started_by_job_config_not_exist: 0 [state-dump] num_worker_not_started_by_registration_timeout: 0 [state-dump] num_tasks_waiting_for_workers: 0 [state-dump] num_cancelled_tasks: 0 [state-dump] cluster_resource_scheduler state: [state-dump] Local id: -2158256074887862688 Local resources: {"total":{node:__internal_head__: [10000], accelerator_type:A40: [10000], GPU: [10000, 10000], CPU: [200000], memory: [844922429440000], object_store_memory: [21474836480000], node:192.168.0.2: [10000]}}, "available": {node:__internal_head__: [10000], accelerator_type:A40: [10000], GPU: [10000, 10000], CPU: [200000], memory: [844922429440000], object_store_memory: [21474836480000], node:192.168.0.2: [10000]}}, "labels":{"ray.io/node_id":"13be7277f830f5a8b967d2a0092091c94c7576cfebf8a5fa66025fcf",} is_draining: 0 is_idle: 1 Cluster resources: node id: -2158256074887862688{"total":{object_store_memory: 21474836480000, node:__internal_head__: 10000, CPU: 200000, GPU: 20000, node:192.168.0.2: 10000, accelerator_type:A40: 10000, memory: 844922429440000}}, "available": {object_store_memory: 21474836480000, node:__internal_head__: 10000, CPU: 200000, GPU: 20000, node:192.168.0.2: 10000, accelerator_type:A40: 10000, memory: 844922429440000}}, "labels":{"ray.io/node_id":"13be7277f830f5a8b967d2a0092091c94c7576cfebf8a5fa66025fcf",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} [state-dump] Waiting tasks size: 0 [state-dump] Number of executing tasks: 0 [state-dump] Number of pinned task arguments: 0 [state-dump] Number of total spilled tasks: 0 [state-dump] Number of spilled waiting tasks: 0 [state-dump] Number of spilled unschedulable tasks: 0 [state-dump] Resource usage { [state-dump] } [state-dump] Backlog Size per scheduling descriptor :{workerId: num backlogs}: [state-dump] [state-dump] Running tasks by scheduling class: [state-dump] ================================================== [state-dump] [state-dump] ClusterResources: [state-dump] LocalObjectManager: [state-dump] - num pinned objects: 0 [state-dump] - pinned objects size: 0 [state-dump] - num objects pending restore: 0 [state-dump] - num objects pending spill: 0 [state-dump] - num bytes pending spill: 0 [state-dump] - num bytes currently spilled: 0 [state-dump] - cumulative spill requests: 0 [state-dump] - cumulative restore requests: 0 [state-dump] - spilled objects pending delete: 0 [state-dump] [state-dump] ObjectManager: [state-dump] - num local objects: 0 [state-dump] - num unfulfilled push requests: 0 [state-dump] - num object pull requests: 0 [state-dump] - num chunks received total: 0 [state-dump] - num chunks received failed (all): 0 [state-dump] - num chunks received failed / cancelled: 0 [state-dump] - num chunks received failed / plasma error: 0 [state-dump] Event stats: [state-dump] Global stats: 0 total (0 active) [state-dump] Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Execution time: mean = -nan s, total = 0.000 s [state-dump] Event stats: [state-dump] PushManager: [state-dump] - num pushes in flight: 0 [state-dump] - num chunks in flight: 0 [state-dump] - num chunks remaining: 0 [state-dump] - max chunks allowed: 409 [state-dump] OwnershipBasedObjectDirectory: [state-dump] - num listeners: 0 [state-dump] - cumulative location updates: 0 [state-dump] - num location updates per second: 0.000 [state-dump] - num location lookups per second: 0.000 [state-dump] - num locations added per second: 0.000 [state-dump] - num locations removed per second: 0.000 [state-dump] BufferPool: [state-dump] - create buffer state map size: 0 [state-dump] PullManager: [state-dump] - num bytes available for pulled objects: 2147483648 [state-dump] - num bytes being pulled (all): 0 [state-dump] - num bytes being pulled / pinned: 0 [state-dump] - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - first get request bundle: N/A [state-dump] - first wait request bundle: N/A [state-dump] - first task request bundle: N/A [state-dump] - num objects queued: 0 [state-dump] - num objects actively pulled (all): 0 [state-dump] - num objects actively pulled / pinned: 0 [state-dump] - num bundles being pulled: 0 [state-dump] - num pull retries: 0 [state-dump] - max timeout seconds: 0 [state-dump] - max timeout request is already processed. No entry. [state-dump] [state-dump] WorkerPool: [state-dump] - registered jobs: 1 [state-dump] - process_failed_job_config_missing: 0 [state-dump] - process_failed_rate_limited: 0 [state-dump] - process_failed_pending_registration: 0 [state-dump] - process_failed_runtime_env_setup_failed: 0 [state-dump] - num PYTHON workers: 20 [state-dump] - num PYTHON drivers: 1 [state-dump] - num PYTHON pending start requests: 0 [state-dump] - num PYTHON pending registration requests: 0 [state-dump] - num object spill callbacks queued: 0 [state-dump] - num object restore queued: 0 [state-dump] - num util functions queued: 0 [state-dump] - num idle workers: 20 [state-dump] TaskDependencyManager: [state-dump] - task deps map size: 0 [state-dump] - get req map size: 0 [state-dump] - wait req map size: 0 [state-dump] - local objects map size: 0 [state-dump] WaitManager: [state-dump] - num active wait requests: 0 [state-dump] Subscriber: [state-dump] Channel WORKER_OBJECT_LOCATIONS_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_OBJECT_EVICTION [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_REF_REMOVED_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] num async plasma notifications: 0 [state-dump] Remote node managers: [state-dump] Event stats: [state-dump] Global stats: 5525 total (35 active) [state-dump] Queueing time: mean = 288.468 us, max = 786.606 ms, min = 57.000 ns, total = 1.594 s [state-dump] Execution time: mean = 553.758 us, total = 3.060 s [state-dump] Event stats: [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 1260 total (0 active), Execution time: mean = 39.394 us, total = 49.636 ms, Queueing time: mean = 136.724 us, max = 26.128 ms, min = 13.687 us, total = 172.272 ms [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog - 1260 total (0 active), Execution time: mean = 586.500 us, total = 738.991 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ObjectManager.UpdateAvailableMemory - 600 total (0 active), Execution time: mean = 6.471 us, total = 3.882 ms, Queueing time: mean = 116.762 us, max = 428.341 us, min = 3.115 us, total = 70.057 ms [state-dump] NodeManager.CheckGC - 600 total (1 active), Execution time: mean = 3.263 us, total = 1.958 ms, Queueing time: mean = 156.500 us, max = 28.206 ms, min = 18.930 us, total = 93.900 ms [state-dump] RaySyncer.OnDemandBroadcasting - 600 total (1 active), Execution time: mean = 12.409 us, total = 7.445 ms, Queueing time: mean = 148.294 us, max = 28.199 ms, min = 16.006 us, total = 88.976 ms [state-dump] RayletWorkerPool.deadline_timer.kill_idle_workers - 300 total (1 active), Execution time: mean = 20.535 us, total = 6.160 ms, Queueing time: mean = 77.824 us, max = 890.821 us, min = 11.893 us, total = 23.347 ms [state-dump] MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 240 total (1 active), Execution time: mean = 474.513 us, total = 113.883 ms, Queueing time: mean = 176.070 us, max = 23.328 ms, min = 26.482 us, total = 42.257 ms [state-dump] ClientConnection.async_read.ProcessMessageHeader - 84 total (21 active), Execution time: mean = 7.684 us, total = 645.453 us, Queueing time: mean = 11.774 ms, max = 786.606 ms, min = 33.517 us, total = 989.031 ms [state-dump] ClientConnection.async_read.ProcessMessage - 63 total (0 active), Execution time: mean = 1.089 ms, total = 68.619 ms, Queueing time: mean = 46.329 us, max = 369.478 us, min = 3.543 us, total = 2.919 ms [state-dump] NodeManager.ScheduleAndDispatchTasks - 60 total (1 active), Execution time: mean = 16.404 us, total = 984.212 us, Queueing time: mean = 70.993 us, max = 127.346 us, min = 11.172 us, total = 4.260 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad - 60 total (0 active), Execution time: mean = 661.142 us, total = 39.669 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.deadline_timer.flush_free_objects - 60 total (1 active), Execution time: mean = 8.736 us, total = 524.160 us, Queueing time: mean = 165.425 us, max = 1.195 ms, min = 20.042 us, total = 9.926 ms [state-dump] NodeManager.deadline_timer.spill_objects_when_over_threshold - 60 total (1 active), Execution time: mean = 3.015 us, total = 180.918 us, Queueing time: mean = 169.502 us, max = 1.191 ms, min = 17.091 us, total = 10.170 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 60 total (0 active), Execution time: mean = 102.830 us, total = 6.170 ms, Queueing time: mean = 125.087 us, max = 187.485 us, min = 35.417 us, total = 7.505 ms [state-dump] ClientConnection.async_write.DoAsyncWrites - 22 total (0 active), Execution time: mean = 1.574 us, total = 34.627 us, Queueing time: mean = 41.270 us, max = 146.493 us, min = 11.116 us, total = 907.929 us [state-dump] NodeManagerService.grpc_server.GetSystemConfig - 21 total (0 active), Execution time: mean = 758.936 us, total = 15.938 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ObjectManager.ObjectDeleted - 21 total (0 active), Execution time: mean = 19.347 us, total = 406.289 us, Queueing time: mean = 139.222 us, max = 517.588 us, min = 35.214 us, total = 2.924 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 21 total (0 active), Execution time: mean = 121.458 us, total = 2.551 ms, Queueing time: mean = 94.581 us, max = 235.076 us, min = 11.959 us, total = 1.986 ms [state-dump] ObjectManager.ObjectAdded - 21 total (0 active), Execution time: mean = 12.500 us, total = 262.503 us, Queueing time: mean = 106.328 us, max = 212.914 us, min = 9.723 us, total = 2.233 ms [state-dump] ClusterResourceManager.ResetRemoteNodeView - 21 total (1 active), Execution time: mean = 9.424 us, total = 197.897 us, Queueing time: mean = 70.290 us, max = 101.558 us, min = 34.849 us, total = 1.476 ms [state-dump] PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 197.633 us, total = 2.569 ms, Queueing time: mean = 4.308 ms, max = 15.027 ms, min = 25.785 us, total = 56.007 ms [state-dump] NodeManager.deadline_timer.record_metrics - 12 total (1 active), Execution time: mean = 565.578 us, total = 6.787 ms, Queueing time: mean = 249.721 us, max = 664.744 us, min = 59.452 us, total = 2.997 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 12 total (0 active), Execution time: mean = 51.247 us, total = 614.961 us, Queueing time: mean = 124.710 us, max = 211.539 us, min = 40.818 us, total = 1.497 ms [state-dump] NodeManager.GcsCheckAlive - 12 total (1 active), Execution time: mean = 259.428 us, total = 3.113 ms, Queueing time: mean = 528.020 us, max = 908.941 us, min = 300.249 us, total = 6.336 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 12 total (0 active), Execution time: mean = 1.505 ms, total = 18.059 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.deadline_timer.debug_state_dump - 6 total (1 active), Execution time: mean = 1.485 ms, total = 8.909 ms, Queueing time: mean = 53.027 us, max = 70.864 us, min = 54.069 us, total = 318.164 us [state-dump] RaySyncerRegister - 2 total (0 active), Execution time: mean = 1.800 us, total = 3.599 us, Queueing time: mean = 209.500 ns, max = 362.000 ns, min = 57.000 ns, total = 419.000 ns [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 1.646 ms, total = 3.291 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 2 total (1 active), Execution time: mean = 460.725 ms, total = 921.449 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 129.215 us, total = 258.430 us, Queueing time: mean = 412.096 us, max = 685.035 us, min = 139.158 us, total = 824.193 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 1.586 ms, total = 1.586 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 60.113 us, total = 60.113 us, Queueing time: mean = 447.798 us, max = 447.798 us, min = 447.798 us, total = 447.798 us [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 1 total (0 active), Execution time: mean = 327.529 us, total = 327.529 us, Queueing time: mean = 115.620 us, max = 115.620 us, min = 115.620 us, total = 115.620 us [state-dump] - 1 total (0 active), Execution time: mean = 657.000 ns, total = 657.000 ns, Queueing time: mean = 182.528 us, max = 182.528 us, min = 182.528 us, total = 182.528 us [state-dump] NodeManager.GCTaskFailureReason - 1 total (1 active), Execution time: mean = 0.000 s, total = 0.000 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 86.446 us, total = 86.446 us, Queueing time: mean = 403.833 us, max = 403.833 us, min = 403.833 us, total = 403.833 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 29.389 us, total = 29.389 us, Queueing time: mean = 150.068 us, max = 150.068 us, min = 150.068 us, total = 150.068 us [state-dump] RaySyncer.BroadcastMessage - 1 total (0 active), Execution time: mean = 59.018 us, total = 59.018 us, Queueing time: mean = 70.000 ns, max = 70.000 ns, min = 70.000 ns, total = 70.000 ns [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 407.271 us, total = 407.271 us, Queueing time: mean = 44.821 us, max = 44.821 us, min = 44.821 us, total = 44.821 us [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 1.740 ms, total = 1.740 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 1.546 ms, total = 1.546 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.deadline_timer.print_event_loop_stats - 1 total (1 active, 1 running), Execution time: mean = 0.000 s, total = 0.000 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 1.759 ms, total = 1.759 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 2.263 ms, total = 2.263 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 132.115 us, total = 132.115 us, Queueing time: mean = 203.939 us, max = 203.939 us, min = 203.939 us, total = 203.939 us [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 1.026 s, total = 1.026 s, Queueing time: mean = 109.121 us, max = 109.121 us, min = 109.121 us, total = 109.121 us [state-dump] DebugString() time ms: 1 [state-dump] [state-dump] [2025-01-20 22:55:02,852 I 11815 11843] (raylet) store.cc:564: Plasma store debug dump: Current usage: 0 / 2.14748 GB - num bytes created total: 168 0 pending objects of total size 0MB - objects spillable: 0 - bytes spillable: 0 - objects unsealed: 0 - bytes unsealed: 0 - objects in use: 0 - bytes in use: 0 - objects evictable: 0 - bytes evictable: 0 - objects created by worker: 0 - bytes created by worker: 0 - objects restored: 0 - bytes restored: 0 - objects received: 0 - bytes received: 0 - objects errored: 0 - bytes errored: 0 [2025-01-20 22:55:03,885 I 11815 11815] (raylet) node_manager.cc:525: [state-dump] NodeManager: [state-dump] Node ID: 13be7277f830f5a8b967d2a0092091c94c7576cfebf8a5fa66025fcf [state-dump] Node name: 192.168.0.2 [state-dump] InitialConfigResources: {node:192.168.0.2: 10000, accelerator_type:A40: 10000, node:__internal_head__: 10000, memory: 844922429440000, object_store_memory: 21474836480000, CPU: 200000, GPU: 20000} [state-dump] ClusterTaskManager: [state-dump] ========== Node: 13be7277f830f5a8b967d2a0092091c94c7576cfebf8a5fa66025fcf ================= [state-dump] Infeasible queue length: 0 [state-dump] Schedule queue length: 0 [state-dump] Dispatch queue length: 0 [state-dump] num_waiting_for_resource: 0 [state-dump] num_waiting_for_plasma_memory: 0 [state-dump] num_waiting_for_remote_node_resources: 0 [state-dump] num_worker_not_started_by_job_config_not_exist: 0 [state-dump] num_worker_not_started_by_registration_timeout: 0 [state-dump] num_tasks_waiting_for_workers: 0 [state-dump] num_cancelled_tasks: 0 [state-dump] cluster_resource_scheduler state: [state-dump] Local id: -2158256074887862688 Local resources: {"total":{node:__internal_head__: [10000], accelerator_type:A40: [10000], GPU: [10000, 10000], CPU: [200000], memory: [844922429440000], object_store_memory: [21474836480000], node:192.168.0.2: [10000]}}, "available": {node:__internal_head__: [10000], accelerator_type:A40: [10000], GPU: [10000, 10000], CPU: [160000], memory: [844922429440000], object_store_memory: [21474836480000], node:192.168.0.2: [10000]}}, "labels":{"ray.io/node_id":"13be7277f830f5a8b967d2a0092091c94c7576cfebf8a5fa66025fcf",} is_draining: 0 is_idle: 0 Cluster resources: node id: -2158256074887862688{"total":{object_store_memory: 21474836480000, memory: 844922429440000, accelerator_type:A40: 10000, node:192.168.0.2: 10000, GPU: 20000, node:__internal_head__: 10000, CPU: 200000}}, "available": {GPU: 20000, node:__internal_head__: 10000, memory: 844922429440000, accelerator_type:A40: 10000, object_store_memory: 21474836480000, node:192.168.0.2: 10000, CPU: 160000}}, "labels":{"ray.io/node_id":"13be7277f830f5a8b967d2a0092091c94c7576cfebf8a5fa66025fcf",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} [state-dump] Waiting tasks size: 0 [state-dump] Number of executing tasks: 4 [state-dump] Number of pinned task arguments: 0 [state-dump] Number of total spilled tasks: 0 [state-dump] Number of spilled waiting tasks: 0 [state-dump] Number of spilled unschedulable tasks: 0 [state-dump] Resource usage { [state-dump] - (language=PYTHON actor_or_task=process_single_file pid=11947 worker_id=a076ef60b315c051997c61da25a41330f93cf4ab04ebec2a79e29cc2): {CPU: 10000} [state-dump] - (language=PYTHON actor_or_task=process_single_file pid=11950 worker_id=e397033d2803f63381ba90f75d3cdd81c98c019f646f24e7c68aad8c): {CPU: 10000} [state-dump] - (language=PYTHON actor_or_task=process_single_file pid=11949 worker_id=15ac23c361271ee3f3657eb716bee5cf4d69204697adb7a1d9f816ae): {CPU: 10000} [state-dump] - (language=PYTHON actor_or_task=process_single_file pid=11954 worker_id=03d0fa710aae153978e181c4a6c55fab840d031b5f864228bcfd7dcf): {CPU: 10000} [state-dump] } [state-dump] Backlog Size per scheduling descriptor :{workerId: num backlogs}: [state-dump] [state-dump] Running tasks by scheduling class: [state-dump] - {depth=1 function_descriptor={type=PythonFunctionDescriptor, module_name=__main__, class_name=, function_name=process_single_file, function_hash=844ab8a910714a069e942f339e277bbe} scheduling_strategy=default_scheduling_strategy { [state-dump] } [state-dump] resource_set={CPU : 1, }}: 4/20 [state-dump] ================================================== [state-dump] [state-dump] ClusterResources: [state-dump] LocalObjectManager: [state-dump] - num pinned objects: 0 [state-dump] - pinned objects size: 0 [state-dump] - num objects pending restore: 0 [state-dump] - num objects pending spill: 0 [state-dump] - num bytes pending spill: 0 [state-dump] - num bytes currently spilled: 0 [state-dump] - cumulative spill requests: 0 [state-dump] - cumulative restore requests: 0 [state-dump] - spilled objects pending delete: 0 [state-dump] [state-dump] ObjectManager: [state-dump] - num local objects: 0 [state-dump] - num unfulfilled push requests: 0 [state-dump] - num object pull requests: 0 [state-dump] - num chunks received total: 0 [state-dump] - num chunks received failed (all): 0 [state-dump] - num chunks received failed / cancelled: 0 [state-dump] - num chunks received failed / plasma error: 0 [state-dump] Event stats: [state-dump] Global stats: 0 total (0 active) [state-dump] Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Execution time: mean = -nan s, total = 0.000 s [state-dump] Event stats: [state-dump] PushManager: [state-dump] - num pushes in flight: 0 [state-dump] - num chunks in flight: 0 [state-dump] - num chunks remaining: 0 [state-dump] - max chunks allowed: 409 [state-dump] OwnershipBasedObjectDirectory: [state-dump] - num listeners: 0 [state-dump] - cumulative location updates: 0 [state-dump] - num location updates per second: 0.000 [state-dump] - num location lookups per second: 0.000 [state-dump] - num locations added per second: 0.000 [state-dump] - num locations removed per second: 0.000 [state-dump] BufferPool: [state-dump] - create buffer state map size: 0 [state-dump] PullManager: [state-dump] - num bytes available for pulled objects: 2147483648 [state-dump] - num bytes being pulled (all): 0 [state-dump] - num bytes being pulled / pinned: 0 [state-dump] - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - first get request bundle: N/A [state-dump] - first wait request bundle: N/A [state-dump] - first task request bundle: N/A [state-dump] - num objects queued: 0 [state-dump] - num objects actively pulled (all): 0 [state-dump] - num objects actively pulled / pinned: 0 [state-dump] - num bundles being pulled: 0 [state-dump] - num pull retries: 0 [state-dump] - max timeout seconds: 0 [state-dump] - max timeout request is already processed. No entry. [state-dump] [state-dump] WorkerPool: [state-dump] - registered jobs: 1 [state-dump] - process_failed_job_config_missing: 0 [state-dump] - process_failed_rate_limited: 0 [state-dump] - process_failed_pending_registration: 0 [state-dump] - process_failed_runtime_env_setup_failed: 0 [state-dump] - num PYTHON workers: 20 [state-dump] - num PYTHON drivers: 1 [state-dump] - num PYTHON pending start requests: 0 [state-dump] - num PYTHON pending registration requests: 0 [state-dump] - num object spill callbacks queued: 0 [state-dump] - num object restore queued: 0 [state-dump] - num util functions queued: 0 [state-dump] - num idle workers: 16 [state-dump] TaskDependencyManager: [state-dump] - task deps map size: 0 [state-dump] - get req map size: 0 [state-dump] - wait req map size: 0 [state-dump] - local objects map size: 0 [state-dump] WaitManager: [state-dump] - num active wait requests: 0 [state-dump] Subscriber: [state-dump] Channel WORKER_OBJECT_LOCATIONS_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_OBJECT_EVICTION [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_REF_REMOVED_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] num async plasma notifications: 0 [state-dump] Remote node managers: [state-dump] Event stats: [state-dump] Global stats: 10830 total (35 active) [state-dump] Queueing time: mean = 83.899 ms, max = 90.698 s, min = 57.000 ns, total = 908.631 s [state-dump] Execution time: mean = 374.654 us, total = 4.057 s [state-dump] Event stats: [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 2520 total (0 active), Execution time: mean = 40.856 us, total = 102.957 ms, Queueing time: mean = 127.102 us, max = 26.128 ms, min = 7.877 us, total = 320.298 ms [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog - 2520 total (0 active), Execution time: mean = 572.231 us, total = 1.442 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ObjectManager.UpdateAvailableMemory - 1199 total (0 active), Execution time: mean = 6.517 us, total = 7.814 ms, Queueing time: mean = 116.747 us, max = 578.536 us, min = 3.115 us, total = 139.980 ms [state-dump] RaySyncer.OnDemandBroadcasting - 1199 total (1 active), Execution time: mean = 14.151 us, total = 16.966 ms, Queueing time: mean = 120.960 us, max = 28.199 ms, min = 16.006 us, total = 145.032 ms [state-dump] NodeManager.CheckGC - 1199 total (1 active), Execution time: mean = 3.302 us, total = 3.960 ms, Queueing time: mean = 130.842 us, max = 28.206 ms, min = 18.930 us, total = 156.880 ms [state-dump] RayletWorkerPool.deadline_timer.kill_idle_workers - 600 total (1 active), Execution time: mean = 20.113 us, total = 12.068 ms, Queueing time: mean = 75.323 us, max = 890.821 us, min = 11.893 us, total = 45.194 ms [state-dump] MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 479 total (1 active), Execution time: mean = 472.074 us, total = 226.123 ms, Queueing time: mean = 127.372 us, max = 23.328 ms, min = 16.675 us, total = 61.011 ms [state-dump] NodeManager.deadline_timer.flush_free_objects - 120 total (1 active), Execution time: mean = 9.069 us, total = 1.088 ms, Queueing time: mean = 181.828 us, max = 2.264 ms, min = 20.042 us, total = 21.819 ms [state-dump] NodeManager.ScheduleAndDispatchTasks - 120 total (1 active), Execution time: mean = 17.565 us, total = 2.108 ms, Queueing time: mean = 94.223 us, max = 2.235 ms, min = 5.788 us, total = 11.307 ms [state-dump] NodeManager.deadline_timer.spill_objects_when_over_threshold - 120 total (1 active), Execution time: mean = 3.005 us, total = 360.549 us, Queueing time: mean = 186.092 us, max = 2.259 ms, min = 17.091 us, total = 22.331 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad - 120 total (0 active), Execution time: mean = 658.322 us, total = 78.999 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 120 total (0 active), Execution time: mean = 112.343 us, total = 13.481 ms, Queueing time: mean = 124.014 us, max = 224.698 us, min = 16.203 us, total = 14.882 ms [state-dump] ClientConnection.async_read.ProcessMessageHeader - 94 total (21 active), Execution time: mean = 8.045 us, total = 756.212 us, Queueing time: mean = 9.655 s, max = 90.698 s, min = 33.517 us, total = 907.589 s [state-dump] ClientConnection.async_read.ProcessMessage - 73 total (0 active), Execution time: mean = 941.754 us, total = 68.748 ms, Queueing time: mean = 41.799 us, max = 369.478 us, min = 3.543 us, total = 3.051 ms [state-dump] ClusterResourceManager.ResetRemoteNodeView - 41 total (1 active), Execution time: mean = 9.714 us, total = 398.294 us, Queueing time: mean = 73.600 us, max = 139.513 us, min = 34.849 us, total = 3.018 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 24 total (0 active), Execution time: mean = 52.331 us, total = 1.256 ms, Queueing time: mean = 124.961 us, max = 211.539 us, min = 40.818 us, total = 2.999 ms [state-dump] NodeManager.deadline_timer.record_metrics - 24 total (1 active), Execution time: mean = 558.038 us, total = 13.393 ms, Queueing time: mean = 363.550 us, max = 1.725 ms, min = 17.618 us, total = 8.725 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 24 total (0 active), Execution time: mean = 1.532 ms, total = 36.772 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.GcsCheckAlive - 24 total (1 active), Execution time: mean = 264.611 us, total = 6.351 ms, Queueing time: mean = 639.735 us, max = 2.306 ms, min = 146.953 us, total = 15.354 ms [state-dump] ClientConnection.async_write.DoAsyncWrites - 22 total (0 active), Execution time: mean = 1.574 us, total = 34.627 us, Queueing time: mean = 41.270 us, max = 146.493 us, min = 11.116 us, total = 907.929 us [state-dump] ObjectManager.ObjectDeleted - 21 total (0 active), Execution time: mean = 19.347 us, total = 406.289 us, Queueing time: mean = 139.222 us, max = 517.588 us, min = 35.214 us, total = 2.924 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig - 21 total (0 active), Execution time: mean = 758.936 us, total = 15.938 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ObjectManager.ObjectAdded - 21 total (0 active), Execution time: mean = 12.500 us, total = 262.503 us, Queueing time: mean = 106.328 us, max = 212.914 us, min = 9.723 us, total = 2.233 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 21 total (0 active), Execution time: mean = 121.458 us, total = 2.551 ms, Queueing time: mean = 94.581 us, max = 235.076 us, min = 11.959 us, total = 1.986 ms [state-dump] PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 197.633 us, total = 2.569 ms, Queueing time: mean = 4.308 ms, max = 15.027 ms, min = 25.785 us, total = 56.007 ms [state-dump] NodeManager.deadline_timer.debug_state_dump - 12 total (1 active), Execution time: mean = 1.703 ms, total = 20.440 ms, Queueing time: mean = 57.427 us, max = 81.809 us, min = 41.543 us, total = 689.128 us [state-dump] NodeManagerService.grpc_server.RequestWorkerLease - 10 total (0 active), Execution time: mean = 780.093 us, total = 7.801 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] WorkerPool.PopWorkerCallback - 10 total (0 active), Execution time: mean = 21.932 us, total = 219.320 us, Queueing time: mean = 109.091 us, max = 200.547 us, min = 20.707 us, total = 1.091 ms [state-dump] NodeManagerService.grpc_server.RequestWorkerLease.HandleRequestImpl - 10 total (0 active), Execution time: mean = 115.576 us, total = 1.156 ms, Queueing time: mean = 115.782 us, max = 331.820 us, min = 13.069 us, total = 1.158 ms [state-dump] RaySyncer.BroadcastMessage - 7 total (0 active), Execution time: mean = 226.389 us, total = 1.585 ms, Queueing time: mean = 698.857 ns, max = 931.000 ns, min = 70.000 ns, total = 4.892 us [state-dump] - 7 total (0 active), Execution time: mean = 1.104 us, total = 7.726 us, Queueing time: mean = 116.440 us, max = 182.528 us, min = 35.904 us, total = 815.078 us [state-dump] NodeManagerService.grpc_server.ReturnWorker.HandleRequestImpl - 6 total (0 active), Execution time: mean = 135.555 us, total = 813.332 us, Queueing time: mean = 120.141 us, max = 137.794 us, min = 97.945 us, total = 720.843 us [state-dump] NodeManagerService.grpc_server.ReturnWorker - 6 total (0 active), Execution time: mean = 658.083 us, total = 3.949 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 1.646 ms, total = 3.291 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] RaySyncerRegister - 2 total (0 active), Execution time: mean = 1.800 us, total = 3.599 us, Queueing time: mean = 209.500 ns, max = 362.000 ns, min = 57.000 ns, total = 419.000 ns [state-dump] NodeManager.deadline_timer.print_event_loop_stats - 2 total (1 active, 1 running), Execution time: mean = 1.441 ms, total = 2.882 ms, Queueing time: mean = 34.472 us, max = 68.945 us, min = 68.945 us, total = 68.945 us [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 129.215 us, total = 258.430 us, Queueing time: mean = 412.096 us, max = 685.035 us, min = 139.158 us, total = 824.193 us [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 2 total (1 active), Execution time: mean = 460.725 ms, total = 921.449 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.GCTaskFailureReason - 1 total (1 active), Execution time: mean = 0.000 s, total = 0.000 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 86.446 us, total = 86.446 us, Queueing time: mean = 403.833 us, max = 403.833 us, min = 403.833 us, total = 403.833 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 29.389 us, total = 29.389 us, Queueing time: mean = 150.068 us, max = 150.068 us, min = 150.068 us, total = 150.068 us [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 1 total (0 active), Execution time: mean = 327.529 us, total = 327.529 us, Queueing time: mean = 115.620 us, max = 115.620 us, min = 115.620 us, total = 115.620 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 60.113 us, total = 60.113 us, Queueing time: mean = 447.798 us, max = 447.798 us, min = 447.798 us, total = 447.798 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 1.546 ms, total = 1.546 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 407.271 us, total = 407.271 us, Queueing time: mean = 44.821 us, max = 44.821 us, min = 44.821 us, total = 44.821 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 1.759 ms, total = 1.759 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 2.263 ms, total = 2.263 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 132.115 us, total = 132.115 us, Queueing time: mean = 203.939 us, max = 203.939 us, min = 203.939 us, total = 203.939 us [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 1.026 s, total = 1.026 s, Queueing time: mean = 109.121 us, max = 109.121 us, min = 109.121 us, total = 109.121 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 1.586 ms, total = 1.586 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 1.740 ms, total = 1.740 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] DebugString() time ms: 1 [state-dump] [state-dump] [2025-01-20 22:56:02,853 I 11815 11843] (raylet) store.cc:564: Plasma store debug dump: Current usage: 0 / 2.14748 GB - num bytes created total: 168 0 pending objects of total size 0MB - objects spillable: 0 - bytes spillable: 0 - objects unsealed: 0 - bytes unsealed: 0 - objects in use: 0 - bytes in use: 0 - objects evictable: 0 - bytes evictable: 0 - objects created by worker: 0 - bytes created by worker: 0 - objects restored: 0 - bytes restored: 0 - objects received: 0 - bytes received: 0 - objects errored: 0 - bytes errored: 0 [2025-01-20 22:56:03,888 I 11815 11815] (raylet) node_manager.cc:525: [state-dump] NodeManager: [state-dump] Node ID: 13be7277f830f5a8b967d2a0092091c94c7576cfebf8a5fa66025fcf [state-dump] Node name: 192.168.0.2 [state-dump] InitialConfigResources: {node:192.168.0.2: 10000, accelerator_type:A40: 10000, node:__internal_head__: 10000, memory: 844922429440000, object_store_memory: 21474836480000, CPU: 200000, GPU: 20000} [state-dump] ClusterTaskManager: [state-dump] ========== Node: 13be7277f830f5a8b967d2a0092091c94c7576cfebf8a5fa66025fcf ================= [state-dump] Infeasible queue length: 0 [state-dump] Schedule queue length: 0 [state-dump] Dispatch queue length: 0 [state-dump] num_waiting_for_resource: 0 [state-dump] num_waiting_for_plasma_memory: 0 [state-dump] num_waiting_for_remote_node_resources: 0 [state-dump] num_worker_not_started_by_job_config_not_exist: 0 [state-dump] num_worker_not_started_by_registration_timeout: 0 [state-dump] num_tasks_waiting_for_workers: 0 [state-dump] num_cancelled_tasks: 0 [state-dump] cluster_resource_scheduler state: [state-dump] Local id: -2158256074887862688 Local resources: {"total":{node:__internal_head__: [10000], accelerator_type:A40: [10000], GPU: [10000, 10000], CPU: [200000], memory: [844922429440000], object_store_memory: [21474836480000], node:192.168.0.2: [10000]}}, "available": {node:__internal_head__: [10000], accelerator_type:A40: [10000], GPU: [10000, 10000], CPU: [200000], memory: [844922429440000], object_store_memory: [21474836480000], node:192.168.0.2: [10000]}}, "labels":{"ray.io/node_id":"13be7277f830f5a8b967d2a0092091c94c7576cfebf8a5fa66025fcf",} is_draining: 0 is_idle: 1 Cluster resources: node id: -2158256074887862688{"total":{object_store_memory: 21474836480000, CPU: 200000, accelerator_type:A40: 10000, GPU: 20000, node:192.168.0.2: 10000, node:__internal_head__: 10000, memory: 844922429440000}}, "available": {object_store_memory: 21474836480000, CPU: 200000, memory: 844922429440000, GPU: 20000, node:192.168.0.2: 10000, accelerator_type:A40: 10000, node:__internal_head__: 10000}}, "labels":{"ray.io/node_id":"13be7277f830f5a8b967d2a0092091c94c7576cfebf8a5fa66025fcf",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} [state-dump] Waiting tasks size: 0 [state-dump] Number of executing tasks: 0 [state-dump] Number of pinned task arguments: 0 [state-dump] Number of total spilled tasks: 0 [state-dump] Number of spilled waiting tasks: 0 [state-dump] Number of spilled unschedulable tasks: 0 [state-dump] Resource usage { [state-dump] } [state-dump] Backlog Size per scheduling descriptor :{workerId: num backlogs}: [state-dump] [state-dump] Running tasks by scheduling class: [state-dump] ================================================== [state-dump] [state-dump] ClusterResources: [state-dump] LocalObjectManager: [state-dump] - num pinned objects: 0 [state-dump] - pinned objects size: 0 [state-dump] - num objects pending restore: 0 [state-dump] - num objects pending spill: 0 [state-dump] - num bytes pending spill: 0 [state-dump] - num bytes currently spilled: 0 [state-dump] - cumulative spill requests: 0 [state-dump] - cumulative restore requests: 0 [state-dump] - spilled objects pending delete: 0 [state-dump] [state-dump] ObjectManager: [state-dump] - num local objects: 0 [state-dump] - num unfulfilled push requests: 0 [state-dump] - num object pull requests: 0 [state-dump] - num chunks received total: 0 [state-dump] - num chunks received failed (all): 0 [state-dump] - num chunks received failed / cancelled: 0 [state-dump] - num chunks received failed / plasma error: 0 [state-dump] Event stats: [state-dump] Global stats: 0 total (0 active) [state-dump] Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Execution time: mean = -nan s, total = 0.000 s [state-dump] Event stats: [state-dump] PushManager: [state-dump] - num pushes in flight: 0 [state-dump] - num chunks in flight: 0 [state-dump] - num chunks remaining: 0 [state-dump] - max chunks allowed: 409 [state-dump] OwnershipBasedObjectDirectory: [state-dump] - num listeners: 0 [state-dump] - cumulative location updates: 0 [state-dump] - num location updates per second: 0.000 [state-dump] - num location lookups per second: 0.000 [state-dump] - num locations added per second: 0.000 [state-dump] - num locations removed per second: 0.000 [state-dump] BufferPool: [state-dump] - create buffer state map size: 0 [state-dump] PullManager: [state-dump] - num bytes available for pulled objects: 2147483648 [state-dump] - num bytes being pulled (all): 0 [state-dump] - num bytes being pulled / pinned: 0 [state-dump] - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - first get request bundle: N/A [state-dump] - first wait request bundle: N/A [state-dump] - first task request bundle: N/A [state-dump] - num objects queued: 0 [state-dump] - num objects actively pulled (all): 0 [state-dump] - num objects actively pulled / pinned: 0 [state-dump] - num bundles being pulled: 0 [state-dump] - num pull retries: 0 [state-dump] - max timeout seconds: 0 [state-dump] - max timeout request is already processed. No entry. [state-dump] [state-dump] WorkerPool: [state-dump] - registered jobs: 1 [state-dump] - process_failed_job_config_missing: 0 [state-dump] - process_failed_rate_limited: 0 [state-dump] - process_failed_pending_registration: 0 [state-dump] - process_failed_runtime_env_setup_failed: 0 [state-dump] - num PYTHON workers: 20 [state-dump] - num PYTHON drivers: 1 [state-dump] - num PYTHON pending start requests: 0 [state-dump] - num PYTHON pending registration requests: 0 [state-dump] - num object spill callbacks queued: 0 [state-dump] - num object restore queued: 0 [state-dump] - num util functions queued: 0 [state-dump] - num idle workers: 20 [state-dump] TaskDependencyManager: [state-dump] - task deps map size: 0 [state-dump] - get req map size: 0 [state-dump] - wait req map size: 0 [state-dump] - local objects map size: 0 [state-dump] WaitManager: [state-dump] - num active wait requests: 0 [state-dump] Subscriber: [state-dump] Channel WORKER_OBJECT_LOCATIONS_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_OBJECT_EVICTION [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_REF_REMOVED_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] num async plasma notifications: 0 [state-dump] Remote node managers: [state-dump] Event stats: [state-dump] Global stats: 16078 total (35 active) [state-dump] Queueing time: mean = 64.194 ms, max = 123.051 s, min = 57.000 ns, total = 1032.105 s [state-dump] Execution time: mean = 312.876 us, total = 5.030 s [state-dump] Event stats: [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 3780 total (0 active), Execution time: mean = 40.201 us, total = 151.961 ms, Queueing time: mean = 123.334 us, max = 26.128 ms, min = 7.877 us, total = 466.204 ms [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog - 3780 total (0 active), Execution time: mean = 566.923 us, total = 2.143 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] RaySyncer.OnDemandBroadcasting - 1798 total (1 active), Execution time: mean = 13.649 us, total = 24.541 ms, Queueing time: mean = 110.822 us, max = 28.199 ms, min = 12.819 us, total = 199.257 ms [state-dump] ObjectManager.UpdateAvailableMemory - 1798 total (0 active), Execution time: mean = 6.408 us, total = 11.522 ms, Queueing time: mean = 117.141 us, max = 578.536 us, min = 3.115 us, total = 210.620 ms [state-dump] NodeManager.CheckGC - 1798 total (1 active), Execution time: mean = 3.252 us, total = 5.847 ms, Queueing time: mean = 120.255 us, max = 28.206 ms, min = 10.316 us, total = 216.218 ms [state-dump] RayletWorkerPool.deadline_timer.kill_idle_workers - 900 total (1 active), Execution time: mean = 20.029 us, total = 18.026 ms, Queueing time: mean = 75.650 us, max = 890.821 us, min = 11.893 us, total = 68.085 ms [state-dump] MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 719 total (1 active), Execution time: mean = 469.841 us, total = 337.816 ms, Queueing time: mean = 110.772 us, max = 23.328 ms, min = 14.052 us, total = 79.645 ms [state-dump] NodeManager.deadline_timer.spill_objects_when_over_threshold - 180 total (1 active), Execution time: mean = 2.994 us, total = 538.879 us, Queueing time: mean = 187.097 us, max = 2.259 ms, min = 17.091 us, total = 33.677 ms [state-dump] NodeManager.deadline_timer.flush_free_objects - 180 total (1 active), Execution time: mean = 9.082 us, total = 1.635 ms, Queueing time: mean = 182.885 us, max = 2.264 ms, min = 18.179 us, total = 32.919 ms [state-dump] NodeManager.ScheduleAndDispatchTasks - 180 total (1 active), Execution time: mean = 16.940 us, total = 3.049 ms, Queueing time: mean = 89.020 us, max = 2.235 ms, min = 5.788 us, total = 16.024 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 180 total (0 active), Execution time: mean = 108.308 us, total = 19.496 ms, Queueing time: mean = 121.080 us, max = 224.698 us, min = 16.203 us, total = 21.794 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad - 180 total (0 active), Execution time: mean = 652.822 us, total = 117.508 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ClientConnection.async_read.ProcessMessageHeader - 95 total (21 active), Execution time: mean = 8.155 us, total = 774.725 us, Queueing time: mean = 10.849 s, max = 123.051 s, min = 33.517 us, total = 1030.640 s [state-dump] ClientConnection.async_read.ProcessMessage - 74 total (0 active), Execution time: mean = 929.411 us, total = 68.776 ms, Queueing time: mean = 43.207 us, max = 369.478 us, min = 3.543 us, total = 3.197 ms [state-dump] ClusterResourceManager.ResetRemoteNodeView - 61 total (1 active), Execution time: mean = 9.585 us, total = 584.682 us, Queueing time: mean = 73.869 us, max = 149.711 us, min = 34.849 us, total = 4.506 ms [state-dump] NodeManager.GcsCheckAlive - 36 total (1 active), Execution time: mean = 273.804 us, total = 9.857 ms, Queueing time: mean = 649.494 us, max = 2.306 ms, min = 146.953 us, total = 23.382 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 36 total (0 active), Execution time: mean = 53.829 us, total = 1.938 ms, Queueing time: mean = 123.938 us, max = 211.539 us, min = 40.818 us, total = 4.462 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 36 total (0 active), Execution time: mean = 1.537 ms, total = 55.344 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.deadline_timer.record_metrics - 36 total (1 active), Execution time: mean = 568.417 us, total = 20.463 ms, Queueing time: mean = 361.136 us, max = 1.725 ms, min = 17.618 us, total = 13.001 ms [state-dump] ClientConnection.async_write.DoAsyncWrites - 22 total (0 active), Execution time: mean = 1.574 us, total = 34.627 us, Queueing time: mean = 41.270 us, max = 146.493 us, min = 11.116 us, total = 907.929 us [state-dump] NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 21 total (0 active), Execution time: mean = 121.458 us, total = 2.551 ms, Queueing time: mean = 94.581 us, max = 235.076 us, min = 11.959 us, total = 1.986 ms [state-dump] ObjectManager.ObjectAdded - 21 total (0 active), Execution time: mean = 12.500 us, total = 262.503 us, Queueing time: mean = 106.328 us, max = 212.914 us, min = 9.723 us, total = 2.233 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig - 21 total (0 active), Execution time: mean = 758.936 us, total = 15.938 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ObjectManager.ObjectDeleted - 21 total (0 active), Execution time: mean = 19.347 us, total = 406.289 us, Queueing time: mean = 139.222 us, max = 517.588 us, min = 35.214 us, total = 2.924 ms [state-dump] NodeManager.deadline_timer.debug_state_dump - 18 total (1 active), Execution time: mean = 1.747 ms, total = 31.437 ms, Queueing time: mean = 63.844 us, max = 99.125 us, min = 41.543 us, total = 1.149 ms [state-dump] PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 197.633 us, total = 2.569 ms, Queueing time: mean = 4.308 ms, max = 15.027 ms, min = 25.785 us, total = 56.007 ms [state-dump] NodeManagerService.grpc_server.RequestWorkerLease - 10 total (0 active), Execution time: mean = 780.093 us, total = 7.801 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] RaySyncer.BroadcastMessage - 10 total (0 active), Execution time: mean = 221.900 us, total = 2.219 ms, Queueing time: mean = 695.800 ns, max = 931.000 ns, min = 70.000 ns, total = 6.958 us [state-dump] - 10 total (0 active), Execution time: mean = 1.025 us, total = 10.254 us, Queueing time: mean = 114.099 us, max = 182.528 us, min = 27.587 us, total = 1.141 ms [state-dump] WorkerPool.PopWorkerCallback - 10 total (0 active), Execution time: mean = 21.932 us, total = 219.320 us, Queueing time: mean = 109.091 us, max = 200.547 us, min = 20.707 us, total = 1.091 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker.HandleRequestImpl - 10 total (0 active), Execution time: mean = 123.767 us, total = 1.238 ms, Queueing time: mean = 96.345 us, max = 137.794 us, min = 36.307 us, total = 963.446 us [state-dump] NodeManagerService.grpc_server.ReturnWorker - 10 total (0 active), Execution time: mean = 635.791 us, total = 6.358 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.RequestWorkerLease.HandleRequestImpl - 10 total (0 active), Execution time: mean = 115.576 us, total = 1.156 ms, Queueing time: mean = 115.782 us, max = 331.820 us, min = 13.069 us, total = 1.158 ms [state-dump] NodeManager.deadline_timer.print_event_loop_stats - 3 total (1 active, 1 running), Execution time: mean = 1.435 ms, total = 4.305 ms, Queueing time: mean = 60.277 us, max = 111.886 us, min = 68.945 us, total = 180.831 us [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 1.646 ms, total = 3.291 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 2 total (1 active), Execution time: mean = 460.725 ms, total = 921.449 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] RaySyncerRegister - 2 total (0 active), Execution time: mean = 1.800 us, total = 3.599 us, Queueing time: mean = 209.500 ns, max = 362.000 ns, min = 57.000 ns, total = 419.000 ns [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 129.215 us, total = 258.430 us, Queueing time: mean = 412.096 us, max = 685.035 us, min = 139.158 us, total = 824.193 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 1.759 ms, total = 1.759 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 1.586 ms, total = 1.586 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 1.546 ms, total = 1.546 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 2.263 ms, total = 2.263 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 132.115 us, total = 132.115 us, Queueing time: mean = 203.939 us, max = 203.939 us, min = 203.939 us, total = 203.939 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 407.271 us, total = 407.271 us, Queueing time: mean = 44.821 us, max = 44.821 us, min = 44.821 us, total = 44.821 us [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 1.026 s, total = 1.026 s, Queueing time: mean = 109.121 us, max = 109.121 us, min = 109.121 us, total = 109.121 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 29.389 us, total = 29.389 us, Queueing time: mean = 150.068 us, max = 150.068 us, min = 150.068 us, total = 150.068 us [state-dump] Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 86.446 us, total = 86.446 us, Queueing time: mean = 403.833 us, max = 403.833 us, min = 403.833 us, total = 403.833 us [state-dump] NodeManager.GCTaskFailureReason - 1 total (1 active), Execution time: mean = 0.000 s, total = 0.000 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 1 total (0 active), Execution time: mean = 327.529 us, total = 327.529 us, Queueing time: mean = 115.620 us, max = 115.620 us, min = 115.620 us, total = 115.620 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 60.113 us, total = 60.113 us, Queueing time: mean = 447.798 us, max = 447.798 us, min = 447.798 us, total = 447.798 us [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 1.740 ms, total = 1.740 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] DebugString() time ms: 1 [state-dump] [state-dump] [2025-01-20 22:57:02,853 I 11815 11843] (raylet) store.cc:564: Plasma store debug dump: Current usage: 0 / 2.14748 GB - num bytes created total: 168 0 pending objects of total size 0MB - objects spillable: 0 - bytes spillable: 0 - objects unsealed: 0 - bytes unsealed: 0 - objects in use: 0 - bytes in use: 0 - objects evictable: 0 - bytes evictable: 0 - objects created by worker: 0 - bytes created by worker: 0 - objects restored: 0 - bytes restored: 0 - objects received: 0 - bytes received: 0 - objects errored: 0 - bytes errored: 0 [2025-01-20 22:57:03,890 I 11815 11815] (raylet) node_manager.cc:525: [state-dump] NodeManager: [state-dump] Node ID: 13be7277f830f5a8b967d2a0092091c94c7576cfebf8a5fa66025fcf [state-dump] Node name: 192.168.0.2 [state-dump] InitialConfigResources: {node:192.168.0.2: 10000, accelerator_type:A40: 10000, node:__internal_head__: 10000, memory: 844922429440000, object_store_memory: 21474836480000, CPU: 200000, GPU: 20000} [state-dump] ClusterTaskManager: [state-dump] ========== Node: 13be7277f830f5a8b967d2a0092091c94c7576cfebf8a5fa66025fcf ================= [state-dump] Infeasible queue length: 0 [state-dump] Schedule queue length: 0 [state-dump] Dispatch queue length: 0 [state-dump] num_waiting_for_resource: 0 [state-dump] num_waiting_for_plasma_memory: 0 [state-dump] num_waiting_for_remote_node_resources: 0 [state-dump] num_worker_not_started_by_job_config_not_exist: 0 [state-dump] num_worker_not_started_by_registration_timeout: 0 [state-dump] num_tasks_waiting_for_workers: 0 [state-dump] num_cancelled_tasks: 0 [state-dump] cluster_resource_scheduler state: [state-dump] Local id: -2158256074887862688 Local resources: {"total":{node:__internal_head__: [10000], accelerator_type:A40: [10000], GPU: [10000, 10000], CPU: [200000], memory: [844922429440000], object_store_memory: [21474836480000], node:192.168.0.2: [10000]}}, "available": {node:__internal_head__: [10000], accelerator_type:A40: [10000], GPU: [10000, 10000], CPU: [200000], memory: [844922429440000], object_store_memory: [21474836480000], node:192.168.0.2: [10000]}}, "labels":{"ray.io/node_id":"13be7277f830f5a8b967d2a0092091c94c7576cfebf8a5fa66025fcf",} is_draining: 0 is_idle: 1 Cluster resources: node id: -2158256074887862688{"total":{object_store_memory: 21474836480000, CPU: 200000, accelerator_type:A40: 10000, GPU: 20000, node:192.168.0.2: 10000, node:__internal_head__: 10000, memory: 844922429440000}}, "available": {object_store_memory: 21474836480000, CPU: 200000, memory: 844922429440000, GPU: 20000, node:192.168.0.2: 10000, accelerator_type:A40: 10000, node:__internal_head__: 10000}}, "labels":{"ray.io/node_id":"13be7277f830f5a8b967d2a0092091c94c7576cfebf8a5fa66025fcf",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} [state-dump] Waiting tasks size: 0 [state-dump] Number of executing tasks: 0 [state-dump] Number of pinned task arguments: 0 [state-dump] Number of total spilled tasks: 0 [state-dump] Number of spilled waiting tasks: 0 [state-dump] Number of spilled unschedulable tasks: 0 [state-dump] Resource usage { [state-dump] } [state-dump] Backlog Size per scheduling descriptor :{workerId: num backlogs}: [state-dump] [state-dump] Running tasks by scheduling class: [state-dump] ================================================== [state-dump] [state-dump] ClusterResources: [state-dump] LocalObjectManager: [state-dump] - num pinned objects: 0 [state-dump] - pinned objects size: 0 [state-dump] - num objects pending restore: 0 [state-dump] - num objects pending spill: 0 [state-dump] - num bytes pending spill: 0 [state-dump] - num bytes currently spilled: 0 [state-dump] - cumulative spill requests: 0 [state-dump] - cumulative restore requests: 0 [state-dump] - spilled objects pending delete: 0 [state-dump] [state-dump] ObjectManager: [state-dump] - num local objects: 0 [state-dump] - num unfulfilled push requests: 0 [state-dump] - num object pull requests: 0 [state-dump] - num chunks received total: 0 [state-dump] - num chunks received failed (all): 0 [state-dump] - num chunks received failed / cancelled: 0 [state-dump] - num chunks received failed / plasma error: 0 [state-dump] Event stats: [state-dump] Global stats: 0 total (0 active) [state-dump] Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Execution time: mean = -nan s, total = 0.000 s [state-dump] Event stats: [state-dump] PushManager: [state-dump] - num pushes in flight: 0 [state-dump] - num chunks in flight: 0 [state-dump] - num chunks remaining: 0 [state-dump] - max chunks allowed: 409 [state-dump] OwnershipBasedObjectDirectory: [state-dump] - num listeners: 0 [state-dump] - cumulative location updates: 0 [state-dump] - num location updates per second: 0.000 [state-dump] - num location lookups per second: 0.000 [state-dump] - num locations added per second: 0.000 [state-dump] - num locations removed per second: 0.000 [state-dump] BufferPool: [state-dump] - create buffer state map size: 0 [state-dump] PullManager: [state-dump] - num bytes available for pulled objects: 2147483648 [state-dump] - num bytes being pulled (all): 0 [state-dump] - num bytes being pulled / pinned: 0 [state-dump] - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - first get request bundle: N/A [state-dump] - first wait request bundle: N/A [state-dump] - first task request bundle: N/A [state-dump] - num objects queued: 0 [state-dump] - num objects actively pulled (all): 0 [state-dump] - num objects actively pulled / pinned: 0 [state-dump] - num bundles being pulled: 0 [state-dump] - num pull retries: 0 [state-dump] - max timeout seconds: 0 [state-dump] - max timeout request is already processed. No entry. [state-dump] [state-dump] WorkerPool: [state-dump] - registered jobs: 1 [state-dump] - process_failed_job_config_missing: 0 [state-dump] - process_failed_rate_limited: 0 [state-dump] - process_failed_pending_registration: 0 [state-dump] - process_failed_runtime_env_setup_failed: 0 [state-dump] - num PYTHON workers: 20 [state-dump] - num PYTHON drivers: 1 [state-dump] - num PYTHON pending start requests: 0 [state-dump] - num PYTHON pending registration requests: 0 [state-dump] - num object spill callbacks queued: 0 [state-dump] - num object restore queued: 0 [state-dump] - num util functions queued: 0 [state-dump] - num idle workers: 20 [state-dump] TaskDependencyManager: [state-dump] - task deps map size: 0 [state-dump] - get req map size: 0 [state-dump] - wait req map size: 0 [state-dump] - local objects map size: 0 [state-dump] WaitManager: [state-dump] - num active wait requests: 0 [state-dump] Subscriber: [state-dump] Channel WORKER_OBJECT_LOCATIONS_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_OBJECT_EVICTION [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_REF_REMOVED_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] num async plasma notifications: 0 [state-dump] Remote node managers: [state-dump] Event stats: [state-dump] Global stats: 21312 total (35 active) [state-dump] Queueing time: mean = 48.448 ms, max = 123.051 s, min = 57.000 ns, total = 1032.529 s [state-dump] Execution time: mean = 281.061 us, total = 5.990 s [state-dump] Event stats: [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 5040 total (0 active), Execution time: mean = 40.105 us, total = 202.130 ms, Queueing time: mean = 122.335 us, max = 26.128 ms, min = 5.488 us, total = 616.567 ms [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog - 5040 total (0 active), Execution time: mean = 563.338 us, total = 2.839 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] RaySyncer.OnDemandBroadcasting - 2398 total (1 active), Execution time: mean = 12.787 us, total = 30.664 ms, Queueing time: mean = 106.475 us, max = 28.199 ms, min = 12.819 us, total = 255.326 ms [state-dump] ObjectManager.UpdateAvailableMemory - 2398 total (0 active), Execution time: mean = 6.289 us, total = 15.082 ms, Queueing time: mean = 116.860 us, max = 578.536 us, min = 3.115 us, total = 280.229 ms [state-dump] NodeManager.CheckGC - 2398 total (1 active), Execution time: mean = 3.154 us, total = 7.563 ms, Queueing time: mean = 115.150 us, max = 28.206 ms, min = 10.316 us, total = 276.129 ms [state-dump] RayletWorkerPool.deadline_timer.kill_idle_workers - 1200 total (1 active), Execution time: mean = 19.558 us, total = 23.470 ms, Queueing time: mean = 76.781 us, max = 890.821 us, min = 11.893 us, total = 92.137 ms [state-dump] MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 958 total (1 active), Execution time: mean = 464.203 us, total = 444.706 ms, Queueing time: mean = 101.847 us, max = 23.328 ms, min = 9.730 us, total = 97.570 ms [state-dump] NodeManager.deadline_timer.spill_objects_when_over_threshold - 240 total (1 active), Execution time: mean = 2.959 us, total = 710.246 us, Queueing time: mean = 183.176 us, max = 2.259 ms, min = 17.091 us, total = 43.962 ms [state-dump] NodeManager.deadline_timer.flush_free_objects - 240 total (1 active), Execution time: mean = 9.006 us, total = 2.161 ms, Queueing time: mean = 179.042 us, max = 2.264 ms, min = 18.179 us, total = 42.970 ms [state-dump] NodeManager.ScheduleAndDispatchTasks - 240 total (1 active), Execution time: mean = 16.493 us, total = 3.958 ms, Queueing time: mean = 87.646 us, max = 2.235 ms, min = 5.788 us, total = 21.035 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 240 total (0 active), Execution time: mean = 106.132 us, total = 25.472 ms, Queueing time: mean = 119.979 us, max = 224.698 us, min = 16.203 us, total = 28.795 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad - 240 total (0 active), Execution time: mean = 651.517 us, total = 156.364 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ClientConnection.async_read.ProcessMessageHeader - 95 total (21 active), Execution time: mean = 8.155 us, total = 774.725 us, Queueing time: mean = 10.849 s, max = 123.051 s, min = 33.517 us, total = 1030.640 s [state-dump] ClusterResourceManager.ResetRemoteNodeView - 81 total (1 active), Execution time: mean = 9.476 us, total = 767.540 us, Queueing time: mean = 75.485 us, max = 149.711 us, min = 34.849 us, total = 6.114 ms [state-dump] ClientConnection.async_read.ProcessMessage - 74 total (0 active), Execution time: mean = 929.411 us, total = 68.776 ms, Queueing time: mean = 43.207 us, max = 369.478 us, min = 3.543 us, total = 3.197 ms [state-dump] NodeManager.GcsCheckAlive - 48 total (1 active), Execution time: mean = 275.353 us, total = 13.217 ms, Queueing time: mean = 634.979 us, max = 2.306 ms, min = 146.953 us, total = 30.479 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 48 total (0 active), Execution time: mean = 54.345 us, total = 2.609 ms, Queueing time: mean = 122.104 us, max = 218.783 us, min = 36.904 us, total = 5.861 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 48 total (0 active), Execution time: mean = 1.535 ms, total = 73.672 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.deadline_timer.record_metrics - 48 total (1 active), Execution time: mean = 564.943 us, total = 27.117 ms, Queueing time: mean = 352.674 us, max = 1.725 ms, min = 17.618 us, total = 16.928 ms [state-dump] NodeManager.deadline_timer.debug_state_dump - 24 total (1 active), Execution time: mean = 1.756 ms, total = 42.154 ms, Queueing time: mean = 62.167 us, max = 99.125 us, min = 27.046 us, total = 1.492 ms [state-dump] ClientConnection.async_write.DoAsyncWrites - 22 total (0 active), Execution time: mean = 1.574 us, total = 34.627 us, Queueing time: mean = 41.270 us, max = 146.493 us, min = 11.116 us, total = 907.929 us [state-dump] NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 21 total (0 active), Execution time: mean = 121.458 us, total = 2.551 ms, Queueing time: mean = 94.581 us, max = 235.076 us, min = 11.959 us, total = 1.986 ms [state-dump] ObjectManager.ObjectAdded - 21 total (0 active), Execution time: mean = 12.500 us, total = 262.503 us, Queueing time: mean = 106.328 us, max = 212.914 us, min = 9.723 us, total = 2.233 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig - 21 total (0 active), Execution time: mean = 758.936 us, total = 15.938 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ObjectManager.ObjectDeleted - 21 total (0 active), Execution time: mean = 19.347 us, total = 406.289 us, Queueing time: mean = 139.222 us, max = 517.588 us, min = 35.214 us, total = 2.924 ms [state-dump] PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 197.633 us, total = 2.569 ms, Queueing time: mean = 4.308 ms, max = 15.027 ms, min = 25.785 us, total = 56.007 ms [state-dump] NodeManagerService.grpc_server.RequestWorkerLease - 10 total (0 active), Execution time: mean = 780.093 us, total = 7.801 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] - 10 total (0 active), Execution time: mean = 1.025 us, total = 10.254 us, Queueing time: mean = 114.099 us, max = 182.528 us, min = 27.587 us, total = 1.141 ms [state-dump] RaySyncer.BroadcastMessage - 10 total (0 active), Execution time: mean = 221.900 us, total = 2.219 ms, Queueing time: mean = 695.800 ns, max = 931.000 ns, min = 70.000 ns, total = 6.958 us [state-dump] WorkerPool.PopWorkerCallback - 10 total (0 active), Execution time: mean = 21.932 us, total = 219.320 us, Queueing time: mean = 109.091 us, max = 200.547 us, min = 20.707 us, total = 1.091 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker.HandleRequestImpl - 10 total (0 active), Execution time: mean = 123.767 us, total = 1.238 ms, Queueing time: mean = 96.345 us, max = 137.794 us, min = 36.307 us, total = 963.446 us [state-dump] NodeManagerService.grpc_server.ReturnWorker - 10 total (0 active), Execution time: mean = 635.791 us, total = 6.358 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.RequestWorkerLease.HandleRequestImpl - 10 total (0 active), Execution time: mean = 115.576 us, total = 1.156 ms, Queueing time: mean = 115.782 us, max = 331.820 us, min = 13.069 us, total = 1.158 ms [state-dump] NodeManager.deadline_timer.print_event_loop_stats - 4 total (1 active, 1 running), Execution time: mean = 1.841 ms, total = 7.365 ms, Queueing time: mean = 61.663 us, max = 111.886 us, min = 65.823 us, total = 246.654 us [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 1.646 ms, total = 3.291 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 2 total (1 active), Execution time: mean = 460.725 ms, total = 921.449 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] RaySyncerRegister - 2 total (0 active), Execution time: mean = 1.800 us, total = 3.599 us, Queueing time: mean = 209.500 ns, max = 362.000 ns, min = 57.000 ns, total = 419.000 ns [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 129.215 us, total = 258.430 us, Queueing time: mean = 412.096 us, max = 685.035 us, min = 139.158 us, total = 824.193 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 1.759 ms, total = 1.759 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 1.586 ms, total = 1.586 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 1.546 ms, total = 1.546 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 2.263 ms, total = 2.263 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 132.115 us, total = 132.115 us, Queueing time: mean = 203.939 us, max = 203.939 us, min = 203.939 us, total = 203.939 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 407.271 us, total = 407.271 us, Queueing time: mean = 44.821 us, max = 44.821 us, min = 44.821 us, total = 44.821 us [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 1.026 s, total = 1.026 s, Queueing time: mean = 109.121 us, max = 109.121 us, min = 109.121 us, total = 109.121 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 29.389 us, total = 29.389 us, Queueing time: mean = 150.068 us, max = 150.068 us, min = 150.068 us, total = 150.068 us [state-dump] Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 86.446 us, total = 86.446 us, Queueing time: mean = 403.833 us, max = 403.833 us, min = 403.833 us, total = 403.833 us [state-dump] NodeManager.GCTaskFailureReason - 1 total (1 active), Execution time: mean = 0.000 s, total = 0.000 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 1 total (0 active), Execution time: mean = 327.529 us, total = 327.529 us, Queueing time: mean = 115.620 us, max = 115.620 us, min = 115.620 us, total = 115.620 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 60.113 us, total = 60.113 us, Queueing time: mean = 447.798 us, max = 447.798 us, min = 447.798 us, total = 447.798 us [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 1.740 ms, total = 1.740 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] DebugString() time ms: 1 [state-dump] [state-dump] [2025-01-20 22:58:02,853 I 11815 11843] (raylet) store.cc:564: Plasma store debug dump: Current usage: 0 / 2.14748 GB - num bytes created total: 168 0 pending objects of total size 0MB - objects spillable: 0 - bytes spillable: 0 - objects unsealed: 0 - bytes unsealed: 0 - objects in use: 0 - bytes in use: 0 - objects evictable: 0 - bytes evictable: 0 - objects created by worker: 0 - bytes created by worker: 0 - objects restored: 0 - bytes restored: 0 - objects received: 0 - bytes received: 0 - objects errored: 0 - bytes errored: 0 [2025-01-20 22:58:03,892 I 11815 11815] (raylet) node_manager.cc:525: [state-dump] NodeManager: [state-dump] Node ID: 13be7277f830f5a8b967d2a0092091c94c7576cfebf8a5fa66025fcf [state-dump] Node name: 192.168.0.2 [state-dump] InitialConfigResources: {node:192.168.0.2: 10000, accelerator_type:A40: 10000, node:__internal_head__: 10000, memory: 844922429440000, object_store_memory: 21474836480000, CPU: 200000, GPU: 20000} [state-dump] ClusterTaskManager: [state-dump] ========== Node: 13be7277f830f5a8b967d2a0092091c94c7576cfebf8a5fa66025fcf ================= [state-dump] Infeasible queue length: 0 [state-dump] Schedule queue length: 0 [state-dump] Dispatch queue length: 0 [state-dump] num_waiting_for_resource: 0 [state-dump] num_waiting_for_plasma_memory: 0 [state-dump] num_waiting_for_remote_node_resources: 0 [state-dump] num_worker_not_started_by_job_config_not_exist: 0 [state-dump] num_worker_not_started_by_registration_timeout: 0 [state-dump] num_tasks_waiting_for_workers: 0 [state-dump] num_cancelled_tasks: 0 [state-dump] cluster_resource_scheduler state: [state-dump] Local id: -2158256074887862688 Local resources: {"total":{node:__internal_head__: [10000], accelerator_type:A40: [10000], GPU: [10000, 10000], CPU: [200000], memory: [844922429440000], object_store_memory: [21474836480000], node:192.168.0.2: [10000]}}, "available": {node:__internal_head__: [10000], accelerator_type:A40: [10000], GPU: [10000, 10000], CPU: [200000], memory: [844922429440000], object_store_memory: [21474836480000], node:192.168.0.2: [10000]}}, "labels":{"ray.io/node_id":"13be7277f830f5a8b967d2a0092091c94c7576cfebf8a5fa66025fcf",} is_draining: 0 is_idle: 1 Cluster resources: node id: -2158256074887862688{"total":{object_store_memory: 21474836480000, CPU: 200000, accelerator_type:A40: 10000, GPU: 20000, node:192.168.0.2: 10000, node:__internal_head__: 10000, memory: 844922429440000}}, "available": {object_store_memory: 21474836480000, CPU: 200000, memory: 844922429440000, GPU: 20000, node:192.168.0.2: 10000, accelerator_type:A40: 10000, node:__internal_head__: 10000}}, "labels":{"ray.io/node_id":"13be7277f830f5a8b967d2a0092091c94c7576cfebf8a5fa66025fcf",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} [state-dump] Waiting tasks size: 0 [state-dump] Number of executing tasks: 0 [state-dump] Number of pinned task arguments: 0 [state-dump] Number of total spilled tasks: 0 [state-dump] Number of spilled waiting tasks: 0 [state-dump] Number of spilled unschedulable tasks: 0 [state-dump] Resource usage { [state-dump] } [state-dump] Backlog Size per scheduling descriptor :{workerId: num backlogs}: [state-dump] [state-dump] Running tasks by scheduling class: [state-dump] ================================================== [state-dump] [state-dump] ClusterResources: [state-dump] LocalObjectManager: [state-dump] - num pinned objects: 0 [state-dump] - pinned objects size: 0 [state-dump] - num objects pending restore: 0 [state-dump] - num objects pending spill: 0 [state-dump] - num bytes pending spill: 0 [state-dump] - num bytes currently spilled: 0 [state-dump] - cumulative spill requests: 0 [state-dump] - cumulative restore requests: 0 [state-dump] - spilled objects pending delete: 0 [state-dump] [state-dump] ObjectManager: [state-dump] - num local objects: 0 [state-dump] - num unfulfilled push requests: 0 [state-dump] - num object pull requests: 0 [state-dump] - num chunks received total: 0 [state-dump] - num chunks received failed (all): 0 [state-dump] - num chunks received failed / cancelled: 0 [state-dump] - num chunks received failed / plasma error: 0 [state-dump] Event stats: [state-dump] Global stats: 0 total (0 active) [state-dump] Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Execution time: mean = -nan s, total = 0.000 s [state-dump] Event stats: [state-dump] PushManager: [state-dump] - num pushes in flight: 0 [state-dump] - num chunks in flight: 0 [state-dump] - num chunks remaining: 0 [state-dump] - max chunks allowed: 409 [state-dump] OwnershipBasedObjectDirectory: [state-dump] - num listeners: 0 [state-dump] - cumulative location updates: 0 [state-dump] - num location updates per second: 0.000 [state-dump] - num location lookups per second: 0.000 [state-dump] - num locations added per second: 0.000 [state-dump] - num locations removed per second: 0.000 [state-dump] BufferPool: [state-dump] - create buffer state map size: 0 [state-dump] PullManager: [state-dump] - num bytes available for pulled objects: 2147483648 [state-dump] - num bytes being pulled (all): 0 [state-dump] - num bytes being pulled / pinned: 0 [state-dump] - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - first get request bundle: N/A [state-dump] - first wait request bundle: N/A [state-dump] - first task request bundle: N/A [state-dump] - num objects queued: 0 [state-dump] - num objects actively pulled (all): 0 [state-dump] - num objects actively pulled / pinned: 0 [state-dump] - num bundles being pulled: 0 [state-dump] - num pull retries: 0 [state-dump] - max timeout seconds: 0 [state-dump] - max timeout request is already processed. No entry. [state-dump] [state-dump] WorkerPool: [state-dump] - registered jobs: 1 [state-dump] - process_failed_job_config_missing: 0 [state-dump] - process_failed_rate_limited: 0 [state-dump] - process_failed_pending_registration: 0 [state-dump] - process_failed_runtime_env_setup_failed: 0 [state-dump] - num PYTHON workers: 20 [state-dump] - num PYTHON drivers: 1 [state-dump] - num PYTHON pending start requests: 0 [state-dump] - num PYTHON pending registration requests: 0 [state-dump] - num object spill callbacks queued: 0 [state-dump] - num object restore queued: 0 [state-dump] - num util functions queued: 0 [state-dump] - num idle workers: 20 [state-dump] TaskDependencyManager: [state-dump] - task deps map size: 0 [state-dump] - get req map size: 0 [state-dump] - wait req map size: 0 [state-dump] - local objects map size: 0 [state-dump] WaitManager: [state-dump] - num active wait requests: 0 [state-dump] Subscriber: [state-dump] Channel WORKER_OBJECT_LOCATIONS_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_OBJECT_EVICTION [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_REF_REMOVED_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] num async plasma notifications: 0 [state-dump] Remote node managers: [state-dump] Event stats: [state-dump] Global stats: 26542 total (35 active) [state-dump] Queueing time: mean = 38.918 ms, max = 123.051 s, min = 57.000 ns, total = 1032.951 s [state-dump] Execution time: mean = 261.237 us, total = 6.934 s [state-dump] Event stats: [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 6299 total (0 active), Execution time: mean = 39.763 us, total = 250.464 ms, Queueing time: mean = 120.954 us, max = 26.128 ms, min = 5.488 us, total = 761.888 ms [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog - 6299 total (0 active), Execution time: mean = 558.808 us, total = 3.520 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] RaySyncer.OnDemandBroadcasting - 2997 total (1 active), Execution time: mean = 12.310 us, total = 36.892 ms, Queueing time: mean = 103.324 us, max = 28.199 ms, min = 12.819 us, total = 309.661 ms [state-dump] ObjectManager.UpdateAvailableMemory - 2997 total (0 active), Execution time: mean = 6.245 us, total = 18.715 ms, Queueing time: mean = 117.766 us, max = 578.536 us, min = 3.115 us, total = 352.946 ms [state-dump] NodeManager.CheckGC - 2997 total (1 active), Execution time: mean = 3.111 us, total = 9.325 ms, Queueing time: mean = 111.574 us, max = 28.206 ms, min = 10.316 us, total = 334.388 ms [state-dump] RayletWorkerPool.deadline_timer.kill_idle_workers - 1500 total (1 active), Execution time: mean = 19.103 us, total = 28.654 ms, Queueing time: mean = 76.722 us, max = 1.689 ms, min = 11.893 us, total = 115.084 ms [state-dump] MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 1198 total (1 active), Execution time: mean = 463.069 us, total = 554.756 ms, Queueing time: mean = 96.133 us, max = 23.328 ms, min = 9.730 us, total = 115.168 ms [state-dump] NodeManager.deadline_timer.spill_objects_when_over_threshold - 300 total (1 active), Execution time: mean = 2.925 us, total = 877.441 us, Queueing time: mean = 184.547 us, max = 2.259 ms, min = 17.091 us, total = 55.364 ms [state-dump] NodeManager.deadline_timer.flush_free_objects - 300 total (1 active), Execution time: mean = 8.907 us, total = 2.672 ms, Queueing time: mean = 180.495 us, max = 2.264 ms, min = 18.179 us, total = 54.148 ms [state-dump] NodeManager.ScheduleAndDispatchTasks - 300 total (1 active), Execution time: mean = 16.091 us, total = 4.827 ms, Queueing time: mean = 84.311 us, max = 2.235 ms, min = 5.788 us, total = 25.293 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 300 total (0 active), Execution time: mean = 104.710 us, total = 31.413 ms, Queueing time: mean = 118.857 us, max = 224.698 us, min = 16.203 us, total = 35.657 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad - 300 total (0 active), Execution time: mean = 645.681 us, total = 193.704 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ClusterResourceManager.ResetRemoteNodeView - 101 total (1 active), Execution time: mean = 9.308 us, total = 940.069 us, Queueing time: mean = 75.694 us, max = 149.711 us, min = 29.037 us, total = 7.645 ms [state-dump] ClientConnection.async_read.ProcessMessageHeader - 95 total (21 active), Execution time: mean = 8.155 us, total = 774.725 us, Queueing time: mean = 10.849 s, max = 123.051 s, min = 33.517 us, total = 1030.640 s [state-dump] ClientConnection.async_read.ProcessMessage - 74 total (0 active), Execution time: mean = 929.411 us, total = 68.776 ms, Queueing time: mean = 43.207 us, max = 369.478 us, min = 3.543 us, total = 3.197 ms [state-dump] NodeManager.GcsCheckAlive - 60 total (1 active), Execution time: mean = 285.275 us, total = 17.117 ms, Queueing time: mean = 636.268 us, max = 2.306 ms, min = 125.845 us, total = 38.176 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 60 total (0 active), Execution time: mean = 53.666 us, total = 3.220 ms, Queueing time: mean = 122.430 us, max = 218.783 us, min = 31.716 us, total = 7.346 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 60 total (0 active), Execution time: mean = 1.533 ms, total = 91.991 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.deadline_timer.record_metrics - 60 total (1 active), Execution time: mean = 558.932 us, total = 33.536 ms, Queueing time: mean = 368.254 us, max = 1.725 ms, min = 17.618 us, total = 22.095 ms [state-dump] NodeManager.deadline_timer.debug_state_dump - 30 total (1 active), Execution time: mean = 1.771 ms, total = 53.130 ms, Queueing time: mean = 64.214 us, max = 99.125 us, min = 27.046 us, total = 1.926 ms [state-dump] ClientConnection.async_write.DoAsyncWrites - 22 total (0 active), Execution time: mean = 1.574 us, total = 34.627 us, Queueing time: mean = 41.270 us, max = 146.493 us, min = 11.116 us, total = 907.929 us [state-dump] NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 21 total (0 active), Execution time: mean = 121.458 us, total = 2.551 ms, Queueing time: mean = 94.581 us, max = 235.076 us, min = 11.959 us, total = 1.986 ms [state-dump] ObjectManager.ObjectAdded - 21 total (0 active), Execution time: mean = 12.500 us, total = 262.503 us, Queueing time: mean = 106.328 us, max = 212.914 us, min = 9.723 us, total = 2.233 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig - 21 total (0 active), Execution time: mean = 758.936 us, total = 15.938 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ObjectManager.ObjectDeleted - 21 total (0 active), Execution time: mean = 19.347 us, total = 406.289 us, Queueing time: mean = 139.222 us, max = 517.588 us, min = 35.214 us, total = 2.924 ms [state-dump] PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 197.633 us, total = 2.569 ms, Queueing time: mean = 4.308 ms, max = 15.027 ms, min = 25.785 us, total = 56.007 ms [state-dump] NodeManagerService.grpc_server.RequestWorkerLease - 10 total (0 active), Execution time: mean = 780.093 us, total = 7.801 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] - 10 total (0 active), Execution time: mean = 1.025 us, total = 10.254 us, Queueing time: mean = 114.099 us, max = 182.528 us, min = 27.587 us, total = 1.141 ms [state-dump] RaySyncer.BroadcastMessage - 10 total (0 active), Execution time: mean = 221.900 us, total = 2.219 ms, Queueing time: mean = 695.800 ns, max = 931.000 ns, min = 70.000 ns, total = 6.958 us [state-dump] WorkerPool.PopWorkerCallback - 10 total (0 active), Execution time: mean = 21.932 us, total = 219.320 us, Queueing time: mean = 109.091 us, max = 200.547 us, min = 20.707 us, total = 1.091 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker.HandleRequestImpl - 10 total (0 active), Execution time: mean = 123.767 us, total = 1.238 ms, Queueing time: mean = 96.345 us, max = 137.794 us, min = 36.307 us, total = 963.446 us [state-dump] NodeManagerService.grpc_server.ReturnWorker - 10 total (0 active), Execution time: mean = 635.791 us, total = 6.358 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.RequestWorkerLease.HandleRequestImpl - 10 total (0 active), Execution time: mean = 115.576 us, total = 1.156 ms, Queueing time: mean = 115.782 us, max = 331.820 us, min = 13.069 us, total = 1.158 ms [state-dump] NodeManager.deadline_timer.print_event_loop_stats - 5 total (1 active, 1 running), Execution time: mean = 2.001 ms, total = 10.007 ms, Queueing time: mean = 61.183 us, max = 111.886 us, min = 59.262 us, total = 305.916 us [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 1.646 ms, total = 3.291 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 2 total (1 active), Execution time: mean = 460.725 ms, total = 921.449 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] RaySyncerRegister - 2 total (0 active), Execution time: mean = 1.800 us, total = 3.599 us, Queueing time: mean = 209.500 ns, max = 362.000 ns, min = 57.000 ns, total = 419.000 ns [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 129.215 us, total = 258.430 us, Queueing time: mean = 412.096 us, max = 685.035 us, min = 139.158 us, total = 824.193 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 1.759 ms, total = 1.759 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 1.586 ms, total = 1.586 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 1.546 ms, total = 1.546 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 2.263 ms, total = 2.263 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 132.115 us, total = 132.115 us, Queueing time: mean = 203.939 us, max = 203.939 us, min = 203.939 us, total = 203.939 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 407.271 us, total = 407.271 us, Queueing time: mean = 44.821 us, max = 44.821 us, min = 44.821 us, total = 44.821 us [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 1.026 s, total = 1.026 s, Queueing time: mean = 109.121 us, max = 109.121 us, min = 109.121 us, total = 109.121 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 29.389 us, total = 29.389 us, Queueing time: mean = 150.068 us, max = 150.068 us, min = 150.068 us, total = 150.068 us [state-dump] Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 86.446 us, total = 86.446 us, Queueing time: mean = 403.833 us, max = 403.833 us, min = 403.833 us, total = 403.833 us [state-dump] NodeManager.GCTaskFailureReason - 1 total (1 active), Execution time: mean = 0.000 s, total = 0.000 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 1 total (0 active), Execution time: mean = 327.529 us, total = 327.529 us, Queueing time: mean = 115.620 us, max = 115.620 us, min = 115.620 us, total = 115.620 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 60.113 us, total = 60.113 us, Queueing time: mean = 447.798 us, max = 447.798 us, min = 447.798 us, total = 447.798 us [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 1.740 ms, total = 1.740 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] DebugString() time ms: 1 [state-dump] [state-dump] [2025-01-20 22:59:02,854 I 11815 11843] (raylet) store.cc:564: Plasma store debug dump: Current usage: 0 / 2.14748 GB - num bytes created total: 168 0 pending objects of total size 0MB - objects spillable: 0 - bytes spillable: 0 - objects unsealed: 0 - bytes unsealed: 0 - objects in use: 0 - bytes in use: 0 - objects evictable: 0 - bytes evictable: 0 - objects created by worker: 0 - bytes created by worker: 0 - objects restored: 0 - bytes restored: 0 - objects received: 0 - bytes received: 0 - objects errored: 0 - bytes errored: 0 [2025-01-20 22:59:03,895 I 11815 11815] (raylet) node_manager.cc:525: [state-dump] NodeManager: [state-dump] Node ID: 13be7277f830f5a8b967d2a0092091c94c7576cfebf8a5fa66025fcf [state-dump] Node name: 192.168.0.2 [state-dump] InitialConfigResources: {node:192.168.0.2: 10000, accelerator_type:A40: 10000, node:__internal_head__: 10000, memory: 844922429440000, object_store_memory: 21474836480000, CPU: 200000, GPU: 20000} [state-dump] ClusterTaskManager: [state-dump] ========== Node: 13be7277f830f5a8b967d2a0092091c94c7576cfebf8a5fa66025fcf ================= [state-dump] Infeasible queue length: 0 [state-dump] Schedule queue length: 0 [state-dump] Dispatch queue length: 0 [state-dump] num_waiting_for_resource: 0 [state-dump] num_waiting_for_plasma_memory: 0 [state-dump] num_waiting_for_remote_node_resources: 0 [state-dump] num_worker_not_started_by_job_config_not_exist: 0 [state-dump] num_worker_not_started_by_registration_timeout: 0 [state-dump] num_tasks_waiting_for_workers: 0 [state-dump] num_cancelled_tasks: 0 [state-dump] cluster_resource_scheduler state: [state-dump] Local id: -2158256074887862688 Local resources: {"total":{node:__internal_head__: [10000], accelerator_type:A40: [10000], GPU: [10000, 10000], CPU: [200000], memory: [844922429440000], object_store_memory: [21474836480000], node:192.168.0.2: [10000]}}, "available": {node:__internal_head__: [10000], accelerator_type:A40: [10000], GPU: [10000, 10000], CPU: [200000], memory: [844922429440000], object_store_memory: [21474836480000], node:192.168.0.2: [10000]}}, "labels":{"ray.io/node_id":"13be7277f830f5a8b967d2a0092091c94c7576cfebf8a5fa66025fcf",} is_draining: 0 is_idle: 1 Cluster resources: node id: -2158256074887862688{"total":{object_store_memory: 21474836480000, CPU: 200000, accelerator_type:A40: 10000, GPU: 20000, node:192.168.0.2: 10000, node:__internal_head__: 10000, memory: 844922429440000}}, "available": {object_store_memory: 21474836480000, CPU: 200000, memory: 844922429440000, GPU: 20000, node:192.168.0.2: 10000, accelerator_type:A40: 10000, node:__internal_head__: 10000}}, "labels":{"ray.io/node_id":"13be7277f830f5a8b967d2a0092091c94c7576cfebf8a5fa66025fcf",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} [state-dump] Waiting tasks size: 0 [state-dump] Number of executing tasks: 0 [state-dump] Number of pinned task arguments: 0 [state-dump] Number of total spilled tasks: 0 [state-dump] Number of spilled waiting tasks: 0 [state-dump] Number of spilled unschedulable tasks: 0 [state-dump] Resource usage { [state-dump] } [state-dump] Backlog Size per scheduling descriptor :{workerId: num backlogs}: [state-dump] [state-dump] Running tasks by scheduling class: [state-dump] ================================================== [state-dump] [state-dump] ClusterResources: [state-dump] LocalObjectManager: [state-dump] - num pinned objects: 0 [state-dump] - pinned objects size: 0 [state-dump] - num objects pending restore: 0 [state-dump] - num objects pending spill: 0 [state-dump] - num bytes pending spill: 0 [state-dump] - num bytes currently spilled: 0 [state-dump] - cumulative spill requests: 0 [state-dump] - cumulative restore requests: 0 [state-dump] - spilled objects pending delete: 0 [state-dump] [state-dump] ObjectManager: [state-dump] - num local objects: 0 [state-dump] - num unfulfilled push requests: 0 [state-dump] - num object pull requests: 0 [state-dump] - num chunks received total: 0 [state-dump] - num chunks received failed (all): 0 [state-dump] - num chunks received failed / cancelled: 0 [state-dump] - num chunks received failed / plasma error: 0 [state-dump] Event stats: [state-dump] Global stats: 0 total (0 active) [state-dump] Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Execution time: mean = -nan s, total = 0.000 s [state-dump] Event stats: [state-dump] PushManager: [state-dump] - num pushes in flight: 0 [state-dump] - num chunks in flight: 0 [state-dump] - num chunks remaining: 0 [state-dump] - max chunks allowed: 409 [state-dump] OwnershipBasedObjectDirectory: [state-dump] - num listeners: 0 [state-dump] - cumulative location updates: 0 [state-dump] - num location updates per second: 0.000 [state-dump] - num location lookups per second: 0.000 [state-dump] - num locations added per second: 0.000 [state-dump] - num locations removed per second: 0.000 [state-dump] BufferPool: [state-dump] - create buffer state map size: 0 [state-dump] PullManager: [state-dump] - num bytes available for pulled objects: 2147483648 [state-dump] - num bytes being pulled (all): 0 [state-dump] - num bytes being pulled / pinned: 0 [state-dump] - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - first get request bundle: N/A [state-dump] - first wait request bundle: N/A [state-dump] - first task request bundle: N/A [state-dump] - num objects queued: 0 [state-dump] - num objects actively pulled (all): 0 [state-dump] - num objects actively pulled / pinned: 0 [state-dump] - num bundles being pulled: 0 [state-dump] - num pull retries: 0 [state-dump] - max timeout seconds: 0 [state-dump] - max timeout request is already processed. No entry. [state-dump] [state-dump] WorkerPool: [state-dump] - registered jobs: 1 [state-dump] - process_failed_job_config_missing: 0 [state-dump] - process_failed_rate_limited: 0 [state-dump] - process_failed_pending_registration: 0 [state-dump] - process_failed_runtime_env_setup_failed: 0 [state-dump] - num PYTHON workers: 20 [state-dump] - num PYTHON drivers: 1 [state-dump] - num PYTHON pending start requests: 0 [state-dump] - num PYTHON pending registration requests: 0 [state-dump] - num object spill callbacks queued: 0 [state-dump] - num object restore queued: 0 [state-dump] - num util functions queued: 0 [state-dump] - num idle workers: 20 [state-dump] TaskDependencyManager: [state-dump] - task deps map size: 0 [state-dump] - get req map size: 0 [state-dump] - wait req map size: 0 [state-dump] - local objects map size: 0 [state-dump] WaitManager: [state-dump] - num active wait requests: 0 [state-dump] Subscriber: [state-dump] Channel WORKER_OBJECT_LOCATIONS_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_OBJECT_EVICTION [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_REF_REMOVED_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] num async plasma notifications: 0 [state-dump] Remote node managers: [state-dump] Event stats: [state-dump] Global stats: 31772 total (35 active) [state-dump] Queueing time: mean = 32.525 ms, max = 123.051 s, min = 57.000 ns, total = 1033.372 s [state-dump] Execution time: mean = 248.588 us, total = 7.898 s [state-dump] Event stats: [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 7557 total (0 active), Execution time: mean = 39.544 us, total = 298.837 ms, Queueing time: mean = 119.993 us, max = 26.128 ms, min = 5.488 us, total = 906.789 ms [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog - 7557 total (0 active), Execution time: mean = 558.523 us, total = 4.221 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] RaySyncer.OnDemandBroadcasting - 3597 total (1 active), Execution time: mean = 12.077 us, total = 43.442 ms, Queueing time: mean = 101.348 us, max = 28.199 ms, min = 12.819 us, total = 364.550 ms [state-dump] ObjectManager.UpdateAvailableMemory - 3597 total (0 active), Execution time: mean = 6.213 us, total = 22.347 ms, Queueing time: mean = 117.327 us, max = 706.852 us, min = 3.115 us, total = 422.024 ms [state-dump] NodeManager.CheckGC - 3597 total (1 active), Execution time: mean = 3.086 us, total = 11.101 ms, Queueing time: mean = 109.382 us, max = 28.206 ms, min = 10.316 us, total = 393.447 ms [state-dump] RayletWorkerPool.deadline_timer.kill_idle_workers - 1800 total (1 active), Execution time: mean = 19.215 us, total = 34.586 ms, Queueing time: mean = 77.733 us, max = 1.689 ms, min = 11.893 us, total = 139.919 ms [state-dump] MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 1437 total (1 active), Execution time: mean = 462.516 us, total = 664.636 ms, Queueing time: mean = 93.021 us, max = 23.328 ms, min = 9.730 us, total = 133.671 ms [state-dump] NodeManager.deadline_timer.spill_objects_when_over_threshold - 360 total (1 active), Execution time: mean = 2.907 us, total = 1.047 ms, Queueing time: mean = 184.832 us, max = 2.259 ms, min = 6.405 us, total = 66.540 ms [state-dump] NodeManager.deadline_timer.flush_free_objects - 360 total (1 active), Execution time: mean = 8.971 us, total = 3.230 ms, Queueing time: mean = 180.733 us, max = 2.264 ms, min = 8.861 us, total = 65.064 ms [state-dump] NodeManager.ScheduleAndDispatchTasks - 360 total (1 active), Execution time: mean = 16.015 us, total = 5.766 ms, Queueing time: mean = 84.290 us, max = 2.235 ms, min = 5.788 us, total = 30.344 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 360 total (0 active), Execution time: mean = 103.736 us, total = 37.345 ms, Queueing time: mean = 118.067 us, max = 248.219 us, min = 16.203 us, total = 42.504 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad - 360 total (0 active), Execution time: mean = 643.551 us, total = 231.678 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ClusterResourceManager.ResetRemoteNodeView - 121 total (1 active), Execution time: mean = 9.176 us, total = 1.110 ms, Queueing time: mean = 78.100 us, max = 253.487 us, min = 29.037 us, total = 9.450 ms [state-dump] ClientConnection.async_read.ProcessMessageHeader - 95 total (21 active), Execution time: mean = 8.155 us, total = 774.725 us, Queueing time: mean = 10.849 s, max = 123.051 s, min = 33.517 us, total = 1030.640 s [state-dump] ClientConnection.async_read.ProcessMessage - 74 total (0 active), Execution time: mean = 929.411 us, total = 68.776 ms, Queueing time: mean = 43.207 us, max = 369.478 us, min = 3.543 us, total = 3.197 ms [state-dump] NodeManager.GcsCheckAlive - 72 total (1 active), Execution time: mean = 290.579 us, total = 20.922 ms, Queueing time: mean = 634.790 us, max = 2.306 ms, min = 107.918 us, total = 45.705 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 72 total (0 active), Execution time: mean = 53.533 us, total = 3.854 ms, Queueing time: mean = 121.400 us, max = 218.783 us, min = 31.716 us, total = 8.741 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 72 total (0 active), Execution time: mean = 1.537 ms, total = 110.675 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.deadline_timer.record_metrics - 72 total (1 active), Execution time: mean = 555.561 us, total = 40.000 ms, Queueing time: mean = 373.612 us, max = 1.725 ms, min = 8.885 us, total = 26.900 ms [state-dump] NodeManager.deadline_timer.debug_state_dump - 36 total (1 active), Execution time: mean = 1.773 ms, total = 63.831 ms, Queueing time: mean = 66.337 us, max = 153.123 us, min = 21.807 us, total = 2.388 ms [state-dump] ClientConnection.async_write.DoAsyncWrites - 22 total (0 active), Execution time: mean = 1.574 us, total = 34.627 us, Queueing time: mean = 41.270 us, max = 146.493 us, min = 11.116 us, total = 907.929 us [state-dump] NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 21 total (0 active), Execution time: mean = 121.458 us, total = 2.551 ms, Queueing time: mean = 94.581 us, max = 235.076 us, min = 11.959 us, total = 1.986 ms [state-dump] ObjectManager.ObjectAdded - 21 total (0 active), Execution time: mean = 12.500 us, total = 262.503 us, Queueing time: mean = 106.328 us, max = 212.914 us, min = 9.723 us, total = 2.233 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig - 21 total (0 active), Execution time: mean = 758.936 us, total = 15.938 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ObjectManager.ObjectDeleted - 21 total (0 active), Execution time: mean = 19.347 us, total = 406.289 us, Queueing time: mean = 139.222 us, max = 517.588 us, min = 35.214 us, total = 2.924 ms [state-dump] PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 197.633 us, total = 2.569 ms, Queueing time: mean = 4.308 ms, max = 15.027 ms, min = 25.785 us, total = 56.007 ms [state-dump] NodeManagerService.grpc_server.RequestWorkerLease - 10 total (0 active), Execution time: mean = 780.093 us, total = 7.801 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] - 10 total (0 active), Execution time: mean = 1.025 us, total = 10.254 us, Queueing time: mean = 114.099 us, max = 182.528 us, min = 27.587 us, total = 1.141 ms [state-dump] RaySyncer.BroadcastMessage - 10 total (0 active), Execution time: mean = 221.900 us, total = 2.219 ms, Queueing time: mean = 695.800 ns, max = 931.000 ns, min = 70.000 ns, total = 6.958 us [state-dump] WorkerPool.PopWorkerCallback - 10 total (0 active), Execution time: mean = 21.932 us, total = 219.320 us, Queueing time: mean = 109.091 us, max = 200.547 us, min = 20.707 us, total = 1.091 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker.HandleRequestImpl - 10 total (0 active), Execution time: mean = 123.767 us, total = 1.238 ms, Queueing time: mean = 96.345 us, max = 137.794 us, min = 36.307 us, total = 963.446 us [state-dump] NodeManagerService.grpc_server.ReturnWorker - 10 total (0 active), Execution time: mean = 635.791 us, total = 6.358 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.RequestWorkerLease.HandleRequestImpl - 10 total (0 active), Execution time: mean = 115.576 us, total = 1.156 ms, Queueing time: mean = 115.782 us, max = 331.820 us, min = 13.069 us, total = 1.158 ms [state-dump] NodeManager.deadline_timer.print_event_loop_stats - 6 total (1 active, 1 running), Execution time: mean = 1.897 ms, total = 11.382 ms, Queueing time: mean = 55.496 us, max = 111.886 us, min = 27.061 us, total = 332.977 us [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 1.646 ms, total = 3.291 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 2 total (1 active), Execution time: mean = 460.725 ms, total = 921.449 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] RaySyncerRegister - 2 total (0 active), Execution time: mean = 1.800 us, total = 3.599 us, Queueing time: mean = 209.500 ns, max = 362.000 ns, min = 57.000 ns, total = 419.000 ns [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 129.215 us, total = 258.430 us, Queueing time: mean = 412.096 us, max = 685.035 us, min = 139.158 us, total = 824.193 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 1.759 ms, total = 1.759 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 1.586 ms, total = 1.586 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 1.546 ms, total = 1.546 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 2.263 ms, total = 2.263 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 132.115 us, total = 132.115 us, Queueing time: mean = 203.939 us, max = 203.939 us, min = 203.939 us, total = 203.939 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 407.271 us, total = 407.271 us, Queueing time: mean = 44.821 us, max = 44.821 us, min = 44.821 us, total = 44.821 us [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 1.026 s, total = 1.026 s, Queueing time: mean = 109.121 us, max = 109.121 us, min = 109.121 us, total = 109.121 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 29.389 us, total = 29.389 us, Queueing time: mean = 150.068 us, max = 150.068 us, min = 150.068 us, total = 150.068 us [state-dump] Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 86.446 us, total = 86.446 us, Queueing time: mean = 403.833 us, max = 403.833 us, min = 403.833 us, total = 403.833 us [state-dump] NodeManager.GCTaskFailureReason - 1 total (1 active), Execution time: mean = 0.000 s, total = 0.000 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 1 total (0 active), Execution time: mean = 327.529 us, total = 327.529 us, Queueing time: mean = 115.620 us, max = 115.620 us, min = 115.620 us, total = 115.620 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 60.113 us, total = 60.113 us, Queueing time: mean = 447.798 us, max = 447.798 us, min = 447.798 us, total = 447.798 us [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 1.740 ms, total = 1.740 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] DebugString() time ms: 1 [state-dump] [state-dump] [2025-01-20 23:00:02,854 I 11815 11843] (raylet) store.cc:564: Plasma store debug dump: Current usage: 0 / 2.14748 GB - num bytes created total: 168 0 pending objects of total size 0MB - objects spillable: 0 - bytes spillable: 0 - objects unsealed: 0 - bytes unsealed: 0 - objects in use: 0 - bytes in use: 0 - objects evictable: 0 - bytes evictable: 0 - objects created by worker: 0 - bytes created by worker: 0 - objects restored: 0 - bytes restored: 0 - objects received: 0 - bytes received: 0 - objects errored: 0 - bytes errored: 0 [2025-01-20 23:00:03,896 I 11815 11815] (raylet) node_manager.cc:525: [state-dump] NodeManager: [state-dump] Node ID: 13be7277f830f5a8b967d2a0092091c94c7576cfebf8a5fa66025fcf [state-dump] Node name: 192.168.0.2 [state-dump] InitialConfigResources: {node:192.168.0.2: 10000, accelerator_type:A40: 10000, node:__internal_head__: 10000, memory: 844922429440000, object_store_memory: 21474836480000, CPU: 200000, GPU: 20000} [state-dump] ClusterTaskManager: [state-dump] ========== Node: 13be7277f830f5a8b967d2a0092091c94c7576cfebf8a5fa66025fcf ================= [state-dump] Infeasible queue length: 0 [state-dump] Schedule queue length: 0 [state-dump] Dispatch queue length: 0 [state-dump] num_waiting_for_resource: 0 [state-dump] num_waiting_for_plasma_memory: 0 [state-dump] num_waiting_for_remote_node_resources: 0 [state-dump] num_worker_not_started_by_job_config_not_exist: 0 [state-dump] num_worker_not_started_by_registration_timeout: 0 [state-dump] num_tasks_waiting_for_workers: 0 [state-dump] num_cancelled_tasks: 0 [state-dump] cluster_resource_scheduler state: [state-dump] Local id: -2158256074887862688 Local resources: {"total":{node:__internal_head__: [10000], accelerator_type:A40: [10000], GPU: [10000, 10000], CPU: [200000], memory: [844922429440000], object_store_memory: [21474836480000], node:192.168.0.2: [10000]}}, "available": {node:__internal_head__: [10000], accelerator_type:A40: [10000], GPU: [10000, 10000], CPU: [200000], memory: [844922429440000], object_store_memory: [21474836480000], node:192.168.0.2: [10000]}}, "labels":{"ray.io/node_id":"13be7277f830f5a8b967d2a0092091c94c7576cfebf8a5fa66025fcf",} is_draining: 0 is_idle: 1 Cluster resources: node id: -2158256074887862688{"total":{object_store_memory: 21474836480000, CPU: 200000, accelerator_type:A40: 10000, GPU: 20000, node:192.168.0.2: 10000, node:__internal_head__: 10000, memory: 844922429440000}}, "available": {object_store_memory: 21474836480000, CPU: 200000, memory: 844922429440000, GPU: 20000, node:192.168.0.2: 10000, accelerator_type:A40: 10000, node:__internal_head__: 10000}}, "labels":{"ray.io/node_id":"13be7277f830f5a8b967d2a0092091c94c7576cfebf8a5fa66025fcf",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} [state-dump] Waiting tasks size: 0 [state-dump] Number of executing tasks: 0 [state-dump] Number of pinned task arguments: 0 [state-dump] Number of total spilled tasks: 0 [state-dump] Number of spilled waiting tasks: 0 [state-dump] Number of spilled unschedulable tasks: 0 [state-dump] Resource usage { [state-dump] } [state-dump] Backlog Size per scheduling descriptor :{workerId: num backlogs}: [state-dump] [state-dump] Running tasks by scheduling class: [state-dump] ================================================== [state-dump] [state-dump] ClusterResources: [state-dump] LocalObjectManager: [state-dump] - num pinned objects: 0 [state-dump] - pinned objects size: 0 [state-dump] - num objects pending restore: 0 [state-dump] - num objects pending spill: 0 [state-dump] - num bytes pending spill: 0 [state-dump] - num bytes currently spilled: 0 [state-dump] - cumulative spill requests: 0 [state-dump] - cumulative restore requests: 0 [state-dump] - spilled objects pending delete: 0 [state-dump] [state-dump] ObjectManager: [state-dump] - num local objects: 0 [state-dump] - num unfulfilled push requests: 0 [state-dump] - num object pull requests: 0 [state-dump] - num chunks received total: 0 [state-dump] - num chunks received failed (all): 0 [state-dump] - num chunks received failed / cancelled: 0 [state-dump] - num chunks received failed / plasma error: 0 [state-dump] Event stats: [state-dump] Global stats: 0 total (0 active) [state-dump] Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Execution time: mean = -nan s, total = 0.000 s [state-dump] Event stats: [state-dump] PushManager: [state-dump] - num pushes in flight: 0 [state-dump] - num chunks in flight: 0 [state-dump] - num chunks remaining: 0 [state-dump] - max chunks allowed: 409 [state-dump] OwnershipBasedObjectDirectory: [state-dump] - num listeners: 0 [state-dump] - cumulative location updates: 0 [state-dump] - num location updates per second: 0.000 [state-dump] - num location lookups per second: 0.000 [state-dump] - num locations added per second: 0.000 [state-dump] - num locations removed per second: 0.000 [state-dump] BufferPool: [state-dump] - create buffer state map size: 0 [state-dump] PullManager: [state-dump] - num bytes available for pulled objects: 2147483648 [state-dump] - num bytes being pulled (all): 0 [state-dump] - num bytes being pulled / pinned: 0 [state-dump] - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - first get request bundle: N/A [state-dump] - first wait request bundle: N/A [state-dump] - first task request bundle: N/A [state-dump] - num objects queued: 0 [state-dump] - num objects actively pulled (all): 0 [state-dump] - num objects actively pulled / pinned: 0 [state-dump] - num bundles being pulled: 0 [state-dump] - num pull retries: 0 [state-dump] - max timeout seconds: 0 [state-dump] - max timeout request is already processed. No entry. [state-dump] [state-dump] WorkerPool: [state-dump] - registered jobs: 1 [state-dump] - process_failed_job_config_missing: 0 [state-dump] - process_failed_rate_limited: 0 [state-dump] - process_failed_pending_registration: 0 [state-dump] - process_failed_runtime_env_setup_failed: 0 [state-dump] - num PYTHON workers: 20 [state-dump] - num PYTHON drivers: 1 [state-dump] - num PYTHON pending start requests: 0 [state-dump] - num PYTHON pending registration requests: 0 [state-dump] - num object spill callbacks queued: 0 [state-dump] - num object restore queued: 0 [state-dump] - num util functions queued: 0 [state-dump] - num idle workers: 20 [state-dump] TaskDependencyManager: [state-dump] - task deps map size: 0 [state-dump] - get req map size: 0 [state-dump] - wait req map size: 0 [state-dump] - local objects map size: 0 [state-dump] WaitManager: [state-dump] - num active wait requests: 0 [state-dump] Subscriber: [state-dump] Channel WORKER_OBJECT_LOCATIONS_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_OBJECT_EVICTION [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_REF_REMOVED_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] num async plasma notifications: 0 [state-dump] Remote node managers: [state-dump] Event stats: [state-dump] Global stats: 36990 total (35 active) [state-dump] Queueing time: mean = 27.948 ms, max = 123.051 s, min = 57.000 ns, total = 1033.795 s [state-dump] Execution time: mean = 239.311 us, total = 8.852 s [state-dump] Event stats: [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 8810 total (0 active), Execution time: mean = 39.244 us, total = 345.739 ms, Queueing time: mean = 119.954 us, max = 26.128 ms, min = 5.488 us, total = 1.057 s [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog - 8810 total (0 active), Execution time: mean = 557.757 us, total = 4.914 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] RaySyncer.OnDemandBroadcasting - 4196 total (1 active), Execution time: mean = 11.832 us, total = 49.647 ms, Queueing time: mean = 99.683 us, max = 28.199 ms, min = 12.241 us, total = 418.268 ms [state-dump] ObjectManager.UpdateAvailableMemory - 4196 total (0 active), Execution time: mean = 6.159 us, total = 25.842 ms, Queueing time: mean = 116.626 us, max = 706.852 us, min = 3.115 us, total = 489.362 ms [state-dump] NodeManager.CheckGC - 4196 total (1 active), Execution time: mean = 3.055 us, total = 12.821 ms, Queueing time: mean = 107.507 us, max = 28.206 ms, min = 6.199 us, total = 451.100 ms [state-dump] RayletWorkerPool.deadline_timer.kill_idle_workers - 2100 total (1 active), Execution time: mean = 19.128 us, total = 40.169 ms, Queueing time: mean = 78.208 us, max = 1.689 ms, min = 11.310 us, total = 164.237 ms [state-dump] MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 1677 total (1 active), Execution time: mean = 461.300 us, total = 773.600 ms, Queueing time: mean = 90.906 us, max = 23.328 ms, min = 9.730 us, total = 152.449 ms [state-dump] NodeManager.deadline_timer.spill_objects_when_over_threshold - 420 total (1 active), Execution time: mean = 2.892 us, total = 1.215 ms, Queueing time: mean = 185.982 us, max = 2.259 ms, min = 6.405 us, total = 78.112 ms [state-dump] NodeManager.deadline_timer.flush_free_objects - 420 total (1 active), Execution time: mean = 8.952 us, total = 3.760 ms, Queueing time: mean = 181.908 us, max = 2.264 ms, min = 8.861 us, total = 76.402 ms [state-dump] NodeManager.ScheduleAndDispatchTasks - 420 total (1 active), Execution time: mean = 15.778 us, total = 6.627 ms, Queueing time: mean = 83.043 us, max = 2.235 ms, min = 5.788 us, total = 34.878 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 420 total (0 active), Execution time: mean = 103.272 us, total = 43.374 ms, Queueing time: mean = 117.168 us, max = 248.219 us, min = 16.203 us, total = 49.210 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad - 420 total (0 active), Execution time: mean = 641.111 us, total = 269.267 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ClusterResourceManager.ResetRemoteNodeView - 141 total (1 active), Execution time: mean = 9.041 us, total = 1.275 ms, Queueing time: mean = 77.589 us, max = 253.487 us, min = 26.576 us, total = 10.940 ms [state-dump] ClientConnection.async_read.ProcessMessageHeader - 95 total (21 active), Execution time: mean = 8.155 us, total = 774.725 us, Queueing time: mean = 10.849 s, max = 123.051 s, min = 33.517 us, total = 1030.640 s [state-dump] NodeManager.GcsCheckAlive - 84 total (1 active), Execution time: mean = 290.657 us, total = 24.415 ms, Queueing time: mean = 642.819 us, max = 2.306 ms, min = 107.918 us, total = 53.997 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 84 total (0 active), Execution time: mean = 53.943 us, total = 4.531 ms, Queueing time: mean = 119.305 us, max = 218.783 us, min = 31.716 us, total = 10.022 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 84 total (0 active), Execution time: mean = 1.525 ms, total = 128.138 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.deadline_timer.record_metrics - 84 total (1 active), Execution time: mean = 557.079 us, total = 46.795 ms, Queueing time: mean = 379.989 us, max = 1.725 ms, min = 8.885 us, total = 31.919 ms [state-dump] ClientConnection.async_read.ProcessMessage - 74 total (0 active), Execution time: mean = 929.411 us, total = 68.776 ms, Queueing time: mean = 43.207 us, max = 369.478 us, min = 3.543 us, total = 3.197 ms [state-dump] NodeManager.deadline_timer.debug_state_dump - 42 total (1 active), Execution time: mean = 1.788 ms, total = 75.104 ms, Queueing time: mean = 67.749 us, max = 153.123 us, min = 21.807 us, total = 2.845 ms [state-dump] ClientConnection.async_write.DoAsyncWrites - 22 total (0 active), Execution time: mean = 1.574 us, total = 34.627 us, Queueing time: mean = 41.270 us, max = 146.493 us, min = 11.116 us, total = 907.929 us [state-dump] NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 21 total (0 active), Execution time: mean = 121.458 us, total = 2.551 ms, Queueing time: mean = 94.581 us, max = 235.076 us, min = 11.959 us, total = 1.986 ms [state-dump] ObjectManager.ObjectAdded - 21 total (0 active), Execution time: mean = 12.500 us, total = 262.503 us, Queueing time: mean = 106.328 us, max = 212.914 us, min = 9.723 us, total = 2.233 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig - 21 total (0 active), Execution time: mean = 758.936 us, total = 15.938 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ObjectManager.ObjectDeleted - 21 total (0 active), Execution time: mean = 19.347 us, total = 406.289 us, Queueing time: mean = 139.222 us, max = 517.588 us, min = 35.214 us, total = 2.924 ms [state-dump] PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 197.633 us, total = 2.569 ms, Queueing time: mean = 4.308 ms, max = 15.027 ms, min = 25.785 us, total = 56.007 ms [state-dump] NodeManagerService.grpc_server.RequestWorkerLease - 10 total (0 active), Execution time: mean = 780.093 us, total = 7.801 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] - 10 total (0 active), Execution time: mean = 1.025 us, total = 10.254 us, Queueing time: mean = 114.099 us, max = 182.528 us, min = 27.587 us, total = 1.141 ms [state-dump] RaySyncer.BroadcastMessage - 10 total (0 active), Execution time: mean = 221.900 us, total = 2.219 ms, Queueing time: mean = 695.800 ns, max = 931.000 ns, min = 70.000 ns, total = 6.958 us [state-dump] WorkerPool.PopWorkerCallback - 10 total (0 active), Execution time: mean = 21.932 us, total = 219.320 us, Queueing time: mean = 109.091 us, max = 200.547 us, min = 20.707 us, total = 1.091 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker.HandleRequestImpl - 10 total (0 active), Execution time: mean = 123.767 us, total = 1.238 ms, Queueing time: mean = 96.345 us, max = 137.794 us, min = 36.307 us, total = 963.446 us [state-dump] NodeManagerService.grpc_server.ReturnWorker - 10 total (0 active), Execution time: mean = 635.791 us, total = 6.358 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.RequestWorkerLease.HandleRequestImpl - 10 total (0 active), Execution time: mean = 115.576 us, total = 1.156 ms, Queueing time: mean = 115.782 us, max = 331.820 us, min = 13.069 us, total = 1.158 ms [state-dump] NodeManager.deadline_timer.print_event_loop_stats - 7 total (1 active, 1 running), Execution time: mean = 2.053 ms, total = 14.371 ms, Queueing time: mean = 56.736 us, max = 111.886 us, min = 27.061 us, total = 397.154 us [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 1.646 ms, total = 3.291 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 2 total (1 active), Execution time: mean = 460.725 ms, total = 921.449 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] RaySyncerRegister - 2 total (0 active), Execution time: mean = 1.800 us, total = 3.599 us, Queueing time: mean = 209.500 ns, max = 362.000 ns, min = 57.000 ns, total = 419.000 ns [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 129.215 us, total = 258.430 us, Queueing time: mean = 412.096 us, max = 685.035 us, min = 139.158 us, total = 824.193 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 1.759 ms, total = 1.759 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 1.586 ms, total = 1.586 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 1.546 ms, total = 1.546 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 2.263 ms, total = 2.263 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 132.115 us, total = 132.115 us, Queueing time: mean = 203.939 us, max = 203.939 us, min = 203.939 us, total = 203.939 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 407.271 us, total = 407.271 us, Queueing time: mean = 44.821 us, max = 44.821 us, min = 44.821 us, total = 44.821 us [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 1.026 s, total = 1.026 s, Queueing time: mean = 109.121 us, max = 109.121 us, min = 109.121 us, total = 109.121 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 29.389 us, total = 29.389 us, Queueing time: mean = 150.068 us, max = 150.068 us, min = 150.068 us, total = 150.068 us [state-dump] Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 86.446 us, total = 86.446 us, Queueing time: mean = 403.833 us, max = 403.833 us, min = 403.833 us, total = 403.833 us [state-dump] NodeManager.GCTaskFailureReason - 1 total (1 active), Execution time: mean = 0.000 s, total = 0.000 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 1 total (0 active), Execution time: mean = 327.529 us, total = 327.529 us, Queueing time: mean = 115.620 us, max = 115.620 us, min = 115.620 us, total = 115.620 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 60.113 us, total = 60.113 us, Queueing time: mean = 447.798 us, max = 447.798 us, min = 447.798 us, total = 447.798 us [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 1.740 ms, total = 1.740 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] DebugString() time ms: 0 [state-dump] [state-dump] [2025-01-20 23:01:02,854 I 11815 11843] (raylet) store.cc:564: Plasma store debug dump: Current usage: 0 / 2.14748 GB - num bytes created total: 168 0 pending objects of total size 0MB - objects spillable: 0 - bytes spillable: 0 - objects unsealed: 0 - bytes unsealed: 0 - objects in use: 0 - bytes in use: 0 - objects evictable: 0 - bytes evictable: 0 - objects created by worker: 0 - bytes created by worker: 0 - objects restored: 0 - bytes restored: 0 - objects received: 0 - bytes received: 0 - objects errored: 0 - bytes errored: 0 [2025-01-20 23:01:03,899 I 11815 11815] (raylet) node_manager.cc:525: [state-dump] NodeManager: [state-dump] Node ID: 13be7277f830f5a8b967d2a0092091c94c7576cfebf8a5fa66025fcf [state-dump] Node name: 192.168.0.2 [state-dump] InitialConfigResources: {node:192.168.0.2: 10000, accelerator_type:A40: 10000, node:__internal_head__: 10000, memory: 844922429440000, object_store_memory: 21474836480000, CPU: 200000, GPU: 20000} [state-dump] ClusterTaskManager: [state-dump] ========== Node: 13be7277f830f5a8b967d2a0092091c94c7576cfebf8a5fa66025fcf ================= [state-dump] Infeasible queue length: 0 [state-dump] Schedule queue length: 0 [state-dump] Dispatch queue length: 0 [state-dump] num_waiting_for_resource: 0 [state-dump] num_waiting_for_plasma_memory: 0 [state-dump] num_waiting_for_remote_node_resources: 0 [state-dump] num_worker_not_started_by_job_config_not_exist: 0 [state-dump] num_worker_not_started_by_registration_timeout: 0 [state-dump] num_tasks_waiting_for_workers: 0 [state-dump] num_cancelled_tasks: 0 [state-dump] cluster_resource_scheduler state: [state-dump] Local id: -2158256074887862688 Local resources: {"total":{node:__internal_head__: [10000], accelerator_type:A40: [10000], GPU: [10000, 10000], CPU: [200000], memory: [844922429440000], object_store_memory: [21474836480000], node:192.168.0.2: [10000]}}, "available": {node:__internal_head__: [10000], accelerator_type:A40: [10000], GPU: [10000, 10000], CPU: [200000], memory: [844922429440000], object_store_memory: [21474836480000], node:192.168.0.2: [10000]}}, "labels":{"ray.io/node_id":"13be7277f830f5a8b967d2a0092091c94c7576cfebf8a5fa66025fcf",} is_draining: 0 is_idle: 1 Cluster resources: node id: -2158256074887862688{"total":{object_store_memory: 21474836480000, CPU: 200000, accelerator_type:A40: 10000, GPU: 20000, node:192.168.0.2: 10000, node:__internal_head__: 10000, memory: 844922429440000}}, "available": {object_store_memory: 21474836480000, CPU: 200000, memory: 844922429440000, GPU: 20000, node:192.168.0.2: 10000, accelerator_type:A40: 10000, node:__internal_head__: 10000}}, "labels":{"ray.io/node_id":"13be7277f830f5a8b967d2a0092091c94c7576cfebf8a5fa66025fcf",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} [state-dump] Waiting tasks size: 0 [state-dump] Number of executing tasks: 0 [state-dump] Number of pinned task arguments: 0 [state-dump] Number of total spilled tasks: 0 [state-dump] Number of spilled waiting tasks: 0 [state-dump] Number of spilled unschedulable tasks: 0 [state-dump] Resource usage { [state-dump] } [state-dump] Backlog Size per scheduling descriptor :{workerId: num backlogs}: [state-dump] [state-dump] Running tasks by scheduling class: [state-dump] ================================================== [state-dump] [state-dump] ClusterResources: [state-dump] LocalObjectManager: [state-dump] - num pinned objects: 0 [state-dump] - pinned objects size: 0 [state-dump] - num objects pending restore: 0 [state-dump] - num objects pending spill: 0 [state-dump] - num bytes pending spill: 0 [state-dump] - num bytes currently spilled: 0 [state-dump] - cumulative spill requests: 0 [state-dump] - cumulative restore requests: 0 [state-dump] - spilled objects pending delete: 0 [state-dump] [state-dump] ObjectManager: [state-dump] - num local objects: 0 [state-dump] - num unfulfilled push requests: 0 [state-dump] - num object pull requests: 0 [state-dump] - num chunks received total: 0 [state-dump] - num chunks received failed (all): 0 [state-dump] - num chunks received failed / cancelled: 0 [state-dump] - num chunks received failed / plasma error: 0 [state-dump] Event stats: [state-dump] Global stats: 0 total (0 active) [state-dump] Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Execution time: mean = -nan s, total = 0.000 s [state-dump] Event stats: [state-dump] PushManager: [state-dump] - num pushes in flight: 0 [state-dump] - num chunks in flight: 0 [state-dump] - num chunks remaining: 0 [state-dump] - max chunks allowed: 409 [state-dump] OwnershipBasedObjectDirectory: [state-dump] - num listeners: 0 [state-dump] - cumulative location updates: 0 [state-dump] - num location updates per second: 0.000 [state-dump] - num location lookups per second: 0.000 [state-dump] - num locations added per second: 0.000 [state-dump] - num locations removed per second: 0.000 [state-dump] BufferPool: [state-dump] - create buffer state map size: 0 [state-dump] PullManager: [state-dump] - num bytes available for pulled objects: 2147483648 [state-dump] - num bytes being pulled (all): 0 [state-dump] - num bytes being pulled / pinned: 0 [state-dump] - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - first get request bundle: N/A [state-dump] - first wait request bundle: N/A [state-dump] - first task request bundle: N/A [state-dump] - num objects queued: 0 [state-dump] - num objects actively pulled (all): 0 [state-dump] - num objects actively pulled / pinned: 0 [state-dump] - num bundles being pulled: 0 [state-dump] - num pull retries: 0 [state-dump] - max timeout seconds: 0 [state-dump] - max timeout request is already processed. No entry. [state-dump] [state-dump] WorkerPool: [state-dump] - registered jobs: 1 [state-dump] - process_failed_job_config_missing: 0 [state-dump] - process_failed_rate_limited: 0 [state-dump] - process_failed_pending_registration: 0 [state-dump] - process_failed_runtime_env_setup_failed: 0 [state-dump] - num PYTHON workers: 20 [state-dump] - num PYTHON drivers: 1 [state-dump] - num PYTHON pending start requests: 0 [state-dump] - num PYTHON pending registration requests: 0 [state-dump] - num object spill callbacks queued: 0 [state-dump] - num object restore queued: 0 [state-dump] - num util functions queued: 0 [state-dump] - num idle workers: 20 [state-dump] TaskDependencyManager: [state-dump] - task deps map size: 0 [state-dump] - get req map size: 0 [state-dump] - wait req map size: 0 [state-dump] - local objects map size: 0 [state-dump] WaitManager: [state-dump] - num active wait requests: 0 [state-dump] Subscriber: [state-dump] Channel WORKER_OBJECT_LOCATIONS_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_OBJECT_EVICTION [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_REF_REMOVED_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] num async plasma notifications: 0 [state-dump] Remote node managers: [state-dump] Event stats: [state-dump] Global stats: 42218 total (35 active) [state-dump] Queueing time: mean = 24.497 ms, max = 123.051 s, min = 57.000 ns, total = 1034.215 s [state-dump] Execution time: mean = 231.936 us, total = 9.792 s [state-dump] Event stats: [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 10070 total (0 active), Execution time: mean = 39.005 us, total = 392.782 ms, Queueing time: mean = 118.861 us, max = 26.128 ms, min = 5.488 us, total = 1.197 s [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog - 10070 total (0 active), Execution time: mean = 555.690 us, total = 5.596 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] RaySyncer.OnDemandBroadcasting - 4795 total (1 active), Execution time: mean = 11.742 us, total = 56.301 ms, Queueing time: mean = 98.792 us, max = 28.199 ms, min = 12.241 us, total = 473.708 ms [state-dump] ObjectManager.UpdateAvailableMemory - 4795 total (0 active), Execution time: mean = 6.182 us, total = 29.641 ms, Queueing time: mean = 116.747 us, max = 706.852 us, min = 3.115 us, total = 559.801 ms [state-dump] NodeManager.CheckGC - 4795 total (1 active), Execution time: mean = 3.053 us, total = 14.641 ms, Queueing time: mean = 106.522 us, max = 28.206 ms, min = 6.199 us, total = 510.774 ms [state-dump] RayletWorkerPool.deadline_timer.kill_idle_workers - 2399 total (1 active), Execution time: mean = 18.966 us, total = 45.499 ms, Queueing time: mean = 78.024 us, max = 1.689 ms, min = 11.310 us, total = 187.180 ms [state-dump] MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 1916 total (1 active), Execution time: mean = 460.952 us, total = 883.183 ms, Queueing time: mean = 89.622 us, max = 23.328 ms, min = 9.730 us, total = 171.717 ms [state-dump] NodeManager.deadline_timer.spill_objects_when_over_threshold - 480 total (1 active), Execution time: mean = 2.883 us, total = 1.384 ms, Queueing time: mean = 186.801 us, max = 2.259 ms, min = 6.405 us, total = 89.664 ms [state-dump] NodeManager.deadline_timer.flush_free_objects - 480 total (1 active), Execution time: mean = 8.924 us, total = 4.283 ms, Queueing time: mean = 182.744 us, max = 2.264 ms, min = 8.861 us, total = 87.717 ms [state-dump] NodeManager.ScheduleAndDispatchTasks - 480 total (1 active), Execution time: mean = 15.569 us, total = 7.473 ms, Queueing time: mean = 84.383 us, max = 2.235 ms, min = 5.788 us, total = 40.504 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 479 total (0 active), Execution time: mean = 102.356 us, total = 49.028 ms, Queueing time: mean = 117.593 us, max = 492.037 us, min = 16.203 us, total = 56.327 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad - 479 total (0 active), Execution time: mean = 637.566 us, total = 305.394 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ClusterResourceManager.ResetRemoteNodeView - 161 total (1 active), Execution time: mean = 8.914 us, total = 1.435 ms, Queueing time: mean = 76.558 us, max = 253.487 us, min = 16.912 us, total = 12.326 ms [state-dump] NodeManager.GcsCheckAlive - 96 total (1 active), Execution time: mean = 286.773 us, total = 27.530 ms, Queueing time: mean = 650.950 us, max = 2.306 ms, min = 99.479 us, total = 62.491 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 96 total (0 active), Execution time: mean = 54.090 us, total = 5.193 ms, Queueing time: mean = 118.397 us, max = 218.783 us, min = 27.880 us, total = 11.366 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 96 total (0 active), Execution time: mean = 1.518 ms, total = 145.764 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.deadline_timer.record_metrics - 96 total (1 active), Execution time: mean = 553.602 us, total = 53.146 ms, Queueing time: mean = 387.062 us, max = 1.725 ms, min = 8.885 us, total = 37.158 ms [state-dump] ClientConnection.async_read.ProcessMessageHeader - 95 total (21 active), Execution time: mean = 8.155 us, total = 774.725 us, Queueing time: mean = 10.849 s, max = 123.051 s, min = 33.517 us, total = 1030.640 s [state-dump] ClientConnection.async_read.ProcessMessage - 74 total (0 active), Execution time: mean = 929.411 us, total = 68.776 ms, Queueing time: mean = 43.207 us, max = 369.478 us, min = 3.543 us, total = 3.197 ms [state-dump] NodeManager.deadline_timer.debug_state_dump - 48 total (1 active), Execution time: mean = 1.797 ms, total = 86.268 ms, Queueing time: mean = 66.964 us, max = 153.123 us, min = 21.807 us, total = 3.214 ms [state-dump] ClientConnection.async_write.DoAsyncWrites - 22 total (0 active), Execution time: mean = 1.574 us, total = 34.627 us, Queueing time: mean = 41.270 us, max = 146.493 us, min = 11.116 us, total = 907.929 us [state-dump] NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 21 total (0 active), Execution time: mean = 121.458 us, total = 2.551 ms, Queueing time: mean = 94.581 us, max = 235.076 us, min = 11.959 us, total = 1.986 ms [state-dump] ObjectManager.ObjectAdded - 21 total (0 active), Execution time: mean = 12.500 us, total = 262.503 us, Queueing time: mean = 106.328 us, max = 212.914 us, min = 9.723 us, total = 2.233 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig - 21 total (0 active), Execution time: mean = 758.936 us, total = 15.938 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ObjectManager.ObjectDeleted - 21 total (0 active), Execution time: mean = 19.347 us, total = 406.289 us, Queueing time: mean = 139.222 us, max = 517.588 us, min = 35.214 us, total = 2.924 ms [state-dump] PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 197.633 us, total = 2.569 ms, Queueing time: mean = 4.308 ms, max = 15.027 ms, min = 25.785 us, total = 56.007 ms [state-dump] NodeManagerService.grpc_server.RequestWorkerLease - 10 total (0 active), Execution time: mean = 780.093 us, total = 7.801 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] - 10 total (0 active), Execution time: mean = 1.025 us, total = 10.254 us, Queueing time: mean = 114.099 us, max = 182.528 us, min = 27.587 us, total = 1.141 ms [state-dump] RaySyncer.BroadcastMessage - 10 total (0 active), Execution time: mean = 221.900 us, total = 2.219 ms, Queueing time: mean = 695.800 ns, max = 931.000 ns, min = 70.000 ns, total = 6.958 us [state-dump] WorkerPool.PopWorkerCallback - 10 total (0 active), Execution time: mean = 21.932 us, total = 219.320 us, Queueing time: mean = 109.091 us, max = 200.547 us, min = 20.707 us, total = 1.091 ms [state-dump] NodeManagerService.grpc_server.ReturnWorker.HandleRequestImpl - 10 total (0 active), Execution time: mean = 123.767 us, total = 1.238 ms, Queueing time: mean = 96.345 us, max = 137.794 us, min = 36.307 us, total = 963.446 us [state-dump] NodeManagerService.grpc_server.ReturnWorker - 10 total (0 active), Execution time: mean = 635.791 us, total = 6.358 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.RequestWorkerLease.HandleRequestImpl - 10 total (0 active), Execution time: mean = 115.576 us, total = 1.156 ms, Queueing time: mean = 115.782 us, max = 331.820 us, min = 13.069 us, total = 1.158 ms [state-dump] NodeManager.deadline_timer.print_event_loop_stats - 8 total (1 active, 1 running), Execution time: mean = 1.946 ms, total = 15.566 ms, Queueing time: mean = 52.790 us, max = 111.886 us, min = 25.166 us, total = 422.320 us [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 1.646 ms, total = 3.291 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 2 total (1 active), Execution time: mean = 460.725 ms, total = 921.449 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] RaySyncerRegister - 2 total (0 active), Execution time: mean = 1.800 us, total = 3.599 us, Queueing time: mean = 209.500 ns, max = 362.000 ns, min = 57.000 ns, total = 419.000 ns [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 129.215 us, total = 258.430 us, Queueing time: mean = 412.096 us, max = 685.035 us, min = 139.158 us, total = 824.193 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 1.759 ms, total = 1.759 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 1.586 ms, total = 1.586 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 1.546 ms, total = 1.546 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 2.263 ms, total = 2.263 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 132.115 us, total = 132.115 us, Queueing time: mean = 203.939 us, max = 203.939 us, min = 203.939 us, total = 203.939 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 407.271 us, total = 407.271 us, Queueing time: mean = 44.821 us, max = 44.821 us, min = 44.821 us, total = 44.821 us [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 1.026 s, total = 1.026 s, Queueing time: mean = 109.121 us, max = 109.121 us, min = 109.121 us, total = 109.121 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 29.389 us, total = 29.389 us, Queueing time: mean = 150.068 us, max = 150.068 us, min = 150.068 us, total = 150.068 us [state-dump] Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 86.446 us, total = 86.446 us, Queueing time: mean = 403.833 us, max = 403.833 us, min = 403.833 us, total = 403.833 us [state-dump] NodeManager.GCTaskFailureReason - 1 total (1 active), Execution time: mean = 0.000 s, total = 0.000 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 1 total (0 active), Execution time: mean = 327.529 us, total = 327.529 us, Queueing time: mean = 115.620 us, max = 115.620 us, min = 115.620 us, total = 115.620 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 60.113 us, total = 60.113 us, Queueing time: mean = 447.798 us, max = 447.798 us, min = 447.798 us, total = 447.798 us [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 1.740 ms, total = 1.740 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] DebugString() time ms: 1 [state-dump] [state-dump] [2025-01-20 23:01:48,997 I 11815 11815] (raylet) node_manager.cc:1481: NodeManager::DisconnectClient, disconnect_type=3, has creation task exception = false [2025-01-20 23:01:48,998 I 11815 11815] (raylet) node_manager.cc:1586: Driver (pid=8700) is disconnected. worker_id=01000000ffffffffffffffffffffffffffffffffffffffffffffffff job_id=01000000 [2025-01-20 23:01:49,005 I 11815 11815] (raylet) worker_pool.cc:692: Job 01000000 already started in worker pool. [2025-01-20 23:01:49,051 I 11815 11815] (raylet) main.cc:454: received SIGTERM. Existing local drain request = None [2025-01-20 23:01:49,051 I 11815 11815] (raylet) main.cc:255: Raylet graceful shutdown triggered, reason = EXPECTED_TERMINATION, reason message = received SIGTERM [2025-01-20 23:01:49,051 I 11815 11815] (raylet) main.cc:258: Shutting down... [2025-01-20 23:01:49,051 I 11815 11815] (raylet) accessor.cc:510: Unregistering node node_id=13be7277f830f5a8b967d2a0092091c94c7576cfebf8a5fa66025fcf [2025-01-20 23:01:49,053 I 11815 11815] (raylet) accessor.cc:523: Finished unregistering node info, status = OK node_id=13be7277f830f5a8b967d2a0092091c94c7576cfebf8a5fa66025fcf [2025-01-20 23:01:49,060 I 11815 11815] (raylet) agent_manager.cc:112: Killing agent dashboard_agent/424238335, pid 11907. [2025-01-20 23:01:49,071 I 11815 11908] (raylet) agent_manager.cc:79: Agent process with name dashboard_agent/424238335 exited, exit code 0. [2025-01-20 23:01:49,071 I 11815 11815] (raylet) agent_manager.cc:112: Killing agent runtime_env_agent, pid 11909. [2025-01-20 23:01:49,079 I 11815 11910] (raylet) agent_manager.cc:79: Agent process with name runtime_env_agent exited, exit code 0. [2025-01-20 23:01:49,080 I 11815 11815] (raylet) io_service_pool.cc:47: IOServicePool is stopped. [2025-01-20 23:01:49,207 I 11815 11815] (raylet) stats.h:120: Stats module has shutdown.