[2025-01-21 06:03:00,426 I 22250 22250] (raylet) main.cc:180: Setting cluster ID to: b89f5df64581a09eded9a19cd2dde8d34725a01ff053e2fab1e5f979 [2025-01-21 06:03:00,433 I 22250 22250] (raylet) main.cc:289: Raylet is not set to kill unknown children. [2025-01-21 06:03:00,433 I 22250 22250] (raylet) io_service_pool.cc:35: IOServicePool is running with 1 io_service. [2025-01-21 06:03:00,434 I 22250 22250] (raylet) main.cc:419: Setting node ID node_id=959d61ca307c5c3e7967dc4f62c340eb34c700e436feb4955e4f5877 [2025-01-21 06:03:00,434 I 22250 22250] (raylet) store_runner.cc:32: Allowing the Plasma store to use up to 2.14748GB of memory. [2025-01-21 06:03:00,434 I 22250 22250] (raylet) store_runner.cc:48: Starting object store with directory /dev/shm, fallback /tmp/ray, and huge page support disabled [2025-01-21 06:03:00,434 I 22250 22279] (raylet) dlmalloc.cc:154: create_and_mmap_buffer(2147483656, /dev/shm/plasmaXXXXXX) [2025-01-21 06:03:00,435 I 22250 22279] (raylet) store.cc:564: Plasma store debug dump: Current usage: 0 / 2.14748 GB - num bytes created total: 0 0 pending objects of total size 0MB - objects spillable: 0 - bytes spillable: 0 - objects unsealed: 0 - bytes unsealed: 0 - objects in use: 0 - bytes in use: 0 - objects evictable: 0 - bytes evictable: 0 - objects created by worker: 0 - bytes created by worker: 0 - objects restored: 0 - bytes restored: 0 - objects received: 0 - bytes received: 0 - objects errored: 0 - bytes errored: 0 [2025-01-21 06:03:01,438 I 22250 22250] (raylet) grpc_server.cc:134: ObjectManager server started, listening on port 34289. [2025-01-21 06:03:01,441 I 22250 22250] (raylet) worker_killing_policy.cc:101: Running GroupByOwner policy. [2025-01-21 06:03:01,442 I 22250 22250] (raylet) memory_monitor.cc:47: MemoryMonitor initialized with usage threshold at 94999994368 bytes (0.95 system memory), total system memory bytes: 99999997952 [2025-01-21 06:03:01,442 I 22250 22250] (raylet) node_manager.cc:287: Initializing NodeManager node_id=959d61ca307c5c3e7967dc4f62c340eb34c700e436feb4955e4f5877 [2025-01-21 06:03:01,443 I 22250 22250] (raylet) grpc_server.cc:134: NodeManager server started, listening on port 33137. [2025-01-21 06:03:01,450 I 22250 22343] (raylet) agent_manager.cc:77: Monitor agent process with name dashboard_agent/424238335 [2025-01-21 06:03:01,451 I 22250 22250] (raylet) event.cc:493: Ray Event initialized for RAYLET [2025-01-21 06:03:01,451 I 22250 22250] (raylet) event.cc:324: Set ray event level to warning [2025-01-21 06:03:01,451 I 22250 22345] (raylet) agent_manager.cc:77: Monitor agent process with name runtime_env_agent [2025-01-21 06:03:01,453 I 22250 22250] (raylet) raylet.cc:134: Raylet of id, 959d61ca307c5c3e7967dc4f62c340eb34c700e436feb4955e4f5877 started. Raylet consists of node_manager and object_manager. node_manager address: 192.168.0.2:33137 object_manager address: 192.168.0.2:34289 hostname: 0cd925b1f73b [2025-01-21 06:03:01,455 I 22250 22250] (raylet) node_manager.cc:525: [state-dump] NodeManager: [state-dump] Node ID: 959d61ca307c5c3e7967dc4f62c340eb34c700e436feb4955e4f5877 [state-dump] Node name: 192.168.0.2 [state-dump] InitialConfigResources: {node:192.168.0.2: 10000, accelerator_type:A40: 10000, node:__internal_head__: 10000, memory: 779659989000000, object_store_memory: 21474836480000, CPU: 200000, GPU: 20000} [state-dump] ClusterTaskManager: [state-dump] ========== Node: 959d61ca307c5c3e7967dc4f62c340eb34c700e436feb4955e4f5877 ================= [state-dump] Infeasible queue length: 0 [state-dump] Schedule queue length: 0 [state-dump] Dispatch queue length: 0 [state-dump] num_waiting_for_resource: 0 [state-dump] num_waiting_for_plasma_memory: 0 [state-dump] num_waiting_for_remote_node_resources: 0 [state-dump] num_worker_not_started_by_job_config_not_exist: 0 [state-dump] num_worker_not_started_by_registration_timeout: 0 [state-dump] num_tasks_waiting_for_workers: 0 [state-dump] num_cancelled_tasks: 0 [state-dump] cluster_resource_scheduler state: [state-dump] Local id: -3074196584474872412 Local resources: {"total":{node:__internal_head__: [10000], node:192.168.0.2: [10000], GPU: [10000, 10000], CPU: [200000], memory: [779659989000000], object_store_memory: [21474836480000], accelerator_type:A40: [10000]}}, "available": {node:__internal_head__: [10000], node:192.168.0.2: [10000], GPU: [10000, 10000], CPU: [200000], memory: [779659989000000], object_store_memory: [21474836480000], accelerator_type:A40: [10000]}}, "labels":{"ray.io/node_id":"959d61ca307c5c3e7967dc4f62c340eb34c700e436feb4955e4f5877",} is_draining: 0 is_idle: 1 Cluster resources: node id: -3074196584474872412{"total":{object_store_memory: 21474836480000, memory: 779659989000000, node:192.168.0.2: 10000, node:__internal_head__: 10000, accelerator_type:A40: 10000, GPU: 20000, CPU: 200000}}, "available": {object_store_memory: 21474836480000, memory: 779659989000000, node:192.168.0.2: 10000, node:__internal_head__: 10000, accelerator_type:A40: 10000, GPU: 20000, CPU: 200000}}, "labels":{"ray.io/node_id":"959d61ca307c5c3e7967dc4f62c340eb34c700e436feb4955e4f5877",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} [state-dump] Waiting tasks size: 0 [state-dump] Number of executing tasks: 0 [state-dump] Number of pinned task arguments: 0 [state-dump] Number of total spilled tasks: 0 [state-dump] Number of spilled waiting tasks: 0 [state-dump] Number of spilled unschedulable tasks: 0 [state-dump] Resource usage { [state-dump] } [state-dump] Backlog Size per scheduling descriptor :{workerId: num backlogs}: [state-dump] [state-dump] Running tasks by scheduling class: [state-dump] ================================================== [state-dump] [state-dump] ClusterResources: [state-dump] LocalObjectManager: [state-dump] - num pinned objects: 0 [state-dump] - pinned objects size: 0 [state-dump] - num objects pending restore: 0 [state-dump] - num objects pending spill: 0 [state-dump] - num bytes pending spill: 0 [state-dump] - num bytes currently spilled: 0 [state-dump] - cumulative spill requests: 0 [state-dump] - cumulative restore requests: 0 [state-dump] - spilled objects pending delete: 0 [state-dump] [state-dump] ObjectManager: [state-dump] - num local objects: 0 [state-dump] - num unfulfilled push requests: 0 [state-dump] - num object pull requests: 0 [state-dump] - num chunks received total: 0 [state-dump] - num chunks received failed (all): 0 [state-dump] - num chunks received failed / cancelled: 0 [state-dump] - num chunks received failed / plasma error: 0 [state-dump] Event stats: [state-dump] Global stats: 0 total (0 active) [state-dump] Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Execution time: mean = -nan s, total = 0.000 s [state-dump] Event stats: [state-dump] PushManager: [state-dump] - num pushes in flight: 0 [state-dump] - num chunks in flight: 0 [state-dump] - num chunks remaining: 0 [state-dump] - max chunks allowed: 409 [state-dump] OwnershipBasedObjectDirectory: [state-dump] - num listeners: 0 [state-dump] - cumulative location updates: 0 [state-dump] - num location updates per second: 70169718562524000.000 [state-dump] - num location lookups per second: 70169718562512000.000 [state-dump] - num locations added per second: 0.000 [state-dump] - num locations removed per second: 0.000 [state-dump] BufferPool: [state-dump] - create buffer state map size: 0 [state-dump] PullManager: [state-dump] - num bytes available for pulled objects: 2147483648 [state-dump] - num bytes being pulled (all): 0 [state-dump] - num bytes being pulled / pinned: 0 [state-dump] - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - first get request bundle: N/A [state-dump] - first wait request bundle: N/A [state-dump] - first task request bundle: N/A [state-dump] - num objects queued: 0 [state-dump] - num objects actively pulled (all): 0 [state-dump] - num objects actively pulled / pinned: 0 [state-dump] - num bundles being pulled: 0 [state-dump] - num pull retries: 0 [state-dump] - max timeout seconds: 0 [state-dump] - max timeout request is already processed. No entry. [state-dump] [state-dump] WorkerPool: [state-dump] - registered jobs: 0 [state-dump] - process_failed_job_config_missing: 0 [state-dump] - process_failed_rate_limited: 0 [state-dump] - process_failed_pending_registration: 0 [state-dump] - process_failed_runtime_env_setup_failed: 0 [state-dump] - num PYTHON workers: 0 [state-dump] - num PYTHON drivers: 0 [state-dump] - num PYTHON pending start requests: 0 [state-dump] - num PYTHON pending registration requests: 0 [state-dump] - num object spill callbacks queued: 0 [state-dump] - num object restore queued: 0 [state-dump] - num util functions queued: 0 [state-dump] - num idle workers: 0 [state-dump] TaskDependencyManager: [state-dump] - task deps map size: 0 [state-dump] - get req map size: 0 [state-dump] - wait req map size: 0 [state-dump] - local objects map size: 0 [state-dump] WaitManager: [state-dump] - num active wait requests: 0 [state-dump] Subscriber: [state-dump] Channel WORKER_OBJECT_LOCATIONS_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_OBJECT_EVICTION [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_REF_REMOVED_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] num async plasma notifications: 0 [state-dump] Remote node managers: [state-dump] Event stats: [state-dump] Global stats: 28 total (13 active) [state-dump] Queueing time: mean = 1.197 ms, max = 9.306 ms, min = 12.808 us, total = 33.510 ms [state-dump] Execution time: mean = 36.569 ms, total = 1.024 s [state-dump] Event stats: [state-dump] PeriodicalRunner.RunFnPeriodically - 11 total (2 active, 1 running), Execution time: mean = 146.873 us, total = 1.616 ms, Queueing time: mean = 3.042 ms, max = 9.306 ms, min = 19.328 us, total = 33.463 ms [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 1 total (1 active), Execution time: mean = 0.000 s, total = 0.000 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 1.599 ms, total = 1.599 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.deadline_timer.record_metrics - 1 total (1 active), Execution time: mean = 0.000 s, total = 0.000 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ObjectManager.UpdateAvailableMemory - 1 total (0 active), Execution time: mean = 1.661 us, total = 1.661 us, Queueing time: mean = 16.773 us, max = 16.773 us, min = 16.773 us, total = 16.773 us [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 1.018 s, total = 1.018 s, Queueing time: mean = 12.808 us, max = 12.808 us, min = 12.808 us, total = 12.808 us [state-dump] NodeManager.deadline_timer.spill_objects_when_over_threshold - 1 total (1 active), Execution time: mean = 0.000 s, total = 0.000 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.deadline_timer.flush_free_objects - 1 total (1 active), Execution time: mean = 0.000 s, total = 0.000 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.GCTaskFailureReason - 1 total (1 active), Execution time: mean = 0.000 s, total = 0.000 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.ScheduleAndDispatchTasks - 1 total (1 active), Execution time: mean = 0.000 s, total = 0.000 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 212.923 us, total = 212.923 us, Queueing time: mean = 17.139 us, max = 17.139 us, min = 17.139 us, total = 17.139 us [state-dump] MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 1 total (1 active), Execution time: mean = 0.000 s, total = 0.000 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.deadline_timer.debug_state_dump - 1 total (1 active), Execution time: mean = 0.000 s, total = 0.000 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] RayletWorkerPool.deadline_timer.kill_idle_workers - 1 total (1 active), Execution time: mean = 0.000 s, total = 0.000 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ClusterResourceManager.ResetRemoteNodeView - 1 total (1 active), Execution time: mean = 0.000 s, total = 0.000 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 1 total (0 active), Execution time: mean = 1.102 ms, total = 1.102 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 1.415 ms, total = 1.415 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 1 total (1 active), Execution time: mean = 0.000 s, total = 0.000 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] DebugString() time ms: 0 [state-dump] [state-dump] [2025-01-21 06:03:01,456 I 22250 22250] (raylet) accessor.cc:762: Received notification for node, IsAlive = 1 node_id=959d61ca307c5c3e7967dc4f62c340eb34c700e436feb4955e4f5877 [2025-01-21 06:03:01,488 I 22250 22250] (raylet) worker_pool.cc:501: Started worker process with pid 22382, the token is 0 [2025-01-21 06:03:01,490 I 22250 22250] (raylet) worker_pool.cc:501: Started worker process with pid 22383, the token is 1 [2025-01-21 06:03:01,492 I 22250 22250] (raylet) worker_pool.cc:501: Started worker process with pid 22384, the token is 2 [2025-01-21 06:03:01,494 I 22250 22250] (raylet) worker_pool.cc:501: Started worker process with pid 22385, the token is 3 [2025-01-21 06:03:01,496 I 22250 22250] (raylet) worker_pool.cc:501: Started worker process with pid 22386, the token is 4 [2025-01-21 06:03:01,498 I 22250 22250] (raylet) worker_pool.cc:501: Started worker process with pid 22387, the token is 5 [2025-01-21 06:03:01,500 I 22250 22250] (raylet) worker_pool.cc:501: Started worker process with pid 22388, the token is 6 [2025-01-21 06:03:01,502 I 22250 22250] (raylet) worker_pool.cc:501: Started worker process with pid 22389, the token is 7 [2025-01-21 06:03:01,504 I 22250 22250] (raylet) worker_pool.cc:501: Started worker process with pid 22390, the token is 8 [2025-01-21 06:03:01,506 I 22250 22250] (raylet) worker_pool.cc:501: Started worker process with pid 22391, the token is 9 [2025-01-21 06:03:01,508 I 22250 22250] (raylet) worker_pool.cc:501: Started worker process with pid 22392, the token is 10 [2025-01-21 06:03:01,510 I 22250 22250] (raylet) worker_pool.cc:501: Started worker process with pid 22393, the token is 11 [2025-01-21 06:03:01,512 I 22250 22250] (raylet) worker_pool.cc:501: Started worker process with pid 22394, the token is 12 [2025-01-21 06:03:01,514 I 22250 22250] (raylet) worker_pool.cc:501: Started worker process with pid 22395, the token is 13 [2025-01-21 06:03:01,515 I 22250 22250] (raylet) worker_pool.cc:501: Started worker process with pid 22396, the token is 14 [2025-01-21 06:03:01,517 I 22250 22250] (raylet) worker_pool.cc:501: Started worker process with pid 22397, the token is 15 [2025-01-21 06:03:01,519 I 22250 22250] (raylet) worker_pool.cc:501: Started worker process with pid 22398, the token is 16 [2025-01-21 06:03:01,521 I 22250 22250] (raylet) worker_pool.cc:501: Started worker process with pid 22399, the token is 17 [2025-01-21 06:03:01,523 I 22250 22250] (raylet) worker_pool.cc:501: Started worker process with pid 22400, the token is 18 [2025-01-21 06:03:01,525 I 22250 22250] (raylet) worker_pool.cc:501: Started worker process with pid 22401, the token is 19 [2025-01-21 06:03:02,112 I 22250 22279] (raylet) object_store.cc:35: Object store current usage 8e-09 / 2.14748 GB. [2025-01-21 06:03:02,358 I 22250 22250] (raylet) worker_pool.cc:692: Job 01000000 already started in worker pool. [2025-01-21 06:03:10,459 W 22250 22273] (raylet) metric_exporter.cc:105: [1] Export metrics to agent failed: RpcError: RPC Error message: failed to connect to all addresses; last error: UNKNOWN: ipv4:127.0.0.1:50564: Failed to connect to remote host: Connection refused; RPC Error details: . This won't affect Ray, but you can lose metrics from the cluster. [2025-01-21 06:04:00,435 I 22250 22279] (raylet) store.cc:564: Plasma store debug dump: Current usage: 0 / 2.14748 GB - num bytes created total: 168 0 pending objects of total size 0MB - objects spillable: 0 - bytes spillable: 0 - objects unsealed: 0 - bytes unsealed: 0 - objects in use: 0 - bytes in use: 0 - objects evictable: 0 - bytes evictable: 0 - objects created by worker: 0 - bytes created by worker: 0 - objects restored: 0 - bytes restored: 0 - objects received: 0 - bytes received: 0 - objects errored: 0 - bytes errored: 0 [2025-01-21 06:04:01,457 I 22250 22250] (raylet) node_manager.cc:525: [state-dump] NodeManager: [state-dump] Node ID: 959d61ca307c5c3e7967dc4f62c340eb34c700e436feb4955e4f5877 [state-dump] Node name: 192.168.0.2 [state-dump] InitialConfigResources: {node:192.168.0.2: 10000, accelerator_type:A40: 10000, node:__internal_head__: 10000, memory: 779659989000000, object_store_memory: 21474836480000, CPU: 200000, GPU: 20000} [state-dump] ClusterTaskManager: [state-dump] ========== Node: 959d61ca307c5c3e7967dc4f62c340eb34c700e436feb4955e4f5877 ================= [state-dump] Infeasible queue length: 0 [state-dump] Schedule queue length: 0 [state-dump] Dispatch queue length: 0 [state-dump] num_waiting_for_resource: 0 [state-dump] num_waiting_for_plasma_memory: 0 [state-dump] num_waiting_for_remote_node_resources: 0 [state-dump] num_worker_not_started_by_job_config_not_exist: 0 [state-dump] num_worker_not_started_by_registration_timeout: 0 [state-dump] num_tasks_waiting_for_workers: 0 [state-dump] num_cancelled_tasks: 0 [state-dump] cluster_resource_scheduler state: [state-dump] Local id: -3074196584474872412 Local resources: {"total":{node:__internal_head__: [10000], node:192.168.0.2: [10000], GPU: [10000, 10000], CPU: [200000], memory: [779659989000000], object_store_memory: [21474836480000], accelerator_type:A40: [10000]}}, "available": {node:__internal_head__: [10000], node:192.168.0.2: [10000], GPU: [10000, 10000], CPU: [200000], memory: [779659989000000], object_store_memory: [21474836480000], accelerator_type:A40: [10000]}}, "labels":{"ray.io/node_id":"959d61ca307c5c3e7967dc4f62c340eb34c700e436feb4955e4f5877",} is_draining: 0 is_idle: 1 Cluster resources: node id: -3074196584474872412{"total":{node:192.168.0.2: 10000, GPU: 20000, memory: 779659989000000, accelerator_type:A40: 10000, node:__internal_head__: 10000, CPU: 200000, object_store_memory: 21474836480000}}, "available": {node:192.168.0.2: 10000, accelerator_type:A40: 10000, memory: 779659989000000, node:__internal_head__: 10000, object_store_memory: 21474836480000, CPU: 200000, GPU: 20000}}, "labels":{"ray.io/node_id":"959d61ca307c5c3e7967dc4f62c340eb34c700e436feb4955e4f5877",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} [state-dump] Waiting tasks size: 0 [state-dump] Number of executing tasks: 0 [state-dump] Number of pinned task arguments: 0 [state-dump] Number of total spilled tasks: 0 [state-dump] Number of spilled waiting tasks: 0 [state-dump] Number of spilled unschedulable tasks: 0 [state-dump] Resource usage { [state-dump] } [state-dump] Backlog Size per scheduling descriptor :{workerId: num backlogs}: [state-dump] [state-dump] Running tasks by scheduling class: [state-dump] ================================================== [state-dump] [state-dump] ClusterResources: [state-dump] LocalObjectManager: [state-dump] - num pinned objects: 0 [state-dump] - pinned objects size: 0 [state-dump] - num objects pending restore: 0 [state-dump] - num objects pending spill: 0 [state-dump] - num bytes pending spill: 0 [state-dump] - num bytes currently spilled: 0 [state-dump] - cumulative spill requests: 0 [state-dump] - cumulative restore requests: 0 [state-dump] - spilled objects pending delete: 0 [state-dump] [state-dump] ObjectManager: [state-dump] - num local objects: 0 [state-dump] - num unfulfilled push requests: 0 [state-dump] - num object pull requests: 0 [state-dump] - num chunks received total: 0 [state-dump] - num chunks received failed (all): 0 [state-dump] - num chunks received failed / cancelled: 0 [state-dump] - num chunks received failed / plasma error: 0 [state-dump] Event stats: [state-dump] Global stats: 0 total (0 active) [state-dump] Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Execution time: mean = -nan s, total = 0.000 s [state-dump] Event stats: [state-dump] PushManager: [state-dump] - num pushes in flight: 0 [state-dump] - num chunks in flight: 0 [state-dump] - num chunks remaining: 0 [state-dump] - max chunks allowed: 409 [state-dump] OwnershipBasedObjectDirectory: [state-dump] - num listeners: 0 [state-dump] - cumulative location updates: 0 [state-dump] - num location updates per second: 0.000 [state-dump] - num location lookups per second: 0.000 [state-dump] - num locations added per second: 0.000 [state-dump] - num locations removed per second: 0.000 [state-dump] BufferPool: [state-dump] - create buffer state map size: 0 [state-dump] PullManager: [state-dump] - num bytes available for pulled objects: 2147483648 [state-dump] - num bytes being pulled (all): 0 [state-dump] - num bytes being pulled / pinned: 0 [state-dump] - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - first get request bundle: N/A [state-dump] - first wait request bundle: N/A [state-dump] - first task request bundle: N/A [state-dump] - num objects queued: 0 [state-dump] - num objects actively pulled (all): 0 [state-dump] - num objects actively pulled / pinned: 0 [state-dump] - num bundles being pulled: 0 [state-dump] - num pull retries: 0 [state-dump] - max timeout seconds: 0 [state-dump] - max timeout request is already processed. No entry. [state-dump] [state-dump] WorkerPool: [state-dump] - registered jobs: 1 [state-dump] - process_failed_job_config_missing: 0 [state-dump] - process_failed_rate_limited: 0 [state-dump] - process_failed_pending_registration: 0 [state-dump] - process_failed_runtime_env_setup_failed: 0 [state-dump] - num PYTHON workers: 20 [state-dump] - num PYTHON drivers: 1 [state-dump] - num PYTHON pending start requests: 0 [state-dump] - num PYTHON pending registration requests: 0 [state-dump] - num object spill callbacks queued: 0 [state-dump] - num object restore queued: 0 [state-dump] - num util functions queued: 0 [state-dump] - num idle workers: 20 [state-dump] TaskDependencyManager: [state-dump] - task deps map size: 0 [state-dump] - get req map size: 0 [state-dump] - wait req map size: 0 [state-dump] - local objects map size: 0 [state-dump] WaitManager: [state-dump] - num active wait requests: 0 [state-dump] Subscriber: [state-dump] Channel WORKER_OBJECT_LOCATIONS_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_OBJECT_EVICTION [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_REF_REMOVED_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] num async plasma notifications: 0 [state-dump] Remote node managers: [state-dump] Event stats: [state-dump] Global stats: 5538 total (35 active) [state-dump] Queueing time: mean = 6.404 ms, max = 25.116 s, min = 57.000 ns, total = 35.466 s [state-dump] Execution time: mean = 510.799 us, total = 2.829 s [state-dump] Event stats: [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog - 1260 total (0 active), Execution time: mean = 464.660 us, total = 585.472 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 1260 total (0 active), Execution time: mean = 34.055 us, total = 42.910 ms, Queueing time: mean = 92.372 us, max = 396.025 us, min = 4.093 us, total = 116.388 ms [state-dump] NodeManager.CheckGC - 600 total (1 active), Execution time: mean = 3.105 us, total = 1.863 ms, Queueing time: mean = 78.095 us, max = 2.991 ms, min = 8.473 us, total = 46.857 ms [state-dump] ObjectManager.UpdateAvailableMemory - 600 total (0 active), Execution time: mean = 5.258 us, total = 3.155 ms, Queueing time: mean = 88.884 us, max = 375.599 us, min = 5.091 us, total = 53.330 ms [state-dump] RaySyncer.OnDemandBroadcasting - 600 total (1 active), Execution time: mean = 11.912 us, total = 7.147 ms, Queueing time: mean = 70.762 us, max = 2.979 ms, min = 10.189 us, total = 42.457 ms [state-dump] RayletWorkerPool.deadline_timer.kill_idle_workers - 300 total (1 active), Execution time: mean = 18.544 us, total = 5.563 ms, Queueing time: mean = 70.306 us, max = 648.172 us, min = 12.544 us, total = 21.092 ms [state-dump] MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 240 total (1 active), Execution time: mean = 436.074 us, total = 104.658 ms, Queueing time: mean = 70.864 us, max = 1.639 ms, min = 19.225 us, total = 17.007 ms [state-dump] ClientConnection.async_read.ProcessMessageHeader - 86 total (21 active), Execution time: mean = 4.687 us, total = 403.106 us, Queueing time: mean = 407.886 ms, max = 25.116 s, min = 16.268 us, total = 35.078 s [state-dump] ClientConnection.async_read.ProcessMessage - 65 total (0 active), Execution time: mean = 750.865 us, total = 48.806 ms, Queueing time: mean = 28.560 us, max = 397.692 us, min = 2.888 us, total = 1.856 ms [state-dump] NodeManager.ScheduleAndDispatchTasks - 60 total (1 active), Execution time: mean = 14.731 us, total = 883.886 us, Queueing time: mean = 66.990 us, max = 167.549 us, min = 20.155 us, total = 4.019 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 60 total (0 active), Execution time: mean = 102.042 us, total = 6.123 ms, Queueing time: mean = 82.329 us, max = 181.150 us, min = 13.512 us, total = 4.940 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad - 60 total (0 active), Execution time: mean = 549.440 us, total = 32.966 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.deadline_timer.flush_free_objects - 60 total (1 active), Execution time: mean = 7.329 us, total = 439.758 us, Queueing time: mean = 173.334 us, max = 1.546 ms, min = 12.114 us, total = 10.400 ms [state-dump] NodeManager.deadline_timer.spill_objects_when_over_threshold - 60 total (1 active), Execution time: mean = 2.810 us, total = 168.621 us, Queueing time: mean = 176.670 us, max = 1.542 ms, min = 9.629 us, total = 10.600 ms [state-dump] ClientConnection.async_write.DoAsyncWrites - 22 total (0 active), Execution time: mean = 977.045 ns, total = 21.495 us, Queueing time: mean = 37.996 us, max = 138.120 us, min = 9.530 us, total = 835.918 us [state-dump] ObjectManager.ObjectAdded - 21 total (0 active), Execution time: mean = 10.226 us, total = 214.738 us, Queueing time: mean = 113.424 us, max = 356.656 us, min = 13.491 us, total = 2.382 ms [state-dump] ClusterResourceManager.ResetRemoteNodeView - 21 total (1 active), Execution time: mean = 8.918 us, total = 187.281 us, Queueing time: mean = 70.133 us, max = 147.205 us, min = 39.639 us, total = 1.473 ms [state-dump] ObjectManager.ObjectDeleted - 21 total (0 active), Execution time: mean = 16.456 us, total = 345.584 us, Queueing time: mean = 89.691 us, max = 194.377 us, min = 20.318 us, total = 1.884 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 21 total (0 active), Execution time: mean = 53.386 us, total = 1.121 ms, Queueing time: mean = 49.966 us, max = 200.350 us, min = 3.857 us, total = 1.049 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig - 21 total (0 active), Execution time: mean = 720.054 us, total = 15.121 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 169.893 us, total = 2.209 ms, Queueing time: mean = 2.849 ms, max = 9.306 ms, min = 19.328 us, total = 37.031 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 12 total (0 active), Execution time: mean = 1.311 ms, total = 15.729 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 12 total (0 active), Execution time: mean = 44.350 us, total = 532.199 us, Queueing time: mean = 95.656 us, max = 150.995 us, min = 12.388 us, total = 1.148 ms [state-dump] NodeManager.deadline_timer.record_metrics - 12 total (1 active), Execution time: mean = 530.125 us, total = 6.362 ms, Queueing time: mean = 332.377 us, max = 973.453 us, min = 24.971 us, total = 3.989 ms [state-dump] NodeManager.GcsCheckAlive - 12 total (1 active), Execution time: mean = 263.009 us, total = 3.156 ms, Queueing time: mean = 578.199 us, max = 1.238 ms, min = 249.805 us, total = 6.938 ms [state-dump] NodeManager.deadline_timer.debug_state_dump - 6 total (1 active), Execution time: mean = 1.580 ms, total = 9.481 ms, Queueing time: mean = 50.409 us, max = 79.074 us, min = 19.740 us, total = 302.453 us [state-dump] - 3 total (0 active), Execution time: mean = 461.667 ns, total = 1.385 us, Queueing time: mean = 69.898 us, max = 178.722 us, min = 10.326 us, total = 209.695 us [state-dump] RaySyncer.BroadcastMessage - 3 total (0 active), Execution time: mean = 137.253 us, total = 411.758 us, Queueing time: mean = 349.667 ns, max = 667.000 ns, min = 77.000 ns, total = 1.049 us [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 106.888 us, total = 213.777 us, Queueing time: mean = 535.597 us, max = 1.063 ms, min = 8.076 us, total = 1.071 ms [state-dump] RaySyncerRegister - 2 total (0 active), Execution time: mean = 1.131 us, total = 2.263 us, Queueing time: mean = 169.500 ns, max = 282.000 ns, min = 57.000 ns, total = 339.000 ns [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 1.056 ms, total = 2.112 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 2 total (1 active), Execution time: mean = 452.461 ms, total = 904.923 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 878.054 us, total = 878.054 us, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 9.117 us, total = 9.117 us, Queueing time: mean = 8.677 us, max = 8.677 us, min = 8.677 us, total = 8.677 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 777.793 us, total = 777.793 us, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 1.123 ms, total = 1.123 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 1.415 ms, total = 1.415 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.RequestWorkerLease.HandleRequestImpl - 1 total (0 active), Execution time: mean = 325.413 us, total = 325.413 us, Queueing time: mean = 141.179 us, max = 141.179 us, min = 141.179 us, total = 141.179 us [state-dump] NodeManager.GCTaskFailureReason - 1 total (1 active), Execution time: mean = 0.000 s, total = 0.000 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 212.923 us, total = 212.923 us, Queueing time: mean = 17.139 us, max = 17.139 us, min = 17.139 us, total = 17.139 us [state-dump] NodeManagerService.grpc_server.RequestWorkerLease - 1 total (0 active), Execution time: mean = 938.179 us, total = 938.179 us, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 106.078 us, total = 106.078 us, Queueing time: mean = 9.662 us, max = 9.662 us, min = 9.662 us, total = 9.662 us [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 1.018 s, total = 1.018 s, Queueing time: mean = 12.808 us, max = 12.808 us, min = 12.808 us, total = 12.808 us [state-dump] Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 67.617 us, total = 67.617 us, Queueing time: mean = 273.520 us, max = 273.520 us, min = 273.520 us, total = 273.520 us [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 1 total (0 active), Execution time: mean = 237.927 us, total = 237.927 us, Queueing time: mean = 89.366 us, max = 89.366 us, min = 89.366 us, total = 89.366 us [state-dump] NodeManagerService.grpc_server.ReturnWorker - 1 total (0 active), Execution time: mean = 283.702 us, total = 283.702 us, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 44.565 us, total = 44.565 us, Queueing time: mean = 308.871 us, max = 308.871 us, min = 308.871 us, total = 308.871 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 1.599 ms, total = 1.599 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] WorkerPool.PopWorkerCallback - 1 total (0 active), Execution time: mean = 46.270 us, total = 46.270 us, Queueing time: mean = 33.074 us, max = 33.074 us, min = 33.074 us, total = 33.074 us [state-dump] NodeManager.deadline_timer.print_event_loop_stats - 1 total (1 active, 1 running), Execution time: mean = 0.000 s, total = 0.000 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReturnWorker.HandleRequestImpl - 1 total (0 active), Execution time: mean = 65.318 us, total = 65.318 us, Queueing time: mean = 15.801 us, max = 15.801 us, min = 15.801 us, total = 15.801 us [state-dump] DebugString() time ms: 2 [state-dump] [state-dump] [2025-01-21 06:04:08,207 I 22250 22250] (raylet) node_manager.cc:1481: NodeManager::DisconnectClient, disconnect_type=3, has creation task exception = false [2025-01-21 06:04:08,207 I 22250 22250] (raylet) node_manager.cc:1586: Driver (pid=21983) is disconnected. worker_id=01000000ffffffffffffffffffffffffffffffffffffffffffffffff job_id=01000000 [2025-01-21 06:04:08,214 I 22250 22250] (raylet) worker_pool.cc:692: Job 01000000 already started in worker pool. [2025-01-21 06:04:08,258 I 22250 22250] (raylet) main.cc:454: received SIGTERM. Existing local drain request = None [2025-01-21 06:04:08,258 I 22250 22250] (raylet) main.cc:255: Raylet graceful shutdown triggered, reason = EXPECTED_TERMINATION, reason message = received SIGTERM [2025-01-21 06:04:08,258 I 22250 22250] (raylet) main.cc:258: Shutting down... [2025-01-21 06:04:08,259 I 22250 22250] (raylet) accessor.cc:510: Unregistering node node_id=959d61ca307c5c3e7967dc4f62c340eb34c700e436feb4955e4f5877 [2025-01-21 06:04:08,261 I 22250 22250] (raylet) accessor.cc:523: Finished unregistering node info, status = OK node_id=959d61ca307c5c3e7967dc4f62c340eb34c700e436feb4955e4f5877 [2025-01-21 06:04:08,264 I 22250 22250] (raylet) agent_manager.cc:112: Killing agent dashboard_agent/424238335, pid 22342. [2025-01-21 06:04:08,277 I 22250 22343] (raylet) agent_manager.cc:79: Agent process with name dashboard_agent/424238335 exited, exit code 0. [2025-01-21 06:04:08,277 I 22250 22250] (raylet) agent_manager.cc:112: Killing agent runtime_env_agent, pid 22344. [2025-01-21 06:04:08,285 I 22250 22345] (raylet) agent_manager.cc:79: Agent process with name runtime_env_agent exited, exit code 0. [2025-01-21 06:04:08,286 I 22250 22250] (raylet) io_service_pool.cc:47: IOServicePool is stopped. [2025-01-21 06:04:08,324 I 22250 22250] (raylet) stats.h:120: Stats module has shutdown.