|
[2025-01-21 05:47:35,708 I 18747 18747] (raylet) main.cc:180: Setting cluster ID to: cf4503ed329bc2b5612e9c435582c6da51e2e7d5bac7b8183e0cfd01 |
|
[2025-01-21 05:47:35,718 I 18747 18747] (raylet) main.cc:289: Raylet is not set to kill unknown children. |
|
[2025-01-21 05:47:35,718 I 18747 18747] (raylet) io_service_pool.cc:35: IOServicePool is running with 1 io_service. |
|
[2025-01-21 05:47:35,718 I 18747 18747] (raylet) main.cc:419: Setting node ID node_id=381e636a10e4140b2e9620d2650d6a018da067c3591f2305edfa793d |
|
[2025-01-21 05:47:35,718 I 18747 18747] (raylet) store_runner.cc:32: Allowing the Plasma store to use up to 2.14748GB of memory. |
|
[2025-01-21 05:47:35,718 I 18747 18747] (raylet) store_runner.cc:48: Starting object store with directory /dev/shm, fallback /tmp/ray, and huge page support disabled |
|
[2025-01-21 05:47:35,719 I 18747 18775] (raylet) dlmalloc.cc:154: create_and_mmap_buffer(2147483656, /dev/shm/plasmaXXXXXX) |
|
[2025-01-21 05:47:35,721 I 18747 18775] (raylet) store.cc:564: Plasma store debug dump: |
|
Current usage: 0 / 2.14748 GB |
|
- num bytes created total: 0 |
|
0 pending objects of total size 0MB |
|
- objects spillable: 0 |
|
- bytes spillable: 0 |
|
- objects unsealed: 0 |
|
- bytes unsealed: 0 |
|
- objects in use: 0 |
|
- bytes in use: 0 |
|
- objects evictable: 0 |
|
- bytes evictable: 0 |
|
|
|
- objects created by worker: 0 |
|
- bytes created by worker: 0 |
|
- objects restored: 0 |
|
- bytes restored: 0 |
|
- objects received: 0 |
|
- bytes received: 0 |
|
- objects errored: 0 |
|
- bytes errored: 0 |
|
|
|
[2025-01-21 05:47:36,724 I 18747 18747] (raylet) grpc_server.cc:134: ObjectManager server started, listening on port 33987. |
|
[2025-01-21 05:47:36,727 I 18747 18747] (raylet) worker_killing_policy.cc:101: Running GroupByOwner policy. |
|
[2025-01-21 05:47:36,728 I 18747 18747] (raylet) memory_monitor.cc:47: MemoryMonitor initialized with usage threshold at 94999994368 bytes (0.95 system memory), total system memory bytes: 99999997952 |
|
[2025-01-21 05:47:36,728 I 18747 18747] (raylet) node_manager.cc:287: Initializing NodeManager node_id=381e636a10e4140b2e9620d2650d6a018da067c3591f2305edfa793d |
|
[2025-01-21 05:47:36,729 I 18747 18747] (raylet) grpc_server.cc:134: NodeManager server started, listening on port 35799. |
|
[2025-01-21 05:47:36,736 I 18747 18840] (raylet) agent_manager.cc:77: Monitor agent process with name dashboard_agent/424238335 |
|
[2025-01-21 05:47:36,736 I 18747 18842] (raylet) agent_manager.cc:77: Monitor agent process with name runtime_env_agent |
|
[2025-01-21 05:47:36,736 I 18747 18747] (raylet) event.cc:493: Ray Event initialized for RAYLET |
|
[2025-01-21 05:47:36,736 I 18747 18747] (raylet) event.cc:324: Set ray event level to warning |
|
[2025-01-21 05:47:36,739 I 18747 18747] (raylet) raylet.cc:134: Raylet of id, 381e636a10e4140b2e9620d2650d6a018da067c3591f2305edfa793d started. Raylet consists of node_manager and object_manager. node_manager address: 192.168.0.2:35799 object_manager address: 192.168.0.2:33987 hostname: 0cd925b1f73b |
|
[2025-01-21 05:47:36,742 I 18747 18747] (raylet) node_manager.cc:525: [state-dump] NodeManager: |
|
[state-dump] Node ID: 381e636a10e4140b2e9620d2650d6a018da067c3591f2305edfa793d |
|
[state-dump] Node name: 192.168.0.2 |
|
[state-dump] InitialConfigResources: {object_store_memory: 21474836480000, node:192.168.0.2: 10000, node:__internal_head__: 10000, memory: 752056999940000, accelerator_type:A40: 10000, GPU: 20000, CPU: 200000} |
|
[state-dump] ClusterTaskManager: |
|
[state-dump] ========== Node: 381e636a10e4140b2e9620d2650d6a018da067c3591f2305edfa793d ================= |
|
[state-dump] Infeasible queue length: 0 |
|
[state-dump] Schedule queue length: 0 |
|
[state-dump] Dispatch queue length: 0 |
|
[state-dump] num_waiting_for_resource: 0 |
|
[state-dump] num_waiting_for_plasma_memory: 0 |
|
[state-dump] num_waiting_for_remote_node_resources: 0 |
|
[state-dump] num_worker_not_started_by_job_config_not_exist: 0 |
|
[state-dump] num_worker_not_started_by_registration_timeout: 0 |
|
[state-dump] num_tasks_waiting_for_workers: 0 |
|
[state-dump] num_cancelled_tasks: 0 |
|
[state-dump] cluster_resource_scheduler state: |
|
[state-dump] Local id: 688648627895828852 Local resources: {"total":{node:__internal_head__: [10000], accelerator_type:A40: [10000], node:192.168.0.2: [10000], CPU: [200000], memory: [752056999940000], GPU: [10000, 10000], object_store_memory: [21474836480000]}}, "available": {node:__internal_head__: [10000], accelerator_type:A40: [10000], node:192.168.0.2: [10000], CPU: [200000], memory: [752056999940000], GPU: [10000, 10000], object_store_memory: [21474836480000]}}, "labels":{"ray.io/node_id":"381e636a10e4140b2e9620d2650d6a018da067c3591f2305edfa793d",} is_draining: 0 is_idle: 1 Cluster resources: node id: 688648627895828852{"total":{GPU: 20000, node:192.168.0.2: 10000, node:__internal_head__: 10000, memory: 752056999940000, accelerator_type:A40: 10000, object_store_memory: 21474836480000, CPU: 200000}}, "available": {GPU: 20000, node:192.168.0.2: 10000, node:__internal_head__: 10000, memory: 752056999940000, accelerator_type:A40: 10000, object_store_memory: 21474836480000, CPU: 200000}}, "labels":{"ray.io/node_id":"381e636a10e4140b2e9620d2650d6a018da067c3591f2305edfa793d",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} |
|
[state-dump] Waiting tasks size: 0 |
|
[state-dump] Number of executing tasks: 0 |
|
[state-dump] Number of pinned task arguments: 0 |
|
[state-dump] Number of total spilled tasks: 0 |
|
[state-dump] Number of spilled waiting tasks: 0 |
|
[state-dump] Number of spilled unschedulable tasks: 0 |
|
[state-dump] Resource usage { |
|
[state-dump] } |
|
[state-dump] Backlog Size per scheduling descriptor :{workerId: num backlogs}: |
|
[state-dump] |
|
[state-dump] Running tasks by scheduling class: |
|
[state-dump] ================================================== |
|
[state-dump] |
|
[state-dump] ClusterResources: |
|
[state-dump] LocalObjectManager: |
|
[state-dump] - num pinned objects: 0 |
|
[state-dump] - pinned objects size: 0 |
|
[state-dump] - num objects pending restore: 0 |
|
[state-dump] - num objects pending spill: 0 |
|
[state-dump] - num bytes pending spill: 0 |
|
[state-dump] - num bytes currently spilled: 0 |
|
[state-dump] - cumulative spill requests: 0 |
|
[state-dump] - cumulative restore requests: 0 |
|
[state-dump] - spilled objects pending delete: 0 |
|
[state-dump] |
|
[state-dump] ObjectManager: |
|
[state-dump] - num local objects: 0 |
|
[state-dump] - num unfulfilled push requests: 0 |
|
[state-dump] - num object pull requests: 0 |
|
[state-dump] - num chunks received total: 0 |
|
[state-dump] - num chunks received failed (all): 0 |
|
[state-dump] - num chunks received failed / cancelled: 0 |
|
[state-dump] - num chunks received failed / plasma error: 0 |
|
[state-dump] Event stats: |
|
[state-dump] Global stats: 0 total (0 active) |
|
[state-dump] Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s |
|
[state-dump] Execution time: mean = -nan s, total = 0.000 s |
|
[state-dump] Event stats: |
|
[state-dump] PushManager: |
|
[state-dump] - num pushes in flight: 0 |
|
[state-dump] - num chunks in flight: 0 |
|
[state-dump] - num chunks remaining: 0 |
|
[state-dump] - max chunks allowed: 409 |
|
[state-dump] OwnershipBasedObjectDirectory: |
|
[state-dump] - num listeners: 0 |
|
[state-dump] - cumulative location updates: 0 |
|
[state-dump] - num location updates per second: 69897927663324000.000 |
|
[state-dump] - num location lookups per second: 69897927663312000.000 |
|
[state-dump] - num locations added per second: 0.000 |
|
[state-dump] - num locations removed per second: 0.000 |
|
[state-dump] BufferPool: |
|
[state-dump] - create buffer state map size: 0 |
|
[state-dump] PullManager: |
|
[state-dump] - num bytes available for pulled objects: 2147483648 |
|
[state-dump] - num bytes being pulled (all): 0 |
|
[state-dump] - num bytes being pulled / pinned: 0 |
|
[state-dump] - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} |
|
[state-dump] - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} |
|
[state-dump] - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} |
|
[state-dump] - first get request bundle: N/A |
|
[state-dump] - first wait request bundle: N/A |
|
[state-dump] - first task request bundle: N/A |
|
[state-dump] - num objects queued: 0 |
|
[state-dump] - num objects actively pulled (all): 0 |
|
[state-dump] - num objects actively pulled / pinned: 0 |
|
[state-dump] - num bundles being pulled: 0 |
|
[state-dump] - num pull retries: 0 |
|
[state-dump] - max timeout seconds: 0 |
|
[state-dump] - max timeout request is already processed. No entry. |
|
[state-dump] |
|
[state-dump] WorkerPool: |
|
[state-dump] - registered jobs: 0 |
|
[state-dump] - process_failed_job_config_missing: 0 |
|
[state-dump] - process_failed_rate_limited: 0 |
|
[state-dump] - process_failed_pending_registration: 0 |
|
[state-dump] - process_failed_runtime_env_setup_failed: 0 |
|
[state-dump] - num PYTHON workers: 0 |
|
[state-dump] - num PYTHON drivers: 0 |
|
[state-dump] - num PYTHON pending start requests: 0 |
|
[state-dump] - num PYTHON pending registration requests: 0 |
|
[state-dump] - num object spill callbacks queued: 0 |
|
[state-dump] - num object restore queued: 0 |
|
[state-dump] - num util functions queued: 0 |
|
[state-dump] - num idle workers: 0 |
|
[state-dump] TaskDependencyManager: |
|
[state-dump] - task deps map size: 0 |
|
[state-dump] - get req map size: 0 |
|
[state-dump] - wait req map size: 0 |
|
[state-dump] - local objects map size: 0 |
|
[state-dump] WaitManager: |
|
[state-dump] - num active wait requests: 0 |
|
[state-dump] Subscriber: |
|
[state-dump] Channel WORKER_OBJECT_LOCATIONS_CHANNEL |
|
[state-dump] - cumulative subscribe requests: 0 |
|
[state-dump] - cumulative unsubscribe requests: 0 |
|
[state-dump] - active subscribed publishers: 0 |
|
[state-dump] - cumulative published messages: 0 |
|
[state-dump] - cumulative processed messages: 0 |
|
[state-dump] Channel WORKER_REF_REMOVED_CHANNEL |
|
[state-dump] - cumulative subscribe requests: 0 |
|
[state-dump] - cumulative unsubscribe requests: 0 |
|
[state-dump] - active subscribed publishers: 0 |
|
[state-dump] - cumulative published messages: 0 |
|
[state-dump] - cumulative processed messages: 0 |
|
[state-dump] Channel WORKER_OBJECT_EVICTION |
|
[state-dump] - cumulative subscribe requests: 0 |
|
[state-dump] - cumulative unsubscribe requests: 0 |
|
[state-dump] - active subscribed publishers: 0 |
|
[state-dump] - cumulative published messages: 0 |
|
[state-dump] - cumulative processed messages: 0 |
|
[state-dump] num async plasma notifications: 0 |
|
[state-dump] Remote node managers: |
|
[state-dump] Event stats: |
|
[state-dump] Global stats: 28 total (13 active) |
|
[state-dump] Queueing time: mean = 1.226 ms, max = 8.958 ms, min = 32.982 us, total = 34.321 ms |
|
[state-dump] Execution time: mean = 36.687 ms, total = 1.027 s |
|
[state-dump] Event stats: |
|
[state-dump] PeriodicalRunner.RunFnPeriodically - 11 total (2 active, 1 running), Execution time: mean = 186.480 us, total = 2.051 ms, Queueing time: mean = 3.086 ms, max = 8.958 ms, min = 32.982 us, total = 33.949 ms |
|
[state-dump] ClusterResourceManager.ResetRemoteNodeView - 1 total (1 active), Execution time: mean = 0.000 s, total = 0.000 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s |
|
[state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 1.553 ms, total = 1.553 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s |
|
[state-dump] NodeManager.deadline_timer.record_metrics - 1 total (1 active), Execution time: mean = 0.000 s, total = 0.000 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s |
|
[state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 342.194 us, total = 342.194 us, Queueing time: mean = 163.766 us, max = 163.766 us, min = 163.766 us, total = 163.766 us |
|
[state-dump] NodeManager.GCTaskFailureReason - 1 total (1 active), Execution time: mean = 0.000 s, total = 0.000 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s |
|
[state-dump] NodeManager.deadline_timer.flush_free_objects - 1 total (1 active), Execution time: mean = 0.000 s, total = 0.000 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s |
|
[state-dump] NodeManager.deadline_timer.debug_state_dump - 1 total (1 active), Execution time: mean = 0.000 s, total = 0.000 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s |
|
[state-dump] MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 1 total (1 active), Execution time: mean = 0.000 s, total = 0.000 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s |
|
[state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 2.419 ms, total = 2.419 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s |
|
[state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 1.019 s, total = 1.019 s, Queueing time: mean = 90.737 us, max = 90.737 us, min = 90.737 us, total = 90.737 us |
|
[state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 1 total (1 active), Execution time: mean = 0.000 s, total = 0.000 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s |
|
[state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 1 total (1 active), Execution time: mean = 0.000 s, total = 0.000 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s |
|
[state-dump] RayletWorkerPool.deadline_timer.kill_idle_workers - 1 total (1 active), Execution time: mean = 0.000 s, total = 0.000 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s |
|
[state-dump] NodeManager.deadline_timer.spill_objects_when_over_threshold - 1 total (1 active), Execution time: mean = 0.000 s, total = 0.000 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s |
|
[state-dump] NodeManager.ScheduleAndDispatchTasks - 1 total (1 active), Execution time: mean = 0.000 s, total = 0.000 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s |
|
[state-dump] ObjectManager.UpdateAvailableMemory - 1 total (0 active), Execution time: mean = 5.026 us, total = 5.026 us, Queueing time: mean = 117.849 us, max = 117.849 us, min = 117.849 us, total = 117.849 us |
|
[state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 1 total (0 active), Execution time: mean = 1.597 ms, total = 1.597 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s |
|
[state-dump] DebugString() time ms: 0 |
|
[state-dump] |
|
[state-dump] |
|
[2025-01-21 05:47:36,744 I 18747 18747] (raylet) accessor.cc:762: Received notification for node, IsAlive = 1 node_id=381e636a10e4140b2e9620d2650d6a018da067c3591f2305edfa793d |
|
[2025-01-21 05:47:36,855 I 18747 18747] (raylet) worker_pool.cc:501: Started worker process with pid 18879, the token is 0 |
|
[2025-01-21 05:47:36,858 I 18747 18747] (raylet) worker_pool.cc:501: Started worker process with pid 18880, the token is 1 |
|
[2025-01-21 05:47:36,861 I 18747 18747] (raylet) worker_pool.cc:501: Started worker process with pid 18881, the token is 2 |
|
[2025-01-21 05:47:36,863 I 18747 18747] (raylet) worker_pool.cc:501: Started worker process with pid 18882, the token is 3 |
|
[2025-01-21 05:47:36,865 I 18747 18747] (raylet) worker_pool.cc:501: Started worker process with pid 18883, the token is 4 |
|
[2025-01-21 05:47:36,867 I 18747 18747] (raylet) worker_pool.cc:501: Started worker process with pid 18884, the token is 5 |
|
[2025-01-21 05:47:36,869 I 18747 18747] (raylet) worker_pool.cc:501: Started worker process with pid 18885, the token is 6 |
|
[2025-01-21 05:47:36,871 I 18747 18747] (raylet) worker_pool.cc:501: Started worker process with pid 18886, the token is 7 |
|
[2025-01-21 05:47:36,873 I 18747 18747] (raylet) worker_pool.cc:501: Started worker process with pid 18887, the token is 8 |
|
[2025-01-21 05:47:36,875 I 18747 18747] (raylet) worker_pool.cc:501: Started worker process with pid 18888, the token is 9 |
|
[2025-01-21 05:47:36,877 I 18747 18747] (raylet) worker_pool.cc:501: Started worker process with pid 18889, the token is 10 |
|
[2025-01-21 05:47:36,879 I 18747 18747] (raylet) worker_pool.cc:501: Started worker process with pid 18890, the token is 11 |
|
[2025-01-21 05:47:36,881 I 18747 18747] (raylet) worker_pool.cc:501: Started worker process with pid 18891, the token is 12 |
|
[2025-01-21 05:47:36,883 I 18747 18747] (raylet) worker_pool.cc:501: Started worker process with pid 18892, the token is 13 |
|
[2025-01-21 05:47:36,885 I 18747 18747] (raylet) worker_pool.cc:501: Started worker process with pid 18893, the token is 14 |
|
[2025-01-21 05:47:36,888 I 18747 18747] (raylet) worker_pool.cc:501: Started worker process with pid 18894, the token is 15 |
|
[2025-01-21 05:47:36,890 I 18747 18747] (raylet) worker_pool.cc:501: Started worker process with pid 18895, the token is 16 |
|
[2025-01-21 05:47:36,893 I 18747 18747] (raylet) worker_pool.cc:501: Started worker process with pid 18896, the token is 17 |
|
[2025-01-21 05:47:36,895 I 18747 18747] (raylet) worker_pool.cc:501: Started worker process with pid 18897, the token is 18 |
|
[2025-01-21 05:47:36,898 I 18747 18747] (raylet) worker_pool.cc:501: Started worker process with pid 18898, the token is 19 |
|
[2025-01-21 05:47:37,611 I 18747 18775] (raylet) object_store.cc:35: Object store current usage 8e-09 / 2.14748 GB. |
|
[2025-01-21 05:47:37,748 I 18747 18747] (raylet) worker_pool.cc:692: Job 01000000 already started in worker pool. |
|
[2025-01-21 05:47:45,760 W 18747 18769] (raylet) metric_exporter.cc:105: [1] Export metrics to agent failed: RpcError: RPC Error message: failed to connect to all addresses; last error: UNKNOWN: ipv4:127.0.0.1:64843: Failed to connect to remote host: Connection refused; RPC Error details: . This won't affect Ray, but you can lose metrics from the cluster. |
|
[2025-01-21 05:48:35,721 I 18747 18775] (raylet) store.cc:564: Plasma store debug dump: |
|
Current usage: 0 / 2.14748 GB |
|
- num bytes created total: 168 |
|
0 pending objects of total size 0MB |
|
- objects spillable: 0 |
|
- bytes spillable: 0 |
|
- objects unsealed: 0 |
|
- bytes unsealed: 0 |
|
- objects in use: 0 |
|
- bytes in use: 0 |
|
- objects evictable: 0 |
|
- bytes evictable: 0 |
|
|
|
- objects created by worker: 0 |
|
- bytes created by worker: 0 |
|
- objects restored: 0 |
|
- bytes restored: 0 |
|
- objects received: 0 |
|
- bytes received: 0 |
|
- objects errored: 0 |
|
- bytes errored: 0 |
|
|
|
[2025-01-21 05:48:36,744 I 18747 18747] (raylet) node_manager.cc:525: [state-dump] NodeManager: |
|
[state-dump] Node ID: 381e636a10e4140b2e9620d2650d6a018da067c3591f2305edfa793d |
|
[state-dump] Node name: 192.168.0.2 |
|
[state-dump] InitialConfigResources: {object_store_memory: 21474836480000, node:192.168.0.2: 10000, node:__internal_head__: 10000, memory: 752056999940000, accelerator_type:A40: 10000, GPU: 20000, CPU: 200000} |
|
[state-dump] ClusterTaskManager: |
|
[state-dump] ========== Node: 381e636a10e4140b2e9620d2650d6a018da067c3591f2305edfa793d ================= |
|
[state-dump] Infeasible queue length: 0 |
|
[state-dump] Schedule queue length: 0 |
|
[state-dump] Dispatch queue length: 0 |
|
[state-dump] num_waiting_for_resource: 0 |
|
[state-dump] num_waiting_for_plasma_memory: 0 |
|
[state-dump] num_waiting_for_remote_node_resources: 0 |
|
[state-dump] num_worker_not_started_by_job_config_not_exist: 0 |
|
[state-dump] num_worker_not_started_by_registration_timeout: 0 |
|
[state-dump] num_tasks_waiting_for_workers: 0 |
|
[state-dump] num_cancelled_tasks: 0 |
|
[state-dump] cluster_resource_scheduler state: |
|
[state-dump] Local id: 688648627895828852 Local resources: {"total":{node:__internal_head__: [10000], accelerator_type:A40: [10000], node:192.168.0.2: [10000], CPU: [200000], memory: [752056999940000], GPU: [10000, 10000], object_store_memory: [21474836480000]}}, "available": {node:__internal_head__: [10000], accelerator_type:A40: [10000], node:192.168.0.2: [10000], CPU: [200000], memory: [752056999940000], GPU: [10000, 10000], object_store_memory: [21474836480000]}}, "labels":{"ray.io/node_id":"381e636a10e4140b2e9620d2650d6a018da067c3591f2305edfa793d",} is_draining: 0 is_idle: 1 Cluster resources: node id: 688648627895828852{"total":{GPU: 20000, accelerator_type:A40: 10000, CPU: 200000, object_store_memory: 21474836480000, node:__internal_head__: 10000, node:192.168.0.2: 10000, memory: 752056999940000}}, "available": {GPU: 20000, node:192.168.0.2: 10000, memory: 752056999940000, node:__internal_head__: 10000, object_store_memory: 21474836480000, CPU: 200000, accelerator_type:A40: 10000}}, "labels":{"ray.io/node_id":"381e636a10e4140b2e9620d2650d6a018da067c3591f2305edfa793d",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} |
|
[state-dump] Waiting tasks size: 0 |
|
[state-dump] Number of executing tasks: 0 |
|
[state-dump] Number of pinned task arguments: 0 |
|
[state-dump] Number of total spilled tasks: 0 |
|
[state-dump] Number of spilled waiting tasks: 0 |
|
[state-dump] Number of spilled unschedulable tasks: 0 |
|
[state-dump] Resource usage { |
|
[state-dump] } |
|
[state-dump] Backlog Size per scheduling descriptor :{workerId: num backlogs}: |
|
[state-dump] |
|
[state-dump] Running tasks by scheduling class: |
|
[state-dump] ================================================== |
|
[state-dump] |
|
[state-dump] ClusterResources: |
|
[state-dump] LocalObjectManager: |
|
[state-dump] - num pinned objects: 0 |
|
[state-dump] - pinned objects size: 0 |
|
[state-dump] - num objects pending restore: 0 |
|
[state-dump] - num objects pending spill: 0 |
|
[state-dump] - num bytes pending spill: 0 |
|
[state-dump] - num bytes currently spilled: 0 |
|
[state-dump] - cumulative spill requests: 0 |
|
[state-dump] - cumulative restore requests: 0 |
|
[state-dump] - spilled objects pending delete: 0 |
|
[state-dump] |
|
[state-dump] ObjectManager: |
|
[state-dump] - num local objects: 0 |
|
[state-dump] - num unfulfilled push requests: 0 |
|
[state-dump] - num object pull requests: 0 |
|
[state-dump] - num chunks received total: 0 |
|
[state-dump] - num chunks received failed (all): 0 |
|
[state-dump] - num chunks received failed / cancelled: 0 |
|
[state-dump] - num chunks received failed / plasma error: 0 |
|
[state-dump] Event stats: |
|
[state-dump] Global stats: 0 total (0 active) |
|
[state-dump] Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s |
|
[state-dump] Execution time: mean = -nan s, total = 0.000 s |
|
[state-dump] Event stats: |
|
[state-dump] PushManager: |
|
[state-dump] - num pushes in flight: 0 |
|
[state-dump] - num chunks in flight: 0 |
|
[state-dump] - num chunks remaining: 0 |
|
[state-dump] - max chunks allowed: 409 |
|
[state-dump] OwnershipBasedObjectDirectory: |
|
[state-dump] - num listeners: 0 |
|
[state-dump] - cumulative location updates: 0 |
|
[state-dump] - num location updates per second: 0.000 |
|
[state-dump] - num location lookups per second: 0.000 |
|
[state-dump] - num locations added per second: 0.000 |
|
[state-dump] - num locations removed per second: 0.000 |
|
[state-dump] BufferPool: |
|
[state-dump] - create buffer state map size: 0 |
|
[state-dump] PullManager: |
|
[state-dump] - num bytes available for pulled objects: 2147483648 |
|
[state-dump] - num bytes being pulled (all): 0 |
|
[state-dump] - num bytes being pulled / pinned: 0 |
|
[state-dump] - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} |
|
[state-dump] - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} |
|
[state-dump] - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} |
|
[state-dump] - first get request bundle: N/A |
|
[state-dump] - first wait request bundle: N/A |
|
[state-dump] - first task request bundle: N/A |
|
[state-dump] - num objects queued: 0 |
|
[state-dump] - num objects actively pulled (all): 0 |
|
[state-dump] - num objects actively pulled / pinned: 0 |
|
[state-dump] - num bundles being pulled: 0 |
|
[state-dump] - num pull retries: 0 |
|
[state-dump] - max timeout seconds: 0 |
|
[state-dump] - max timeout request is already processed. No entry. |
|
[state-dump] |
|
[state-dump] WorkerPool: |
|
[state-dump] - registered jobs: 1 |
|
[state-dump] - process_failed_job_config_missing: 0 |
|
[state-dump] - process_failed_rate_limited: 0 |
|
[state-dump] - process_failed_pending_registration: 0 |
|
[state-dump] - process_failed_runtime_env_setup_failed: 0 |
|
[state-dump] - num PYTHON workers: 20 |
|
[state-dump] - num PYTHON drivers: 1 |
|
[state-dump] - num PYTHON pending start requests: 0 |
|
[state-dump] - num PYTHON pending registration requests: 0 |
|
[state-dump] - num object spill callbacks queued: 0 |
|
[state-dump] - num object restore queued: 0 |
|
[state-dump] - num util functions queued: 0 |
|
[state-dump] - num idle workers: 20 |
|
[state-dump] TaskDependencyManager: |
|
[state-dump] - task deps map size: 0 |
|
[state-dump] - get req map size: 0 |
|
[state-dump] - wait req map size: 0 |
|
[state-dump] - local objects map size: 0 |
|
[state-dump] WaitManager: |
|
[state-dump] - num active wait requests: 0 |
|
[state-dump] Subscriber: |
|
[state-dump] Channel WORKER_OBJECT_LOCATIONS_CHANNEL |
|
[state-dump] - cumulative subscribe requests: 0 |
|
[state-dump] - cumulative unsubscribe requests: 0 |
|
[state-dump] - active subscribed publishers: 0 |
|
[state-dump] - cumulative published messages: 0 |
|
[state-dump] - cumulative processed messages: 0 |
|
[state-dump] Channel WORKER_REF_REMOVED_CHANNEL |
|
[state-dump] - cumulative subscribe requests: 0 |
|
[state-dump] - cumulative unsubscribe requests: 0 |
|
[state-dump] - active subscribed publishers: 0 |
|
[state-dump] - cumulative published messages: 0 |
|
[state-dump] - cumulative processed messages: 0 |
|
[state-dump] Channel WORKER_OBJECT_EVICTION |
|
[state-dump] - cumulative subscribe requests: 0 |
|
[state-dump] - cumulative unsubscribe requests: 0 |
|
[state-dump] - active subscribed publishers: 0 |
|
[state-dump] - cumulative published messages: 0 |
|
[state-dump] - cumulative processed messages: 0 |
|
[state-dump] num async plasma notifications: 0 |
|
[state-dump] Remote node managers: |
|
[state-dump] Event stats: |
|
[state-dump] Global stats: 5530 total (35 active) |
|
[state-dump] Queueing time: mean = 573.126 us, max = 1.191 s, min = 67.000 ns, total = 3.169 s |
|
[state-dump] Execution time: mean = 560.089 us, total = 3.097 s |
|
[state-dump] Event stats: |
|
[state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog - 1256 total (0 active), Execution time: mean = 551.680 us, total = 692.910 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s |
|
[state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 1256 total (0 active), Execution time: mean = 37.564 us, total = 47.181 ms, Queueing time: mean = 120.094 us, max = 1.170 ms, min = 4.142 us, total = 150.838 ms |
|
[state-dump] RaySyncer.OnDemandBroadcasting - 600 total (1 active), Execution time: mean = 11.152 us, total = 6.691 ms, Queueing time: mean = 97.931 us, max = 2.543 ms, min = 17.309 us, total = 58.759 ms |
|
[state-dump] NodeManager.CheckGC - 600 total (1 active), Execution time: mean = 3.035 us, total = 1.821 ms, Queueing time: mean = 105.115 us, max = 2.553 ms, min = 17.075 us, total = 63.069 ms |
|
[state-dump] ObjectManager.UpdateAvailableMemory - 600 total (0 active), Execution time: mean = 5.642 us, total = 3.385 ms, Queueing time: mean = 124.167 us, max = 9.283 ms, min = 4.574 us, total = 74.500 ms |
|
[state-dump] RayletWorkerPool.deadline_timer.kill_idle_workers - 300 total (1 active), Execution time: mean = 18.242 us, total = 5.472 ms, Queueing time: mean = 73.722 us, max = 917.083 us, min = 14.481 us, total = 22.116 ms |
|
[state-dump] MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 240 total (1 active), Execution time: mean = 456.797 us, total = 109.631 ms, Queueing time: mean = 72.166 us, max = 211.927 us, min = 11.434 us, total = 17.320 ms |
|
[state-dump] ClientConnection.async_read.ProcessMessageHeader - 86 total (21 active), Execution time: mean = 7.344 us, total = 631.600 us, Queueing time: mean = 30.672 ms, max = 1.191 s, min = 27.575 us, total = 2.638 s |
|
[state-dump] ClientConnection.async_read.ProcessMessage - 65 total (0 active), Execution time: mean = 928.306 us, total = 60.340 ms, Queueing time: mean = 73.811 us, max = 1.027 ms, min = 2.835 us, total = 4.798 ms |
|
[state-dump] NodeManagerService.grpc_server.GetResourceLoad - 60 total (0 active), Execution time: mean = 687.638 us, total = 41.258 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s |
|
[state-dump] NodeManager.deadline_timer.flush_free_objects - 60 total (1 active), Execution time: mean = 7.745 us, total = 464.713 us, Queueing time: mean = 168.857 us, max = 1.192 ms, min = 16.703 us, total = 10.131 ms |
|
[state-dump] NodeManager.ScheduleAndDispatchTasks - 60 total (1 active), Execution time: mean = 14.714 us, total = 882.838 us, Queueing time: mean = 76.310 us, max = 238.113 us, min = 18.599 us, total = 4.579 ms |
|
[state-dump] NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 60 total (0 active), Execution time: mean = 108.305 us, total = 6.498 ms, Queueing time: mean = 113.514 us, max = 238.952 us, min = 26.158 us, total = 6.811 ms |
|
[state-dump] NodeManager.deadline_timer.spill_objects_when_over_threshold - 60 total (1 active), Execution time: mean = 3.051 us, total = 183.042 us, Queueing time: mean = 172.228 us, max = 1.189 ms, min = 14.355 us, total = 10.334 ms |
|
[state-dump] ClientConnection.async_write.DoAsyncWrites - 22 total (0 active), Execution time: mean = 1.317 us, total = 28.982 us, Queueing time: mean = 49.767 us, max = 431.510 us, min = 17.047 us, total = 1.095 ms |
|
[state-dump] ObjectManager.ObjectDeleted - 21 total (0 active), Execution time: mean = 21.296 us, total = 447.224 us, Queueing time: mean = 186.718 us, max = 583.430 us, min = 33.345 us, total = 3.921 ms |
|
[state-dump] NodeManagerService.grpc_server.GetSystemConfig - 21 total (0 active), Execution time: mean = 1.630 ms, total = 34.233 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s |
|
[state-dump] ObjectManager.ObjectAdded - 21 total (0 active), Execution time: mean = 11.466 us, total = 240.795 us, Queueing time: mean = 2.139 ms, max = 21.287 ms, min = 13.920 us, total = 44.925 ms |
|
[state-dump] ClusterResourceManager.ResetRemoteNodeView - 21 total (1 active), Execution time: mean = 8.266 us, total = 173.579 us, Queueing time: mean = 81.200 us, max = 170.339 us, min = 29.645 us, total = 1.705 ms |
|
[state-dump] NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 21 total (0 active), Execution time: mean = 154.352 us, total = 3.241 ms, Queueing time: mean = 162.625 us, max = 432.570 us, min = 33.451 us, total = 3.415 ms |
|
[state-dump] PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 217.267 us, total = 2.824 ms, Queueing time: mean = 2.965 ms, max = 8.958 ms, min = 32.982 us, total = 38.551 ms |
|
[state-dump] NodeManager.deadline_timer.record_metrics - 12 total (1 active), Execution time: mean = 551.125 us, total = 6.614 ms, Queueing time: mean = 276.973 us, max = 847.550 us, min = 44.026 us, total = 3.324 ms |
|
[state-dump] NodeManager.GcsCheckAlive - 12 total (1 active), Execution time: mean = 240.853 us, total = 2.890 ms, Queueing time: mean = 555.132 us, max = 1.215 ms, min = 116.533 us, total = 6.662 ms |
|
[state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 12 total (0 active), Execution time: mean = 1.351 ms, total = 16.210 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s |
|
[state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 12 total (0 active), Execution time: mean = 49.450 us, total = 593.399 us, Queueing time: mean = 133.260 us, max = 185.679 us, min = 103.084 us, total = 1.599 ms |
|
[state-dump] NodeManager.deadline_timer.debug_state_dump - 6 total (1 active), Execution time: mean = 1.455 ms, total = 8.728 ms, Queueing time: mean = 58.917 us, max = 120.559 us, min = 31.326 us, total = 353.500 us |
|
[state-dump] RaySyncer.BroadcastMessage - 3 total (0 active), Execution time: mean = 184.876 us, total = 554.629 us, Queueing time: mean = 494.333 ns, max = 672.000 ns, min = 148.000 ns, total = 1.483 us |
|
[state-dump] - 3 total (0 active), Execution time: mean = 930.667 ns, total = 2.792 us, Queueing time: mean = 93.439 us, max = 153.432 us, min = 25.031 us, total = 280.318 us |
|
[state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 1.377 ms, total = 2.754 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s |
|
[state-dump] RaySyncerRegister - 2 total (0 active), Execution time: mean = 2.083 us, total = 4.165 us, Queueing time: mean = 301.000 ns, max = 535.000 ns, min = 67.000 ns, total = 602.000 ns |
|
[state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 129.555 us, total = 259.110 us, Queueing time: mean = 655.112 us, max = 1.180 ms, min = 129.843 us, total = 1.310 ms |
|
[state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 2 total (1 active), Execution time: mean = 504.334 ms, total = 1.009 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s |
|
[state-dump] NodeManagerService.grpc_server.RequestWorkerLease - 1 total (0 active), Execution time: mean = 843.714 us, total = 843.714 us, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s |
|
[state-dump] Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 96.544 us, total = 96.544 us, Queueing time: mean = 315.750 us, max = 315.750 us, min = 315.750 us, total = 315.750 us |
|
[state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 342.194 us, total = 342.194 us, Queueing time: mean = 163.766 us, max = 163.766 us, min = 163.766 us, total = 163.766 us |
|
[state-dump] NodeManagerService.grpc_server.RequestWorkerLease.HandleRequestImpl - 1 total (0 active), Execution time: mean = 264.026 us, total = 264.026 us, Queueing time: mean = 107.963 us, max = 107.963 us, min = 107.963 us, total = 107.963 us |
|
[state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 2.419 ms, total = 2.419 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s |
|
[state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 1 total (0 active), Execution time: mean = 302.707 us, total = 302.707 us, Queueing time: mean = 113.841 us, max = 113.841 us, min = 113.841 us, total = 113.841 us |
|
[state-dump] NodeManager.deadline_timer.print_event_loop_stats - 1 total (1 active, 1 running), Execution time: mean = 0.000 s, total = 0.000 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s |
|
[state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 1.019 s, total = 1.019 s, Queueing time: mean = 90.737 us, max = 90.737 us, min = 90.737 us, total = 90.737 us |
|
[state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 1.553 ms, total = 1.553 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s |
|
[state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 1.873 ms, total = 1.873 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s |
|
[state-dump] NodeManagerService.grpc_server.ReturnWorker - 1 total (0 active), Execution time: mean = 742.755 us, total = 742.755 us, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s |
|
[state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 29.991 us, total = 29.991 us, Queueing time: mean = 111.550 us, max = 111.550 us, min = 111.550 us, total = 111.550 us |
|
[state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 1.569 ms, total = 1.569 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s |
|
[state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 1.597 ms, total = 1.597 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s |
|
[state-dump] NodeManager.GCTaskFailureReason - 1 total (1 active), Execution time: mean = 0.000 s, total = 0.000 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s |
|
[state-dump] WorkerPool.PopWorkerCallback - 1 total (0 active), Execution time: mean = 55.365 us, total = 55.365 us, Queueing time: mean = 38.510 us, max = 38.510 us, min = 38.510 us, total = 38.510 us |
|
[state-dump] NodeManagerService.grpc_server.ReturnWorker.HandleRequestImpl - 1 total (0 active), Execution time: mean = 149.640 us, total = 149.640 us, Queueing time: mean = 140.746 us, max = 140.746 us, min = 140.746 us, total = 140.746 us |
|
[state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 224.222 us, total = 224.222 us, Queueing time: mean = 119.308 us, max = 119.308 us, min = 119.308 us, total = 119.308 us |
|
[state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 137.575 us, total = 137.575 us, Queueing time: mean = 36.079 us, max = 36.079 us, min = 36.079 us, total = 36.079 us |
|
[state-dump] DebugString() time ms: 1 |
|
[state-dump] |
|
[state-dump] |
|
[2025-01-21 05:49:35,722 I 18747 18775] (raylet) store.cc:564: Plasma store debug dump: |
|
Current usage: 0 / 2.14748 GB |
|
- num bytes created total: 168 |
|
0 pending objects of total size 0MB |
|
- objects spillable: 0 |
|
- bytes spillable: 0 |
|
- objects unsealed: 0 |
|
- bytes unsealed: 0 |
|
- objects in use: 0 |
|
- bytes in use: 0 |
|
- objects evictable: 0 |
|
- bytes evictable: 0 |
|
|
|
- objects created by worker: 0 |
|
- bytes created by worker: 0 |
|
- objects restored: 0 |
|
- bytes restored: 0 |
|
- objects received: 0 |
|
- bytes received: 0 |
|
- objects errored: 0 |
|
- bytes errored: 0 |
|
|
|
[2025-01-21 05:49:36,747 I 18747 18747] (raylet) node_manager.cc:525: [state-dump] NodeManager: |
|
[state-dump] Node ID: 381e636a10e4140b2e9620d2650d6a018da067c3591f2305edfa793d |
|
[state-dump] Node name: 192.168.0.2 |
|
[state-dump] InitialConfigResources: {object_store_memory: 21474836480000, node:192.168.0.2: 10000, node:__internal_head__: 10000, memory: 752056999940000, accelerator_type:A40: 10000, GPU: 20000, CPU: 200000} |
|
[state-dump] ClusterTaskManager: |
|
[state-dump] ========== Node: 381e636a10e4140b2e9620d2650d6a018da067c3591f2305edfa793d ================= |
|
[state-dump] Infeasible queue length: 0 |
|
[state-dump] Schedule queue length: 0 |
|
[state-dump] Dispatch queue length: 0 |
|
[state-dump] num_waiting_for_resource: 0 |
|
[state-dump] num_waiting_for_plasma_memory: 0 |
|
[state-dump] num_waiting_for_remote_node_resources: 0 |
|
[state-dump] num_worker_not_started_by_job_config_not_exist: 0 |
|
[state-dump] num_worker_not_started_by_registration_timeout: 0 |
|
[state-dump] num_tasks_waiting_for_workers: 0 |
|
[state-dump] num_cancelled_tasks: 0 |
|
[state-dump] cluster_resource_scheduler state: |
|
[state-dump] Local id: 688648627895828852 Local resources: {"total":{node:__internal_head__: [10000], accelerator_type:A40: [10000], node:192.168.0.2: [10000], CPU: [200000], memory: [752056999940000], GPU: [10000, 10000], object_store_memory: [21474836480000]}}, "available": {node:__internal_head__: [10000], accelerator_type:A40: [10000], node:192.168.0.2: [10000], CPU: [200000], memory: [752056999940000], GPU: [10000, 10000], object_store_memory: [21474836480000]}}, "labels":{"ray.io/node_id":"381e636a10e4140b2e9620d2650d6a018da067c3591f2305edfa793d",} is_draining: 0 is_idle: 1 Cluster resources: node id: 688648627895828852{"total":{GPU: 20000, accelerator_type:A40: 10000, CPU: 200000, object_store_memory: 21474836480000, node:__internal_head__: 10000, node:192.168.0.2: 10000, memory: 752056999940000}}, "available": {GPU: 20000, node:192.168.0.2: 10000, memory: 752056999940000, node:__internal_head__: 10000, object_store_memory: 21474836480000, CPU: 200000, accelerator_type:A40: 10000}}, "labels":{"ray.io/node_id":"381e636a10e4140b2e9620d2650d6a018da067c3591f2305edfa793d",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} |
|
[state-dump] Waiting tasks size: 0 |
|
[state-dump] Number of executing tasks: 0 |
|
[state-dump] Number of pinned task arguments: 0 |
|
[state-dump] Number of total spilled tasks: 0 |
|
[state-dump] Number of spilled waiting tasks: 0 |
|
[state-dump] Number of spilled unschedulable tasks: 0 |
|
[state-dump] Resource usage { |
|
[state-dump] } |
|
[state-dump] Backlog Size per scheduling descriptor :{workerId: num backlogs}: |
|
[state-dump] |
|
[state-dump] Running tasks by scheduling class: |
|
[state-dump] ================================================== |
|
[state-dump] |
|
[state-dump] ClusterResources: |
|
[state-dump] LocalObjectManager: |
|
[state-dump] - num pinned objects: 0 |
|
[state-dump] - pinned objects size: 0 |
|
[state-dump] - num objects pending restore: 0 |
|
[state-dump] - num objects pending spill: 0 |
|
[state-dump] - num bytes pending spill: 0 |
|
[state-dump] - num bytes currently spilled: 0 |
|
[state-dump] - cumulative spill requests: 0 |
|
[state-dump] - cumulative restore requests: 0 |
|
[state-dump] - spilled objects pending delete: 0 |
|
[state-dump] |
|
[state-dump] ObjectManager: |
|
[state-dump] - num local objects: 0 |
|
[state-dump] - num unfulfilled push requests: 0 |
|
[state-dump] - num object pull requests: 0 |
|
[state-dump] - num chunks received total: 0 |
|
[state-dump] - num chunks received failed (all): 0 |
|
[state-dump] - num chunks received failed / cancelled: 0 |
|
[state-dump] - num chunks received failed / plasma error: 0 |
|
[state-dump] Event stats: |
|
[state-dump] Global stats: 0 total (0 active) |
|
[state-dump] Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s |
|
[state-dump] Execution time: mean = -nan s, total = 0.000 s |
|
[state-dump] Event stats: |
|
[state-dump] PushManager: |
|
[state-dump] - num pushes in flight: 0 |
|
[state-dump] - num chunks in flight: 0 |
|
[state-dump] - num chunks remaining: 0 |
|
[state-dump] - max chunks allowed: 409 |
|
[state-dump] OwnershipBasedObjectDirectory: |
|
[state-dump] - num listeners: 0 |
|
[state-dump] - cumulative location updates: 0 |
|
[state-dump] - num location updates per second: 0.000 |
|
[state-dump] - num location lookups per second: 0.000 |
|
[state-dump] - num locations added per second: 0.000 |
|
[state-dump] - num locations removed per second: 0.000 |
|
[state-dump] BufferPool: |
|
[state-dump] - create buffer state map size: 0 |
|
[state-dump] PullManager: |
|
[state-dump] - num bytes available for pulled objects: 2147483648 |
|
[state-dump] - num bytes being pulled (all): 0 |
|
[state-dump] - num bytes being pulled / pinned: 0 |
|
[state-dump] - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} |
|
[state-dump] - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} |
|
[state-dump] - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} |
|
[state-dump] - first get request bundle: N/A |
|
[state-dump] - first wait request bundle: N/A |
|
[state-dump] - first task request bundle: N/A |
|
[state-dump] - num objects queued: 0 |
|
[state-dump] - num objects actively pulled (all): 0 |
|
[state-dump] - num objects actively pulled / pinned: 0 |
|
[state-dump] - num bundles being pulled: 0 |
|
[state-dump] - num pull retries: 0 |
|
[state-dump] - max timeout seconds: 0 |
|
[state-dump] - max timeout request is already processed. No entry. |
|
[state-dump] |
|
[state-dump] WorkerPool: |
|
[state-dump] - registered jobs: 1 |
|
[state-dump] - process_failed_job_config_missing: 0 |
|
[state-dump] - process_failed_rate_limited: 0 |
|
[state-dump] - process_failed_pending_registration: 0 |
|
[state-dump] - process_failed_runtime_env_setup_failed: 0 |
|
[state-dump] - num PYTHON workers: 20 |
|
[state-dump] - num PYTHON drivers: 1 |
|
[state-dump] - num PYTHON pending start requests: 0 |
|
[state-dump] - num PYTHON pending registration requests: 0 |
|
[state-dump] - num object spill callbacks queued: 0 |
|
[state-dump] - num object restore queued: 0 |
|
[state-dump] - num util functions queued: 0 |
|
[state-dump] - num idle workers: 20 |
|
[state-dump] TaskDependencyManager: |
|
[state-dump] - task deps map size: 0 |
|
[state-dump] - get req map size: 0 |
|
[state-dump] - wait req map size: 0 |
|
[state-dump] - local objects map size: 0 |
|
[state-dump] WaitManager: |
|
[state-dump] - num active wait requests: 0 |
|
[state-dump] Subscriber: |
|
[state-dump] Channel WORKER_OBJECT_LOCATIONS_CHANNEL |
|
[state-dump] - cumulative subscribe requests: 0 |
|
[state-dump] - cumulative unsubscribe requests: 0 |
|
[state-dump] - active subscribed publishers: 0 |
|
[state-dump] - cumulative published messages: 0 |
|
[state-dump] - cumulative processed messages: 0 |
|
[state-dump] Channel WORKER_REF_REMOVED_CHANNEL |
|
[state-dump] - cumulative subscribe requests: 0 |
|
[state-dump] - cumulative unsubscribe requests: 0 |
|
[state-dump] - active subscribed publishers: 0 |
|
[state-dump] - cumulative published messages: 0 |
|
[state-dump] - cumulative processed messages: 0 |
|
[state-dump] Channel WORKER_OBJECT_EVICTION |
|
[state-dump] - cumulative subscribe requests: 0 |
|
[state-dump] - cumulative unsubscribe requests: 0 |
|
[state-dump] - active subscribed publishers: 0 |
|
[state-dump] - cumulative published messages: 0 |
|
[state-dump] - cumulative processed messages: 0 |
|
[state-dump] num async plasma notifications: 0 |
|
[state-dump] Remote node managers: |
|
[state-dump] Event stats: |
|
[state-dump] Global stats: 10762 total (35 active) |
|
[state-dump] Queueing time: mean = 328.202 us, max = 1.191 s, min = 67.000 ns, total = 3.532 s |
|
[state-dump] Execution time: mean = 366.124 us, total = 3.940 s |
|
[state-dump] Event stats: |
|
[state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog - 2516 total (0 active), Execution time: mean = 516.103 us, total = 1.299 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s |
|
[state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 2516 total (0 active), Execution time: mean = 35.351 us, total = 88.942 ms, Queueing time: mean = 112.130 us, max = 1.474 ms, min = 4.142 us, total = 282.118 ms |
|
[state-dump] RaySyncer.OnDemandBroadcasting - 1199 total (1 active), Execution time: mean = 9.886 us, total = 11.853 ms, Queueing time: mean = 88.245 us, max = 2.543 ms, min = 17.309 us, total = 105.806 ms |
|
[state-dump] NodeManager.CheckGC - 1199 total (1 active), Execution time: mean = 2.891 us, total = 3.466 ms, Queueing time: mean = 94.322 us, max = 2.553 ms, min = 12.906 us, total = 113.093 ms |
|
[state-dump] ObjectManager.UpdateAvailableMemory - 1199 total (0 active), Execution time: mean = 5.075 us, total = 6.085 ms, Queueing time: mean = 106.530 us, max = 9.283 ms, min = 4.283 us, total = 127.730 ms |
|
[state-dump] RayletWorkerPool.deadline_timer.kill_idle_workers - 600 total (1 active), Execution time: mean = 16.715 us, total = 10.029 ms, Queueing time: mean = 68.180 us, max = 992.162 us, min = 9.895 us, total = 40.908 ms |
|
[state-dump] MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 480 total (1 active), Execution time: mean = 442.599 us, total = 212.447 ms, Queueing time: mean = 75.926 us, max = 3.232 ms, min = 8.760 us, total = 36.445 ms |
|
[state-dump] NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 120 total (0 active), Execution time: mean = 105.204 us, total = 12.625 ms, Queueing time: mean = 107.876 us, max = 238.952 us, min = 21.549 us, total = 12.945 ms |
|
[state-dump] NodeManager.deadline_timer.spill_objects_when_over_threshold - 120 total (1 active), Execution time: mean = 2.936 us, total = 352.314 us, Queueing time: mean = 162.767 us, max = 1.982 ms, min = 7.617 us, total = 19.532 ms |
|
[state-dump] NodeManager.ScheduleAndDispatchTasks - 120 total (1 active), Execution time: mean = 13.943 us, total = 1.673 ms, Queueing time: mean = 89.610 us, max = 2.272 ms, min = 17.139 us, total = 10.753 ms |
|
[state-dump] NodeManager.deadline_timer.flush_free_objects - 120 total (1 active), Execution time: mean = 7.496 us, total = 899.544 us, Queueing time: mean = 159.579 us, max = 1.983 ms, min = 10.158 us, total = 19.149 ms |
|
[state-dump] NodeManagerService.grpc_server.GetResourceLoad - 120 total (0 active), Execution time: mean = 641.393 us, total = 76.967 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s |
|
[state-dump] ClientConnection.async_read.ProcessMessageHeader - 86 total (21 active), Execution time: mean = 7.344 us, total = 631.600 us, Queueing time: mean = 30.672 ms, max = 1.191 s, min = 27.575 us, total = 2.638 s |
|
[state-dump] ClientConnection.async_read.ProcessMessage - 65 total (0 active), Execution time: mean = 928.306 us, total = 60.340 ms, Queueing time: mean = 73.811 us, max = 1.027 ms, min = 2.835 us, total = 4.798 ms |
|
[state-dump] ClusterResourceManager.ResetRemoteNodeView - 41 total (1 active), Execution time: mean = 7.828 us, total = 320.931 us, Queueing time: mean = 70.124 us, max = 170.339 us, min = 21.940 us, total = 2.875 ms |
|
[state-dump] NodeManager.deadline_timer.record_metrics - 24 total (1 active), Execution time: mean = 518.353 us, total = 12.440 ms, Queueing time: mean = 283.542 us, max = 1.453 ms, min = 22.010 us, total = 6.805 ms |
|
[state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 24 total (0 active), Execution time: mean = 46.637 us, total = 1.119 ms, Queueing time: mean = 112.639 us, max = 223.007 us, min = 18.869 us, total = 2.703 ms |
|
[state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 24 total (0 active), Execution time: mean = 1.276 ms, total = 30.612 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s |
|
[state-dump] NodeManager.GcsCheckAlive - 24 total (1 active), Execution time: mean = 233.340 us, total = 5.600 ms, Queueing time: mean = 552.154 us, max = 1.871 ms, min = 116.533 us, total = 13.252 ms |
|
[state-dump] ClientConnection.async_write.DoAsyncWrites - 22 total (0 active), Execution time: mean = 1.317 us, total = 28.982 us, Queueing time: mean = 49.767 us, max = 431.510 us, min = 17.047 us, total = 1.095 ms |
|
[state-dump] NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 21 total (0 active), Execution time: mean = 154.352 us, total = 3.241 ms, Queueing time: mean = 162.625 us, max = 432.570 us, min = 33.451 us, total = 3.415 ms |
|
[state-dump] ObjectManager.ObjectAdded - 21 total (0 active), Execution time: mean = 11.466 us, total = 240.795 us, Queueing time: mean = 2.139 ms, max = 21.287 ms, min = 13.920 us, total = 44.925 ms |
|
[state-dump] ObjectManager.ObjectDeleted - 21 total (0 active), Execution time: mean = 21.296 us, total = 447.224 us, Queueing time: mean = 186.718 us, max = 583.430 us, min = 33.345 us, total = 3.921 ms |
|
[state-dump] NodeManagerService.grpc_server.GetSystemConfig - 21 total (0 active), Execution time: mean = 1.630 ms, total = 34.233 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s |
|
[state-dump] PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 217.267 us, total = 2.824 ms, Queueing time: mean = 2.965 ms, max = 8.958 ms, min = 32.982 us, total = 38.551 ms |
|
[state-dump] NodeManager.deadline_timer.debug_state_dump - 12 total (1 active), Execution time: mean = 1.487 ms, total = 17.846 ms, Queueing time: mean = 53.499 us, max = 120.559 us, min = 13.718 us, total = 641.991 us |
|
[state-dump] - 3 total (0 active), Execution time: mean = 930.667 ns, total = 2.792 us, Queueing time: mean = 93.439 us, max = 153.432 us, min = 25.031 us, total = 280.318 us |
|
[state-dump] RaySyncer.BroadcastMessage - 3 total (0 active), Execution time: mean = 184.876 us, total = 554.629 us, Queueing time: mean = 494.333 ns, max = 672.000 ns, min = 148.000 ns, total = 1.483 us |
|
[state-dump] RaySyncerRegister - 2 total (0 active), Execution time: mean = 2.083 us, total = 4.165 us, Queueing time: mean = 301.000 ns, max = 535.000 ns, min = 67.000 ns, total = 602.000 ns |
|
[state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 129.555 us, total = 259.110 us, Queueing time: mean = 655.112 us, max = 1.180 ms, min = 129.843 us, total = 1.310 ms |
|
[state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 2 total (1 active), Execution time: mean = 504.334 ms, total = 1.009 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s |
|
[state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 1.377 ms, total = 2.754 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s |
|
[state-dump] NodeManager.deadline_timer.print_event_loop_stats - 2 total (1 active, 1 running), Execution time: mean = 1.366 ms, total = 2.732 ms, Queueing time: mean = 30.976 us, max = 61.952 us, min = 61.952 us, total = 61.952 us |
|
[state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 342.194 us, total = 342.194 us, Queueing time: mean = 163.766 us, max = 163.766 us, min = 163.766 us, total = 163.766 us |
|
[state-dump] NodeManagerService.grpc_server.RequestWorkerLease - 1 total (0 active), Execution time: mean = 843.714 us, total = 843.714 us, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s |
|
[state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 2.419 ms, total = 2.419 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s |
|
[state-dump] NodeManagerService.grpc_server.RequestWorkerLease.HandleRequestImpl - 1 total (0 active), Execution time: mean = 264.026 us, total = 264.026 us, Queueing time: mean = 107.963 us, max = 107.963 us, min = 107.963 us, total = 107.963 us |
|
[state-dump] Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 96.544 us, total = 96.544 us, Queueing time: mean = 315.750 us, max = 315.750 us, min = 315.750 us, total = 315.750 us |
|
[state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 1 total (0 active), Execution time: mean = 302.707 us, total = 302.707 us, Queueing time: mean = 113.841 us, max = 113.841 us, min = 113.841 us, total = 113.841 us |
|
[state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 1.019 s, total = 1.019 s, Queueing time: mean = 90.737 us, max = 90.737 us, min = 90.737 us, total = 90.737 us |
|
[state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 1.553 ms, total = 1.553 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s |
|
[state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 1.873 ms, total = 1.873 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s |
|
[state-dump] NodeManagerService.grpc_server.ReturnWorker - 1 total (0 active), Execution time: mean = 742.755 us, total = 742.755 us, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s |
|
[state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 29.991 us, total = 29.991 us, Queueing time: mean = 111.550 us, max = 111.550 us, min = 111.550 us, total = 111.550 us |
|
[state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 1.569 ms, total = 1.569 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s |
|
[state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 1.597 ms, total = 1.597 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s |
|
[state-dump] NodeManager.GCTaskFailureReason - 1 total (1 active), Execution time: mean = 0.000 s, total = 0.000 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s |
|
[state-dump] WorkerPool.PopWorkerCallback - 1 total (0 active), Execution time: mean = 55.365 us, total = 55.365 us, Queueing time: mean = 38.510 us, max = 38.510 us, min = 38.510 us, total = 38.510 us |
|
[state-dump] NodeManagerService.grpc_server.ReturnWorker.HandleRequestImpl - 1 total (0 active), Execution time: mean = 149.640 us, total = 149.640 us, Queueing time: mean = 140.746 us, max = 140.746 us, min = 140.746 us, total = 140.746 us |
|
[state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 224.222 us, total = 224.222 us, Queueing time: mean = 119.308 us, max = 119.308 us, min = 119.308 us, total = 119.308 us |
|
[state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 137.575 us, total = 137.575 us, Queueing time: mean = 36.079 us, max = 36.079 us, min = 36.079 us, total = 36.079 us |
|
[state-dump] DebugString() time ms: 1 |
|
[state-dump] |
|
[state-dump] |
|
[2025-01-21 05:50:35,722 I 18747 18775] (raylet) store.cc:564: Plasma store debug dump: |
|
Current usage: 0 / 2.14748 GB |
|
- num bytes created total: 168 |
|
0 pending objects of total size 0MB |
|
- objects spillable: 0 |
|
- bytes spillable: 0 |
|
- objects unsealed: 0 |
|
- bytes unsealed: 0 |
|
- objects in use: 0 |
|
- bytes in use: 0 |
|
- objects evictable: 0 |
|
- bytes evictable: 0 |
|
|
|
- objects created by worker: 0 |
|
- bytes created by worker: 0 |
|
- objects restored: 0 |
|
- bytes restored: 0 |
|
- objects received: 0 |
|
- bytes received: 0 |
|
- objects errored: 0 |
|
- bytes errored: 0 |
|
|
|
[2025-01-21 05:50:36,750 I 18747 18747] (raylet) node_manager.cc:525: [state-dump] NodeManager: |
|
[state-dump] Node ID: 381e636a10e4140b2e9620d2650d6a018da067c3591f2305edfa793d |
|
[state-dump] Node name: 192.168.0.2 |
|
[state-dump] InitialConfigResources: {object_store_memory: 21474836480000, node:192.168.0.2: 10000, node:__internal_head__: 10000, memory: 752056999940000, accelerator_type:A40: 10000, GPU: 20000, CPU: 200000} |
|
[state-dump] ClusterTaskManager: |
|
[state-dump] ========== Node: 381e636a10e4140b2e9620d2650d6a018da067c3591f2305edfa793d ================= |
|
[state-dump] Infeasible queue length: 0 |
|
[state-dump] Schedule queue length: 0 |
|
[state-dump] Dispatch queue length: 0 |
|
[state-dump] num_waiting_for_resource: 0 |
|
[state-dump] num_waiting_for_plasma_memory: 0 |
|
[state-dump] num_waiting_for_remote_node_resources: 0 |
|
[state-dump] num_worker_not_started_by_job_config_not_exist: 0 |
|
[state-dump] num_worker_not_started_by_registration_timeout: 0 |
|
[state-dump] num_tasks_waiting_for_workers: 0 |
|
[state-dump] num_cancelled_tasks: 0 |
|
[state-dump] cluster_resource_scheduler state: |
|
[state-dump] Local id: 688648627895828852 Local resources: {"total":{node:__internal_head__: [10000], accelerator_type:A40: [10000], node:192.168.0.2: [10000], CPU: [200000], memory: [752056999940000], GPU: [10000, 10000], object_store_memory: [21474836480000]}}, "available": {node:__internal_head__: [10000], accelerator_type:A40: [10000], node:192.168.0.2: [10000], CPU: [200000], memory: [752056999940000], GPU: [10000, 10000], object_store_memory: [21474836480000]}}, "labels":{"ray.io/node_id":"381e636a10e4140b2e9620d2650d6a018da067c3591f2305edfa793d",} is_draining: 0 is_idle: 1 Cluster resources: node id: 688648627895828852{"total":{object_store_memory: 21474836480000, node:192.168.0.2: 10000, memory: 752056999940000, node:__internal_head__: 10000, GPU: 20000, accelerator_type:A40: 10000, CPU: 200000}}, "available": {object_store_memory: 21474836480000, memory: 752056999940000, node:__internal_head__: 10000, node:192.168.0.2: 10000, GPU: 20000, accelerator_type:A40: 10000, CPU: 200000}}, "labels":{"ray.io/node_id":"381e636a10e4140b2e9620d2650d6a018da067c3591f2305edfa793d",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} |
|
[state-dump] Waiting tasks size: 0 |
|
[state-dump] Number of executing tasks: 0 |
|
[state-dump] Number of pinned task arguments: 0 |
|
[state-dump] Number of total spilled tasks: 0 |
|
[state-dump] Number of spilled waiting tasks: 0 |
|
[state-dump] Number of spilled unschedulable tasks: 0 |
|
[state-dump] Resource usage { |
|
[state-dump] } |
|
[state-dump] Backlog Size per scheduling descriptor :{workerId: num backlogs}: |
|
[state-dump] |
|
[state-dump] Running tasks by scheduling class: |
|
[state-dump] ================================================== |
|
[state-dump] |
|
[state-dump] ClusterResources: |
|
[state-dump] LocalObjectManager: |
|
[state-dump] - num pinned objects: 0 |
|
[state-dump] - pinned objects size: 0 |
|
[state-dump] - num objects pending restore: 0 |
|
[state-dump] - num objects pending spill: 0 |
|
[state-dump] - num bytes pending spill: 0 |
|
[state-dump] - num bytes currently spilled: 0 |
|
[state-dump] - cumulative spill requests: 0 |
|
[state-dump] - cumulative restore requests: 0 |
|
[state-dump] - spilled objects pending delete: 0 |
|
[state-dump] |
|
[state-dump] ObjectManager: |
|
[state-dump] - num local objects: 0 |
|
[state-dump] - num unfulfilled push requests: 0 |
|
[state-dump] - num object pull requests: 0 |
|
[state-dump] - num chunks received total: 0 |
|
[state-dump] - num chunks received failed (all): 0 |
|
[state-dump] - num chunks received failed / cancelled: 0 |
|
[state-dump] - num chunks received failed / plasma error: 0 |
|
[state-dump] Event stats: |
|
[state-dump] Global stats: 0 total (0 active) |
|
[state-dump] Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s |
|
[state-dump] Execution time: mean = -nan s, total = 0.000 s |
|
[state-dump] Event stats: |
|
[state-dump] PushManager: |
|
[state-dump] - num pushes in flight: 0 |
|
[state-dump] - num chunks in flight: 0 |
|
[state-dump] - num chunks remaining: 0 |
|
[state-dump] - max chunks allowed: 409 |
|
[state-dump] OwnershipBasedObjectDirectory: |
|
[state-dump] - num listeners: 0 |
|
[state-dump] - cumulative location updates: 0 |
|
[state-dump] - num location updates per second: 0.000 |
|
[state-dump] - num location lookups per second: 0.000 |
|
[state-dump] - num locations added per second: 0.000 |
|
[state-dump] - num locations removed per second: 0.000 |
|
[state-dump] BufferPool: |
|
[state-dump] - create buffer state map size: 0 |
|
[state-dump] PullManager: |
|
[state-dump] - num bytes available for pulled objects: 2147483648 |
|
[state-dump] - num bytes being pulled (all): 0 |
|
[state-dump] - num bytes being pulled / pinned: 0 |
|
[state-dump] - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} |
|
[state-dump] - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} |
|
[state-dump] - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} |
|
[state-dump] - first get request bundle: N/A |
|
[state-dump] - first wait request bundle: N/A |
|
[state-dump] - first task request bundle: N/A |
|
[state-dump] - num objects queued: 0 |
|
[state-dump] - num objects actively pulled (all): 0 |
|
[state-dump] - num objects actively pulled / pinned: 0 |
|
[state-dump] - num bundles being pulled: 0 |
|
[state-dump] - num pull retries: 0 |
|
[state-dump] - max timeout seconds: 0 |
|
[state-dump] - max timeout request is already processed. No entry. |
|
[state-dump] |
|
[state-dump] WorkerPool: |
|
[state-dump] - registered jobs: 1 |
|
[state-dump] - process_failed_job_config_missing: 0 |
|
[state-dump] - process_failed_rate_limited: 0 |
|
[state-dump] - process_failed_pending_registration: 0 |
|
[state-dump] - process_failed_runtime_env_setup_failed: 0 |
|
[state-dump] - num PYTHON workers: 20 |
|
[state-dump] - num PYTHON drivers: 1 |
|
[state-dump] - num PYTHON pending start requests: 0 |
|
[state-dump] - num PYTHON pending registration requests: 0 |
|
[state-dump] - num object spill callbacks queued: 0 |
|
[state-dump] - num object restore queued: 0 |
|
[state-dump] - num util functions queued: 0 |
|
[state-dump] - num idle workers: 20 |
|
[state-dump] TaskDependencyManager: |
|
[state-dump] - task deps map size: 0 |
|
[state-dump] - get req map size: 0 |
|
[state-dump] - wait req map size: 0 |
|
[state-dump] - local objects map size: 0 |
|
[state-dump] WaitManager: |
|
[state-dump] - num active wait requests: 0 |
|
[state-dump] Subscriber: |
|
[state-dump] Channel WORKER_OBJECT_LOCATIONS_CHANNEL |
|
[state-dump] - cumulative subscribe requests: 0 |
|
[state-dump] - cumulative unsubscribe requests: 0 |
|
[state-dump] - active subscribed publishers: 0 |
|
[state-dump] - cumulative published messages: 0 |
|
[state-dump] - cumulative processed messages: 0 |
|
[state-dump] Channel WORKER_REF_REMOVED_CHANNEL |
|
[state-dump] - cumulative subscribe requests: 0 |
|
[state-dump] - cumulative unsubscribe requests: 0 |
|
[state-dump] - active subscribed publishers: 0 |
|
[state-dump] - cumulative published messages: 0 |
|
[state-dump] - cumulative processed messages: 0 |
|
[state-dump] Channel WORKER_OBJECT_EVICTION |
|
[state-dump] - cumulative subscribe requests: 0 |
|
[state-dump] - cumulative unsubscribe requests: 0 |
|
[state-dump] - active subscribed publishers: 0 |
|
[state-dump] - cumulative published messages: 0 |
|
[state-dump] - cumulative processed messages: 0 |
|
[state-dump] num async plasma notifications: 0 |
|
[state-dump] Remote node managers: |
|
[state-dump] Event stats: |
|
[state-dump] Global stats: 16003 total (35 active) |
|
[state-dump] Queueing time: mean = 15.445 ms, max = 122.187 s, min = 67.000 ns, total = 247.167 s |
|
[state-dump] Execution time: mean = 300.787 us, total = 4.813 s |
|
[state-dump] Event stats: |
|
[state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog - 3774 total (0 active), Execution time: mean = 509.198 us, total = 1.922 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s |
|
[state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 3774 total (0 active), Execution time: mean = 34.959 us, total = 131.937 ms, Queueing time: mean = 107.808 us, max = 1.474 ms, min = 4.142 us, total = 406.866 ms |
|
[state-dump] RaySyncer.OnDemandBroadcasting - 1799 total (1 active), Execution time: mean = 10.022 us, total = 18.030 ms, Queueing time: mean = 87.365 us, max = 2.543 ms, min = 8.344 us, total = 157.170 ms |
|
[state-dump] ObjectManager.UpdateAvailableMemory - 1799 total (0 active), Execution time: mean = 5.034 us, total = 9.056 ms, Queueing time: mean = 102.001 us, max = 9.283 ms, min = 4.232 us, total = 183.499 ms |
|
[state-dump] NodeManager.CheckGC - 1799 total (1 active), Execution time: mean = 2.907 us, total = 5.230 ms, Queueing time: mean = 93.569 us, max = 2.553 ms, min = 6.447 us, total = 168.331 ms |
|
[state-dump] RayletWorkerPool.deadline_timer.kill_idle_workers - 900 total (1 active), Execution time: mean = 17.181 us, total = 15.463 ms, Queueing time: mean = 67.441 us, max = 992.162 us, min = 9.895 us, total = 60.697 ms |
|
[state-dump] MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 719 total (1 active), Execution time: mean = 443.574 us, total = 318.929 ms, Queueing time: mean = 73.418 us, max = 3.232 ms, min = 8.760 us, total = 52.787 ms |
|
[state-dump] NodeManager.ScheduleAndDispatchTasks - 180 total (1 active), Execution time: mean = 14.258 us, total = 2.566 ms, Queueing time: mean = 82.925 us, max = 2.272 ms, min = 17.139 us, total = 14.927 ms |
|
[state-dump] NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 180 total (0 active), Execution time: mean = 105.244 us, total = 18.944 ms, Queueing time: mean = 109.004 us, max = 238.952 us, min = 20.940 us, total = 19.621 ms |
|
[state-dump] NodeManager.deadline_timer.spill_objects_when_over_threshold - 180 total (1 active), Execution time: mean = 2.948 us, total = 530.591 us, Queueing time: mean = 159.227 us, max = 1.982 ms, min = 7.617 us, total = 28.661 ms |
|
[state-dump] NodeManager.deadline_timer.flush_free_objects - 180 total (1 active), Execution time: mean = 7.616 us, total = 1.371 ms, Queueing time: mean = 155.993 us, max = 1.983 ms, min = 10.158 us, total = 28.079 ms |
|
[state-dump] NodeManagerService.grpc_server.GetResourceLoad - 180 total (0 active), Execution time: mean = 637.940 us, total = 114.829 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s |
|
[state-dump] ClientConnection.async_read.ProcessMessageHeader - 88 total (21 active), Execution time: mean = 7.299 us, total = 642.290 us, Queueing time: mean = 2.794 s, max = 122.187 s, min = 27.575 us, total = 245.907 s |
|
[state-dump] ClientConnection.async_read.ProcessMessage - 67 total (0 active), Execution time: mean = 900.865 us, total = 60.358 ms, Queueing time: mean = 71.782 us, max = 1.027 ms, min = 2.835 us, total = 4.809 ms |
|
[state-dump] ClusterResourceManager.ResetRemoteNodeView - 61 total (1 active), Execution time: mean = 8.120 us, total = 495.292 us, Queueing time: mean = 76.745 us, max = 253.106 us, min = 21.940 us, total = 4.681 ms |
|
[state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 36 total (0 active), Execution time: mean = 1.286 ms, total = 46.280 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s |
|
[state-dump] NodeManager.GcsCheckAlive - 36 total (1 active), Execution time: mean = 232.208 us, total = 8.359 ms, Queueing time: mean = 552.636 us, max = 1.871 ms, min = 116.533 us, total = 19.895 ms |
|
[state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 36 total (0 active), Execution time: mean = 45.972 us, total = 1.655 ms, Queueing time: mean = 104.957 us, max = 223.007 us, min = 15.566 us, total = 3.778 ms |
|
[state-dump] NodeManager.deadline_timer.record_metrics - 36 total (1 active), Execution time: mean = 515.772 us, total = 18.568 ms, Queueing time: mean = 279.037 us, max = 1.453 ms, min = 9.061 us, total = 10.045 ms |
|
[state-dump] ClientConnection.async_write.DoAsyncWrites - 22 total (0 active), Execution time: mean = 1.317 us, total = 28.982 us, Queueing time: mean = 49.767 us, max = 431.510 us, min = 17.047 us, total = 1.095 ms |
|
[state-dump] ObjectManager.ObjectAdded - 21 total (0 active), Execution time: mean = 11.466 us, total = 240.795 us, Queueing time: mean = 2.139 ms, max = 21.287 ms, min = 13.920 us, total = 44.925 ms |
|
[state-dump] NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 21 total (0 active), Execution time: mean = 154.352 us, total = 3.241 ms, Queueing time: mean = 162.625 us, max = 432.570 us, min = 33.451 us, total = 3.415 ms |
|
[state-dump] ObjectManager.ObjectDeleted - 21 total (0 active), Execution time: mean = 21.296 us, total = 447.224 us, Queueing time: mean = 186.718 us, max = 583.430 us, min = 33.345 us, total = 3.921 ms |
|
[state-dump] NodeManagerService.grpc_server.GetSystemConfig - 21 total (0 active), Execution time: mean = 1.630 ms, total = 34.233 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s |
|
[state-dump] NodeManager.deadline_timer.debug_state_dump - 18 total (1 active), Execution time: mean = 1.496 ms, total = 26.925 ms, Queueing time: mean = 55.004 us, max = 120.559 us, min = 11.928 us, total = 990.075 us |
|
[state-dump] PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 217.267 us, total = 2.824 ms, Queueing time: mean = 2.965 ms, max = 8.958 ms, min = 32.982 us, total = 38.551 ms |
|
[state-dump] RaySyncer.BroadcastMessage - 4 total (0 active), Execution time: mean = 164.504 us, total = 658.017 us, Queueing time: mean = 446.500 ns, max = 672.000 ns, min = 148.000 ns, total = 1.786 us |
|
[state-dump] - 4 total (0 active), Execution time: mean = 955.500 ns, total = 3.822 us, Queueing time: mean = 82.430 us, max = 153.432 us, min = 25.031 us, total = 329.719 us |
|
[state-dump] NodeManager.deadline_timer.print_event_loop_stats - 3 total (1 active, 1 running), Execution time: mean = 1.754 ms, total = 5.262 ms, Queueing time: mean = 26.636 us, max = 61.952 us, min = 17.957 us, total = 79.909 us |
|
[state-dump] RaySyncerRegister - 2 total (0 active), Execution time: mean = 2.083 us, total = 4.165 us, Queueing time: mean = 301.000 ns, max = 535.000 ns, min = 67.000 ns, total = 602.000 ns |
|
[state-dump] NodeManagerService.grpc_server.RequestWorkerLease.HandleRequestImpl - 2 total (0 active), Execution time: mean = 256.966 us, total = 513.933 us, Queueing time: mean = 110.073 us, max = 112.184 us, min = 107.963 us, total = 220.147 us |
|
[state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 1.377 ms, total = 2.754 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s |
|
[state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 2 total (1 active), Execution time: mean = 504.334 ms, total = 1.009 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s |
|
[state-dump] NodeManagerService.grpc_server.ReturnWorker - 2 total (0 active), Execution time: mean = 605.399 us, total = 1.211 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s |
|
[state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 129.555 us, total = 259.110 us, Queueing time: mean = 655.112 us, max = 1.180 ms, min = 129.843 us, total = 1.310 ms |
|
[state-dump] NodeManagerService.grpc_server.ReturnWorker.HandleRequestImpl - 2 total (0 active), Execution time: mean = 97.721 us, total = 195.441 us, Queueing time: mean = 81.200 us, max = 140.746 us, min = 21.653 us, total = 162.399 us |
|
[state-dump] WorkerPool.PopWorkerCallback - 2 total (0 active), Execution time: mean = 44.829 us, total = 89.658 us, Queueing time: mean = 29.422 us, max = 38.510 us, min = 20.335 us, total = 58.845 us |
|
[state-dump] NodeManagerService.grpc_server.RequestWorkerLease - 2 total (0 active), Execution time: mean = 780.366 us, total = 1.561 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s |
|
[state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 2.419 ms, total = 2.419 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s |
|
[state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 1 total (0 active), Execution time: mean = 302.707 us, total = 302.707 us, Queueing time: mean = 113.841 us, max = 113.841 us, min = 113.841 us, total = 113.841 us |
|
[state-dump] Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 96.544 us, total = 96.544 us, Queueing time: mean = 315.750 us, max = 315.750 us, min = 315.750 us, total = 315.750 us |
|
[state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 342.194 us, total = 342.194 us, Queueing time: mean = 163.766 us, max = 163.766 us, min = 163.766 us, total = 163.766 us |
|
[state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 1.019 s, total = 1.019 s, Queueing time: mean = 90.737 us, max = 90.737 us, min = 90.737 us, total = 90.737 us |
|
[state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 1.553 ms, total = 1.553 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s |
|
[state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 1.873 ms, total = 1.873 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s |
|
[state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 29.991 us, total = 29.991 us, Queueing time: mean = 111.550 us, max = 111.550 us, min = 111.550 us, total = 111.550 us |
|
[state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 1.569 ms, total = 1.569 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s |
|
[state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 1.597 ms, total = 1.597 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s |
|
[state-dump] NodeManager.GCTaskFailureReason - 1 total (1 active), Execution time: mean = 0.000 s, total = 0.000 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s |
|
[state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 224.222 us, total = 224.222 us, Queueing time: mean = 119.308 us, max = 119.308 us, min = 119.308 us, total = 119.308 us |
|
[state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 137.575 us, total = 137.575 us, Queueing time: mean = 36.079 us, max = 36.079 us, min = 36.079 us, total = 36.079 us |
|
[state-dump] DebugString() time ms: 1 |
|
[state-dump] |
|
[state-dump] |
|
[2025-01-21 05:51:35,722 I 18747 18775] (raylet) store.cc:564: Plasma store debug dump: |
|
Current usage: 0 / 2.14748 GB |
|
- num bytes created total: 168 |
|
0 pending objects of total size 0MB |
|
- objects spillable: 0 |
|
- bytes spillable: 0 |
|
- objects unsealed: 0 |
|
- bytes unsealed: 0 |
|
- objects in use: 0 |
|
- bytes in use: 0 |
|
- objects evictable: 0 |
|
- bytes evictable: 0 |
|
|
|
- objects created by worker: 0 |
|
- bytes created by worker: 0 |
|
- objects restored: 0 |
|
- bytes restored: 0 |
|
- objects received: 0 |
|
- bytes received: 0 |
|
- objects errored: 0 |
|
- bytes errored: 0 |
|
|
|
[2025-01-21 05:51:36,752 I 18747 18747] (raylet) node_manager.cc:525: [state-dump] NodeManager: |
|
[state-dump] Node ID: 381e636a10e4140b2e9620d2650d6a018da067c3591f2305edfa793d |
|
[state-dump] Node name: 192.168.0.2 |
|
[state-dump] InitialConfigResources: {object_store_memory: 21474836480000, node:192.168.0.2: 10000, node:__internal_head__: 10000, memory: 752056999940000, accelerator_type:A40: 10000, GPU: 20000, CPU: 200000} |
|
[state-dump] ClusterTaskManager: |
|
[state-dump] ========== Node: 381e636a10e4140b2e9620d2650d6a018da067c3591f2305edfa793d ================= |
|
[state-dump] Infeasible queue length: 0 |
|
[state-dump] Schedule queue length: 0 |
|
[state-dump] Dispatch queue length: 0 |
|
[state-dump] num_waiting_for_resource: 0 |
|
[state-dump] num_waiting_for_plasma_memory: 0 |
|
[state-dump] num_waiting_for_remote_node_resources: 0 |
|
[state-dump] num_worker_not_started_by_job_config_not_exist: 0 |
|
[state-dump] num_worker_not_started_by_registration_timeout: 0 |
|
[state-dump] num_tasks_waiting_for_workers: 0 |
|
[state-dump] num_cancelled_tasks: 0 |
|
[state-dump] cluster_resource_scheduler state: |
|
[state-dump] Local id: 688648627895828852 Local resources: {"total":{node:__internal_head__: [10000], accelerator_type:A40: [10000], node:192.168.0.2: [10000], CPU: [200000], memory: [752056999940000], GPU: [10000, 10000], object_store_memory: [21474836480000]}}, "available": {node:__internal_head__: [10000], accelerator_type:A40: [10000], node:192.168.0.2: [10000], CPU: [200000], memory: [752056999940000], GPU: [10000, 10000], object_store_memory: [21474836480000]}}, "labels":{"ray.io/node_id":"381e636a10e4140b2e9620d2650d6a018da067c3591f2305edfa793d",} is_draining: 0 is_idle: 1 Cluster resources: node id: 688648627895828852{"total":{object_store_memory: 21474836480000, node:__internal_head__: 10000, node:192.168.0.2: 10000, accelerator_type:A40: 10000, memory: 752056999940000, GPU: 20000, CPU: 200000}}, "available": {object_store_memory: 21474836480000, node:192.168.0.2: 10000, accelerator_type:A40: 10000, memory: 752056999940000, GPU: 20000, node:__internal_head__: 10000, CPU: 200000}}, "labels":{"ray.io/node_id":"381e636a10e4140b2e9620d2650d6a018da067c3591f2305edfa793d",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} |
|
[state-dump] Waiting tasks size: 0 |
|
[state-dump] Number of executing tasks: 0 |
|
[state-dump] Number of pinned task arguments: 0 |
|
[state-dump] Number of total spilled tasks: 0 |
|
[state-dump] Number of spilled waiting tasks: 0 |
|
[state-dump] Number of spilled unschedulable tasks: 0 |
|
[state-dump] Resource usage { |
|
[state-dump] } |
|
[state-dump] Backlog Size per scheduling descriptor :{workerId: num backlogs}: |
|
[state-dump] |
|
[state-dump] Running tasks by scheduling class: |
|
[state-dump] ================================================== |
|
[state-dump] |
|
[state-dump] ClusterResources: |
|
[state-dump] LocalObjectManager: |
|
[state-dump] - num pinned objects: 0 |
|
[state-dump] - pinned objects size: 0 |
|
[state-dump] - num objects pending restore: 0 |
|
[state-dump] - num objects pending spill: 0 |
|
[state-dump] - num bytes pending spill: 0 |
|
[state-dump] - num bytes currently spilled: 0 |
|
[state-dump] - cumulative spill requests: 0 |
|
[state-dump] - cumulative restore requests: 0 |
|
[state-dump] - spilled objects pending delete: 0 |
|
[state-dump] |
|
[state-dump] ObjectManager: |
|
[state-dump] - num local objects: 0 |
|
[state-dump] - num unfulfilled push requests: 0 |
|
[state-dump] - num object pull requests: 0 |
|
[state-dump] - num chunks received total: 0 |
|
[state-dump] - num chunks received failed (all): 0 |
|
[state-dump] - num chunks received failed / cancelled: 0 |
|
[state-dump] - num chunks received failed / plasma error: 0 |
|
[state-dump] Event stats: |
|
[state-dump] Global stats: 0 total (0 active) |
|
[state-dump] Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s |
|
[state-dump] Execution time: mean = -nan s, total = 0.000 s |
|
[state-dump] Event stats: |
|
[state-dump] PushManager: |
|
[state-dump] - num pushes in flight: 0 |
|
[state-dump] - num chunks in flight: 0 |
|
[state-dump] - num chunks remaining: 0 |
|
[state-dump] - max chunks allowed: 409 |
|
[state-dump] OwnershipBasedObjectDirectory: |
|
[state-dump] - num listeners: 0 |
|
[state-dump] - cumulative location updates: 0 |
|
[state-dump] - num location updates per second: 0.000 |
|
[state-dump] - num location lookups per second: 0.000 |
|
[state-dump] - num locations added per second: 0.000 |
|
[state-dump] - num locations removed per second: 0.000 |
|
[state-dump] BufferPool: |
|
[state-dump] - create buffer state map size: 0 |
|
[state-dump] PullManager: |
|
[state-dump] - num bytes available for pulled objects: 2147483648 |
|
[state-dump] - num bytes being pulled (all): 0 |
|
[state-dump] - num bytes being pulled / pinned: 0 |
|
[state-dump] - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} |
|
[state-dump] - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} |
|
[state-dump] - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} |
|
[state-dump] - first get request bundle: N/A |
|
[state-dump] - first wait request bundle: N/A |
|
[state-dump] - first task request bundle: N/A |
|
[state-dump] - num objects queued: 0 |
|
[state-dump] - num objects actively pulled (all): 0 |
|
[state-dump] - num objects actively pulled / pinned: 0 |
|
[state-dump] - num bundles being pulled: 0 |
|
[state-dump] - num pull retries: 0 |
|
[state-dump] - max timeout seconds: 0 |
|
[state-dump] - max timeout request is already processed. No entry. |
|
[state-dump] |
|
[state-dump] WorkerPool: |
|
[state-dump] - registered jobs: 1 |
|
[state-dump] - process_failed_job_config_missing: 0 |
|
[state-dump] - process_failed_rate_limited: 0 |
|
[state-dump] - process_failed_pending_registration: 0 |
|
[state-dump] - process_failed_runtime_env_setup_failed: 0 |
|
[state-dump] - num PYTHON workers: 20 |
|
[state-dump] - num PYTHON drivers: 1 |
|
[state-dump] - num PYTHON pending start requests: 0 |
|
[state-dump] - num PYTHON pending registration requests: 0 |
|
[state-dump] - num object spill callbacks queued: 0 |
|
[state-dump] - num object restore queued: 0 |
|
[state-dump] - num util functions queued: 0 |
|
[state-dump] - num idle workers: 20 |
|
[state-dump] TaskDependencyManager: |
|
[state-dump] - task deps map size: 0 |
|
[state-dump] - get req map size: 0 |
|
[state-dump] - wait req map size: 0 |
|
[state-dump] - local objects map size: 0 |
|
[state-dump] WaitManager: |
|
[state-dump] - num active wait requests: 0 |
|
[state-dump] Subscriber: |
|
[state-dump] Channel WORKER_OBJECT_LOCATIONS_CHANNEL |
|
[state-dump] - cumulative subscribe requests: 0 |
|
[state-dump] - cumulative unsubscribe requests: 0 |
|
[state-dump] - active subscribed publishers: 0 |
|
[state-dump] - cumulative published messages: 0 |
|
[state-dump] - cumulative processed messages: 0 |
|
[state-dump] Channel WORKER_REF_REMOVED_CHANNEL |
|
[state-dump] - cumulative subscribe requests: 0 |
|
[state-dump] - cumulative unsubscribe requests: 0 |
|
[state-dump] - active subscribed publishers: 0 |
|
[state-dump] - cumulative published messages: 0 |
|
[state-dump] - cumulative processed messages: 0 |
|
[state-dump] Channel WORKER_OBJECT_EVICTION |
|
[state-dump] - cumulative subscribe requests: 0 |
|
[state-dump] - cumulative unsubscribe requests: 0 |
|
[state-dump] - active subscribed publishers: 0 |
|
[state-dump] - cumulative published messages: 0 |
|
[state-dump] - cumulative processed messages: 0 |
|
[state-dump] num async plasma notifications: 0 |
|
[state-dump] Remote node managers: |
|
[state-dump] Event stats: |
|
[state-dump] Global stats: 21251 total (35 active) |
|
[state-dump] Queueing time: mean = 17.059 ms, max = 122.187 s, min = 67.000 ns, total = 362.531 s |
|
[state-dump] Execution time: mean = 252.658 us, total = 5.369 s |
|
[state-dump] Event stats: |
|
[state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog - 5034 total (0 active), Execution time: mean = 451.659 us, total = 2.274 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s |
|
[state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 5034 total (0 active), Execution time: mean = 31.225 us, total = 157.188 ms, Queueing time: mean = 89.117 us, max = 1.474 ms, min = 4.142 us, total = 448.613 ms |
|
[state-dump] RaySyncer.OnDemandBroadcasting - 2399 total (1 active), Execution time: mean = 9.384 us, total = 22.511 ms, Queueing time: mean = 77.282 us, max = 2.543 ms, min = 8.344 us, total = 185.399 ms |
|
[state-dump] ObjectManager.UpdateAvailableMemory - 2399 total (0 active), Execution time: mean = 4.510 us, total = 10.819 ms, Queueing time: mean = 83.652 us, max = 9.283 ms, min = 3.503 us, total = 200.681 ms |
|
[state-dump] NodeManager.CheckGC - 2399 total (1 active), Execution time: mean = 2.842 us, total = 6.818 ms, Queueing time: mean = 82.973 us, max = 2.553 ms, min = 6.447 us, total = 199.053 ms |
|
[state-dump] RayletWorkerPool.deadline_timer.kill_idle_workers - 1200 total (1 active), Execution time: mean = 15.923 us, total = 19.107 ms, Queueing time: mean = 60.371 us, max = 992.162 us, min = 9.895 us, total = 72.445 ms |
|
[state-dump] MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 959 total (1 active), Execution time: mean = 436.845 us, total = 418.934 ms, Queueing time: mean = 66.606 us, max = 3.232 ms, min = 8.760 us, total = 63.875 ms |
|
[state-dump] NodeManager.ScheduleAndDispatchTasks - 240 total (1 active), Execution time: mean = 13.577 us, total = 3.258 ms, Queueing time: mean = 71.556 us, max = 2.272 ms, min = 17.139 us, total = 17.173 ms |
|
[state-dump] NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 240 total (0 active), Execution time: mean = 101.277 us, total = 24.306 ms, Queueing time: mean = 90.706 us, max = 238.952 us, min = 19.563 us, total = 21.769 ms |
|
[state-dump] NodeManager.deadline_timer.spill_objects_when_over_threshold - 240 total (1 active), Execution time: mean = 2.875 us, total = 690.087 us, Queueing time: mean = 158.323 us, max = 1.982 ms, min = 7.617 us, total = 37.998 ms |
|
[state-dump] NodeManager.deadline_timer.flush_free_objects - 240 total (1 active), Execution time: mean = 7.294 us, total = 1.750 ms, Queueing time: mean = 155.267 us, max = 1.983 ms, min = 10.158 us, total = 37.264 ms |
|
[state-dump] NodeManagerService.grpc_server.GetResourceLoad - 240 total (0 active), Execution time: mean = 583.887 us, total = 140.133 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s |
|
[state-dump] ClientConnection.async_read.ProcessMessageHeader - 90 total (21 active), Execution time: mean = 7.335 us, total = 660.188 us, Queueing time: mean = 4.012 s, max = 122.187 s, min = 27.575 us, total = 361.095 s |
|
[state-dump] ClusterResourceManager.ResetRemoteNodeView - 81 total (1 active), Execution time: mean = 7.614 us, total = 616.711 us, Queueing time: mean = 65.714 us, max = 253.106 us, min = 14.828 us, total = 5.323 ms |
|
[state-dump] ClientConnection.async_read.ProcessMessage - 69 total (0 active), Execution time: mean = 875.203 us, total = 60.389 ms, Queueing time: mean = 70.019 us, max = 1.027 ms, min = 2.835 us, total = 4.831 ms |
|
[state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 48 total (0 active), Execution time: mean = 1.208 ms, total = 58.003 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s |
|
[state-dump] NodeManager.GcsCheckAlive - 48 total (1 active), Execution time: mean = 230.108 us, total = 11.045 ms, Queueing time: mean = 558.068 us, max = 1.871 ms, min = 116.533 us, total = 26.787 ms |
|
[state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 48 total (0 active), Execution time: mean = 44.230 us, total = 2.123 ms, Queueing time: mean = 92.538 us, max = 223.007 us, min = 15.566 us, total = 4.442 ms |
|
[state-dump] NodeManager.deadline_timer.record_metrics - 48 total (1 active), Execution time: mean = 509.527 us, total = 24.457 ms, Queueing time: mean = 287.377 us, max = 1.453 ms, min = 9.061 us, total = 13.794 ms |
|
[state-dump] NodeManager.deadline_timer.debug_state_dump - 24 total (1 active), Execution time: mean = 1.523 ms, total = 36.549 ms, Queueing time: mean = 51.584 us, max = 120.559 us, min = 11.928 us, total = 1.238 ms |
|
[state-dump] ClientConnection.async_write.DoAsyncWrites - 22 total (0 active), Execution time: mean = 1.317 us, total = 28.982 us, Queueing time: mean = 49.767 us, max = 431.510 us, min = 17.047 us, total = 1.095 ms |
|
[state-dump] ObjectManager.ObjectAdded - 21 total (0 active), Execution time: mean = 11.466 us, total = 240.795 us, Queueing time: mean = 2.139 ms, max = 21.287 ms, min = 13.920 us, total = 44.925 ms |
|
[state-dump] NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 21 total (0 active), Execution time: mean = 154.352 us, total = 3.241 ms, Queueing time: mean = 162.625 us, max = 432.570 us, min = 33.451 us, total = 3.415 ms |
|
[state-dump] ObjectManager.ObjectDeleted - 21 total (0 active), Execution time: mean = 21.296 us, total = 447.224 us, Queueing time: mean = 186.718 us, max = 583.430 us, min = 33.345 us, total = 3.921 ms |
|
[state-dump] NodeManagerService.grpc_server.GetSystemConfig - 21 total (0 active), Execution time: mean = 1.630 ms, total = 34.233 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s |
|
[state-dump] PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 217.267 us, total = 2.824 ms, Queueing time: mean = 2.965 ms, max = 8.958 ms, min = 32.982 us, total = 38.551 ms |
|
[state-dump] RaySyncer.BroadcastMessage - 6 total (0 active), Execution time: mean = 164.357 us, total = 986.142 us, Queueing time: mean = 501.500 ns, max = 672.000 ns, min = 148.000 ns, total = 3.009 us |
|
[state-dump] - 6 total (0 active), Execution time: mean = 958.667 ns, total = 5.752 us, Queueing time: mean = 71.981 us, max = 153.432 us, min = 25.031 us, total = 431.888 us |
|
[state-dump] NodeManager.deadline_timer.print_event_loop_stats - 4 total (1 active, 1 running), Execution time: mean = 2.018 ms, total = 8.071 ms, Queueing time: mean = 26.289 us, max = 61.952 us, min = 17.957 us, total = 105.157 us |
|
[state-dump] NodeManagerService.grpc_server.RequestWorkerLease.HandleRequestImpl - 3 total (0 active), Execution time: mean = 249.818 us, total = 749.455 us, Queueing time: mean = 85.760 us, max = 112.184 us, min = 37.134 us, total = 257.281 us |
|
[state-dump] NodeManagerService.grpc_server.ReturnWorker - 3 total (0 active), Execution time: mean = 538.722 us, total = 1.616 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s |
|
[state-dump] NodeManagerService.grpc_server.ReturnWorker.HandleRequestImpl - 3 total (0 active), Execution time: mean = 87.640 us, total = 262.921 us, Queueing time: mean = 62.702 us, max = 140.746 us, min = 21.653 us, total = 188.106 us |
|
[state-dump] WorkerPool.PopWorkerCallback - 3 total (0 active), Execution time: mean = 43.430 us, total = 130.291 us, Queueing time: mean = 30.363 us, max = 38.510 us, min = 20.335 us, total = 91.088 us |
|
[state-dump] NodeManagerService.grpc_server.RequestWorkerLease - 3 total (0 active), Execution time: mean = 758.717 us, total = 2.276 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s |
|
[state-dump] RaySyncerRegister - 2 total (0 active), Execution time: mean = 2.083 us, total = 4.165 us, Queueing time: mean = 301.000 ns, max = 535.000 ns, min = 67.000 ns, total = 602.000 ns |
|
[state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 1.377 ms, total = 2.754 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s |
|
[state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 2 total (1 active), Execution time: mean = 504.334 ms, total = 1.009 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s |
|
[state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 129.555 us, total = 259.110 us, Queueing time: mean = 655.112 us, max = 1.180 ms, min = 129.843 us, total = 1.310 ms |
|
[state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 2.419 ms, total = 2.419 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s |
|
[state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 1 total (0 active), Execution time: mean = 302.707 us, total = 302.707 us, Queueing time: mean = 113.841 us, max = 113.841 us, min = 113.841 us, total = 113.841 us |
|
[state-dump] Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 96.544 us, total = 96.544 us, Queueing time: mean = 315.750 us, max = 315.750 us, min = 315.750 us, total = 315.750 us |
|
[state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 342.194 us, total = 342.194 us, Queueing time: mean = 163.766 us, max = 163.766 us, min = 163.766 us, total = 163.766 us |
|
[state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 1.019 s, total = 1.019 s, Queueing time: mean = 90.737 us, max = 90.737 us, min = 90.737 us, total = 90.737 us |
|
[state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 1.553 ms, total = 1.553 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s |
|
[state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 1.873 ms, total = 1.873 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s |
|
[state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 29.991 us, total = 29.991 us, Queueing time: mean = 111.550 us, max = 111.550 us, min = 111.550 us, total = 111.550 us |
|
[state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 1.569 ms, total = 1.569 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s |
|
[state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 1.597 ms, total = 1.597 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s |
|
[state-dump] NodeManager.GCTaskFailureReason - 1 total (1 active), Execution time: mean = 0.000 s, total = 0.000 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s |
|
[state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 224.222 us, total = 224.222 us, Queueing time: mean = 119.308 us, max = 119.308 us, min = 119.308 us, total = 119.308 us |
|
[state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 137.575 us, total = 137.575 us, Queueing time: mean = 36.079 us, max = 36.079 us, min = 36.079 us, total = 36.079 us |
|
[state-dump] DebugString() time ms: 1 |
|
[state-dump] |
|
[state-dump] |
|
[2025-01-21 05:52:35,722 I 18747 18775] (raylet) store.cc:564: Plasma store debug dump: |
|
Current usage: 0 / 2.14748 GB |
|
- num bytes created total: 168 |
|
0 pending objects of total size 0MB |
|
- objects spillable: 0 |
|
- bytes spillable: 0 |
|
- objects unsealed: 0 |
|
- bytes unsealed: 0 |
|
- objects in use: 0 |
|
- bytes in use: 0 |
|
- objects evictable: 0 |
|
- bytes evictable: 0 |
|
|
|
- objects created by worker: 0 |
|
- bytes created by worker: 0 |
|
- objects restored: 0 |
|
- bytes restored: 0 |
|
- objects received: 0 |
|
- bytes received: 0 |
|
- objects errored: 0 |
|
- bytes errored: 0 |
|
|
|
[2025-01-21 05:52:36,755 I 18747 18747] (raylet) node_manager.cc:525: [state-dump] NodeManager: |
|
[state-dump] Node ID: 381e636a10e4140b2e9620d2650d6a018da067c3591f2305edfa793d |
|
[state-dump] Node name: 192.168.0.2 |
|
[state-dump] InitialConfigResources: {object_store_memory: 21474836480000, node:192.168.0.2: 10000, node:__internal_head__: 10000, memory: 752056999940000, accelerator_type:A40: 10000, GPU: 20000, CPU: 200000} |
|
[state-dump] ClusterTaskManager: |
|
[state-dump] ========== Node: 381e636a10e4140b2e9620d2650d6a018da067c3591f2305edfa793d ================= |
|
[state-dump] Infeasible queue length: 0 |
|
[state-dump] Schedule queue length: 0 |
|
[state-dump] Dispatch queue length: 0 |
|
[state-dump] num_waiting_for_resource: 0 |
|
[state-dump] num_waiting_for_plasma_memory: 0 |
|
[state-dump] num_waiting_for_remote_node_resources: 0 |
|
[state-dump] num_worker_not_started_by_job_config_not_exist: 0 |
|
[state-dump] num_worker_not_started_by_registration_timeout: 0 |
|
[state-dump] num_tasks_waiting_for_workers: 0 |
|
[state-dump] num_cancelled_tasks: 0 |
|
[state-dump] cluster_resource_scheduler state: |
|
[state-dump] Local id: 688648627895828852 Local resources: {"total":{node:__internal_head__: [10000], accelerator_type:A40: [10000], node:192.168.0.2: [10000], CPU: [200000], memory: [752056999940000], GPU: [10000, 10000], object_store_memory: [21474836480000]}}, "available": {node:__internal_head__: [10000], accelerator_type:A40: [10000], node:192.168.0.2: [10000], CPU: [200000], memory: [752056999940000], GPU: [10000, 10000], object_store_memory: [21474836480000]}}, "labels":{"ray.io/node_id":"381e636a10e4140b2e9620d2650d6a018da067c3591f2305edfa793d",} is_draining: 0 is_idle: 1 Cluster resources: node id: 688648627895828852{"total":{object_store_memory: 21474836480000, node:__internal_head__: 10000, node:192.168.0.2: 10000, accelerator_type:A40: 10000, memory: 752056999940000, GPU: 20000, CPU: 200000}}, "available": {object_store_memory: 21474836480000, node:192.168.0.2: 10000, accelerator_type:A40: 10000, memory: 752056999940000, GPU: 20000, node:__internal_head__: 10000, CPU: 200000}}, "labels":{"ray.io/node_id":"381e636a10e4140b2e9620d2650d6a018da067c3591f2305edfa793d",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} |
|
[state-dump] Waiting tasks size: 0 |
|
[state-dump] Number of executing tasks: 0 |
|
[state-dump] Number of pinned task arguments: 0 |
|
[state-dump] Number of total spilled tasks: 0 |
|
[state-dump] Number of spilled waiting tasks: 0 |
|
[state-dump] Number of spilled unschedulable tasks: 0 |
|
[state-dump] Resource usage { |
|
[state-dump] } |
|
[state-dump] Backlog Size per scheduling descriptor :{workerId: num backlogs}: |
|
[state-dump] |
|
[state-dump] Running tasks by scheduling class: |
|
[state-dump] ================================================== |
|
[state-dump] |
|
[state-dump] ClusterResources: |
|
[state-dump] LocalObjectManager: |
|
[state-dump] - num pinned objects: 0 |
|
[state-dump] - pinned objects size: 0 |
|
[state-dump] - num objects pending restore: 0 |
|
[state-dump] - num objects pending spill: 0 |
|
[state-dump] - num bytes pending spill: 0 |
|
[state-dump] - num bytes currently spilled: 0 |
|
[state-dump] - cumulative spill requests: 0 |
|
[state-dump] - cumulative restore requests: 0 |
|
[state-dump] - spilled objects pending delete: 0 |
|
[state-dump] |
|
[state-dump] ObjectManager: |
|
[state-dump] - num local objects: 0 |
|
[state-dump] - num unfulfilled push requests: 0 |
|
[state-dump] - num object pull requests: 0 |
|
[state-dump] - num chunks received total: 0 |
|
[state-dump] - num chunks received failed (all): 0 |
|
[state-dump] - num chunks received failed / cancelled: 0 |
|
[state-dump] - num chunks received failed / plasma error: 0 |
|
[state-dump] Event stats: |
|
[state-dump] Global stats: 0 total (0 active) |
|
[state-dump] Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s |
|
[state-dump] Execution time: mean = -nan s, total = 0.000 s |
|
[state-dump] Event stats: |
|
[state-dump] PushManager: |
|
[state-dump] - num pushes in flight: 0 |
|
[state-dump] - num chunks in flight: 0 |
|
[state-dump] - num chunks remaining: 0 |
|
[state-dump] - max chunks allowed: 409 |
|
[state-dump] OwnershipBasedObjectDirectory: |
|
[state-dump] - num listeners: 0 |
|
[state-dump] - cumulative location updates: 0 |
|
[state-dump] - num location updates per second: 0.000 |
|
[state-dump] - num location lookups per second: 0.000 |
|
[state-dump] - num locations added per second: 0.000 |
|
[state-dump] - num locations removed per second: 0.000 |
|
[state-dump] BufferPool: |
|
[state-dump] - create buffer state map size: 0 |
|
[state-dump] PullManager: |
|
[state-dump] - num bytes available for pulled objects: 2147483648 |
|
[state-dump] - num bytes being pulled (all): 0 |
|
[state-dump] - num bytes being pulled / pinned: 0 |
|
[state-dump] - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} |
|
[state-dump] - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} |
|
[state-dump] - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} |
|
[state-dump] - first get request bundle: N/A |
|
[state-dump] - first wait request bundle: N/A |
|
[state-dump] - first task request bundle: N/A |
|
[state-dump] - num objects queued: 0 |
|
[state-dump] - num objects actively pulled (all): 0 |
|
[state-dump] - num objects actively pulled / pinned: 0 |
|
[state-dump] - num bundles being pulled: 0 |
|
[state-dump] - num pull retries: 0 |
|
[state-dump] - max timeout seconds: 0 |
|
[state-dump] - max timeout request is already processed. No entry. |
|
[state-dump] |
|
[state-dump] WorkerPool: |
|
[state-dump] - registered jobs: 1 |
|
[state-dump] - process_failed_job_config_missing: 0 |
|
[state-dump] - process_failed_rate_limited: 0 |
|
[state-dump] - process_failed_pending_registration: 0 |
|
[state-dump] - process_failed_runtime_env_setup_failed: 0 |
|
[state-dump] - num PYTHON workers: 20 |
|
[state-dump] - num PYTHON drivers: 1 |
|
[state-dump] - num PYTHON pending start requests: 0 |
|
[state-dump] - num PYTHON pending registration requests: 0 |
|
[state-dump] - num object spill callbacks queued: 0 |
|
[state-dump] - num object restore queued: 0 |
|
[state-dump] - num util functions queued: 0 |
|
[state-dump] - num idle workers: 20 |
|
[state-dump] TaskDependencyManager: |
|
[state-dump] - task deps map size: 0 |
|
[state-dump] - get req map size: 0 |
|
[state-dump] - wait req map size: 0 |
|
[state-dump] - local objects map size: 0 |
|
[state-dump] WaitManager: |
|
[state-dump] - num active wait requests: 0 |
|
[state-dump] Subscriber: |
|
[state-dump] Channel WORKER_OBJECT_LOCATIONS_CHANNEL |
|
[state-dump] - cumulative subscribe requests: 0 |
|
[state-dump] - cumulative unsubscribe requests: 0 |
|
[state-dump] - active subscribed publishers: 0 |
|
[state-dump] - cumulative published messages: 0 |
|
[state-dump] - cumulative processed messages: 0 |
|
[state-dump] Channel WORKER_REF_REMOVED_CHANNEL |
|
[state-dump] - cumulative subscribe requests: 0 |
|
[state-dump] - cumulative unsubscribe requests: 0 |
|
[state-dump] - active subscribed publishers: 0 |
|
[state-dump] - cumulative published messages: 0 |
|
[state-dump] - cumulative processed messages: 0 |
|
[state-dump] Channel WORKER_OBJECT_EVICTION |
|
[state-dump] - cumulative subscribe requests: 0 |
|
[state-dump] - cumulative unsubscribe requests: 0 |
|
[state-dump] - active subscribed publishers: 0 |
|
[state-dump] - cumulative published messages: 0 |
|
[state-dump] - cumulative processed messages: 0 |
|
[state-dump] num async plasma notifications: 0 |
|
[state-dump] Remote node managers: |
|
[state-dump] Event stats: |
|
[state-dump] Global stats: 26476 total (35 active) |
|
[state-dump] Queueing time: mean = 13.707 ms, max = 122.187 s, min = 67.000 ns, total = 362.919 s |
|
[state-dump] Execution time: mean = 237.229 us, total = 6.281 s |
|
[state-dump] Event stats: |
|
[state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog - 6291 total (0 active), Execution time: mean = 466.153 us, total = 2.933 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s |
|
[state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 6291 total (0 active), Execution time: mean = 32.918 us, total = 207.086 ms, Queueing time: mean = 93.420 us, max = 2.139 ms, min = 4.142 us, total = 587.708 ms |
|
[state-dump] RaySyncer.OnDemandBroadcasting - 2998 total (1 active), Execution time: mean = 9.180 us, total = 27.523 ms, Queueing time: mean = 77.559 us, max = 2.543 ms, min = 8.344 us, total = 232.523 ms |
|
[state-dump] ObjectManager.UpdateAvailableMemory - 2998 total (0 active), Execution time: mean = 4.646 us, total = 13.930 ms, Queueing time: mean = 87.895 us, max = 9.283 ms, min = 3.503 us, total = 263.508 ms |
|
[state-dump] NodeManager.CheckGC - 2998 total (1 active), Execution time: mean = 2.810 us, total = 8.425 ms, Queueing time: mean = 83.080 us, max = 2.553 ms, min = 6.447 us, total = 249.075 ms |
|
[state-dump] RayletWorkerPool.deadline_timer.kill_idle_workers - 1500 total (1 active), Execution time: mean = 15.731 us, total = 23.596 ms, Queueing time: mean = 61.784 us, max = 992.162 us, min = 9.895 us, total = 92.676 ms |
|
[state-dump] MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 1198 total (1 active), Execution time: mean = 433.681 us, total = 519.550 ms, Queueing time: mean = 67.675 us, max = 3.232 ms, min = 8.760 us, total = 81.075 ms |
|
[state-dump] NodeManager.ScheduleAndDispatchTasks - 300 total (1 active), Execution time: mean = 13.671 us, total = 4.101 ms, Queueing time: mean = 73.120 us, max = 2.272 ms, min = 17.139 us, total = 21.936 ms |
|
[state-dump] NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 300 total (0 active), Execution time: mean = 102.318 us, total = 30.696 ms, Queueing time: mean = 94.335 us, max = 238.952 us, min = 19.563 us, total = 28.301 ms |
|
[state-dump] NodeManager.deadline_timer.spill_objects_when_over_threshold - 300 total (1 active), Execution time: mean = 2.945 us, total = 883.350 us, Queueing time: mean = 165.475 us, max = 2.205 ms, min = 7.617 us, total = 49.642 ms |
|
[state-dump] NodeManager.deadline_timer.flush_free_objects - 300 total (1 active), Execution time: mean = 7.519 us, total = 2.256 ms, Queueing time: mean = 162.272 us, max = 2.209 ms, min = 10.158 us, total = 48.682 ms |
|
[state-dump] NodeManagerService.grpc_server.GetResourceLoad - 300 total (0 active), Execution time: mean = 595.079 us, total = 178.524 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s |
|
[state-dump] ClusterResourceManager.ResetRemoteNodeView - 101 total (1 active), Execution time: mean = 7.740 us, total = 781.735 us, Queueing time: mean = 72.301 us, max = 253.106 us, min = 14.828 us, total = 7.302 ms |
|
[state-dump] ClientConnection.async_read.ProcessMessageHeader - 90 total (21 active), Execution time: mean = 7.335 us, total = 660.188 us, Queueing time: mean = 4.012 s, max = 122.187 s, min = 27.575 us, total = 361.095 s |
|
[state-dump] ClientConnection.async_read.ProcessMessage - 69 total (0 active), Execution time: mean = 875.203 us, total = 60.389 ms, Queueing time: mean = 70.019 us, max = 1.027 ms, min = 2.835 us, total = 4.831 ms |
|
[state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 60 total (0 active), Execution time: mean = 1.255 ms, total = 75.274 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s |
|
[state-dump] NodeManager.GcsCheckAlive - 60 total (1 active), Execution time: mean = 239.348 us, total = 14.361 ms, Queueing time: mean = 585.093 us, max = 2.274 ms, min = 116.533 us, total = 35.106 ms |
|
[state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 60 total (0 active), Execution time: mean = 45.816 us, total = 2.749 ms, Queueing time: mean = 89.352 us, max = 223.007 us, min = 15.566 us, total = 5.361 ms |
|
[state-dump] NodeManager.deadline_timer.record_metrics - 60 total (1 active), Execution time: mean = 518.482 us, total = 31.109 ms, Queueing time: mean = 312.303 us, max = 1.700 ms, min = 9.061 us, total = 18.738 ms |
|
[state-dump] NodeManager.deadline_timer.debug_state_dump - 30 total (1 active), Execution time: mean = 1.589 ms, total = 47.675 ms, Queueing time: mean = 50.456 us, max = 120.559 us, min = 11.928 us, total = 1.514 ms |
|
[state-dump] ClientConnection.async_write.DoAsyncWrites - 22 total (0 active), Execution time: mean = 1.317 us, total = 28.982 us, Queueing time: mean = 49.767 us, max = 431.510 us, min = 17.047 us, total = 1.095 ms |
|
[state-dump] ObjectManager.ObjectAdded - 21 total (0 active), Execution time: mean = 11.466 us, total = 240.795 us, Queueing time: mean = 2.139 ms, max = 21.287 ms, min = 13.920 us, total = 44.925 ms |
|
[state-dump] NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 21 total (0 active), Execution time: mean = 154.352 us, total = 3.241 ms, Queueing time: mean = 162.625 us, max = 432.570 us, min = 33.451 us, total = 3.415 ms |
|
[state-dump] ObjectManager.ObjectDeleted - 21 total (0 active), Execution time: mean = 21.296 us, total = 447.224 us, Queueing time: mean = 186.718 us, max = 583.430 us, min = 33.345 us, total = 3.921 ms |
|
[state-dump] NodeManagerService.grpc_server.GetSystemConfig - 21 total (0 active), Execution time: mean = 1.630 ms, total = 34.233 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s |
|
[state-dump] PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 217.267 us, total = 2.824 ms, Queueing time: mean = 2.965 ms, max = 8.958 ms, min = 32.982 us, total = 38.551 ms |
|
[state-dump] RaySyncer.BroadcastMessage - 6 total (0 active), Execution time: mean = 164.357 us, total = 986.142 us, Queueing time: mean = 501.500 ns, max = 672.000 ns, min = 148.000 ns, total = 3.009 us |
|
[state-dump] - 6 total (0 active), Execution time: mean = 958.667 ns, total = 5.752 us, Queueing time: mean = 71.981 us, max = 153.432 us, min = 25.031 us, total = 431.888 us |
|
[state-dump] NodeManager.deadline_timer.print_event_loop_stats - 5 total (1 active, 1 running), Execution time: mean = 2.118 ms, total = 10.588 ms, Queueing time: mean = 26.730 us, max = 61.952 us, min = 17.957 us, total = 133.649 us |
|
[state-dump] NodeManagerService.grpc_server.RequestWorkerLease.HandleRequestImpl - 3 total (0 active), Execution time: mean = 249.818 us, total = 749.455 us, Queueing time: mean = 85.760 us, max = 112.184 us, min = 37.134 us, total = 257.281 us |
|
[state-dump] NodeManagerService.grpc_server.ReturnWorker - 3 total (0 active), Execution time: mean = 538.722 us, total = 1.616 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s |
|
[state-dump] NodeManagerService.grpc_server.ReturnWorker.HandleRequestImpl - 3 total (0 active), Execution time: mean = 87.640 us, total = 262.921 us, Queueing time: mean = 62.702 us, max = 140.746 us, min = 21.653 us, total = 188.106 us |
|
[state-dump] WorkerPool.PopWorkerCallback - 3 total (0 active), Execution time: mean = 43.430 us, total = 130.291 us, Queueing time: mean = 30.363 us, max = 38.510 us, min = 20.335 us, total = 91.088 us |
|
[state-dump] NodeManagerService.grpc_server.RequestWorkerLease - 3 total (0 active), Execution time: mean = 758.717 us, total = 2.276 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s |
|
[state-dump] RaySyncerRegister - 2 total (0 active), Execution time: mean = 2.083 us, total = 4.165 us, Queueing time: mean = 301.000 ns, max = 535.000 ns, min = 67.000 ns, total = 602.000 ns |
|
[state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 1.377 ms, total = 2.754 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s |
|
[state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 2 total (1 active), Execution time: mean = 504.334 ms, total = 1.009 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s |
|
[state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 129.555 us, total = 259.110 us, Queueing time: mean = 655.112 us, max = 1.180 ms, min = 129.843 us, total = 1.310 ms |
|
[state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 2.419 ms, total = 2.419 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s |
|
[state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 1 total (0 active), Execution time: mean = 302.707 us, total = 302.707 us, Queueing time: mean = 113.841 us, max = 113.841 us, min = 113.841 us, total = 113.841 us |
|
[state-dump] Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 96.544 us, total = 96.544 us, Queueing time: mean = 315.750 us, max = 315.750 us, min = 315.750 us, total = 315.750 us |
|
[state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 342.194 us, total = 342.194 us, Queueing time: mean = 163.766 us, max = 163.766 us, min = 163.766 us, total = 163.766 us |
|
[state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 1.019 s, total = 1.019 s, Queueing time: mean = 90.737 us, max = 90.737 us, min = 90.737 us, total = 90.737 us |
|
[state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 1.553 ms, total = 1.553 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s |
|
[state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 1.873 ms, total = 1.873 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s |
|
[state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 29.991 us, total = 29.991 us, Queueing time: mean = 111.550 us, max = 111.550 us, min = 111.550 us, total = 111.550 us |
|
[state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 1.569 ms, total = 1.569 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s |
|
[state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 1.597 ms, total = 1.597 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s |
|
[state-dump] NodeManager.GCTaskFailureReason - 1 total (1 active), Execution time: mean = 0.000 s, total = 0.000 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s |
|
[state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 224.222 us, total = 224.222 us, Queueing time: mean = 119.308 us, max = 119.308 us, min = 119.308 us, total = 119.308 us |
|
[state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 137.575 us, total = 137.575 us, Queueing time: mean = 36.079 us, max = 36.079 us, min = 36.079 us, total = 36.079 us |
|
[state-dump] DebugString() time ms: 2 |
|
[state-dump] |
|
[state-dump] |
|
[2025-01-21 05:53:35,723 I 18747 18775] (raylet) store.cc:564: Plasma store debug dump: |
|
Current usage: 0 / 2.14748 GB |
|
- num bytes created total: 168 |
|
0 pending objects of total size 0MB |
|
- objects spillable: 0 |
|
- bytes spillable: 0 |
|
- objects unsealed: 0 |
|
- bytes unsealed: 0 |
|
- objects in use: 0 |
|
- bytes in use: 0 |
|
- objects evictable: 0 |
|
- bytes evictable: 0 |
|
|
|
- objects created by worker: 0 |
|
- bytes created by worker: 0 |
|
- objects restored: 0 |
|
- bytes restored: 0 |
|
- objects received: 0 |
|
- bytes received: 0 |
|
- objects errored: 0 |
|
- bytes errored: 0 |
|
|
|
[2025-01-21 05:53:36,758 I 18747 18747] (raylet) node_manager.cc:525: [state-dump] NodeManager: |
|
[state-dump] Node ID: 381e636a10e4140b2e9620d2650d6a018da067c3591f2305edfa793d |
|
[state-dump] Node name: 192.168.0.2 |
|
[state-dump] InitialConfigResources: {object_store_memory: 21474836480000, node:192.168.0.2: 10000, node:__internal_head__: 10000, memory: 752056999940000, accelerator_type:A40: 10000, GPU: 20000, CPU: 200000} |
|
[state-dump] ClusterTaskManager: |
|
[state-dump] ========== Node: 381e636a10e4140b2e9620d2650d6a018da067c3591f2305edfa793d ================= |
|
[state-dump] Infeasible queue length: 0 |
|
[state-dump] Schedule queue length: 0 |
|
[state-dump] Dispatch queue length: 0 |
|
[state-dump] num_waiting_for_resource: 0 |
|
[state-dump] num_waiting_for_plasma_memory: 0 |
|
[state-dump] num_waiting_for_remote_node_resources: 0 |
|
[state-dump] num_worker_not_started_by_job_config_not_exist: 0 |
|
[state-dump] num_worker_not_started_by_registration_timeout: 0 |
|
[state-dump] num_tasks_waiting_for_workers: 0 |
|
[state-dump] num_cancelled_tasks: 0 |
|
[state-dump] cluster_resource_scheduler state: |
|
[state-dump] Local id: 688648627895828852 Local resources: {"total":{node:__internal_head__: [10000], accelerator_type:A40: [10000], node:192.168.0.2: [10000], CPU: [200000], memory: [752056999940000], GPU: [10000, 10000], object_store_memory: [21474836480000]}}, "available": {node:__internal_head__: [10000], accelerator_type:A40: [10000], node:192.168.0.2: [10000], CPU: [200000], memory: [752056999940000], GPU: [10000, 10000], object_store_memory: [21474836480000]}}, "labels":{"ray.io/node_id":"381e636a10e4140b2e9620d2650d6a018da067c3591f2305edfa793d",} is_draining: 0 is_idle: 1 Cluster resources: node id: 688648627895828852{"total":{object_store_memory: 21474836480000, node:__internal_head__: 10000, node:192.168.0.2: 10000, memory: 752056999940000, accelerator_type:A40: 10000, GPU: 20000, CPU: 200000}}, "available": {memory: 752056999940000, node:192.168.0.2: 10000, node:__internal_head__: 10000, accelerator_type:A40: 10000, CPU: 200000, GPU: 20000, object_store_memory: 21474836480000}}, "labels":{"ray.io/node_id":"381e636a10e4140b2e9620d2650d6a018da067c3591f2305edfa793d",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} |
|
[state-dump] Waiting tasks size: 0 |
|
[state-dump] Number of executing tasks: 0 |
|
[state-dump] Number of pinned task arguments: 0 |
|
[state-dump] Number of total spilled tasks: 0 |
|
[state-dump] Number of spilled waiting tasks: 0 |
|
[state-dump] Number of spilled unschedulable tasks: 0 |
|
[state-dump] Resource usage { |
|
[state-dump] } |
|
[state-dump] Backlog Size per scheduling descriptor :{workerId: num backlogs}: |
|
[state-dump] |
|
[state-dump] Running tasks by scheduling class: |
|
[state-dump] ================================================== |
|
[state-dump] |
|
[state-dump] ClusterResources: |
|
[state-dump] LocalObjectManager: |
|
[state-dump] - num pinned objects: 0 |
|
[state-dump] - pinned objects size: 0 |
|
[state-dump] - num objects pending restore: 0 |
|
[state-dump] - num objects pending spill: 0 |
|
[state-dump] - num bytes pending spill: 0 |
|
[state-dump] - num bytes currently spilled: 0 |
|
[state-dump] - cumulative spill requests: 0 |
|
[state-dump] - cumulative restore requests: 0 |
|
[state-dump] - spilled objects pending delete: 0 |
|
[state-dump] |
|
[state-dump] ObjectManager: |
|
[state-dump] - num local objects: 0 |
|
[state-dump] - num unfulfilled push requests: 0 |
|
[state-dump] - num object pull requests: 0 |
|
[state-dump] - num chunks received total: 0 |
|
[state-dump] - num chunks received failed (all): 0 |
|
[state-dump] - num chunks received failed / cancelled: 0 |
|
[state-dump] - num chunks received failed / plasma error: 0 |
|
[state-dump] Event stats: |
|
[state-dump] Global stats: 0 total (0 active) |
|
[state-dump] Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s |
|
[state-dump] Execution time: mean = -nan s, total = 0.000 s |
|
[state-dump] Event stats: |
|
[state-dump] PushManager: |
|
[state-dump] - num pushes in flight: 0 |
|
[state-dump] - num chunks in flight: 0 |
|
[state-dump] - num chunks remaining: 0 |
|
[state-dump] - max chunks allowed: 409 |
|
[state-dump] OwnershipBasedObjectDirectory: |
|
[state-dump] - num listeners: 0 |
|
[state-dump] - cumulative location updates: 0 |
|
[state-dump] - num location updates per second: 0.000 |
|
[state-dump] - num location lookups per second: 0.000 |
|
[state-dump] - num locations added per second: 0.000 |
|
[state-dump] - num locations removed per second: 0.000 |
|
[state-dump] BufferPool: |
|
[state-dump] - create buffer state map size: 0 |
|
[state-dump] PullManager: |
|
[state-dump] - num bytes available for pulled objects: 2147483648 |
|
[state-dump] - num bytes being pulled (all): 0 |
|
[state-dump] - num bytes being pulled / pinned: 0 |
|
[state-dump] - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} |
|
[state-dump] - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} |
|
[state-dump] - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} |
|
[state-dump] - first get request bundle: N/A |
|
[state-dump] - first wait request bundle: N/A |
|
[state-dump] - first task request bundle: N/A |
|
[state-dump] - num objects queued: 0 |
|
[state-dump] - num objects actively pulled (all): 0 |
|
[state-dump] - num objects actively pulled / pinned: 0 |
|
[state-dump] - num bundles being pulled: 0 |
|
[state-dump] - num pull retries: 0 |
|
[state-dump] - max timeout seconds: 0 |
|
[state-dump] - max timeout request is already processed. No entry. |
|
[state-dump] |
|
[state-dump] WorkerPool: |
|
[state-dump] - registered jobs: 1 |
|
[state-dump] - process_failed_job_config_missing: 0 |
|
[state-dump] - process_failed_rate_limited: 0 |
|
[state-dump] - process_failed_pending_registration: 0 |
|
[state-dump] - process_failed_runtime_env_setup_failed: 0 |
|
[state-dump] - num PYTHON workers: 20 |
|
[state-dump] - num PYTHON drivers: 1 |
|
[state-dump] - num PYTHON pending start requests: 0 |
|
[state-dump] - num PYTHON pending registration requests: 0 |
|
[state-dump] - num object spill callbacks queued: 0 |
|
[state-dump] - num object restore queued: 0 |
|
[state-dump] - num util functions queued: 0 |
|
[state-dump] - num idle workers: 20 |
|
[state-dump] TaskDependencyManager: |
|
[state-dump] - task deps map size: 0 |
|
[state-dump] - get req map size: 0 |
|
[state-dump] - wait req map size: 0 |
|
[state-dump] - local objects map size: 0 |
|
[state-dump] WaitManager: |
|
[state-dump] - num active wait requests: 0 |
|
[state-dump] Subscriber: |
|
[state-dump] Channel WORKER_OBJECT_LOCATIONS_CHANNEL |
|
[state-dump] - cumulative subscribe requests: 0 |
|
[state-dump] - cumulative unsubscribe requests: 0 |
|
[state-dump] - active subscribed publishers: 0 |
|
[state-dump] - cumulative published messages: 0 |
|
[state-dump] - cumulative processed messages: 0 |
|
[state-dump] Channel WORKER_REF_REMOVED_CHANNEL |
|
[state-dump] - cumulative subscribe requests: 0 |
|
[state-dump] - cumulative unsubscribe requests: 0 |
|
[state-dump] - active subscribed publishers: 0 |
|
[state-dump] - cumulative published messages: 0 |
|
[state-dump] - cumulative processed messages: 0 |
|
[state-dump] Channel WORKER_OBJECT_EVICTION |
|
[state-dump] - cumulative subscribe requests: 0 |
|
[state-dump] - cumulative unsubscribe requests: 0 |
|
[state-dump] - active subscribed publishers: 0 |
|
[state-dump] - cumulative published messages: 0 |
|
[state-dump] - cumulative processed messages: 0 |
|
[state-dump] num async plasma notifications: 0 |
|
[state-dump] Remote node managers: |
|
[state-dump] Event stats: |
|
[state-dump] Global stats: 31721 total (35 active) |
|
[state-dump] Queueing time: mean = 19.720 ms, max = 131.865 s, min = 67.000 ns, total = 625.538 s |
|
[state-dump] Execution time: mean = 228.713 us, total = 7.255 s |
|
[state-dump] Event stats: |
|
[state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog - 7551 total (0 active), Execution time: mean = 482.191 us, total = 3.641 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s |
|
[state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 7551 total (0 active), Execution time: mean = 34.518 us, total = 260.648 ms, Queueing time: mean = 97.348 us, max = 2.139 ms, min = 4.142 us, total = 735.076 ms |
|
[state-dump] RaySyncer.OnDemandBroadcasting - 3597 total (1 active), Execution time: mean = 9.363 us, total = 33.677 ms, Queueing time: mean = 79.008 us, max = 2.543 ms, min = 8.344 us, total = 284.192 ms |
|
[state-dump] ObjectManager.UpdateAvailableMemory - 3597 total (0 active), Execution time: mean = 4.823 us, total = 17.348 ms, Queueing time: mean = 92.894 us, max = 9.283 ms, min = 3.503 us, total = 334.139 ms |
|
[state-dump] NodeManager.CheckGC - 3597 total (1 active), Execution time: mean = 2.822 us, total = 10.152 ms, Queueing time: mean = 84.679 us, max = 2.553 ms, min = 6.447 us, total = 304.592 ms |
|
[state-dump] RayletWorkerPool.deadline_timer.kill_idle_workers - 1800 total (1 active), Execution time: mean = 15.806 us, total = 28.452 ms, Queueing time: mean = 63.692 us, max = 992.162 us, min = 9.895 us, total = 114.646 ms |
|
[state-dump] MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 1438 total (1 active), Execution time: mean = 434.757 us, total = 625.180 ms, Queueing time: mean = 68.548 us, max = 3.232 ms, min = 8.760 us, total = 98.572 ms |
|
[state-dump] NodeManager.ScheduleAndDispatchTasks - 360 total (1 active), Execution time: mean = 13.945 us, total = 5.020 ms, Queueing time: mean = 75.772 us, max = 2.272 ms, min = 17.139 us, total = 27.278 ms |
|
[state-dump] NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 360 total (0 active), Execution time: mean = 103.185 us, total = 37.147 ms, Queueing time: mean = 97.448 us, max = 238.952 us, min = 19.563 us, total = 35.081 ms |
|
[state-dump] NodeManager.deadline_timer.spill_objects_when_over_threshold - 360 total (1 active), Execution time: mean = 2.989 us, total = 1.076 ms, Queueing time: mean = 167.568 us, max = 2.205 ms, min = 7.617 us, total = 60.324 ms |
|
[state-dump] NodeManager.deadline_timer.flush_free_objects - 360 total (1 active), Execution time: mean = 7.681 us, total = 2.765 ms, Queueing time: mean = 164.296 us, max = 2.209 ms, min = 10.158 us, total = 59.147 ms |
|
[state-dump] NodeManagerService.grpc_server.GetResourceLoad - 360 total (0 active), Execution time: mean = 606.817 us, total = 218.454 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s |
|
[state-dump] ClusterResourceManager.ResetRemoteNodeView - 121 total (1 active), Execution time: mean = 7.741 us, total = 936.655 us, Queueing time: mean = 72.313 us, max = 253.106 us, min = 14.828 us, total = 8.750 ms |
|
[state-dump] ClientConnection.async_read.ProcessMessageHeader - 92 total (21 active), Execution time: mean = 7.533 us, total = 693.037 us, Queueing time: mean = 6.775 s, max = 131.865 s, min = 27.575 us, total = 623.301 s |
|
[state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 72 total (0 active), Execution time: mean = 1.276 ms, total = 91.881 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s |
|
[state-dump] NodeManager.GcsCheckAlive - 72 total (1 active), Execution time: mean = 245.057 us, total = 17.644 ms, Queueing time: mean = 591.342 us, max = 2.274 ms, min = 116.533 us, total = 42.577 ms |
|
[state-dump] NodeManager.deadline_timer.record_metrics - 72 total (1 active), Execution time: mean = 518.874 us, total = 37.359 ms, Queueing time: mean = 323.447 us, max = 1.700 ms, min = 9.061 us, total = 23.288 ms |
|
[state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 72 total (0 active), Execution time: mean = 46.604 us, total = 3.355 ms, Queueing time: mean = 95.927 us, max = 241.540 us, min = 15.566 us, total = 6.907 ms |
|
[state-dump] ClientConnection.async_read.ProcessMessage - 71 total (0 active), Execution time: mean = 851.189 us, total = 60.434 ms, Queueing time: mean = 68.562 us, max = 1.027 ms, min = 2.835 us, total = 4.868 ms |
|
[state-dump] NodeManager.deadline_timer.debug_state_dump - 36 total (1 active), Execution time: mean = 1.613 ms, total = 58.062 ms, Queueing time: mean = 59.346 us, max = 147.920 us, min = 11.928 us, total = 2.136 ms |
|
[state-dump] ClientConnection.async_write.DoAsyncWrites - 22 total (0 active), Execution time: mean = 1.317 us, total = 28.982 us, Queueing time: mean = 49.767 us, max = 431.510 us, min = 17.047 us, total = 1.095 ms |
|
[state-dump] ObjectManager.ObjectAdded - 21 total (0 active), Execution time: mean = 11.466 us, total = 240.795 us, Queueing time: mean = 2.139 ms, max = 21.287 ms, min = 13.920 us, total = 44.925 ms |
|
[state-dump] NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 21 total (0 active), Execution time: mean = 154.352 us, total = 3.241 ms, Queueing time: mean = 162.625 us, max = 432.570 us, min = 33.451 us, total = 3.415 ms |
|
[state-dump] ObjectManager.ObjectDeleted - 21 total (0 active), Execution time: mean = 21.296 us, total = 447.224 us, Queueing time: mean = 186.718 us, max = 583.430 us, min = 33.345 us, total = 3.921 ms |
|
[state-dump] NodeManagerService.grpc_server.GetSystemConfig - 21 total (0 active), Execution time: mean = 1.630 ms, total = 34.233 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s |
|
[state-dump] PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 217.267 us, total = 2.824 ms, Queueing time: mean = 2.965 ms, max = 8.958 ms, min = 32.982 us, total = 38.551 ms |
|
[state-dump] RaySyncer.BroadcastMessage - 8 total (0 active), Execution time: mean = 175.542 us, total = 1.404 ms, Queueing time: mean = 537.375 ns, max = 672.000 ns, min = 148.000 ns, total = 4.299 us |
|
[state-dump] - 8 total (0 active), Execution time: mean = 961.250 ns, total = 7.690 us, Queueing time: mean = 72.430 us, max = 153.432 us, min = 25.031 us, total = 579.438 us |
|
[state-dump] NodeManager.deadline_timer.print_event_loop_stats - 6 total (1 active, 1 running), Execution time: mean = 2.229 ms, total = 13.373 ms, Queueing time: mean = 29.768 us, max = 61.952 us, min = 17.957 us, total = 178.606 us |
|
[state-dump] NodeManagerService.grpc_server.RequestWorkerLease - 4 total (0 active), Execution time: mean = 768.406 us, total = 3.074 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s |
|
[state-dump] NodeManagerService.grpc_server.RequestWorkerLease.HandleRequestImpl - 4 total (0 active), Execution time: mean = 242.441 us, total = 969.764 us, Queueing time: mean = 89.175 us, max = 112.184 us, min = 37.134 us, total = 356.699 us |
|
[state-dump] NodeManagerService.grpc_server.ReturnWorker - 4 total (0 active), Execution time: mean = 541.409 us, total = 2.166 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s |
|
[state-dump] NodeManagerService.grpc_server.ReturnWorker.HandleRequestImpl - 4 total (0 active), Execution time: mean = 95.265 us, total = 381.061 us, Queueing time: mean = 58.353 us, max = 140.746 us, min = 21.653 us, total = 233.411 us |
|
[state-dump] WorkerPool.PopWorkerCallback - 4 total (0 active), Execution time: mean = 46.346 us, total = 185.384 us, Queueing time: mean = 30.612 us, max = 38.510 us, min = 20.335 us, total = 122.448 us |
|
[state-dump] RaySyncerRegister - 2 total (0 active), Execution time: mean = 2.083 us, total = 4.165 us, Queueing time: mean = 301.000 ns, max = 535.000 ns, min = 67.000 ns, total = 602.000 ns |
|
[state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 1.377 ms, total = 2.754 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s |
|
[state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 2 total (1 active), Execution time: mean = 504.334 ms, total = 1.009 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s |
|
[state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 129.555 us, total = 259.110 us, Queueing time: mean = 655.112 us, max = 1.180 ms, min = 129.843 us, total = 1.310 ms |
|
[state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 2.419 ms, total = 2.419 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s |
|
[state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 1 total (0 active), Execution time: mean = 302.707 us, total = 302.707 us, Queueing time: mean = 113.841 us, max = 113.841 us, min = 113.841 us, total = 113.841 us |
|
[state-dump] Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 96.544 us, total = 96.544 us, Queueing time: mean = 315.750 us, max = 315.750 us, min = 315.750 us, total = 315.750 us |
|
[state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 342.194 us, total = 342.194 us, Queueing time: mean = 163.766 us, max = 163.766 us, min = 163.766 us, total = 163.766 us |
|
[state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 1.019 s, total = 1.019 s, Queueing time: mean = 90.737 us, max = 90.737 us, min = 90.737 us, total = 90.737 us |
|
[state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 1.553 ms, total = 1.553 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s |
|
[state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 1.873 ms, total = 1.873 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s |
|
[state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 29.991 us, total = 29.991 us, Queueing time: mean = 111.550 us, max = 111.550 us, min = 111.550 us, total = 111.550 us |
|
[state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 1.569 ms, total = 1.569 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s |
|
[state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 1.597 ms, total = 1.597 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s |
|
[state-dump] NodeManager.GCTaskFailureReason - 1 total (1 active), Execution time: mean = 0.000 s, total = 0.000 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s |
|
[state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 224.222 us, total = 224.222 us, Queueing time: mean = 119.308 us, max = 119.308 us, min = 119.308 us, total = 119.308 us |
|
[state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 137.575 us, total = 137.575 us, Queueing time: mean = 36.079 us, max = 36.079 us, min = 36.079 us, total = 36.079 us |
|
[state-dump] DebugString() time ms: 1 |
|
[state-dump] |
|
[state-dump] |
|
[2025-01-21 05:54:35,723 I 18747 18775] (raylet) store.cc:564: Plasma store debug dump: |
|
Current usage: 0 / 2.14748 GB |
|
- num bytes created total: 168 |
|
0 pending objects of total size 0MB |
|
- objects spillable: 0 |
|
- bytes spillable: 0 |
|
- objects unsealed: 0 |
|
- bytes unsealed: 0 |
|
- objects in use: 0 |
|
- bytes in use: 0 |
|
- objects evictable: 0 |
|
- bytes evictable: 0 |
|
|
|
- objects created by worker: 0 |
|
- bytes created by worker: 0 |
|
- objects restored: 0 |
|
- bytes restored: 0 |
|
- objects received: 0 |
|
- bytes received: 0 |
|
- objects errored: 0 |
|
- bytes errored: 0 |
|
|
|
[2025-01-21 05:54:36,760 I 18747 18747] (raylet) node_manager.cc:525: [state-dump] NodeManager: |
|
[state-dump] Node ID: 381e636a10e4140b2e9620d2650d6a018da067c3591f2305edfa793d |
|
[state-dump] Node name: 192.168.0.2 |
|
[state-dump] InitialConfigResources: {object_store_memory: 21474836480000, node:192.168.0.2: 10000, node:__internal_head__: 10000, memory: 752056999940000, accelerator_type:A40: 10000, GPU: 20000, CPU: 200000} |
|
[state-dump] ClusterTaskManager: |
|
[state-dump] ========== Node: 381e636a10e4140b2e9620d2650d6a018da067c3591f2305edfa793d ================= |
|
[state-dump] Infeasible queue length: 0 |
|
[state-dump] Schedule queue length: 0 |
|
[state-dump] Dispatch queue length: 0 |
|
[state-dump] num_waiting_for_resource: 0 |
|
[state-dump] num_waiting_for_plasma_memory: 0 |
|
[state-dump] num_waiting_for_remote_node_resources: 0 |
|
[state-dump] num_worker_not_started_by_job_config_not_exist: 0 |
|
[state-dump] num_worker_not_started_by_registration_timeout: 0 |
|
[state-dump] num_tasks_waiting_for_workers: 0 |
|
[state-dump] num_cancelled_tasks: 0 |
|
[state-dump] cluster_resource_scheduler state: |
|
[state-dump] Local id: 688648627895828852 Local resources: {"total":{node:__internal_head__: [10000], accelerator_type:A40: [10000], node:192.168.0.2: [10000], CPU: [200000], memory: [752056999940000], GPU: [10000, 10000], object_store_memory: [21474836480000]}}, "available": {node:__internal_head__: [10000], accelerator_type:A40: [10000], node:192.168.0.2: [10000], CPU: [200000], memory: [752056999940000], GPU: [10000, 10000], object_store_memory: [21474836480000]}}, "labels":{"ray.io/node_id":"381e636a10e4140b2e9620d2650d6a018da067c3591f2305edfa793d",} is_draining: 0 is_idle: 1 Cluster resources: node id: 688648627895828852{"total":{object_store_memory: 21474836480000, node:__internal_head__: 10000, node:192.168.0.2: 10000, memory: 752056999940000, accelerator_type:A40: 10000, GPU: 20000, CPU: 200000}}, "available": {memory: 752056999940000, node:192.168.0.2: 10000, node:__internal_head__: 10000, accelerator_type:A40: 10000, CPU: 200000, GPU: 20000, object_store_memory: 21474836480000}}, "labels":{"ray.io/node_id":"381e636a10e4140b2e9620d2650d6a018da067c3591f2305edfa793d",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} |
|
[state-dump] Waiting tasks size: 0 |
|
[state-dump] Number of executing tasks: 0 |
|
[state-dump] Number of pinned task arguments: 0 |
|
[state-dump] Number of total spilled tasks: 0 |
|
[state-dump] Number of spilled waiting tasks: 0 |
|
[state-dump] Number of spilled unschedulable tasks: 0 |
|
[state-dump] Resource usage { |
|
[state-dump] } |
|
[state-dump] Backlog Size per scheduling descriptor :{workerId: num backlogs}: |
|
[state-dump] |
|
[state-dump] Running tasks by scheduling class: |
|
[state-dump] ================================================== |
|
[state-dump] |
|
[state-dump] ClusterResources: |
|
[state-dump] LocalObjectManager: |
|
[state-dump] - num pinned objects: 0 |
|
[state-dump] - pinned objects size: 0 |
|
[state-dump] - num objects pending restore: 0 |
|
[state-dump] - num objects pending spill: 0 |
|
[state-dump] - num bytes pending spill: 0 |
|
[state-dump] - num bytes currently spilled: 0 |
|
[state-dump] - cumulative spill requests: 0 |
|
[state-dump] - cumulative restore requests: 0 |
|
[state-dump] - spilled objects pending delete: 0 |
|
[state-dump] |
|
[state-dump] ObjectManager: |
|
[state-dump] - num local objects: 0 |
|
[state-dump] - num unfulfilled push requests: 0 |
|
[state-dump] - num object pull requests: 0 |
|
[state-dump] - num chunks received total: 0 |
|
[state-dump] - num chunks received failed (all): 0 |
|
[state-dump] - num chunks received failed / cancelled: 0 |
|
[state-dump] - num chunks received failed / plasma error: 0 |
|
[state-dump] Event stats: |
|
[state-dump] Global stats: 0 total (0 active) |
|
[state-dump] Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s |
|
[state-dump] Execution time: mean = -nan s, total = 0.000 s |
|
[state-dump] Event stats: |
|
[state-dump] PushManager: |
|
[state-dump] - num pushes in flight: 0 |
|
[state-dump] - num chunks in flight: 0 |
|
[state-dump] - num chunks remaining: 0 |
|
[state-dump] - max chunks allowed: 409 |
|
[state-dump] OwnershipBasedObjectDirectory: |
|
[state-dump] - num listeners: 0 |
|
[state-dump] - cumulative location updates: 0 |
|
[state-dump] - num location updates per second: 0.000 |
|
[state-dump] - num location lookups per second: 0.000 |
|
[state-dump] - num locations added per second: 0.000 |
|
[state-dump] - num locations removed per second: 0.000 |
|
[state-dump] BufferPool: |
|
[state-dump] - create buffer state map size: 0 |
|
[state-dump] PullManager: |
|
[state-dump] - num bytes available for pulled objects: 2147483648 |
|
[state-dump] - num bytes being pulled (all): 0 |
|
[state-dump] - num bytes being pulled / pinned: 0 |
|
[state-dump] - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} |
|
[state-dump] - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} |
|
[state-dump] - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} |
|
[state-dump] - first get request bundle: N/A |
|
[state-dump] - first wait request bundle: N/A |
|
[state-dump] - first task request bundle: N/A |
|
[state-dump] - num objects queued: 0 |
|
[state-dump] - num objects actively pulled (all): 0 |
|
[state-dump] - num objects actively pulled / pinned: 0 |
|
[state-dump] - num bundles being pulled: 0 |
|
[state-dump] - num pull retries: 0 |
|
[state-dump] - max timeout seconds: 0 |
|
[state-dump] - max timeout request is already processed. No entry. |
|
[state-dump] |
|
[state-dump] WorkerPool: |
|
[state-dump] - registered jobs: 1 |
|
[state-dump] - process_failed_job_config_missing: 0 |
|
[state-dump] - process_failed_rate_limited: 0 |
|
[state-dump] - process_failed_pending_registration: 0 |
|
[state-dump] - process_failed_runtime_env_setup_failed: 0 |
|
[state-dump] - num PYTHON workers: 20 |
|
[state-dump] - num PYTHON drivers: 1 |
|
[state-dump] - num PYTHON pending start requests: 0 |
|
[state-dump] - num PYTHON pending registration requests: 0 |
|
[state-dump] - num object spill callbacks queued: 0 |
|
[state-dump] - num object restore queued: 0 |
|
[state-dump] - num util functions queued: 0 |
|
[state-dump] - num idle workers: 20 |
|
[state-dump] TaskDependencyManager: |
|
[state-dump] - task deps map size: 0 |
|
[state-dump] - get req map size: 0 |
|
[state-dump] - wait req map size: 0 |
|
[state-dump] - local objects map size: 0 |
|
[state-dump] WaitManager: |
|
[state-dump] - num active wait requests: 0 |
|
[state-dump] Subscriber: |
|
[state-dump] Channel WORKER_OBJECT_LOCATIONS_CHANNEL |
|
[state-dump] - cumulative subscribe requests: 0 |
|
[state-dump] - cumulative unsubscribe requests: 0 |
|
[state-dump] - active subscribed publishers: 0 |
|
[state-dump] - cumulative published messages: 0 |
|
[state-dump] - cumulative processed messages: 0 |
|
[state-dump] Channel WORKER_REF_REMOVED_CHANNEL |
|
[state-dump] - cumulative subscribe requests: 0 |
|
[state-dump] - cumulative unsubscribe requests: 0 |
|
[state-dump] - active subscribed publishers: 0 |
|
[state-dump] - cumulative published messages: 0 |
|
[state-dump] - cumulative processed messages: 0 |
|
[state-dump] Channel WORKER_OBJECT_EVICTION |
|
[state-dump] - cumulative subscribe requests: 0 |
|
[state-dump] - cumulative unsubscribe requests: 0 |
|
[state-dump] - active subscribed publishers: 0 |
|
[state-dump] - cumulative published messages: 0 |
|
[state-dump] - cumulative processed messages: 0 |
|
[state-dump] num async plasma notifications: 0 |
|
[state-dump] Remote node managers: |
|
[state-dump] Event stats: |
|
[state-dump] Global stats: 36945 total (35 active) |
|
[state-dump] Queueing time: mean = 16.943 ms, max = 131.865 s, min = 67.000 ns, total = 625.955 s |
|
[state-dump] Execution time: mean = 222.695 us, total = 8.227 s |
|
[state-dump] Event stats: |
|
[state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog - 8806 total (0 active), Execution time: mean = 493.521 us, total = 4.346 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s |
|
[state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 8806 total (0 active), Execution time: mean = 36.109 us, total = 317.979 ms, Queueing time: mean = 100.277 us, max = 2.189 ms, min = 4.142 us, total = 883.041 ms |
|
[state-dump] RaySyncer.OnDemandBroadcasting - 4197 total (1 active), Execution time: mean = 9.378 us, total = 39.359 ms, Queueing time: mean = 80.576 us, max = 2.543 ms, min = 8.344 us, total = 338.179 ms |
|
[state-dump] ObjectManager.UpdateAvailableMemory - 4197 total (0 active), Execution time: mean = 4.915 us, total = 20.630 ms, Queueing time: mean = 95.270 us, max = 9.283 ms, min = 3.503 us, total = 399.848 ms |
|
[state-dump] NodeManager.CheckGC - 4197 total (1 active), Execution time: mean = 2.846 us, total = 11.944 ms, Queueing time: mean = 86.260 us, max = 2.553 ms, min = 6.447 us, total = 362.031 ms |
|
[state-dump] RayletWorkerPool.deadline_timer.kill_idle_workers - 2100 total (1 active), Execution time: mean = 15.955 us, total = 33.505 ms, Queueing time: mean = 65.295 us, max = 992.162 us, min = 9.895 us, total = 137.119 ms |
|
[state-dump] MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 1677 total (1 active), Execution time: mean = 435.395 us, total = 730.157 ms, Queueing time: mean = 69.718 us, max = 3.232 ms, min = 8.760 us, total = 116.917 ms |
|
[state-dump] NodeManager.ScheduleAndDispatchTasks - 420 total (1 active), Execution time: mean = 14.095 us, total = 5.920 ms, Queueing time: mean = 76.860 us, max = 2.272 ms, min = 17.139 us, total = 32.281 ms |
|
[state-dump] NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 420 total (0 active), Execution time: mean = 104.442 us, total = 43.866 ms, Queueing time: mean = 100.734 us, max = 238.952 us, min = 19.563 us, total = 42.308 ms |
|
[state-dump] NodeManager.deadline_timer.spill_objects_when_over_threshold - 420 total (1 active), Execution time: mean = 3.018 us, total = 1.267 ms, Queueing time: mean = 170.300 us, max = 2.205 ms, min = 7.617 us, total = 71.526 ms |
|
[state-dump] NodeManager.deadline_timer.flush_free_objects - 420 total (1 active), Execution time: mean = 7.834 us, total = 3.290 ms, Queueing time: mean = 166.937 us, max = 2.209 ms, min = 10.158 us, total = 70.114 ms |
|
[state-dump] NodeManagerService.grpc_server.GetResourceLoad - 420 total (0 active), Execution time: mean = 616.749 us, total = 259.034 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s |
|
[state-dump] ClusterResourceManager.ResetRemoteNodeView - 141 total (1 active), Execution time: mean = 7.820 us, total = 1.103 ms, Queueing time: mean = 74.046 us, max = 253.106 us, min = 14.828 us, total = 10.441 ms |
|
[state-dump] ClientConnection.async_read.ProcessMessageHeader - 92 total (21 active), Execution time: mean = 7.533 us, total = 693.037 us, Queueing time: mean = 6.775 s, max = 131.865 s, min = 27.575 us, total = 623.301 s |
|
[state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 84 total (0 active), Execution time: mean = 1.290 ms, total = 108.351 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s |
|
[state-dump] NodeManager.GcsCheckAlive - 84 total (1 active), Execution time: mean = 249.653 us, total = 20.971 ms, Queueing time: mean = 602.153 us, max = 2.274 ms, min = 115.311 us, total = 50.581 ms |
|
[state-dump] NodeManager.deadline_timer.record_metrics - 84 total (1 active), Execution time: mean = 522.939 us, total = 43.927 ms, Queueing time: mean = 333.397 us, max = 1.700 ms, min = 9.061 us, total = 28.005 ms |
|
[state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 84 total (0 active), Execution time: mean = 47.769 us, total = 4.013 ms, Queueing time: mean = 99.514 us, max = 241.540 us, min = 15.566 us, total = 8.359 ms |
|
[state-dump] ClientConnection.async_read.ProcessMessage - 71 total (0 active), Execution time: mean = 851.189 us, total = 60.434 ms, Queueing time: mean = 68.562 us, max = 1.027 ms, min = 2.835 us, total = 4.868 ms |
|
[state-dump] NodeManager.deadline_timer.debug_state_dump - 42 total (1 active), Execution time: mean = 1.638 ms, total = 68.795 ms, Queueing time: mean = 59.860 us, max = 147.920 us, min = 11.928 us, total = 2.514 ms |
|
[state-dump] ClientConnection.async_write.DoAsyncWrites - 22 total (0 active), Execution time: mean = 1.317 us, total = 28.982 us, Queueing time: mean = 49.767 us, max = 431.510 us, min = 17.047 us, total = 1.095 ms |
|
[state-dump] ObjectManager.ObjectAdded - 21 total (0 active), Execution time: mean = 11.466 us, total = 240.795 us, Queueing time: mean = 2.139 ms, max = 21.287 ms, min = 13.920 us, total = 44.925 ms |
|
[state-dump] NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 21 total (0 active), Execution time: mean = 154.352 us, total = 3.241 ms, Queueing time: mean = 162.625 us, max = 432.570 us, min = 33.451 us, total = 3.415 ms |
|
[state-dump] ObjectManager.ObjectDeleted - 21 total (0 active), Execution time: mean = 21.296 us, total = 447.224 us, Queueing time: mean = 186.718 us, max = 583.430 us, min = 33.345 us, total = 3.921 ms |
|
[state-dump] NodeManagerService.grpc_server.GetSystemConfig - 21 total (0 active), Execution time: mean = 1.630 ms, total = 34.233 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s |
|
[state-dump] PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 217.267 us, total = 2.824 ms, Queueing time: mean = 2.965 ms, max = 8.958 ms, min = 32.982 us, total = 38.551 ms |
|
[state-dump] RaySyncer.BroadcastMessage - 8 total (0 active), Execution time: mean = 175.542 us, total = 1.404 ms, Queueing time: mean = 537.375 ns, max = 672.000 ns, min = 148.000 ns, total = 4.299 us |
|
[state-dump] - 8 total (0 active), Execution time: mean = 961.250 ns, total = 7.690 us, Queueing time: mean = 72.430 us, max = 153.432 us, min = 25.031 us, total = 579.438 us |
|
[state-dump] NodeManager.deadline_timer.print_event_loop_stats - 7 total (1 active, 1 running), Execution time: mean = 2.282 ms, total = 15.975 ms, Queueing time: mean = 31.797 us, max = 61.952 us, min = 17.957 us, total = 222.582 us |
|
[state-dump] NodeManagerService.grpc_server.RequestWorkerLease - 4 total (0 active), Execution time: mean = 768.406 us, total = 3.074 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s |
|
[state-dump] NodeManagerService.grpc_server.RequestWorkerLease.HandleRequestImpl - 4 total (0 active), Execution time: mean = 242.441 us, total = 969.764 us, Queueing time: mean = 89.175 us, max = 112.184 us, min = 37.134 us, total = 356.699 us |
|
[state-dump] NodeManagerService.grpc_server.ReturnWorker - 4 total (0 active), Execution time: mean = 541.409 us, total = 2.166 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s |
|
[state-dump] NodeManagerService.grpc_server.ReturnWorker.HandleRequestImpl - 4 total (0 active), Execution time: mean = 95.265 us, total = 381.061 us, Queueing time: mean = 58.353 us, max = 140.746 us, min = 21.653 us, total = 233.411 us |
|
[state-dump] WorkerPool.PopWorkerCallback - 4 total (0 active), Execution time: mean = 46.346 us, total = 185.384 us, Queueing time: mean = 30.612 us, max = 38.510 us, min = 20.335 us, total = 122.448 us |
|
[state-dump] RaySyncerRegister - 2 total (0 active), Execution time: mean = 2.083 us, total = 4.165 us, Queueing time: mean = 301.000 ns, max = 535.000 ns, min = 67.000 ns, total = 602.000 ns |
|
[state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 1.377 ms, total = 2.754 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s |
|
[state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 2 total (1 active), Execution time: mean = 504.334 ms, total = 1.009 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s |
|
[state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 129.555 us, total = 259.110 us, Queueing time: mean = 655.112 us, max = 1.180 ms, min = 129.843 us, total = 1.310 ms |
|
[state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 2.419 ms, total = 2.419 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s |
|
[state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 1 total (0 active), Execution time: mean = 302.707 us, total = 302.707 us, Queueing time: mean = 113.841 us, max = 113.841 us, min = 113.841 us, total = 113.841 us |
|
[state-dump] Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 96.544 us, total = 96.544 us, Queueing time: mean = 315.750 us, max = 315.750 us, min = 315.750 us, total = 315.750 us |
|
[state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 342.194 us, total = 342.194 us, Queueing time: mean = 163.766 us, max = 163.766 us, min = 163.766 us, total = 163.766 us |
|
[state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 1.019 s, total = 1.019 s, Queueing time: mean = 90.737 us, max = 90.737 us, min = 90.737 us, total = 90.737 us |
|
[state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 1.553 ms, total = 1.553 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s |
|
[state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 1.873 ms, total = 1.873 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s |
|
[state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 29.991 us, total = 29.991 us, Queueing time: mean = 111.550 us, max = 111.550 us, min = 111.550 us, total = 111.550 us |
|
[state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 1.569 ms, total = 1.569 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s |
|
[state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 1.597 ms, total = 1.597 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s |
|
[state-dump] NodeManager.GCTaskFailureReason - 1 total (1 active), Execution time: mean = 0.000 s, total = 0.000 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s |
|
[state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 224.222 us, total = 224.222 us, Queueing time: mean = 119.308 us, max = 119.308 us, min = 119.308 us, total = 119.308 us |
|
[state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 137.575 us, total = 137.575 us, Queueing time: mean = 36.079 us, max = 36.079 us, min = 36.079 us, total = 36.079 us |
|
[state-dump] DebugString() time ms: 1 |
|
[state-dump] |
|
[state-dump] |
|
[2025-01-21 05:55:35,723 I 18747 18775] (raylet) store.cc:564: Plasma store debug dump: |
|
Current usage: 0 / 2.14748 GB |
|
- num bytes created total: 168 |
|
0 pending objects of total size 0MB |
|
- objects spillable: 0 |
|
- bytes spillable: 0 |
|
- objects unsealed: 0 |
|
- bytes unsealed: 0 |
|
- objects in use: 0 |
|
- bytes in use: 0 |
|
- objects evictable: 0 |
|
- bytes evictable: 0 |
|
|
|
- objects created by worker: 0 |
|
- bytes created by worker: 0 |
|
- objects restored: 0 |
|
- bytes restored: 0 |
|
- objects received: 0 |
|
- bytes received: 0 |
|
- objects errored: 0 |
|
- bytes errored: 0 |
|
|
|
[2025-01-21 05:55:36,763 I 18747 18747] (raylet) node_manager.cc:525: [state-dump] NodeManager: |
|
[state-dump] Node ID: 381e636a10e4140b2e9620d2650d6a018da067c3591f2305edfa793d |
|
[state-dump] Node name: 192.168.0.2 |
|
[state-dump] InitialConfigResources: {object_store_memory: 21474836480000, node:192.168.0.2: 10000, node:__internal_head__: 10000, memory: 752056999940000, accelerator_type:A40: 10000, GPU: 20000, CPU: 200000} |
|
[state-dump] ClusterTaskManager: |
|
[state-dump] ========== Node: 381e636a10e4140b2e9620d2650d6a018da067c3591f2305edfa793d ================= |
|
[state-dump] Infeasible queue length: 0 |
|
[state-dump] Schedule queue length: 0 |
|
[state-dump] Dispatch queue length: 0 |
|
[state-dump] num_waiting_for_resource: 0 |
|
[state-dump] num_waiting_for_plasma_memory: 0 |
|
[state-dump] num_waiting_for_remote_node_resources: 0 |
|
[state-dump] num_worker_not_started_by_job_config_not_exist: 0 |
|
[state-dump] num_worker_not_started_by_registration_timeout: 0 |
|
[state-dump] num_tasks_waiting_for_workers: 0 |
|
[state-dump] num_cancelled_tasks: 0 |
|
[state-dump] cluster_resource_scheduler state: |
|
[state-dump] Local id: 688648627895828852 Local resources: {"total":{node:__internal_head__: [10000], accelerator_type:A40: [10000], node:192.168.0.2: [10000], CPU: [200000], memory: [752056999940000], GPU: [10000, 10000], object_store_memory: [21474836480000]}}, "available": {node:__internal_head__: [10000], accelerator_type:A40: [10000], node:192.168.0.2: [10000], CPU: [200000], memory: [752056999940000], GPU: [10000, 10000], object_store_memory: [21474836480000]}}, "labels":{"ray.io/node_id":"381e636a10e4140b2e9620d2650d6a018da067c3591f2305edfa793d",} is_draining: 0 is_idle: 1 Cluster resources: node id: 688648627895828852{"total":{node:192.168.0.2: 10000, node:__internal_head__: 10000, memory: 752056999940000, CPU: 200000, accelerator_type:A40: 10000, GPU: 20000, object_store_memory: 21474836480000}}, "available": {node:__internal_head__: 10000, node:192.168.0.2: 10000, accelerator_type:A40: 10000, memory: 752056999940000, CPU: 200000, GPU: 20000, object_store_memory: 21474836480000}}, "labels":{"ray.io/node_id":"381e636a10e4140b2e9620d2650d6a018da067c3591f2305edfa793d",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} |
|
[state-dump] Waiting tasks size: 0 |
|
[state-dump] Number of executing tasks: 0 |
|
[state-dump] Number of pinned task arguments: 0 |
|
[state-dump] Number of total spilled tasks: 0 |
|
[state-dump] Number of spilled waiting tasks: 0 |
|
[state-dump] Number of spilled unschedulable tasks: 0 |
|
[state-dump] Resource usage { |
|
[state-dump] } |
|
[state-dump] Backlog Size per scheduling descriptor :{workerId: num backlogs}: |
|
[state-dump] |
|
[state-dump] Running tasks by scheduling class: |
|
[state-dump] ================================================== |
|
[state-dump] |
|
[state-dump] ClusterResources: |
|
[state-dump] LocalObjectManager: |
|
[state-dump] - num pinned objects: 0 |
|
[state-dump] - pinned objects size: 0 |
|
[state-dump] - num objects pending restore: 0 |
|
[state-dump] - num objects pending spill: 0 |
|
[state-dump] - num bytes pending spill: 0 |
|
[state-dump] - num bytes currently spilled: 0 |
|
[state-dump] - cumulative spill requests: 0 |
|
[state-dump] - cumulative restore requests: 0 |
|
[state-dump] - spilled objects pending delete: 0 |
|
[state-dump] |
|
[state-dump] ObjectManager: |
|
[state-dump] - num local objects: 0 |
|
[state-dump] - num unfulfilled push requests: 0 |
|
[state-dump] - num object pull requests: 0 |
|
[state-dump] - num chunks received total: 0 |
|
[state-dump] - num chunks received failed (all): 0 |
|
[state-dump] - num chunks received failed / cancelled: 0 |
|
[state-dump] - num chunks received failed / plasma error: 0 |
|
[state-dump] Event stats: |
|
[state-dump] Global stats: 0 total (0 active) |
|
[state-dump] Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s |
|
[state-dump] Execution time: mean = -nan s, total = 0.000 s |
|
[state-dump] Event stats: |
|
[state-dump] PushManager: |
|
[state-dump] - num pushes in flight: 0 |
|
[state-dump] - num chunks in flight: 0 |
|
[state-dump] - num chunks remaining: 0 |
|
[state-dump] - max chunks allowed: 409 |
|
[state-dump] OwnershipBasedObjectDirectory: |
|
[state-dump] - num listeners: 0 |
|
[state-dump] - cumulative location updates: 0 |
|
[state-dump] - num location updates per second: 0.000 |
|
[state-dump] - num location lookups per second: 0.000 |
|
[state-dump] - num locations added per second: 0.000 |
|
[state-dump] - num locations removed per second: 0.000 |
|
[state-dump] BufferPool: |
|
[state-dump] - create buffer state map size: 0 |
|
[state-dump] PullManager: |
|
[state-dump] - num bytes available for pulled objects: 2147483648 |
|
[state-dump] - num bytes being pulled (all): 0 |
|
[state-dump] - num bytes being pulled / pinned: 0 |
|
[state-dump] - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} |
|
[state-dump] - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} |
|
[state-dump] - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} |
|
[state-dump] - first get request bundle: N/A |
|
[state-dump] - first wait request bundle: N/A |
|
[state-dump] - first task request bundle: N/A |
|
[state-dump] - num objects queued: 0 |
|
[state-dump] - num objects actively pulled (all): 0 |
|
[state-dump] - num objects actively pulled / pinned: 0 |
|
[state-dump] - num bundles being pulled: 0 |
|
[state-dump] - num pull retries: 0 |
|
[state-dump] - max timeout seconds: 0 |
|
[state-dump] - max timeout request is already processed. No entry. |
|
[state-dump] |
|
[state-dump] WorkerPool: |
|
[state-dump] - registered jobs: 1 |
|
[state-dump] - process_failed_job_config_missing: 0 |
|
[state-dump] - process_failed_rate_limited: 0 |
|
[state-dump] - process_failed_pending_registration: 0 |
|
[state-dump] - process_failed_runtime_env_setup_failed: 0 |
|
[state-dump] - num PYTHON workers: 20 |
|
[state-dump] - num PYTHON drivers: 1 |
|
[state-dump] - num PYTHON pending start requests: 0 |
|
[state-dump] - num PYTHON pending registration requests: 0 |
|
[state-dump] - num object spill callbacks queued: 0 |
|
[state-dump] - num object restore queued: 0 |
|
[state-dump] - num util functions queued: 0 |
|
[state-dump] - num idle workers: 20 |
|
[state-dump] TaskDependencyManager: |
|
[state-dump] - task deps map size: 0 |
|
[state-dump] - get req map size: 0 |
|
[state-dump] - wait req map size: 0 |
|
[state-dump] - local objects map size: 0 |
|
[state-dump] WaitManager: |
|
[state-dump] - num active wait requests: 0 |
|
[state-dump] Subscriber: |
|
[state-dump] Channel WORKER_OBJECT_LOCATIONS_CHANNEL |
|
[state-dump] - cumulative subscribe requests: 0 |
|
[state-dump] - cumulative unsubscribe requests: 0 |
|
[state-dump] - active subscribed publishers: 0 |
|
[state-dump] - cumulative published messages: 0 |
|
[state-dump] - cumulative processed messages: 0 |
|
[state-dump] Channel WORKER_REF_REMOVED_CHANNEL |
|
[state-dump] - cumulative subscribe requests: 0 |
|
[state-dump] - cumulative unsubscribe requests: 0 |
|
[state-dump] - active subscribed publishers: 0 |
|
[state-dump] - cumulative published messages: 0 |
|
[state-dump] - cumulative processed messages: 0 |
|
[state-dump] Channel WORKER_OBJECT_EVICTION |
|
[state-dump] - cumulative subscribe requests: 0 |
|
[state-dump] - cumulative unsubscribe requests: 0 |
|
[state-dump] - active subscribed publishers: 0 |
|
[state-dump] - cumulative published messages: 0 |
|
[state-dump] - cumulative processed messages: 0 |
|
[state-dump] num async plasma notifications: 0 |
|
[state-dump] Remote node managers: |
|
[state-dump] Event stats: |
|
[state-dump] Global stats: 42184 total (35 active) |
|
[state-dump] Queueing time: mean = 21.879 ms, max = 149.071 s, min = 67.000 ns, total = 922.936 s |
|
[state-dump] Execution time: mean = 217.575 us, total = 9.178 s |
|
[state-dump] Event stats: |
|
[state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog - 10064 total (0 active), Execution time: mean = 500.467 us, total = 5.037 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s |
|
[state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 10064 total (0 active), Execution time: mean = 36.935 us, total = 371.710 ms, Queueing time: mean = 102.251 us, max = 2.189 ms, min = 4.142 us, total = 1.029 s |
|
[state-dump] RaySyncer.OnDemandBroadcasting - 4796 total (1 active), Execution time: mean = 9.405 us, total = 45.108 ms, Queueing time: mean = 80.805 us, max = 2.543 ms, min = 8.344 us, total = 387.543 ms |
|
[state-dump] ObjectManager.UpdateAvailableMemory - 4796 total (0 active), Execution time: mean = 4.994 us, total = 23.953 ms, Queueing time: mean = 97.636 us, max = 9.283 ms, min = 3.503 us, total = 468.263 ms |
|
[state-dump] NodeManager.CheckGC - 4796 total (1 active), Execution time: mean = 2.843 us, total = 13.637 ms, Queueing time: mean = 86.512 us, max = 2.553 ms, min = 6.447 us, total = 414.911 ms |
|
[state-dump] RayletWorkerPool.deadline_timer.kill_idle_workers - 2400 total (1 active), Execution time: mean = 15.975 us, total = 38.340 ms, Queueing time: mean = 65.877 us, max = 992.162 us, min = 9.895 us, total = 158.105 ms |
|
[state-dump] MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 1917 total (1 active), Execution time: mean = 435.521 us, total = 834.893 ms, Queueing time: mean = 70.003 us, max = 3.232 ms, min = 8.760 us, total = 134.195 ms |
|
[state-dump] NodeManager.ScheduleAndDispatchTasks - 480 total (1 active), Execution time: mean = 14.068 us, total = 6.753 ms, Queueing time: mean = 76.867 us, max = 2.272 ms, min = 17.139 us, total = 36.896 ms |
|
[state-dump] NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 480 total (0 active), Execution time: mean = 104.746 us, total = 50.278 ms, Queueing time: mean = 102.833 us, max = 238.952 us, min = 19.563 us, total = 49.360 ms |
|
[state-dump] NodeManager.deadline_timer.spill_objects_when_over_threshold - 480 total (1 active), Execution time: mean = 3.024 us, total = 1.452 ms, Queueing time: mean = 168.780 us, max = 2.205 ms, min = 7.617 us, total = 81.015 ms |
|
[state-dump] NodeManager.deadline_timer.flush_free_objects - 480 total (1 active), Execution time: mean = 7.863 us, total = 3.774 ms, Queueing time: mean = 165.422 us, max = 2.209 ms, min = 10.158 us, total = 79.403 ms |
|
[state-dump] NodeManagerService.grpc_server.GetResourceLoad - 480 total (0 active), Execution time: mean = 621.145 us, total = 298.149 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s |
|
[state-dump] ClusterResourceManager.ResetRemoteNodeView - 161 total (1 active), Execution time: mean = 7.786 us, total = 1.254 ms, Queueing time: mean = 75.671 us, max = 253.106 us, min = 14.828 us, total = 12.183 ms |
|
[state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 96 total (0 active), Execution time: mean = 1.291 ms, total = 123.899 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s |
|
[state-dump] NodeManager.GcsCheckAlive - 96 total (1 active), Execution time: mean = 251.878 us, total = 24.180 ms, Queueing time: mean = 594.511 us, max = 2.274 ms, min = 115.311 us, total = 57.073 ms |
|
[state-dump] NodeManager.deadline_timer.record_metrics - 96 total (1 active), Execution time: mean = 516.889 us, total = 49.621 ms, Queueing time: mean = 333.887 us, max = 1.700 ms, min = 9.061 us, total = 32.053 ms |
|
[state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 96 total (0 active), Execution time: mean = 47.691 us, total = 4.578 ms, Queueing time: mean = 100.571 us, max = 241.540 us, min = 15.566 us, total = 9.655 ms |
|
[state-dump] ClientConnection.async_read.ProcessMessageHeader - 94 total (21 active), Execution time: mean = 7.655 us, total = 719.592 us, Queueing time: mean = 9.786 s, max = 149.071 s, min = 27.575 us, total = 919.883 s |
|
[state-dump] ClientConnection.async_read.ProcessMessage - 73 total (0 active), Execution time: mean = 828.378 us, total = 60.472 ms, Queueing time: mean = 68.615 us, max = 1.027 ms, min = 2.835 us, total = 5.009 ms |
|
[state-dump] NodeManager.deadline_timer.debug_state_dump - 48 total (1 active), Execution time: mean = 1.630 ms, total = 78.258 ms, Queueing time: mean = 62.661 us, max = 147.920 us, min = 11.928 us, total = 3.008 ms |
|
[state-dump] ClientConnection.async_write.DoAsyncWrites - 22 total (0 active), Execution time: mean = 1.317 us, total = 28.982 us, Queueing time: mean = 49.767 us, max = 431.510 us, min = 17.047 us, total = 1.095 ms |
|
[state-dump] NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 21 total (0 active), Execution time: mean = 154.352 us, total = 3.241 ms, Queueing time: mean = 162.625 us, max = 432.570 us, min = 33.451 us, total = 3.415 ms |
|
[state-dump] ObjectManager.ObjectAdded - 21 total (0 active), Execution time: mean = 11.466 us, total = 240.795 us, Queueing time: mean = 2.139 ms, max = 21.287 ms, min = 13.920 us, total = 44.925 ms |
|
[state-dump] ObjectManager.ObjectDeleted - 21 total (0 active), Execution time: mean = 21.296 us, total = 447.224 us, Queueing time: mean = 186.718 us, max = 583.430 us, min = 33.345 us, total = 3.921 ms |
|
[state-dump] NodeManagerService.grpc_server.GetSystemConfig - 21 total (0 active), Execution time: mean = 1.630 ms, total = 34.233 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s |
|
[state-dump] PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 217.267 us, total = 2.824 ms, Queueing time: mean = 2.965 ms, max = 8.958 ms, min = 32.982 us, total = 38.551 ms |
|
[state-dump] RaySyncer.BroadcastMessage - 9 total (0 active), Execution time: mean = 182.148 us, total = 1.639 ms, Queueing time: mean = 558.444 ns, max = 727.000 ns, min = 148.000 ns, total = 5.026 us |
|
[state-dump] - 9 total (0 active), Execution time: mean = 960.889 ns, total = 8.648 us, Queueing time: mean = 82.816 us, max = 165.908 us, min = 25.031 us, total = 745.346 us |
|
[state-dump] NodeManager.deadline_timer.print_event_loop_stats - 8 total (1 active, 1 running), Execution time: mean = 2.262 ms, total = 18.095 ms, Queueing time: mean = 34.536 us, max = 61.952 us, min = 17.957 us, total = 276.291 us |
|
[state-dump] NodeManagerService.grpc_server.RequestWorkerLease - 5 total (0 active), Execution time: mean = 783.478 us, total = 3.917 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s |
|
[state-dump] NodeManagerService.grpc_server.RequestWorkerLease.HandleRequestImpl - 5 total (0 active), Execution time: mean = 235.036 us, total = 1.175 ms, Queueing time: mean = 96.003 us, max = 123.315 us, min = 37.134 us, total = 480.014 us |
|
[state-dump] NodeManagerService.grpc_server.ReturnWorker - 5 total (0 active), Execution time: mean = 551.562 us, total = 2.758 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s |
|
[state-dump] NodeManagerService.grpc_server.ReturnWorker.HandleRequestImpl - 5 total (0 active), Execution time: mean = 98.072 us, total = 490.362 us, Queueing time: mean = 50.658 us, max = 140.746 us, min = 19.878 us, total = 253.289 us |
|
[state-dump] WorkerPool.PopWorkerCallback - 5 total (0 active), Execution time: mean = 47.193 us, total = 235.967 us, Queueing time: mean = 28.959 us, max = 38.510 us, min = 20.335 us, total = 144.797 us |
|
[state-dump] RaySyncerRegister - 2 total (0 active), Execution time: mean = 2.083 us, total = 4.165 us, Queueing time: mean = 301.000 ns, max = 535.000 ns, min = 67.000 ns, total = 602.000 ns |
|
[state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 2 total (1 active), Execution time: mean = 504.334 ms, total = 1.009 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s |
|
[state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 1.377 ms, total = 2.754 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s |
|
[state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 129.555 us, total = 259.110 us, Queueing time: mean = 655.112 us, max = 1.180 ms, min = 129.843 us, total = 1.310 ms |
|
[state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 2.419 ms, total = 2.419 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s |
|
[state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 1 total (0 active), Execution time: mean = 302.707 us, total = 302.707 us, Queueing time: mean = 113.841 us, max = 113.841 us, min = 113.841 us, total = 113.841 us |
|
[state-dump] Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 96.544 us, total = 96.544 us, Queueing time: mean = 315.750 us, max = 315.750 us, min = 315.750 us, total = 315.750 us |
|
[state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 342.194 us, total = 342.194 us, Queueing time: mean = 163.766 us, max = 163.766 us, min = 163.766 us, total = 163.766 us |
|
[state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 1.019 s, total = 1.019 s, Queueing time: mean = 90.737 us, max = 90.737 us, min = 90.737 us, total = 90.737 us |
|
[state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 1.553 ms, total = 1.553 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s |
|
[state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 1.873 ms, total = 1.873 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s |
|
[state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 29.991 us, total = 29.991 us, Queueing time: mean = 111.550 us, max = 111.550 us, min = 111.550 us, total = 111.550 us |
|
[state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 1.569 ms, total = 1.569 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s |
|
[state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 1.597 ms, total = 1.597 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s |
|
[state-dump] NodeManager.GCTaskFailureReason - 1 total (1 active), Execution time: mean = 0.000 s, total = 0.000 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s |
|
[state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 224.222 us, total = 224.222 us, Queueing time: mean = 119.308 us, max = 119.308 us, min = 119.308 us, total = 119.308 us |
|
[state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 137.575 us, total = 137.575 us, Queueing time: mean = 36.079 us, max = 36.079 us, min = 36.079 us, total = 36.079 us |
|
[state-dump] DebugString() time ms: 1 |
|
[state-dump] |
|
[state-dump] |
|
[2025-01-21 05:56:35,724 I 18747 18775] (raylet) store.cc:564: Plasma store debug dump: |
|
Current usage: 0 / 2.14748 GB |
|
- num bytes created total: 168 |
|
0 pending objects of total size 0MB |
|
- objects spillable: 0 |
|
- bytes spillable: 0 |
|
- objects unsealed: 0 |
|
- bytes unsealed: 0 |
|
- objects in use: 0 |
|
- bytes in use: 0 |
|
- objects evictable: 0 |
|
- bytes evictable: 0 |
|
|
|
- objects created by worker: 0 |
|
- bytes created by worker: 0 |
|
- objects restored: 0 |
|
- bytes restored: 0 |
|
- objects received: 0 |
|
- bytes received: 0 |
|
- objects errored: 0 |
|
- bytes errored: 0 |
|
|
|
[2025-01-21 05:56:36,765 I 18747 18747] (raylet) node_manager.cc:525: [state-dump] NodeManager: |
|
[state-dump] Node ID: 381e636a10e4140b2e9620d2650d6a018da067c3591f2305edfa793d |
|
[state-dump] Node name: 192.168.0.2 |
|
[state-dump] InitialConfigResources: {object_store_memory: 21474836480000, node:192.168.0.2: 10000, node:__internal_head__: 10000, memory: 752056999940000, accelerator_type:A40: 10000, GPU: 20000, CPU: 200000} |
|
[state-dump] ClusterTaskManager: |
|
[state-dump] ========== Node: 381e636a10e4140b2e9620d2650d6a018da067c3591f2305edfa793d ================= |
|
[state-dump] Infeasible queue length: 0 |
|
[state-dump] Schedule queue length: 0 |
|
[state-dump] Dispatch queue length: 0 |
|
[state-dump] num_waiting_for_resource: 0 |
|
[state-dump] num_waiting_for_plasma_memory: 0 |
|
[state-dump] num_waiting_for_remote_node_resources: 0 |
|
[state-dump] num_worker_not_started_by_job_config_not_exist: 0 |
|
[state-dump] num_worker_not_started_by_registration_timeout: 0 |
|
[state-dump] num_tasks_waiting_for_workers: 0 |
|
[state-dump] num_cancelled_tasks: 0 |
|
[state-dump] cluster_resource_scheduler state: |
|
[state-dump] Local id: 688648627895828852 Local resources: {"total":{node:__internal_head__: [10000], accelerator_type:A40: [10000], node:192.168.0.2: [10000], CPU: [200000], memory: [752056999940000], GPU: [10000, 10000], object_store_memory: [21474836480000]}}, "available": {node:__internal_head__: [10000], accelerator_type:A40: [10000], node:192.168.0.2: [10000], CPU: [200000], memory: [752056999940000], GPU: [10000, 10000], object_store_memory: [21474836480000]}}, "labels":{"ray.io/node_id":"381e636a10e4140b2e9620d2650d6a018da067c3591f2305edfa793d",} is_draining: 0 is_idle: 1 Cluster resources: node id: 688648627895828852{"total":{node:192.168.0.2: 10000, node:__internal_head__: 10000, memory: 752056999940000, CPU: 200000, accelerator_type:A40: 10000, GPU: 20000, object_store_memory: 21474836480000}}, "available": {node:__internal_head__: 10000, node:192.168.0.2: 10000, accelerator_type:A40: 10000, memory: 752056999940000, CPU: 200000, GPU: 20000, object_store_memory: 21474836480000}}, "labels":{"ray.io/node_id":"381e636a10e4140b2e9620d2650d6a018da067c3591f2305edfa793d",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} |
|
[state-dump] Waiting tasks size: 0 |
|
[state-dump] Number of executing tasks: 0 |
|
[state-dump] Number of pinned task arguments: 0 |
|
[state-dump] Number of total spilled tasks: 0 |
|
[state-dump] Number of spilled waiting tasks: 0 |
|
[state-dump] Number of spilled unschedulable tasks: 0 |
|
[state-dump] Resource usage { |
|
[state-dump] } |
|
[state-dump] Backlog Size per scheduling descriptor :{workerId: num backlogs}: |
|
[state-dump] |
|
[state-dump] Running tasks by scheduling class: |
|
[state-dump] ================================================== |
|
[state-dump] |
|
[state-dump] ClusterResources: |
|
[state-dump] LocalObjectManager: |
|
[state-dump] - num pinned objects: 0 |
|
[state-dump] - pinned objects size: 0 |
|
[state-dump] - num objects pending restore: 0 |
|
[state-dump] - num objects pending spill: 0 |
|
[state-dump] - num bytes pending spill: 0 |
|
[state-dump] - num bytes currently spilled: 0 |
|
[state-dump] - cumulative spill requests: 0 |
|
[state-dump] - cumulative restore requests: 0 |
|
[state-dump] - spilled objects pending delete: 0 |
|
[state-dump] |
|
[state-dump] ObjectManager: |
|
[state-dump] - num local objects: 0 |
|
[state-dump] - num unfulfilled push requests: 0 |
|
[state-dump] - num object pull requests: 0 |
|
[state-dump] - num chunks received total: 0 |
|
[state-dump] - num chunks received failed (all): 0 |
|
[state-dump] - num chunks received failed / cancelled: 0 |
|
[state-dump] - num chunks received failed / plasma error: 0 |
|
[state-dump] Event stats: |
|
[state-dump] Global stats: 0 total (0 active) |
|
[state-dump] Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s |
|
[state-dump] Execution time: mean = -nan s, total = 0.000 s |
|
[state-dump] Event stats: |
|
[state-dump] PushManager: |
|
[state-dump] - num pushes in flight: 0 |
|
[state-dump] - num chunks in flight: 0 |
|
[state-dump] - num chunks remaining: 0 |
|
[state-dump] - max chunks allowed: 409 |
|
[state-dump] OwnershipBasedObjectDirectory: |
|
[state-dump] - num listeners: 0 |
|
[state-dump] - cumulative location updates: 0 |
|
[state-dump] - num location updates per second: 0.000 |
|
[state-dump] - num location lookups per second: 0.000 |
|
[state-dump] - num locations added per second: 0.000 |
|
[state-dump] - num locations removed per second: 0.000 |
|
[state-dump] BufferPool: |
|
[state-dump] - create buffer state map size: 0 |
|
[state-dump] PullManager: |
|
[state-dump] - num bytes available for pulled objects: 2147483648 |
|
[state-dump] - num bytes being pulled (all): 0 |
|
[state-dump] - num bytes being pulled / pinned: 0 |
|
[state-dump] - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} |
|
[state-dump] - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} |
|
[state-dump] - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} |
|
[state-dump] - first get request bundle: N/A |
|
[state-dump] - first wait request bundle: N/A |
|
[state-dump] - first task request bundle: N/A |
|
[state-dump] - num objects queued: 0 |
|
[state-dump] - num objects actively pulled (all): 0 |
|
[state-dump] - num objects actively pulled / pinned: 0 |
|
[state-dump] - num bundles being pulled: 0 |
|
[state-dump] - num pull retries: 0 |
|
[state-dump] - max timeout seconds: 0 |
|
[state-dump] - max timeout request is already processed. No entry. |
|
[state-dump] |
|
[state-dump] WorkerPool: |
|
[state-dump] - registered jobs: 1 |
|
[state-dump] - process_failed_job_config_missing: 0 |
|
[state-dump] - process_failed_rate_limited: 0 |
|
[state-dump] - process_failed_pending_registration: 0 |
|
[state-dump] - process_failed_runtime_env_setup_failed: 0 |
|
[state-dump] - num PYTHON workers: 20 |
|
[state-dump] - num PYTHON drivers: 1 |
|
[state-dump] - num PYTHON pending start requests: 0 |
|
[state-dump] - num PYTHON pending registration requests: 0 |
|
[state-dump] - num object spill callbacks queued: 0 |
|
[state-dump] - num object restore queued: 0 |
|
[state-dump] - num util functions queued: 0 |
|
[state-dump] - num idle workers: 20 |
|
[state-dump] TaskDependencyManager: |
|
[state-dump] - task deps map size: 0 |
|
[state-dump] - get req map size: 0 |
|
[state-dump] - wait req map size: 0 |
|
[state-dump] - local objects map size: 0 |
|
[state-dump] WaitManager: |
|
[state-dump] - num active wait requests: 0 |
|
[state-dump] Subscriber: |
|
[state-dump] Channel WORKER_OBJECT_LOCATIONS_CHANNEL |
|
[state-dump] - cumulative subscribe requests: 0 |
|
[state-dump] - cumulative unsubscribe requests: 0 |
|
[state-dump] - active subscribed publishers: 0 |
|
[state-dump] - cumulative published messages: 0 |
|
[state-dump] - cumulative processed messages: 0 |
|
[state-dump] Channel WORKER_REF_REMOVED_CHANNEL |
|
[state-dump] - cumulative subscribe requests: 0 |
|
[state-dump] - cumulative unsubscribe requests: 0 |
|
[state-dump] - active subscribed publishers: 0 |
|
[state-dump] - cumulative published messages: 0 |
|
[state-dump] - cumulative processed messages: 0 |
|
[state-dump] Channel WORKER_OBJECT_EVICTION |
|
[state-dump] - cumulative subscribe requests: 0 |
|
[state-dump] - cumulative unsubscribe requests: 0 |
|
[state-dump] - active subscribed publishers: 0 |
|
[state-dump] - cumulative published messages: 0 |
|
[state-dump] - cumulative processed messages: 0 |
|
[state-dump] num async plasma notifications: 0 |
|
[state-dump] Remote node managers: |
|
[state-dump] Event stats: |
|
[state-dump] Global stats: 47408 total (35 active) |
|
[state-dump] Queueing time: mean = 19.476 ms, max = 149.071 s, min = 67.000 ns, total = 923.340 s |
|
[state-dump] Execution time: mean = 212.946 us, total = 10.095 s |
|
[state-dump] Event stats: |
|
[state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog - 11319 total (0 active), Execution time: mean = 503.385 us, total = 5.698 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s |
|
[state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 11319 total (0 active), Execution time: mean = 37.209 us, total = 421.172 ms, Queueing time: mean = 102.663 us, max = 2.189 ms, min = 4.142 us, total = 1.162 s |
|
[state-dump] RaySyncer.OnDemandBroadcasting - 5396 total (1 active), Execution time: mean = 9.506 us, total = 51.295 ms, Queueing time: mean = 82.835 us, max = 3.517 ms, min = 8.344 us, total = 446.979 ms |
|
[state-dump] ObjectManager.UpdateAvailableMemory - 5396 total (0 active), Execution time: mean = 5.037 us, total = 27.177 ms, Queueing time: mean = 98.076 us, max = 9.283 ms, min = 3.503 us, total = 529.218 ms |
|
[state-dump] NodeManager.CheckGC - 5396 total (1 active), Execution time: mean = 2.852 us, total = 15.392 ms, Queueing time: mean = 88.631 us, max = 3.519 ms, min = 6.447 us, total = 478.255 ms |
|
[state-dump] RayletWorkerPool.deadline_timer.kill_idle_workers - 2700 total (1 active), Execution time: mean = 16.231 us, total = 43.823 ms, Queueing time: mean = 66.884 us, max = 992.162 us, min = 9.895 us, total = 180.588 ms |
|
[state-dump] MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 2156 total (1 active), Execution time: mean = 435.697 us, total = 939.363 ms, Queueing time: mean = 70.226 us, max = 3.232 ms, min = 8.760 us, total = 151.407 ms |
|
[state-dump] NodeManager.ScheduleAndDispatchTasks - 540 total (1 active), Execution time: mean = 14.069 us, total = 7.597 ms, Queueing time: mean = 77.286 us, max = 2.272 ms, min = 17.139 us, total = 41.735 ms |
|
[state-dump] NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 540 total (0 active), Execution time: mean = 104.914 us, total = 56.654 ms, Queueing time: mean = 103.730 us, max = 238.952 us, min = 19.403 us, total = 56.014 ms |
|
[state-dump] NodeManager.deadline_timer.spill_objects_when_over_threshold - 540 total (1 active), Execution time: mean = 3.022 us, total = 1.632 ms, Queueing time: mean = 169.226 us, max = 2.205 ms, min = 6.247 us, total = 91.382 ms |
|
[state-dump] NodeManager.deadline_timer.flush_free_objects - 540 total (1 active), Execution time: mean = 7.931 us, total = 4.283 ms, Queueing time: mean = 165.818 us, max = 2.209 ms, min = 9.779 us, total = 89.542 ms |
|
[state-dump] NodeManagerService.grpc_server.GetResourceLoad - 540 total (0 active), Execution time: mean = 624.341 us, total = 337.144 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s |
|
[state-dump] ClusterResourceManager.ResetRemoteNodeView - 181 total (1 active), Execution time: mean = 7.821 us, total = 1.416 ms, Queueing time: mean = 74.655 us, max = 253.106 us, min = 11.307 us, total = 13.513 ms |
|
[state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 108 total (0 active), Execution time: mean = 1.294 ms, total = 139.751 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s |
|
[state-dump] NodeManager.GcsCheckAlive - 108 total (1 active), Execution time: mean = 251.793 us, total = 27.194 ms, Queueing time: mean = 598.714 us, max = 2.274 ms, min = 115.311 us, total = 64.661 ms |
|
[state-dump] NodeManager.deadline_timer.record_metrics - 108 total (1 active), Execution time: mean = 517.475 us, total = 55.887 ms, Queueing time: mean = 337.803 us, max = 1.700 ms, min = 9.061 us, total = 36.483 ms |
|
[state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 108 total (0 active), Execution time: mean = 47.963 us, total = 5.180 ms, Queueing time: mean = 100.733 us, max = 241.540 us, min = 14.505 us, total = 10.879 ms |
|
[state-dump] ClientConnection.async_read.ProcessMessageHeader - 94 total (21 active), Execution time: mean = 7.655 us, total = 719.592 us, Queueing time: mean = 9.786 s, max = 149.071 s, min = 27.575 us, total = 919.883 s |
|
[state-dump] ClientConnection.async_read.ProcessMessage - 73 total (0 active), Execution time: mean = 828.378 us, total = 60.472 ms, Queueing time: mean = 68.615 us, max = 1.027 ms, min = 2.835 us, total = 5.009 ms |
|
[state-dump] NodeManager.deadline_timer.debug_state_dump - 54 total (1 active), Execution time: mean = 1.635 ms, total = 88.277 ms, Queueing time: mean = 65.135 us, max = 147.920 us, min = 11.928 us, total = 3.517 ms |
|
[state-dump] ClientConnection.async_write.DoAsyncWrites - 22 total (0 active), Execution time: mean = 1.317 us, total = 28.982 us, Queueing time: mean = 49.767 us, max = 431.510 us, min = 17.047 us, total = 1.095 ms |
|
[state-dump] NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 21 total (0 active), Execution time: mean = 154.352 us, total = 3.241 ms, Queueing time: mean = 162.625 us, max = 432.570 us, min = 33.451 us, total = 3.415 ms |
|
[state-dump] ObjectManager.ObjectAdded - 21 total (0 active), Execution time: mean = 11.466 us, total = 240.795 us, Queueing time: mean = 2.139 ms, max = 21.287 ms, min = 13.920 us, total = 44.925 ms |
|
[state-dump] ObjectManager.ObjectDeleted - 21 total (0 active), Execution time: mean = 21.296 us, total = 447.224 us, Queueing time: mean = 186.718 us, max = 583.430 us, min = 33.345 us, total = 3.921 ms |
|
[state-dump] NodeManagerService.grpc_server.GetSystemConfig - 21 total (0 active), Execution time: mean = 1.630 ms, total = 34.233 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s |
|
[state-dump] PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 217.267 us, total = 2.824 ms, Queueing time: mean = 2.965 ms, max = 8.958 ms, min = 32.982 us, total = 38.551 ms |
|
[state-dump] RaySyncer.BroadcastMessage - 9 total (0 active), Execution time: mean = 182.148 us, total = 1.639 ms, Queueing time: mean = 558.444 ns, max = 727.000 ns, min = 148.000 ns, total = 5.026 us |
|
[state-dump] NodeManager.deadline_timer.print_event_loop_stats - 9 total (1 active, 1 running), Execution time: mean = 2.305 ms, total = 20.742 ms, Queueing time: mean = 42.685 us, max = 107.871 us, min = 17.957 us, total = 384.162 us |
|
[state-dump] - 9 total (0 active), Execution time: mean = 960.889 ns, total = 8.648 us, Queueing time: mean = 82.816 us, max = 165.908 us, min = 25.031 us, total = 745.346 us |
|
[state-dump] NodeManagerService.grpc_server.RequestWorkerLease - 5 total (0 active), Execution time: mean = 783.478 us, total = 3.917 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s |
|
[state-dump] NodeManagerService.grpc_server.RequestWorkerLease.HandleRequestImpl - 5 total (0 active), Execution time: mean = 235.036 us, total = 1.175 ms, Queueing time: mean = 96.003 us, max = 123.315 us, min = 37.134 us, total = 480.014 us |
|
[state-dump] NodeManagerService.grpc_server.ReturnWorker - 5 total (0 active), Execution time: mean = 551.562 us, total = 2.758 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s |
|
[state-dump] NodeManagerService.grpc_server.ReturnWorker.HandleRequestImpl - 5 total (0 active), Execution time: mean = 98.072 us, total = 490.362 us, Queueing time: mean = 50.658 us, max = 140.746 us, min = 19.878 us, total = 253.289 us |
|
[state-dump] WorkerPool.PopWorkerCallback - 5 total (0 active), Execution time: mean = 47.193 us, total = 235.967 us, Queueing time: mean = 28.959 us, max = 38.510 us, min = 20.335 us, total = 144.797 us |
|
[state-dump] RaySyncerRegister - 2 total (0 active), Execution time: mean = 2.083 us, total = 4.165 us, Queueing time: mean = 301.000 ns, max = 535.000 ns, min = 67.000 ns, total = 602.000 ns |
|
[state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 2 total (1 active), Execution time: mean = 504.334 ms, total = 1.009 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s |
|
[state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 129.555 us, total = 259.110 us, Queueing time: mean = 655.112 us, max = 1.180 ms, min = 129.843 us, total = 1.310 ms |
|
[state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 1.377 ms, total = 2.754 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s |
|
[state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 2.419 ms, total = 2.419 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s |
|
[state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 1 total (0 active), Execution time: mean = 302.707 us, total = 302.707 us, Queueing time: mean = 113.841 us, max = 113.841 us, min = 113.841 us, total = 113.841 us |
|
[state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 342.194 us, total = 342.194 us, Queueing time: mean = 163.766 us, max = 163.766 us, min = 163.766 us, total = 163.766 us |
|
[state-dump] Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 96.544 us, total = 96.544 us, Queueing time: mean = 315.750 us, max = 315.750 us, min = 315.750 us, total = 315.750 us |
|
[state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 1.019 s, total = 1.019 s, Queueing time: mean = 90.737 us, max = 90.737 us, min = 90.737 us, total = 90.737 us |
|
[state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 1.553 ms, total = 1.553 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s |
|
[state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 1.873 ms, total = 1.873 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s |
|
[state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 29.991 us, total = 29.991 us, Queueing time: mean = 111.550 us, max = 111.550 us, min = 111.550 us, total = 111.550 us |
|
[state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 1.569 ms, total = 1.569 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s |
|
[state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 1.597 ms, total = 1.597 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s |
|
[state-dump] NodeManager.GCTaskFailureReason - 1 total (1 active), Execution time: mean = 0.000 s, total = 0.000 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s |
|
[state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 224.222 us, total = 224.222 us, Queueing time: mean = 119.308 us, max = 119.308 us, min = 119.308 us, total = 119.308 us |
|
[state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 137.575 us, total = 137.575 us, Queueing time: mean = 36.079 us, max = 36.079 us, min = 36.079 us, total = 36.079 us |
|
[state-dump] DebugString() time ms: 1 |
|
[state-dump] |
|
[state-dump] |
|
[2025-01-21 05:57:35,724 I 18747 18775] (raylet) store.cc:564: Plasma store debug dump: |
|
Current usage: 0 / 2.14748 GB |
|
- num bytes created total: 168 |
|
0 pending objects of total size 0MB |
|
- objects spillable: 0 |
|
- bytes spillable: 0 |
|
- objects unsealed: 0 |
|
- bytes unsealed: 0 |
|
- objects in use: 0 |
|
- bytes in use: 0 |
|
- objects evictable: 0 |
|
- bytes evictable: 0 |
|
|
|
- objects created by worker: 0 |
|
- bytes created by worker: 0 |
|
- objects restored: 0 |
|
- bytes restored: 0 |
|
- objects received: 0 |
|
- bytes received: 0 |
|
- objects errored: 0 |
|
- bytes errored: 0 |
|
|
|
[2025-01-21 05:57:36,768 I 18747 18747] (raylet) node_manager.cc:525: [state-dump] NodeManager: |
|
[state-dump] Node ID: 381e636a10e4140b2e9620d2650d6a018da067c3591f2305edfa793d |
|
[state-dump] Node name: 192.168.0.2 |
|
[state-dump] InitialConfigResources: {object_store_memory: 21474836480000, node:192.168.0.2: 10000, node:__internal_head__: 10000, memory: 752056999940000, accelerator_type:A40: 10000, GPU: 20000, CPU: 200000} |
|
[state-dump] ClusterTaskManager: |
|
[state-dump] ========== Node: 381e636a10e4140b2e9620d2650d6a018da067c3591f2305edfa793d ================= |
|
[state-dump] Infeasible queue length: 0 |
|
[state-dump] Schedule queue length: 0 |
|
[state-dump] Dispatch queue length: 0 |
|
[state-dump] num_waiting_for_resource: 0 |
|
[state-dump] num_waiting_for_plasma_memory: 0 |
|
[state-dump] num_waiting_for_remote_node_resources: 0 |
|
[state-dump] num_worker_not_started_by_job_config_not_exist: 0 |
|
[state-dump] num_worker_not_started_by_registration_timeout: 0 |
|
[state-dump] num_tasks_waiting_for_workers: 0 |
|
[state-dump] num_cancelled_tasks: 0 |
|
[state-dump] cluster_resource_scheduler state: |
|
[state-dump] Local id: 688648627895828852 Local resources: {"total":{node:__internal_head__: [10000], accelerator_type:A40: [10000], node:192.168.0.2: [10000], CPU: [200000], memory: [752056999940000], GPU: [10000, 10000], object_store_memory: [21474836480000]}}, "available": {node:__internal_head__: [10000], accelerator_type:A40: [10000], node:192.168.0.2: [10000], CPU: [200000], memory: [752056999940000], GPU: [10000, 10000], object_store_memory: [21474836480000]}}, "labels":{"ray.io/node_id":"381e636a10e4140b2e9620d2650d6a018da067c3591f2305edfa793d",} is_draining: 0 is_idle: 1 Cluster resources: node id: 688648627895828852{"total":{node:192.168.0.2: 10000, node:__internal_head__: 10000, memory: 752056999940000, CPU: 200000, accelerator_type:A40: 10000, GPU: 20000, object_store_memory: 21474836480000}}, "available": {node:__internal_head__: 10000, node:192.168.0.2: 10000, accelerator_type:A40: 10000, memory: 752056999940000, CPU: 200000, GPU: 20000, object_store_memory: 21474836480000}}, "labels":{"ray.io/node_id":"381e636a10e4140b2e9620d2650d6a018da067c3591f2305edfa793d",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} |
|
[state-dump] Waiting tasks size: 0 |
|
[state-dump] Number of executing tasks: 0 |
|
[state-dump] Number of pinned task arguments: 0 |
|
[state-dump] Number of total spilled tasks: 0 |
|
[state-dump] Number of spilled waiting tasks: 0 |
|
[state-dump] Number of spilled unschedulable tasks: 0 |
|
[state-dump] Resource usage { |
|
[state-dump] } |
|
[state-dump] Backlog Size per scheduling descriptor :{workerId: num backlogs}: |
|
[state-dump] |
|
[state-dump] Running tasks by scheduling class: |
|
[state-dump] ================================================== |
|
[state-dump] |
|
[state-dump] ClusterResources: |
|
[state-dump] LocalObjectManager: |
|
[state-dump] - num pinned objects: 0 |
|
[state-dump] - pinned objects size: 0 |
|
[state-dump] - num objects pending restore: 0 |
|
[state-dump] - num objects pending spill: 0 |
|
[state-dump] - num bytes pending spill: 0 |
|
[state-dump] - num bytes currently spilled: 0 |
|
[state-dump] - cumulative spill requests: 0 |
|
[state-dump] - cumulative restore requests: 0 |
|
[state-dump] - spilled objects pending delete: 0 |
|
[state-dump] |
|
[state-dump] ObjectManager: |
|
[state-dump] - num local objects: 0 |
|
[state-dump] - num unfulfilled push requests: 0 |
|
[state-dump] - num object pull requests: 0 |
|
[state-dump] - num chunks received total: 0 |
|
[state-dump] - num chunks received failed (all): 0 |
|
[state-dump] - num chunks received failed / cancelled: 0 |
|
[state-dump] - num chunks received failed / plasma error: 0 |
|
[state-dump] Event stats: |
|
[state-dump] Global stats: 0 total (0 active) |
|
[state-dump] Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s |
|
[state-dump] Execution time: mean = -nan s, total = 0.000 s |
|
[state-dump] Event stats: |
|
[state-dump] PushManager: |
|
[state-dump] - num pushes in flight: 0 |
|
[state-dump] - num chunks in flight: 0 |
|
[state-dump] - num chunks remaining: 0 |
|
[state-dump] - max chunks allowed: 409 |
|
[state-dump] OwnershipBasedObjectDirectory: |
|
[state-dump] - num listeners: 0 |
|
[state-dump] - cumulative location updates: 0 |
|
[state-dump] - num location updates per second: 0.000 |
|
[state-dump] - num location lookups per second: 0.000 |
|
[state-dump] - num locations added per second: 0.000 |
|
[state-dump] - num locations removed per second: 0.000 |
|
[state-dump] BufferPool: |
|
[state-dump] - create buffer state map size: 0 |
|
[state-dump] PullManager: |
|
[state-dump] - num bytes available for pulled objects: 2147483648 |
|
[state-dump] - num bytes being pulled (all): 0 |
|
[state-dump] - num bytes being pulled / pinned: 0 |
|
[state-dump] - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} |
|
[state-dump] - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} |
|
[state-dump] - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} |
|
[state-dump] - first get request bundle: N/A |
|
[state-dump] - first wait request bundle: N/A |
|
[state-dump] - first task request bundle: N/A |
|
[state-dump] - num objects queued: 0 |
|
[state-dump] - num objects actively pulled (all): 0 |
|
[state-dump] - num objects actively pulled / pinned: 0 |
|
[state-dump] - num bundles being pulled: 0 |
|
[state-dump] - num pull retries: 0 |
|
[state-dump] - max timeout seconds: 0 |
|
[state-dump] - max timeout request is already processed. No entry. |
|
[state-dump] |
|
[state-dump] WorkerPool: |
|
[state-dump] - registered jobs: 1 |
|
[state-dump] - process_failed_job_config_missing: 0 |
|
[state-dump] - process_failed_rate_limited: 0 |
|
[state-dump] - process_failed_pending_registration: 0 |
|
[state-dump] - process_failed_runtime_env_setup_failed: 0 |
|
[state-dump] - num PYTHON workers: 20 |
|
[state-dump] - num PYTHON drivers: 1 |
|
[state-dump] - num PYTHON pending start requests: 0 |
|
[state-dump] - num PYTHON pending registration requests: 0 |
|
[state-dump] - num object spill callbacks queued: 0 |
|
[state-dump] - num object restore queued: 0 |
|
[state-dump] - num util functions queued: 0 |
|
[state-dump] - num idle workers: 20 |
|
[state-dump] TaskDependencyManager: |
|
[state-dump] - task deps map size: 0 |
|
[state-dump] - get req map size: 0 |
|
[state-dump] - wait req map size: 0 |
|
[state-dump] - local objects map size: 0 |
|
[state-dump] WaitManager: |
|
[state-dump] - num active wait requests: 0 |
|
[state-dump] Subscriber: |
|
[state-dump] Channel WORKER_OBJECT_LOCATIONS_CHANNEL |
|
[state-dump] - cumulative subscribe requests: 0 |
|
[state-dump] - cumulative unsubscribe requests: 0 |
|
[state-dump] - active subscribed publishers: 0 |
|
[state-dump] - cumulative published messages: 0 |
|
[state-dump] - cumulative processed messages: 0 |
|
[state-dump] Channel WORKER_REF_REMOVED_CHANNEL |
|
[state-dump] - cumulative subscribe requests: 0 |
|
[state-dump] - cumulative unsubscribe requests: 0 |
|
[state-dump] - active subscribed publishers: 0 |
|
[state-dump] - cumulative published messages: 0 |
|
[state-dump] - cumulative processed messages: 0 |
|
[state-dump] Channel WORKER_OBJECT_EVICTION |
|
[state-dump] - cumulative subscribe requests: 0 |
|
[state-dump] - cumulative unsubscribe requests: 0 |
|
[state-dump] - active subscribed publishers: 0 |
|
[state-dump] - cumulative published messages: 0 |
|
[state-dump] - cumulative processed messages: 0 |
|
[state-dump] num async plasma notifications: 0 |
|
[state-dump] Remote node managers: |
|
[state-dump] Event stats: |
|
[state-dump] Global stats: 52639 total (35 active) |
|
[state-dump] Queueing time: mean = 17.547 ms, max = 149.071 s, min = 67.000 ns, total = 923.653 s |
|
[state-dump] Execution time: mean = 11.521 ms, total = 606.456 s |
|
[state-dump] Event stats: |
|
[state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog - 12579 total (0 active), Execution time: mean = 496.457 us, total = 6.245 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s |
|
[state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 12579 total (0 active), Execution time: mean = 36.629 us, total = 460.754 ms, Queueing time: mean = 100.690 us, max = 2.189 ms, min = 4.142 us, total = 1.267 s |
|
[state-dump] RaySyncer.OnDemandBroadcasting - 5995 total (1 active), Execution time: mean = 9.377 us, total = 56.216 ms, Queueing time: mean = 81.694 us, max = 3.517 ms, min = 8.344 us, total = 489.757 ms |
|
[state-dump] ObjectManager.UpdateAvailableMemory - 5995 total (0 active), Execution time: mean = 4.965 us, total = 29.765 ms, Queueing time: mean = 95.965 us, max = 9.283 ms, min = 3.503 us, total = 575.309 ms |
|
[state-dump] NodeManager.CheckGC - 5995 total (1 active), Execution time: mean = 2.837 us, total = 17.009 ms, Queueing time: mean = 87.380 us, max = 3.519 ms, min = 6.447 us, total = 523.846 ms |
|
[state-dump] RayletWorkerPool.deadline_timer.kill_idle_workers - 2999 total (1 active), Execution time: mean = 15.991 us, total = 47.958 ms, Queueing time: mean = 65.497 us, max = 992.162 us, min = 9.895 us, total = 196.426 ms |
|
[state-dump] MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 2396 total (1 active), Execution time: mean = 435.089 us, total = 1.042 s, Queueing time: mean = 69.156 us, max = 3.232 ms, min = 8.760 us, total = 165.698 ms |
|
[state-dump] NodeManager.ScheduleAndDispatchTasks - 600 total (1 active), Execution time: mean = 13.915 us, total = 8.349 ms, Queueing time: mean = 76.128 us, max = 2.272 ms, min = 17.139 us, total = 45.677 ms |
|
[state-dump] NodeManager.deadline_timer.spill_objects_when_over_threshold - 600 total (1 active), Execution time: mean = 3.043 us, total = 1.826 ms, Queueing time: mean = 169.740 us, max = 2.205 ms, min = 6.247 us, total = 101.844 ms |
|
[state-dump] NodeManager.deadline_timer.flush_free_objects - 600 total (1 active), Execution time: mean = 7.892 us, total = 4.735 ms, Queueing time: mean = 166.396 us, max = 2.209 ms, min = 9.779 us, total = 99.837 ms |
|
[state-dump] NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 599 total (0 active), Execution time: mean = 104.970 us, total = 62.877 ms, Queueing time: mean = 101.154 us, max = 238.952 us, min = 18.297 us, total = 60.591 ms |
|
[state-dump] NodeManagerService.grpc_server.GetResourceLoad - 599 total (0 active), Execution time: mean = 616.304 us, total = 369.166 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s |
|
[state-dump] ClusterResourceManager.ResetRemoteNodeView - 201 total (1 active), Execution time: mean = 7.744 us, total = 1.557 ms, Queueing time: mean = 72.366 us, max = 253.106 us, min = 11.307 us, total = 14.546 ms |
|
[state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 120 total (0 active), Execution time: mean = 1.285 ms, total = 154.193 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s |
|
[state-dump] NodeManager.GcsCheckAlive - 120 total (1 active), Execution time: mean = 252.150 us, total = 30.258 ms, Queueing time: mean = 601.992 us, max = 2.274 ms, min = 115.311 us, total = 72.239 ms |
|
[state-dump] NodeManager.deadline_timer.record_metrics - 120 total (1 active), Execution time: mean = 515.791 us, total = 61.895 ms, Queueing time: mean = 341.279 us, max = 1.700 ms, min = 9.061 us, total = 40.954 ms |
|
[state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 120 total (0 active), Execution time: mean = 47.600 us, total = 5.712 ms, Queueing time: mean = 98.077 us, max = 241.540 us, min = 14.505 us, total = 11.769 ms |
|
[state-dump] ClientConnection.async_read.ProcessMessageHeader - 94 total (21 active), Execution time: mean = 7.655 us, total = 719.592 us, Queueing time: mean = 9.786 s, max = 149.071 s, min = 27.575 us, total = 919.883 s |
|
[state-dump] ClientConnection.async_read.ProcessMessage - 73 total (0 active), Execution time: mean = 828.378 us, total = 60.472 ms, Queueing time: mean = 68.615 us, max = 1.027 ms, min = 2.835 us, total = 5.009 ms |
|
[state-dump] NodeManager.deadline_timer.debug_state_dump - 60 total (1 active), Execution time: mean = 1.639 ms, total = 98.354 ms, Queueing time: mean = 65.243 us, max = 196.608 us, min = 11.928 us, total = 3.915 ms |
|
[state-dump] ClientConnection.async_write.DoAsyncWrites - 22 total (0 active), Execution time: mean = 1.317 us, total = 28.982 us, Queueing time: mean = 49.767 us, max = 431.510 us, min = 17.047 us, total = 1.095 ms |
|
[state-dump] NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 21 total (0 active), Execution time: mean = 154.352 us, total = 3.241 ms, Queueing time: mean = 162.625 us, max = 432.570 us, min = 33.451 us, total = 3.415 ms |
|
[state-dump] ObjectManager.ObjectAdded - 21 total (0 active), Execution time: mean = 11.466 us, total = 240.795 us, Queueing time: mean = 2.139 ms, max = 21.287 ms, min = 13.920 us, total = 44.925 ms |
|
[state-dump] ObjectManager.ObjectDeleted - 21 total (0 active), Execution time: mean = 21.296 us, total = 447.224 us, Queueing time: mean = 186.718 us, max = 583.430 us, min = 33.345 us, total = 3.921 ms |
|
[state-dump] NodeManagerService.grpc_server.GetSystemConfig - 21 total (0 active), Execution time: mean = 1.630 ms, total = 34.233 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s |
|
[state-dump] PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 217.267 us, total = 2.824 ms, Queueing time: mean = 2.965 ms, max = 8.958 ms, min = 32.982 us, total = 38.551 ms |
|
[state-dump] NodeManager.deadline_timer.print_event_loop_stats - 10 total (1 active, 1 running), Execution time: mean = 2.339 ms, total = 23.393 ms, Queueing time: mean = 43.389 us, max = 107.871 us, min = 17.957 us, total = 433.893 us |
|
[state-dump] RaySyncer.BroadcastMessage - 9 total (0 active), Execution time: mean = 182.148 us, total = 1.639 ms, Queueing time: mean = 558.444 ns, max = 727.000 ns, min = 148.000 ns, total = 5.026 us |
|
[state-dump] - 9 total (0 active), Execution time: mean = 960.889 ns, total = 8.648 us, Queueing time: mean = 82.816 us, max = 165.908 us, min = 25.031 us, total = 745.346 us |
|
[state-dump] NodeManagerService.grpc_server.RequestWorkerLease - 5 total (0 active), Execution time: mean = 783.478 us, total = 3.917 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s |
|
[state-dump] NodeManagerService.grpc_server.RequestWorkerLease.HandleRequestImpl - 5 total (0 active), Execution time: mean = 235.036 us, total = 1.175 ms, Queueing time: mean = 96.003 us, max = 123.315 us, min = 37.134 us, total = 480.014 us |
|
[state-dump] NodeManagerService.grpc_server.ReturnWorker - 5 total (0 active), Execution time: mean = 551.562 us, total = 2.758 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s |
|
[state-dump] NodeManagerService.grpc_server.ReturnWorker.HandleRequestImpl - 5 total (0 active), Execution time: mean = 98.072 us, total = 490.362 us, Queueing time: mean = 50.658 us, max = 140.746 us, min = 19.878 us, total = 253.289 us |
|
[state-dump] WorkerPool.PopWorkerCallback - 5 total (0 active), Execution time: mean = 47.193 us, total = 235.967 us, Queueing time: mean = 28.959 us, max = 38.510 us, min = 20.335 us, total = 144.797 us |
|
[state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 3 total (1 active), Execution time: mean = 198.863 s, total = 596.590 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s |
|
[state-dump] RaySyncerRegister - 2 total (0 active), Execution time: mean = 2.083 us, total = 4.165 us, Queueing time: mean = 301.000 ns, max = 535.000 ns, min = 67.000 ns, total = 602.000 ns |
|
[state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 2 total (0 active), Execution time: mean = 355.635 us, total = 711.270 us, Queueing time: mean = 123.462 us, max = 133.083 us, min = 113.841 us, total = 246.924 us |
|
[state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 129.555 us, total = 259.110 us, Queueing time: mean = 655.112 us, max = 1.180 ms, min = 129.843 us, total = 1.310 ms |
|
[state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 1.377 ms, total = 2.754 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s |
|
[state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 2.419 ms, total = 2.419 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s |
|
[state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 342.194 us, total = 342.194 us, Queueing time: mean = 163.766 us, max = 163.766 us, min = 163.766 us, total = 163.766 us |
|
[state-dump] Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 96.544 us, total = 96.544 us, Queueing time: mean = 315.750 us, max = 315.750 us, min = 315.750 us, total = 315.750 us |
|
[state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 1.019 s, total = 1.019 s, Queueing time: mean = 90.737 us, max = 90.737 us, min = 90.737 us, total = 90.737 us |
|
[state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 1.553 ms, total = 1.553 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s |
|
[state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 1.873 ms, total = 1.873 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s |
|
[state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 29.991 us, total = 29.991 us, Queueing time: mean = 111.550 us, max = 111.550 us, min = 111.550 us, total = 111.550 us |
|
[state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 1.569 ms, total = 1.569 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s |
|
[state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 1.597 ms, total = 1.597 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s |
|
[state-dump] NodeManager.GCTaskFailureReason - 1 total (1 active), Execution time: mean = 0.000 s, total = 0.000 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s |
|
[state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 224.222 us, total = 224.222 us, Queueing time: mean = 119.308 us, max = 119.308 us, min = 119.308 us, total = 119.308 us |
|
[state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 137.575 us, total = 137.575 us, Queueing time: mean = 36.079 us, max = 36.079 us, min = 36.079 us, total = 36.079 us |
|
[state-dump] DebugString() time ms: 2 |
|
[state-dump] |
|
[state-dump] |
|
[2025-01-21 05:58:01,514 I 18747 18747] (raylet) node_manager.cc:1481: NodeManager::DisconnectClient, disconnect_type=3, has creation task exception = false |
|
[2025-01-21 05:58:01,515 I 18747 18747] (raylet) node_manager.cc:1586: Driver (pid=18344) is disconnected. worker_id=01000000ffffffffffffffffffffffffffffffffffffffffffffffff job_id=01000000 |
|
[2025-01-21 05:58:01,520 I 18747 18747] (raylet) worker_pool.cc:692: Job 01000000 already started in worker pool. |
|
[2025-01-21 05:58:01,583 I 18747 18747] (raylet) worker_pool.cc:1119: Force exiting worker whose job has exited 9d27394de3945eeed2c61d91251e608609fbaf1c7ba84a0c6d70972d |
|
[2025-01-21 05:58:01,592 I 18747 18747] (raylet) node_manager.cc:1481: NodeManager::DisconnectClient, disconnect_type=1, has creation task exception = false |
|
[2025-01-21 05:58:01,635 I 18747 18747] (raylet) main.cc:454: received SIGTERM. Existing local drain request = None |
|
[2025-01-21 05:58:01,635 I 18747 18747] (raylet) main.cc:255: Raylet graceful shutdown triggered, reason = EXPECTED_TERMINATION, reason message = received SIGTERM |
|
[2025-01-21 05:58:01,635 I 18747 18747] (raylet) main.cc:258: Shutting down... |
|
[2025-01-21 05:58:01,635 I 18747 18747] (raylet) accessor.cc:510: Unregistering node node_id=381e636a10e4140b2e9620d2650d6a018da067c3591f2305edfa793d |
|
[2025-01-21 05:58:01,637 I 18747 18747] (raylet) accessor.cc:523: Finished unregistering node info, status = OK node_id=381e636a10e4140b2e9620d2650d6a018da067c3591f2305edfa793d |
|
[2025-01-21 05:58:01,642 I 18747 18747] (raylet) agent_manager.cc:112: Killing agent dashboard_agent/424238335, pid 18839. |
|
[2025-01-21 05:58:01,654 I 18747 18840] (raylet) agent_manager.cc:79: Agent process with name dashboard_agent/424238335 exited, exit code 0. |
|
[2025-01-21 05:58:01,654 I 18747 18747] (raylet) agent_manager.cc:112: Killing agent runtime_env_agent, pid 18841. |
|
[2025-01-21 05:58:01,663 I 18747 18842] (raylet) agent_manager.cc:79: Agent process with name runtime_env_agent exited, exit code 0. |
|
[2025-01-21 05:58:01,663 I 18747 18747] (raylet) io_service_pool.cc:47: IOServicePool is stopped. |
|
[2025-01-21 05:58:01,762 I 18747 18747] (raylet) stats.h:120: Stats module has shutdown. |
|
|