File size: 17,569 Bytes
c011401 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 |
NodeManager: Node ID: ed5029aed12dfb118ce5ec8eeddd392389018cb6a21a8c84b00cbebd Node name: 192.168.0.2 InitialConfigResources: {node:192.168.0.2: 10000, CPU: 200000, accelerator_type:A40: 10000, memory: 846480855040000, node:__internal_head__: 10000, object_store_memory: 21474836480000, GPU: 20000} ClusterTaskManager: ========== Node: ed5029aed12dfb118ce5ec8eeddd392389018cb6a21a8c84b00cbebd ================= Infeasible queue length: 0 Schedule queue length: 0 Dispatch queue length: 0 num_waiting_for_resource: 0 num_waiting_for_plasma_memory: 0 num_waiting_for_remote_node_resources: 0 num_worker_not_started_by_job_config_not_exist: 0 num_worker_not_started_by_registration_timeout: 0 num_tasks_waiting_for_workers: 0 num_cancelled_tasks: 0 cluster_resource_scheduler state: Local id: -609853312980384924 Local resources: {"total":{GPU: [10000, 10000], node:192.168.0.2: [10000], accelerator_type:A40: [10000], CPU: [200000], object_store_memory: [21474836480000], node:__internal_head__: [10000], memory: [846480855040000]}}, "available": {GPU: [10000, 10000], node:192.168.0.2: [10000], accelerator_type:A40: [10000], CPU: [200000], object_store_memory: [21474836480000], node:__internal_head__: [10000], memory: [846480855040000]}}, "labels":{"ray.io/node_id":"ed5029aed12dfb118ce5ec8eeddd392389018cb6a21a8c84b00cbebd",} is_draining: 0 is_idle: 1 Cluster resources: node id: -609853312980384924{"total":{accelerator_type:A40: 10000, node:192.168.0.2: 10000, GPU: 20000, memory: 846480855040000, CPU: 200000, object_store_memory: 21474836480000, node:__internal_head__: 10000}}, "available": {accelerator_type:A40: 10000, node:192.168.0.2: 10000, object_store_memory: 21474836480000, CPU: 200000, node:__internal_head__: 10000, memory: 846480855040000, GPU: 20000}}, "labels":{"ray.io/node_id":"ed5029aed12dfb118ce5ec8eeddd392389018cb6a21a8c84b00cbebd",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} Waiting tasks size: 0 Number of executing tasks: 0 Number of pinned task arguments: 0 Number of total spilled tasks: 0 Number of spilled waiting tasks: 0 Number of spilled unschedulable tasks: 0 Resource usage { } Backlog Size per scheduling descriptor :{workerId: num backlogs}: Running tasks by scheduling class: ================================================== ClusterResources: LocalObjectManager: - num pinned objects: 0 - pinned objects size: 0 - num objects pending restore: 0 - num objects pending spill: 0 - num bytes pending spill: 0 - num bytes currently spilled: 0 - cumulative spill requests: 0 - cumulative restore requests: 0 - spilled objects pending delete: 0 ObjectManager: - num local objects: 0 - num unfulfilled push requests: 0 - num object pull requests: 0 - num chunks received total: 0 - num chunks received failed (all): 0 - num chunks received failed / cancelled: 0 - num chunks received failed / plasma error: 0 Event stats: Global stats: 0 total (0 active) Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s Execution time: mean = -nan s, total = 0.000 s Event stats: PushManager: - num pushes in flight: 0 - num chunks in flight: 0 - num chunks remaining: 0 - max chunks allowed: 409 OwnershipBasedObjectDirectory: - num listeners: 0 - cumulative location updates: 0 - num location updates per second: 0.000 - num location lookups per second: 0.000 - num locations added per second: 0.000 - num locations removed per second: 0.000 BufferPool: - create buffer state map size: 0 PullManager: - num bytes available for pulled objects: 2147483648 - num bytes being pulled (all): 0 - num bytes being pulled / pinned: 0 - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} - first get request bundle: N/A - first wait request bundle: N/A - first task request bundle: N/A - num objects queued: 0 - num objects actively pulled (all): 0 - num objects actively pulled / pinned: 0 - num bundles being pulled: 0 - num pull retries: 0 - max timeout seconds: 0 - max timeout request is already processed. No entry. WorkerPool: - registered jobs: 1 - process_failed_job_config_missing: 0 - process_failed_rate_limited: 0 - process_failed_pending_registration: 0 - process_failed_runtime_env_setup_failed: 0 - num PYTHON workers: 20 - num PYTHON drivers: 1 - num PYTHON pending start requests: 0 - num PYTHON pending registration requests: 0 - num object spill callbacks queued: 0 - num object restore queued: 0 - num util functions queued: 0 - num idle workers: 20 TaskDependencyManager: - task deps map size: 0 - get req map size: 0 - wait req map size: 0 - local objects map size: 0 WaitManager: - num active wait requests: 0 Subscriber: Channel WORKER_OBJECT_EVICTION - cumulative subscribe requests: 0 - cumulative unsubscribe requests: 0 - active subscribed publishers: 0 - cumulative published messages: 0 - cumulative processed messages: 0 Channel WORKER_REF_REMOVED_CHANNEL - cumulative subscribe requests: 0 - cumulative unsubscribe requests: 0 - active subscribed publishers: 0 - cumulative published messages: 0 - cumulative processed messages: 0 Channel WORKER_OBJECT_LOCATIONS_CHANNEL - cumulative subscribe requests: 0 - cumulative unsubscribe requests: 0 - active subscribed publishers: 0 - cumulative published messages: 0 - cumulative processed messages: 0 num async plasma notifications: 0 Remote node managers: Event stats: Global stats: 327336 total (35 active) Queueing time: mean = 157.796 ms, max = 1921.160 s, min = -0.001 s, total = 51652.241 s Execution time: mean = 11.183 ms, total = 3660.583 s Event stats: NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 78537 total (0 active), Execution time: mean = 36.782 us, total = 2.889 s, Queueing time: mean = 107.699 us, max = 3.225 ms, min = 1.438 us, total = 8.458 s NodeManagerService.grpc_server.ReportWorkerBacklog - 78537 total (0 active), Execution time: mean = 525.207 us, total = 41.248 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s NodeManager.CheckGC - 37369 total (1 active), Execution time: mean = 3.184 us, total = 118.978 ms, Queueing time: mean = 99.695 us, max = 51.449 ms, min = 3.386 us, total = 3.726 s RaySyncer.OnDemandBroadcasting - 37369 total (1 active), Execution time: mean = 11.670 us, total = 436.105 ms, Queueing time: mean = 92.257 us, max = 51.440 ms, min = 7.347 us, total = 3.448 s ObjectManager.UpdateAvailableMemory - 37368 total (0 active), Execution time: mean = 5.998 us, total = 224.145 ms, Queueing time: mean = 104.379 us, max = 1.031 ms, min = 2.098 us, total = 3.900 s RayletWorkerPool.deadline_timer.kill_idle_workers - 18695 total (1 active), Execution time: mean = 19.301 us, total = 360.831 ms, Queueing time: mean = 76.564 us, max = 13.722 ms, min = 4.133 us, total = 1.431 s MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 14932 total (1 active), Execution time: mean = 456.426 us, total = 6.815 s, Queueing time: mean = 75.037 us, max = 1.472 ms, min = -0.001 s, total = 1.120 s NodeManager.ScheduleAndDispatchTasks - 3741 total (1 active), Execution time: mean = 15.314 us, total = 57.290 ms, Queueing time: mean = 67.187 us, max = 2.582 ms, min = 6.718 us, total = 251.345 ms NodeManager.deadline_timer.flush_free_objects - 3740 total (1 active), Execution time: mean = 9.528 us, total = 35.635 ms, Queueing time: mean = 182.702 us, max = 2.380 ms, min = 160.000 ns, total = 683.306 ms NodeManager.deadline_timer.spill_objects_when_over_threshold - 3740 total (1 active), Execution time: mean = 2.970 us, total = 11.107 ms, Queueing time: mean = 186.949 us, max = 2.379 ms, min = 4.508 us, total = 699.189 ms NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 3739 total (0 active), Execution time: mean = 102.185 us, total = 382.070 ms, Queueing time: mean = 112.302 us, max = 1.512 ms, min = 4.918 us, total = 419.896 ms NodeManagerService.grpc_server.GetResourceLoad - 3739 total (0 active), Execution time: mean = 625.139 us, total = 2.337 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s ClusterResourceManager.ResetRemoteNodeView - 1247 total (1 active), Execution time: mean = 9.266 us, total = 11.555 ms, Queueing time: mean = 72.901 us, max = 363.446 us, min = 7.807 us, total = 90.907 ms NodeManager.GcsCheckAlive - 748 total (1 active), Execution time: mean = 324.276 us, total = 242.559 ms, Queueing time: mean = 623.636 us, max = 2.445 ms, min = 6.025 us, total = 466.480 ms ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 748 total (0 active), Execution time: mean = 55.115 us, total = 41.226 ms, Queueing time: mean = 105.471 us, max = 307.469 us, min = 11.913 us, total = 78.893 ms ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 748 total (0 active), Execution time: mean = 1.559 ms, total = 1.166 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s NodeManager.deadline_timer.record_metrics - 748 total (1 active), Execution time: mean = 553.899 us, total = 414.316 ms, Queueing time: mean = 394.386 us, max = 2.257 ms, min = 8.454 us, total = 295.001 ms NodeManager.deadline_timer.debug_state_dump - 374 total (1 active, 1 running), Execution time: mean = 1.822 ms, total = 681.602 ms, Queueing time: mean = 73.015 us, max = 183.426 us, min = 11.269 us, total = 27.308 ms ClientConnection.async_read.ProcessMessageHeader - 241 total (21 active), Execution time: mean = 8.391 us, total = 2.022 ms, Queueing time: mean = 214.220 s, max = 1921.160 s, min = 23.644 us, total = 51627.016 s ClientConnection.async_read.ProcessMessage - 220 total (0 active), Execution time: mean = 355.794 us, total = 78.275 ms, Queueing time: mean = 20.354 us, max = 494.085 us, min = 2.397 us, total = 4.478 ms NodeManagerService.grpc_server.RequestWorkerLease - 83 total (0 active), Execution time: mean = 48.706 ms, total = 4.043 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s NodeManagerService.grpc_server.RequestWorkerLease.HandleRequestImpl - 83 total (0 active), Execution time: mean = 100.509 us, total = 8.342 ms, Queueing time: mean = 180.288 us, max = 674.029 us, min = 6.921 us, total = 14.964 ms NodeManagerService.grpc_server.ReturnWorker.HandleRequestImpl - 73 total (0 active), Execution time: mean = 105.669 us, total = 7.714 ms, Queueing time: mean = 101.687 us, max = 252.805 us, min = 19.400 us, total = 7.423 ms NodeManagerService.grpc_server.ReturnWorker - 73 total (0 active), Execution time: mean = 589.105 us, total = 43.005 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s WorkerPool.PopWorkerCallback - 73 total (0 active), Execution time: mean = 36.946 us, total = 2.697 ms, Queueing time: mean = 164.270 us, max = 539.776 us, min = 15.433 us, total = 11.992 ms - 65 total (0 active), Execution time: mean = 913.708 ns, total = 59.391 us, Queueing time: mean = 98.104 us, max = 237.802 us, min = 20.527 us, total = 6.377 ms RaySyncer.BroadcastMessage - 65 total (0 active), Execution time: mean = 214.865 us, total = 13.966 ms, Queueing time: mean = 691.308 ns, max = 1.206 us, min = 91.000 ns, total = 44.935 us NodeManager.deadline_timer.print_event_loop_stats - 63 total (1 active), Execution time: mean = 2.828 ms, total = 178.176 ms, Queueing time: mean = 70.600 us, max = 169.082 us, min = 13.745 us, total = 4.448 ms ClientConnection.async_write.DoAsyncWrites - 22 total (0 active), Execution time: mean = 1.435 us, total = 31.563 us, Queueing time: mean = 66.914 us, max = 367.875 us, min = 8.996 us, total = 1.472 ms NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 21 total (0 active), Execution time: mean = 129.760 us, total = 2.725 ms, Queueing time: mean = 104.003 us, max = 159.261 us, min = 13.379 us, total = 2.184 ms NodeManagerService.grpc_server.GetSystemConfig - 21 total (0 active), Execution time: mean = 1.379 ms, total = 28.968 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s ObjectManager.ObjectDeleted - 21 total (0 active), Execution time: mean = 20.184 us, total = 423.864 us, Queueing time: mean = 120.258 us, max = 235.929 us, min = 29.898 us, total = 2.525 ms ObjectManager.ObjectAdded - 21 total (0 active), Execution time: mean = 14.666 us, total = 307.984 us, Queueing time: mean = 125.073 us, max = 550.635 us, min = 8.198 us, total = 2.627 ms PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 439.203 us, total = 5.710 ms, Queueing time: mean = 4.809 ms, max = 12.424 ms, min = 61.413 us, total = 62.521 ms NodeManagerService.grpc_server.CancelWorkerLease.HandleRequestImpl - 10 total (0 active), Execution time: mean = 84.638 us, total = 846.385 us, Queueing time: mean = 261.769 us, max = 482.894 us, min = 112.774 us, total = 2.618 ms NodeManagerService.grpc_server.CancelWorkerLease - 10 total (0 active), Execution time: mean = 875.644 us, total = 8.756 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 8 total (1 active), Execution time: mean = 449.831 s, total = 3598.645 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 7 total (0 active), Execution time: mean = 422.222 us, total = 2.956 ms, Queueing time: mean = 100.169 us, max = 238.879 us, min = 29.074 us, total = 701.180 us NodeManager.GCTaskFailureReason - 5 total (1 active), Execution time: mean = 8.422 us, total = 42.108 us, Queueing time: mean = 68.676 us, max = 126.591 us, min = 59.744 us, total = 343.378 us ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 137.931 us, total = 275.863 us, Queueing time: mean = 2.023 ms, max = 4.028 ms, min = 18.196 us, total = 4.047 ms ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 1.513 ms, total = 3.027 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s RaySyncerRegister - 2 total (0 active), Execution time: mean = 1.986 us, total = 3.973 us, Queueing time: mean = 180.500 ns, max = 284.000 ns, min = 77.000 ns, total = 361.000 ns ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 1.721 ms, total = 1.721 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 1.482 ms, total = 1.482 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s ray::rpc::JobInfoGcsService.grpc_client.ReportJobError - 1 total (0 active), Execution time: mean = 1.897 ms, total = 1.897 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 142.402 us, total = 142.402 us, Queueing time: mean = 115.097 us, max = 115.097 us, min = 115.097 us, total = 115.097 us Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 69.860 us, total = 69.860 us, Queueing time: mean = 301.959 us, max = 301.959 us, min = 301.959 us, total = 301.959 us ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 2.102 ms, total = 2.102 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 20.913 us, total = 20.913 us, Queueing time: mean = 20.083 us, max = 20.083 us, min = 20.083 us, total = 20.083 us ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 585.655 us, total = 585.655 us, Queueing time: mean = 25.912 us, max = 25.912 us, min = 25.912 us, total = 25.912 us ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 22.315 ms, total = 22.315 ms, Queueing time: mean = 78.086 us, max = 78.086 us, min = 78.086 us, total = 78.086 us ray::rpc::JobInfoGcsService.grpc_client.ReportJobError.OnReplyReceived - 1 total (0 active), Execution time: mean = 71.261 us, total = 71.261 us, Queueing time: mean = 144.822 us, max = 144.822 us, min = 144.822 us, total = 144.822 us ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 1.647 ms, total = 1.647 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 53.657 us, total = 53.657 us, Queueing time: mean = 375.226 us, max = 375.226 us, min = 375.226 us, total = 375.226 us ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 2.339 ms, total = 2.339 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s DebugString() time ms: 1 |