File size: 16,661 Bytes
c011401 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 |
NodeManager: Node ID: 381e636a10e4140b2e9620d2650d6a018da067c3591f2305edfa793d Node name: 192.168.0.2 InitialConfigResources: {object_store_memory: 21474836480000, node:192.168.0.2: 10000, node:__internal_head__: 10000, memory: 752056999940000, accelerator_type:A40: 10000, GPU: 20000, CPU: 200000} ClusterTaskManager: ========== Node: 381e636a10e4140b2e9620d2650d6a018da067c3591f2305edfa793d ================= Infeasible queue length: 0 Schedule queue length: 0 Dispatch queue length: 0 num_waiting_for_resource: 0 num_waiting_for_plasma_memory: 0 num_waiting_for_remote_node_resources: 0 num_worker_not_started_by_job_config_not_exist: 0 num_worker_not_started_by_registration_timeout: 0 num_tasks_waiting_for_workers: 0 num_cancelled_tasks: 0 cluster_resource_scheduler state: Local id: 688648627895828852 Local resources: {"total":{node:__internal_head__: [10000], accelerator_type:A40: [10000], node:192.168.0.2: [10000], CPU: [200000], memory: [752056999940000], GPU: [10000, 10000], object_store_memory: [21474836480000]}}, "available": {node:__internal_head__: [10000], accelerator_type:A40: [10000], node:192.168.0.2: [10000], CPU: [200000], memory: [752056999940000], GPU: [10000, 10000], object_store_memory: [21474836480000]}}, "labels":{"ray.io/node_id":"381e636a10e4140b2e9620d2650d6a018da067c3591f2305edfa793d",} is_draining: 0 is_idle: 1 Cluster resources: node id: 688648627895828852{"total":{object_store_memory: 21474836480000, node:__internal_head__: 10000, node:192.168.0.2: 10000, accelerator_type:A40: 10000, GPU: 20000, memory: 752056999940000, CPU: 200000}}, "available": {object_store_memory: 21474836480000, memory: 752056999940000, node:__internal_head__: 10000, accelerator_type:A40: 10000, GPU: 20000, node:192.168.0.2: 10000, CPU: 200000}}, "labels":{"ray.io/node_id":"381e636a10e4140b2e9620d2650d6a018da067c3591f2305edfa793d",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} Waiting tasks size: 0 Number of executing tasks: 0 Number of pinned task arguments: 0 Number of total spilled tasks: 0 Number of spilled waiting tasks: 0 Number of spilled unschedulable tasks: 0 Resource usage { } Backlog Size per scheduling descriptor :{workerId: num backlogs}: Running tasks by scheduling class: ================================================== ClusterResources: LocalObjectManager: - num pinned objects: 0 - pinned objects size: 0 - num objects pending restore: 0 - num objects pending spill: 0 - num bytes pending spill: 0 - num bytes currently spilled: 0 - cumulative spill requests: 0 - cumulative restore requests: 0 - spilled objects pending delete: 0 ObjectManager: - num local objects: 0 - num unfulfilled push requests: 0 - num object pull requests: 0 - num chunks received total: 0 - num chunks received failed (all): 0 - num chunks received failed / cancelled: 0 - num chunks received failed / plasma error: 0 Event stats: Global stats: 0 total (0 active) Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s Execution time: mean = -nan s, total = 0.000 s Event stats: PushManager: - num pushes in flight: 0 - num chunks in flight: 0 - num chunks remaining: 0 - max chunks allowed: 409 OwnershipBasedObjectDirectory: - num listeners: 0 - cumulative location updates: 0 - num location updates per second: 0.000 - num location lookups per second: 0.000 - num locations added per second: 0.000 - num locations removed per second: 0.000 BufferPool: - create buffer state map size: 0 PullManager: - num bytes available for pulled objects: 2147483648 - num bytes being pulled (all): 0 - num bytes being pulled / pinned: 0 - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} - first get request bundle: N/A - first wait request bundle: N/A - first task request bundle: N/A - num objects queued: 0 - num objects actively pulled (all): 0 - num objects actively pulled / pinned: 0 - num bundles being pulled: 0 - num pull retries: 0 - max timeout seconds: 0 - max timeout request is already processed. No entry. WorkerPool: - registered jobs: 1 - process_failed_job_config_missing: 0 - process_failed_rate_limited: 0 - process_failed_pending_registration: 0 - process_failed_runtime_env_setup_failed: 0 - num PYTHON workers: 20 - num PYTHON drivers: 1 - num PYTHON pending start requests: 0 - num PYTHON pending registration requests: 0 - num object spill callbacks queued: 0 - num object restore queued: 0 - num util functions queued: 0 - num idle workers: 20 TaskDependencyManager: - task deps map size: 0 - get req map size: 0 - wait req map size: 0 - local objects map size: 0 WaitManager: - num active wait requests: 0 Subscriber: Channel WORKER_OBJECT_LOCATIONS_CHANNEL - cumulative subscribe requests: 0 - cumulative unsubscribe requests: 0 - active subscribed publishers: 0 - cumulative published messages: 0 - cumulative processed messages: 0 Channel WORKER_REF_REMOVED_CHANNEL - cumulative subscribe requests: 0 - cumulative unsubscribe requests: 0 - active subscribed publishers: 0 - cumulative published messages: 0 - cumulative processed messages: 0 Channel WORKER_OBJECT_EVICTION - cumulative subscribe requests: 0 - cumulative unsubscribe requests: 0 - active subscribed publishers: 0 - cumulative published messages: 0 - cumulative processed messages: 0 num async plasma notifications: 0 Remote node managers: Event stats: Global stats: 54426 total (35 active) Queueing time: mean = 22.293 ms, max = 149.071 s, min = 67.000 ns, total = 1213.324 s Execution time: mean = 11.148 ms, total = 606.759 s Event stats: NodeManagerService.grpc_server.ReportWorkerBacklog - 13011 total (0 active), Execution time: mean = 496.742 us, total = 6.463 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 13011 total (0 active), Execution time: mean = 36.437 us, total = 474.085 ms, Queueing time: mean = 100.731 us, max = 2.189 ms, min = 4.142 us, total = 1.311 s RaySyncer.OnDemandBroadcasting - 6196 total (1 active), Execution time: mean = 9.386 us, total = 58.156 ms, Queueing time: mean = 81.554 us, max = 3.517 ms, min = 8.344 us, total = 505.306 ms ObjectManager.UpdateAvailableMemory - 6196 total (0 active), Execution time: mean = 4.952 us, total = 30.680 ms, Queueing time: mean = 95.721 us, max = 9.283 ms, min = 3.503 us, total = 593.086 ms NodeManager.CheckGC - 6196 total (1 active), Execution time: mean = 2.838 us, total = 17.584 ms, Queueing time: mean = 87.248 us, max = 3.519 ms, min = 6.447 us, total = 540.589 ms RayletWorkerPool.deadline_timer.kill_idle_workers - 3100 total (1 active), Execution time: mean = 15.985 us, total = 49.554 ms, Queueing time: mean = 65.442 us, max = 992.162 us, min = 9.895 us, total = 202.872 ms MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 2476 total (1 active), Execution time: mean = 434.392 us, total = 1.076 s, Queueing time: mean = 69.240 us, max = 3.232 ms, min = 8.760 us, total = 171.439 ms NodeManager.ScheduleAndDispatchTasks - 621 total (1 active), Execution time: mean = 13.903 us, total = 8.634 ms, Queueing time: mean = 75.585 us, max = 2.272 ms, min = 12.508 us, total = 46.938 ms NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 620 total (0 active), Execution time: mean = 104.860 us, total = 65.013 ms, Queueing time: mean = 101.329 us, max = 238.952 us, min = 18.297 us, total = 62.824 ms NodeManager.deadline_timer.spill_objects_when_over_threshold - 620 total (1 active), Execution time: mean = 3.040 us, total = 1.885 ms, Queueing time: mean = 169.907 us, max = 2.205 ms, min = 6.247 us, total = 105.342 ms NodeManager.deadline_timer.flush_free_objects - 620 total (1 active), Execution time: mean = 7.922 us, total = 4.912 ms, Queueing time: mean = 166.580 us, max = 2.209 ms, min = 9.779 us, total = 103.280 ms NodeManagerService.grpc_server.GetResourceLoad - 620 total (0 active), Execution time: mean = 616.940 us, total = 382.503 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s ClusterResourceManager.ResetRemoteNodeView - 207 total (1 active), Execution time: mean = 7.768 us, total = 1.608 ms, Queueing time: mean = 72.296 us, max = 253.106 us, min = 11.307 us, total = 14.965 ms ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 124 total (0 active), Execution time: mean = 1.286 ms, total = 159.513 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s NodeManager.GcsCheckAlive - 124 total (1 active), Execution time: mean = 253.644 us, total = 31.452 ms, Queueing time: mean = 601.611 us, max = 2.274 ms, min = 115.311 us, total = 74.600 ms NodeManager.deadline_timer.record_metrics - 124 total (1 active), Execution time: mean = 516.252 us, total = 64.015 ms, Queueing time: mean = 340.401 us, max = 1.700 ms, min = 9.061 us, total = 42.210 ms ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 124 total (0 active), Execution time: mean = 47.576 us, total = 5.899 ms, Queueing time: mean = 97.622 us, max = 241.540 us, min = 14.505 us, total = 12.105 ms ClientConnection.async_read.ProcessMessageHeader - 96 total (21 active), Execution time: mean = 7.737 us, total = 742.799 us, Queueing time: mean = 12.598 s, max = 149.071 s, min = 27.575 us, total = 1209.432 s ClientConnection.async_read.ProcessMessage - 75 total (0 active), Execution time: mean = 806.725 us, total = 60.504 ms, Queueing time: mean = 67.160 us, max = 1.027 ms, min = 2.835 us, total = 5.037 ms NodeManager.deadline_timer.debug_state_dump - 62 total (1 active, 1 running), Execution time: mean = 1.638 ms, total = 101.551 ms, Queueing time: mean = 64.617 us, max = 196.608 us, min = 11.928 us, total = 4.006 ms ClientConnection.async_write.DoAsyncWrites - 22 total (0 active), Execution time: mean = 1.317 us, total = 28.982 us, Queueing time: mean = 49.767 us, max = 431.510 us, min = 17.047 us, total = 1.095 ms NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 21 total (0 active), Execution time: mean = 154.352 us, total = 3.241 ms, Queueing time: mean = 162.625 us, max = 432.570 us, min = 33.451 us, total = 3.415 ms ObjectManager.ObjectAdded - 21 total (0 active), Execution time: mean = 11.466 us, total = 240.795 us, Queueing time: mean = 2.139 ms, max = 21.287 ms, min = 13.920 us, total = 44.925 ms ObjectManager.ObjectDeleted - 21 total (0 active), Execution time: mean = 21.296 us, total = 447.224 us, Queueing time: mean = 186.718 us, max = 583.430 us, min = 33.345 us, total = 3.921 ms NodeManagerService.grpc_server.GetSystemConfig - 21 total (0 active), Execution time: mean = 1.630 ms, total = 34.233 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 217.267 us, total = 2.824 ms, Queueing time: mean = 2.965 ms, max = 8.958 ms, min = 32.982 us, total = 38.551 ms NodeManager.deadline_timer.print_event_loop_stats - 11 total (1 active), Execution time: mean = 2.383 ms, total = 26.214 ms, Queueing time: mean = 42.203 us, max = 107.871 us, min = 17.957 us, total = 464.235 us RaySyncer.BroadcastMessage - 10 total (0 active), Execution time: mean = 182.922 us, total = 1.829 ms, Queueing time: mean = 565.700 ns, max = 727.000 ns, min = 148.000 ns, total = 5.657 us - 10 total (0 active), Execution time: mean = 928.300 ns, total = 9.283 us, Queueing time: mean = 76.912 us, max = 165.908 us, min = 23.770 us, total = 769.116 us NodeManagerService.grpc_server.RequestWorkerLease - 6 total (0 active), Execution time: mean = 776.617 us, total = 4.660 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s NodeManagerService.grpc_server.RequestWorkerLease.HandleRequestImpl - 6 total (0 active), Execution time: mean = 229.578 us, total = 1.377 ms, Queueing time: mean = 99.063 us, max = 123.315 us, min = 37.134 us, total = 594.378 us NodeManagerService.grpc_server.ReturnWorker - 6 total (0 active), Execution time: mean = 538.559 us, total = 3.231 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s NodeManagerService.grpc_server.ReturnWorker.HandleRequestImpl - 6 total (0 active), Execution time: mean = 95.401 us, total = 572.407 us, Queueing time: mean = 43.422 us, max = 140.746 us, min = 7.240 us, total = 260.529 us WorkerPool.PopWorkerCallback - 6 total (0 active), Execution time: mean = 47.279 us, total = 283.677 us, Queueing time: mean = 29.447 us, max = 38.510 us, min = 20.335 us, total = 176.684 us ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 3 total (1 active), Execution time: mean = 198.863 s, total = 596.590 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s RaySyncerRegister - 2 total (0 active), Execution time: mean = 2.083 us, total = 4.165 us, Queueing time: mean = 301.000 ns, max = 535.000 ns, min = 67.000 ns, total = 602.000 ns ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 2 total (0 active), Execution time: mean = 355.635 us, total = 711.270 us, Queueing time: mean = 123.462 us, max = 133.083 us, min = 113.841 us, total = 246.924 us ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 129.555 us, total = 259.110 us, Queueing time: mean = 655.112 us, max = 1.180 ms, min = 129.843 us, total = 1.310 ms ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 1.377 ms, total = 2.754 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 2.419 ms, total = 2.419 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 342.194 us, total = 342.194 us, Queueing time: mean = 163.766 us, max = 163.766 us, min = 163.766 us, total = 163.766 us Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 96.544 us, total = 96.544 us, Queueing time: mean = 315.750 us, max = 315.750 us, min = 315.750 us, total = 315.750 us ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 1.019 s, total = 1.019 s, Queueing time: mean = 90.737 us, max = 90.737 us, min = 90.737 us, total = 90.737 us ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 1.553 ms, total = 1.553 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 1.873 ms, total = 1.873 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 29.991 us, total = 29.991 us, Queueing time: mean = 111.550 us, max = 111.550 us, min = 111.550 us, total = 111.550 us ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 1.569 ms, total = 1.569 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 1.597 ms, total = 1.597 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s NodeManager.GCTaskFailureReason - 1 total (1 active), Execution time: mean = 0.000 s, total = 0.000 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 224.222 us, total = 224.222 us, Queueing time: mean = 119.308 us, max = 119.308 us, min = 119.308 us, total = 119.308 us ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 137.575 us, total = 137.575 us, Queueing time: mean = 36.079 us, max = 36.079 us, min = 36.079 us, total = 36.079 us DebugString() time ms: 1 |