File size: 16,633 Bytes
c011401 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 |
NodeManager: Node ID: 959d61ca307c5c3e7967dc4f62c340eb34c700e436feb4955e4f5877 Node name: 192.168.0.2 InitialConfigResources: {node:192.168.0.2: 10000, accelerator_type:A40: 10000, node:__internal_head__: 10000, memory: 779659989000000, object_store_memory: 21474836480000, CPU: 200000, GPU: 20000} ClusterTaskManager: ========== Node: 959d61ca307c5c3e7967dc4f62c340eb34c700e436feb4955e4f5877 ================= Infeasible queue length: 0 Schedule queue length: 0 Dispatch queue length: 0 num_waiting_for_resource: 0 num_waiting_for_plasma_memory: 0 num_waiting_for_remote_node_resources: 0 num_worker_not_started_by_job_config_not_exist: 0 num_worker_not_started_by_registration_timeout: 0 num_tasks_waiting_for_workers: 0 num_cancelled_tasks: 0 cluster_resource_scheduler state: Local id: -3074196584474872412 Local resources: {"total":{node:__internal_head__: [10000], node:192.168.0.2: [10000], GPU: [10000, 10000], CPU: [200000], memory: [779659989000000], object_store_memory: [21474836480000], accelerator_type:A40: [10000]}}, "available": {node:__internal_head__: [10000], node:192.168.0.2: [10000], GPU: [10000, 10000], CPU: [200000], memory: [779659989000000], object_store_memory: [21474836480000], accelerator_type:A40: [10000]}}, "labels":{"ray.io/node_id":"959d61ca307c5c3e7967dc4f62c340eb34c700e436feb4955e4f5877",} is_draining: 0 is_idle: 1 Cluster resources: node id: -3074196584474872412{"total":{node:192.168.0.2: 10000, GPU: 20000, memory: 779659989000000, accelerator_type:A40: 10000, node:__internal_head__: 10000, CPU: 200000, object_store_memory: 21474836480000}}, "available": {node:192.168.0.2: 10000, accelerator_type:A40: 10000, memory: 779659989000000, node:__internal_head__: 10000, object_store_memory: 21474836480000, CPU: 200000, GPU: 20000}}, "labels":{"ray.io/node_id":"959d61ca307c5c3e7967dc4f62c340eb34c700e436feb4955e4f5877",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} Waiting tasks size: 0 Number of executing tasks: 0 Number of pinned task arguments: 0 Number of total spilled tasks: 0 Number of spilled waiting tasks: 0 Number of spilled unschedulable tasks: 0 Resource usage { } Backlog Size per scheduling descriptor :{workerId: num backlogs}: Running tasks by scheduling class: ================================================== ClusterResources: LocalObjectManager: - num pinned objects: 0 - pinned objects size: 0 - num objects pending restore: 0 - num objects pending spill: 0 - num bytes pending spill: 0 - num bytes currently spilled: 0 - cumulative spill requests: 0 - cumulative restore requests: 0 - spilled objects pending delete: 0 ObjectManager: - num local objects: 0 - num unfulfilled push requests: 0 - num object pull requests: 0 - num chunks received total: 0 - num chunks received failed (all): 0 - num chunks received failed / cancelled: 0 - num chunks received failed / plasma error: 0 Event stats: Global stats: 0 total (0 active) Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s Execution time: mean = -nan s, total = 0.000 s Event stats: PushManager: - num pushes in flight: 0 - num chunks in flight: 0 - num chunks remaining: 0 - max chunks allowed: 409 OwnershipBasedObjectDirectory: - num listeners: 0 - cumulative location updates: 0 - num location updates per second: 0.000 - num location lookups per second: 0.000 - num locations added per second: 0.000 - num locations removed per second: 0.000 BufferPool: - create buffer state map size: 0 PullManager: - num bytes available for pulled objects: 2147483648 - num bytes being pulled (all): 0 - num bytes being pulled / pinned: 0 - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} - first get request bundle: N/A - first wait request bundle: N/A - first task request bundle: N/A - num objects queued: 0 - num objects actively pulled (all): 0 - num objects actively pulled / pinned: 0 - num bundles being pulled: 0 - num pull retries: 0 - max timeout seconds: 0 - max timeout request is already processed. No entry. WorkerPool: - registered jobs: 1 - process_failed_job_config_missing: 0 - process_failed_rate_limited: 0 - process_failed_pending_registration: 0 - process_failed_runtime_env_setup_failed: 0 - num PYTHON workers: 20 - num PYTHON drivers: 1 - num PYTHON pending start requests: 0 - num PYTHON pending registration requests: 0 - num object spill callbacks queued: 0 - num object restore queued: 0 - num util functions queued: 0 - num idle workers: 20 TaskDependencyManager: - task deps map size: 0 - get req map size: 0 - wait req map size: 0 - local objects map size: 0 WaitManager: - num active wait requests: 0 Subscriber: Channel WORKER_OBJECT_LOCATIONS_CHANNEL - cumulative subscribe requests: 0 - cumulative unsubscribe requests: 0 - active subscribed publishers: 0 - cumulative published messages: 0 - cumulative processed messages: 0 Channel WORKER_OBJECT_EVICTION - cumulative subscribe requests: 0 - cumulative unsubscribe requests: 0 - active subscribed publishers: 0 - cumulative published messages: 0 - cumulative processed messages: 0 Channel WORKER_REF_REMOVED_CHANNEL - cumulative subscribe requests: 0 - cumulative unsubscribe requests: 0 - active subscribed publishers: 0 - cumulative published messages: 0 - cumulative processed messages: 0 num async plasma notifications: 0 Remote node managers: Event stats: Global stats: 5540 total (35 active) Queueing time: mean = 6.402 ms, max = 25.116 s, min = 57.000 ns, total = 35.468 s Execution time: mean = 511.138 us, total = 2.832 s Event stats: NodeManagerService.grpc_server.ReportWorkerBacklog - 1260 total (0 active), Execution time: mean = 464.660 us, total = 585.472 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 1260 total (0 active), Execution time: mean = 34.055 us, total = 42.910 ms, Queueing time: mean = 92.372 us, max = 396.025 us, min = 4.093 us, total = 116.388 ms NodeManager.CheckGC - 600 total (1 active), Execution time: mean = 3.105 us, total = 1.863 ms, Queueing time: mean = 78.095 us, max = 2.991 ms, min = 8.473 us, total = 46.857 ms RaySyncer.OnDemandBroadcasting - 600 total (1 active), Execution time: mean = 11.912 us, total = 7.147 ms, Queueing time: mean = 70.762 us, max = 2.979 ms, min = 10.189 us, total = 42.457 ms ObjectManager.UpdateAvailableMemory - 600 total (0 active), Execution time: mean = 5.258 us, total = 3.155 ms, Queueing time: mean = 88.884 us, max = 375.599 us, min = 5.091 us, total = 53.330 ms RayletWorkerPool.deadline_timer.kill_idle_workers - 300 total (1 active), Execution time: mean = 18.544 us, total = 5.563 ms, Queueing time: mean = 70.306 us, max = 648.172 us, min = 12.544 us, total = 21.092 ms MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 240 total (1 active), Execution time: mean = 436.074 us, total = 104.658 ms, Queueing time: mean = 70.864 us, max = 1.639 ms, min = 19.225 us, total = 17.007 ms ClientConnection.async_read.ProcessMessageHeader - 86 total (21 active), Execution time: mean = 4.687 us, total = 403.106 us, Queueing time: mean = 407.886 ms, max = 25.116 s, min = 16.268 us, total = 35.078 s ClientConnection.async_read.ProcessMessage - 65 total (0 active), Execution time: mean = 750.865 us, total = 48.806 ms, Queueing time: mean = 28.560 us, max = 397.692 us, min = 2.888 us, total = 1.856 ms NodeManager.ScheduleAndDispatchTasks - 61 total (1 active), Execution time: mean = 14.586 us, total = 889.764 us, Queueing time: mean = 91.744 us, max = 1.577 ms, min = 20.155 us, total = 5.596 ms NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 60 total (0 active), Execution time: mean = 102.042 us, total = 6.123 ms, Queueing time: mean = 82.329 us, max = 181.150 us, min = 13.512 us, total = 4.940 ms NodeManager.deadline_timer.spill_objects_when_over_threshold - 60 total (1 active), Execution time: mean = 2.810 us, total = 168.621 us, Queueing time: mean = 176.670 us, max = 1.542 ms, min = 9.629 us, total = 10.600 ms NodeManager.deadline_timer.flush_free_objects - 60 total (1 active), Execution time: mean = 7.329 us, total = 439.758 us, Queueing time: mean = 173.334 us, max = 1.546 ms, min = 12.114 us, total = 10.400 ms NodeManagerService.grpc_server.GetResourceLoad - 60 total (0 active), Execution time: mean = 549.440 us, total = 32.966 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s ClientConnection.async_write.DoAsyncWrites - 22 total (0 active), Execution time: mean = 977.045 ns, total = 21.495 us, Queueing time: mean = 37.996 us, max = 138.120 us, min = 9.530 us, total = 835.918 us ClusterResourceManager.ResetRemoteNodeView - 21 total (1 active), Execution time: mean = 8.918 us, total = 187.281 us, Queueing time: mean = 70.133 us, max = 147.205 us, min = 39.639 us, total = 1.473 ms ObjectManager.ObjectAdded - 21 total (0 active), Execution time: mean = 10.226 us, total = 214.738 us, Queueing time: mean = 113.424 us, max = 356.656 us, min = 13.491 us, total = 2.382 ms NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 21 total (0 active), Execution time: mean = 53.386 us, total = 1.121 ms, Queueing time: mean = 49.966 us, max = 200.350 us, min = 3.857 us, total = 1.049 ms ObjectManager.ObjectDeleted - 21 total (0 active), Execution time: mean = 16.456 us, total = 345.584 us, Queueing time: mean = 89.691 us, max = 194.377 us, min = 20.318 us, total = 1.884 ms NodeManagerService.grpc_server.GetSystemConfig - 21 total (0 active), Execution time: mean = 720.054 us, total = 15.121 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 169.893 us, total = 2.209 ms, Queueing time: mean = 2.849 ms, max = 9.306 ms, min = 19.328 us, total = 37.031 ms ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 12 total (0 active), Execution time: mean = 1.311 ms, total = 15.729 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 12 total (0 active), Execution time: mean = 44.350 us, total = 532.199 us, Queueing time: mean = 95.656 us, max = 150.995 us, min = 12.388 us, total = 1.148 ms NodeManager.GcsCheckAlive - 12 total (1 active), Execution time: mean = 263.009 us, total = 3.156 ms, Queueing time: mean = 578.199 us, max = 1.238 ms, min = 249.805 us, total = 6.938 ms NodeManager.deadline_timer.record_metrics - 12 total (1 active), Execution time: mean = 530.125 us, total = 6.362 ms, Queueing time: mean = 332.377 us, max = 973.453 us, min = 24.971 us, total = 3.989 ms NodeManager.deadline_timer.debug_state_dump - 6 total (1 active, 1 running), Execution time: mean = 1.580 ms, total = 9.481 ms, Queueing time: mean = 50.409 us, max = 79.074 us, min = 19.740 us, total = 302.453 us - 3 total (0 active), Execution time: mean = 461.667 ns, total = 1.385 us, Queueing time: mean = 69.898 us, max = 178.722 us, min = 10.326 us, total = 209.695 us RaySyncer.BroadcastMessage - 3 total (0 active), Execution time: mean = 137.253 us, total = 411.758 us, Queueing time: mean = 349.667 ns, max = 667.000 ns, min = 77.000 ns, total = 1.049 us ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 106.888 us, total = 213.777 us, Queueing time: mean = 535.597 us, max = 1.063 ms, min = 8.076 us, total = 1.071 ms RaySyncerRegister - 2 total (0 active), Execution time: mean = 1.131 us, total = 2.263 us, Queueing time: mean = 169.500 ns, max = 282.000 ns, min = 57.000 ns, total = 339.000 ns ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 1.056 ms, total = 2.112 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s NodeManager.deadline_timer.print_event_loop_stats - 2 total (1 active), Execution time: mean = 1.447 ms, total = 2.894 ms, Queueing time: mean = 32.171 us, max = 64.342 us, min = 64.342 us, total = 64.342 us ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 2 total (1 active), Execution time: mean = 452.461 ms, total = 904.923 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 212.923 us, total = 212.923 us, Queueing time: mean = 17.139 us, max = 17.139 us, min = 17.139 us, total = 17.139 us NodeManager.GCTaskFailureReason - 1 total (1 active), Execution time: mean = 0.000 s, total = 0.000 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s NodeManagerService.grpc_server.RequestWorkerLease.HandleRequestImpl - 1 total (0 active), Execution time: mean = 325.413 us, total = 325.413 us, Queueing time: mean = 141.179 us, max = 141.179 us, min = 141.179 us, total = 141.179 us ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 1.599 ms, total = 1.599 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 9.117 us, total = 9.117 us, Queueing time: mean = 8.677 us, max = 8.677 us, min = 8.677 us, total = 8.677 us NodeManagerService.grpc_server.ReturnWorker.HandleRequestImpl - 1 total (0 active), Execution time: mean = 65.318 us, total = 65.318 us, Queueing time: mean = 15.801 us, max = 15.801 us, min = 15.801 us, total = 15.801 us ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 777.793 us, total = 777.793 us, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 1.415 ms, total = 1.415 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 44.565 us, total = 44.565 us, Queueing time: mean = 308.871 us, max = 308.871 us, min = 308.871 us, total = 308.871 us WorkerPool.PopWorkerCallback - 1 total (0 active), Execution time: mean = 46.270 us, total = 46.270 us, Queueing time: mean = 33.074 us, max = 33.074 us, min = 33.074 us, total = 33.074 us NodeManagerService.grpc_server.RequestWorkerLease - 1 total (0 active), Execution time: mean = 938.179 us, total = 938.179 us, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 1.123 ms, total = 1.123 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 1.018 s, total = 1.018 s, Queueing time: mean = 12.808 us, max = 12.808 us, min = 12.808 us, total = 12.808 us Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 67.617 us, total = 67.617 us, Queueing time: mean = 273.520 us, max = 273.520 us, min = 273.520 us, total = 273.520 us ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 878.054 us, total = 878.054 us, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 1 total (0 active), Execution time: mean = 237.927 us, total = 237.927 us, Queueing time: mean = 89.366 us, max = 89.366 us, min = 89.366 us, total = 89.366 us ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 106.078 us, total = 106.078 us, Queueing time: mean = 9.662 us, max = 9.662 us, min = 9.662 us, total = 9.662 us NodeManagerService.grpc_server.ReturnWorker - 1 total (0 active), Execution time: mean = 283.702 us, total = 283.702 us, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s DebugString() time ms: 1 |