|
NodeManager: |
|
Node ID: 381e636a10e4140b2e9620d2650d6a018da067c3591f2305edfa793d |
|
Node name: 192.168.0.2 |
|
InitialConfigResources: {object_store_memory: 21474836480000, node:192.168.0.2: 10000, node:__internal_head__: 10000, memory: 752056999940000, accelerator_type:A40: 10000, GPU: 20000, CPU: 200000} |
|
ClusterTaskManager: |
|
========== Node: 381e636a10e4140b2e9620d2650d6a018da067c3591f2305edfa793d ================= |
|
Infeasible queue length: 0 |
|
Schedule queue length: 0 |
|
Dispatch queue length: 0 |
|
num_waiting_for_resource: 0 |
|
num_waiting_for_plasma_memory: 0 |
|
num_waiting_for_remote_node_resources: 0 |
|
num_worker_not_started_by_job_config_not_exist: 0 |
|
num_worker_not_started_by_registration_timeout: 0 |
|
num_tasks_waiting_for_workers: 0 |
|
num_cancelled_tasks: 0 |
|
cluster_resource_scheduler state: |
|
Local id: 688648627895828852 Local resources: {"total":{node:__internal_head__: [10000], accelerator_type:A40: [10000], node:192.168.0.2: [10000], CPU: [200000], memory: [752056999940000], GPU: [10000, 10000], object_store_memory: [21474836480000]}}, "available": {node:__internal_head__: [10000], accelerator_type:A40: [10000], node:192.168.0.2: [10000], CPU: [200000], memory: [752056999940000], GPU: [10000, 10000], object_store_memory: [21474836480000]}}, "labels":{"ray.io/node_id":"381e636a10e4140b2e9620d2650d6a018da067c3591f2305edfa793d",} is_draining: 0 is_idle: 1 Cluster resources: node id: 688648627895828852{"total":{object_store_memory: 21474836480000, node:__internal_head__: 10000, node:192.168.0.2: 10000, accelerator_type:A40: 10000, GPU: 20000, memory: 752056999940000, CPU: 200000}}, "available": {object_store_memory: 21474836480000, memory: 752056999940000, node:__internal_head__: 10000, accelerator_type:A40: 10000, GPU: 20000, node:192.168.0.2: 10000, CPU: 200000}}, "labels":{"ray.io/node_id":"381e636a10e4140b2e9620d2650d6a018da067c3591f2305edfa793d",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} |
|
Waiting tasks size: 0 |
|
Number of executing tasks: 0 |
|
Number of pinned task arguments: 0 |
|
Number of total spilled tasks: 0 |
|
Number of spilled waiting tasks: 0 |
|
Number of spilled unschedulable tasks: 0 |
|
Resource usage { |
|
} |
|
Backlog Size per scheduling descriptor :{workerId: num backlogs}: |
|
|
|
Running tasks by scheduling class: |
|
================================================== |
|
|
|
ClusterResources: |
|
LocalObjectManager: |
|
- num pinned objects: 0 |
|
- pinned objects size: 0 |
|
- num objects pending restore: 0 |
|
- num objects pending spill: 0 |
|
- num bytes pending spill: 0 |
|
- num bytes currently spilled: 0 |
|
- cumulative spill requests: 0 |
|
- cumulative restore requests: 0 |
|
- spilled objects pending delete: 0 |
|
|
|
ObjectManager: |
|
- num local objects: 0 |
|
- num unfulfilled push requests: 0 |
|
- num object pull requests: 0 |
|
- num chunks received total: 0 |
|
- num chunks received failed (all): 0 |
|
- num chunks received failed / cancelled: 0 |
|
- num chunks received failed / plasma error: 0 |
|
Event stats: |
|
Global stats: 0 total (0 active) |
|
Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s |
|
Execution time: mean = -nan s, total = 0.000 s |
|
Event stats: |
|
PushManager: |
|
- num pushes in flight: 0 |
|
- num chunks in flight: 0 |
|
- num chunks remaining: 0 |
|
- max chunks allowed: 409 |
|
OwnershipBasedObjectDirectory: |
|
- num listeners: 0 |
|
- cumulative location updates: 0 |
|
- num location updates per second: 0.000 |
|
- num location lookups per second: 0.000 |
|
- num locations added per second: 0.000 |
|
- num locations removed per second: 0.000 |
|
BufferPool: |
|
- create buffer state map size: 0 |
|
PullManager: |
|
- num bytes available for pulled objects: 2147483648 |
|
- num bytes being pulled (all): 0 |
|
- num bytes being pulled / pinned: 0 |
|
- get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} |
|
- wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} |
|
- task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} |
|
- first get request bundle: N/A |
|
- first wait request bundle: N/A |
|
- first task request bundle: N/A |
|
- num objects queued: 0 |
|
- num objects actively pulled (all): 0 |
|
- num objects actively pulled / pinned: 0 |
|
- num bundles being pulled: 0 |
|
- num pull retries: 0 |
|
- max timeout seconds: 0 |
|
- max timeout request is already processed. No entry. |
|
|
|
WorkerPool: |
|
- registered jobs: 1 |
|
- process_failed_job_config_missing: 0 |
|
- process_failed_rate_limited: 0 |
|
- process_failed_pending_registration: 0 |
|
- process_failed_runtime_env_setup_failed: 0 |
|
- num PYTHON workers: 20 |
|
- num PYTHON drivers: 1 |
|
- num PYTHON pending start requests: 0 |
|
- num PYTHON pending registration requests: 0 |
|
- num object spill callbacks queued: 0 |
|
- num object restore queued: 0 |
|
- num util functions queued: 0 |
|
- num idle workers: 20 |
|
TaskDependencyManager: |
|
- task deps map size: 0 |
|
- get req map size: 0 |
|
- wait req map size: 0 |
|
- local objects map size: 0 |
|
WaitManager: |
|
- num active wait requests: 0 |
|
Subscriber: |
|
Channel WORKER_OBJECT_LOCATIONS_CHANNEL |
|
- cumulative subscribe requests: 0 |
|
- cumulative unsubscribe requests: 0 |
|
- active subscribed publishers: 0 |
|
- cumulative published messages: 0 |
|
- cumulative processed messages: 0 |
|
Channel WORKER_REF_REMOVED_CHANNEL |
|
- cumulative subscribe requests: 0 |
|
- cumulative unsubscribe requests: 0 |
|
- active subscribed publishers: 0 |
|
- cumulative published messages: 0 |
|
- cumulative processed messages: 0 |
|
Channel WORKER_OBJECT_EVICTION |
|
- cumulative subscribe requests: 0 |
|
- cumulative unsubscribe requests: 0 |
|
- active subscribed publishers: 0 |
|
- cumulative published messages: 0 |
|
- cumulative processed messages: 0 |
|
num async plasma notifications: 0 |
|
Remote node managers: |
|
Event stats: |
|
Global stats: 54426 total (35 active) |
|
Queueing time: mean = 22.293 ms, max = 149.071 s, min = 67.000 ns, total = 1213.324 s |
|
Execution time: mean = 11.148 ms, total = 606.759 s |
|
Event stats: |
|
NodeManagerService.grpc_server.ReportWorkerBacklog - 13011 total (0 active), Execution time: mean = 496.742 us, total = 6.463 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s |
|
NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 13011 total (0 active), Execution time: mean = 36.437 us, total = 474.085 ms, Queueing time: mean = 100.731 us, max = 2.189 ms, min = 4.142 us, total = 1.311 s |
|
RaySyncer.OnDemandBroadcasting - 6196 total (1 active), Execution time: mean = 9.386 us, total = 58.156 ms, Queueing time: mean = 81.554 us, max = 3.517 ms, min = 8.344 us, total = 505.306 ms |
|
ObjectManager.UpdateAvailableMemory - 6196 total (0 active), Execution time: mean = 4.952 us, total = 30.680 ms, Queueing time: mean = 95.721 us, max = 9.283 ms, min = 3.503 us, total = 593.086 ms |
|
NodeManager.CheckGC - 6196 total (1 active), Execution time: mean = 2.838 us, total = 17.584 ms, Queueing time: mean = 87.248 us, max = 3.519 ms, min = 6.447 us, total = 540.589 ms |
|
RayletWorkerPool.deadline_timer.kill_idle_workers - 3100 total (1 active), Execution time: mean = 15.985 us, total = 49.554 ms, Queueing time: mean = 65.442 us, max = 992.162 us, min = 9.895 us, total = 202.872 ms |
|
MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 2476 total (1 active), Execution time: mean = 434.392 us, total = 1.076 s, Queueing time: mean = 69.240 us, max = 3.232 ms, min = 8.760 us, total = 171.439 ms |
|
NodeManager.ScheduleAndDispatchTasks - 621 total (1 active), Execution time: mean = 13.903 us, total = 8.634 ms, Queueing time: mean = 75.585 us, max = 2.272 ms, min = 12.508 us, total = 46.938 ms |
|
NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 620 total (0 active), Execution time: mean = 104.860 us, total = 65.013 ms, Queueing time: mean = 101.329 us, max = 238.952 us, min = 18.297 us, total = 62.824 ms |
|
NodeManager.deadline_timer.spill_objects_when_over_threshold - 620 total (1 active), Execution time: mean = 3.040 us, total = 1.885 ms, Queueing time: mean = 169.907 us, max = 2.205 ms, min = 6.247 us, total = 105.342 ms |
|
NodeManager.deadline_timer.flush_free_objects - 620 total (1 active), Execution time: mean = 7.922 us, total = 4.912 ms, Queueing time: mean = 166.580 us, max = 2.209 ms, min = 9.779 us, total = 103.280 ms |
|
NodeManagerService.grpc_server.GetResourceLoad - 620 total (0 active), Execution time: mean = 616.940 us, total = 382.503 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s |
|
ClusterResourceManager.ResetRemoteNodeView - 207 total (1 active), Execution time: mean = 7.768 us, total = 1.608 ms, Queueing time: mean = 72.296 us, max = 253.106 us, min = 11.307 us, total = 14.965 ms |
|
ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 124 total (0 active), Execution time: mean = 1.286 ms, total = 159.513 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s |
|
NodeManager.GcsCheckAlive - 124 total (1 active), Execution time: mean = 253.644 us, total = 31.452 ms, Queueing time: mean = 601.611 us, max = 2.274 ms, min = 115.311 us, total = 74.600 ms |
|
NodeManager.deadline_timer.record_metrics - 124 total (1 active), Execution time: mean = 516.252 us, total = 64.015 ms, Queueing time: mean = 340.401 us, max = 1.700 ms, min = 9.061 us, total = 42.210 ms |
|
ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 124 total (0 active), Execution time: mean = 47.576 us, total = 5.899 ms, Queueing time: mean = 97.622 us, max = 241.540 us, min = 14.505 us, total = 12.105 ms |
|
ClientConnection.async_read.ProcessMessageHeader - 96 total (21 active), Execution time: mean = 7.737 us, total = 742.799 us, Queueing time: mean = 12.598 s, max = 149.071 s, min = 27.575 us, total = 1209.432 s |
|
ClientConnection.async_read.ProcessMessage - 75 total (0 active), Execution time: mean = 806.725 us, total = 60.504 ms, Queueing time: mean = 67.160 us, max = 1.027 ms, min = 2.835 us, total = 5.037 ms |
|
NodeManager.deadline_timer.debug_state_dump - 62 total (1 active, 1 running), Execution time: mean = 1.638 ms, total = 101.551 ms, Queueing time: mean = 64.617 us, max = 196.608 us, min = 11.928 us, total = 4.006 ms |
|
ClientConnection.async_write.DoAsyncWrites - 22 total (0 active), Execution time: mean = 1.317 us, total = 28.982 us, Queueing time: mean = 49.767 us, max = 431.510 us, min = 17.047 us, total = 1.095 ms |
|
NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 21 total (0 active), Execution time: mean = 154.352 us, total = 3.241 ms, Queueing time: mean = 162.625 us, max = 432.570 us, min = 33.451 us, total = 3.415 ms |
|
ObjectManager.ObjectAdded - 21 total (0 active), Execution time: mean = 11.466 us, total = 240.795 us, Queueing time: mean = 2.139 ms, max = 21.287 ms, min = 13.920 us, total = 44.925 ms |
|
ObjectManager.ObjectDeleted - 21 total (0 active), Execution time: mean = 21.296 us, total = 447.224 us, Queueing time: mean = 186.718 us, max = 583.430 us, min = 33.345 us, total = 3.921 ms |
|
NodeManagerService.grpc_server.GetSystemConfig - 21 total (0 active), Execution time: mean = 1.630 ms, total = 34.233 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s |
|
PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 217.267 us, total = 2.824 ms, Queueing time: mean = 2.965 ms, max = 8.958 ms, min = 32.982 us, total = 38.551 ms |
|
NodeManager.deadline_timer.print_event_loop_stats - 11 total (1 active), Execution time: mean = 2.383 ms, total = 26.214 ms, Queueing time: mean = 42.203 us, max = 107.871 us, min = 17.957 us, total = 464.235 us |
|
RaySyncer.BroadcastMessage - 10 total (0 active), Execution time: mean = 182.922 us, total = 1.829 ms, Queueing time: mean = 565.700 ns, max = 727.000 ns, min = 148.000 ns, total = 5.657 us |
|
- 10 total (0 active), Execution time: mean = 928.300 ns, total = 9.283 us, Queueing time: mean = 76.912 us, max = 165.908 us, min = 23.770 us, total = 769.116 us |
|
NodeManagerService.grpc_server.RequestWorkerLease - 6 total (0 active), Execution time: mean = 776.617 us, total = 4.660 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s |
|
NodeManagerService.grpc_server.RequestWorkerLease.HandleRequestImpl - 6 total (0 active), Execution time: mean = 229.578 us, total = 1.377 ms, Queueing time: mean = 99.063 us, max = 123.315 us, min = 37.134 us, total = 594.378 us |
|
NodeManagerService.grpc_server.ReturnWorker - 6 total (0 active), Execution time: mean = 538.559 us, total = 3.231 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s |
|
NodeManagerService.grpc_server.ReturnWorker.HandleRequestImpl - 6 total (0 active), Execution time: mean = 95.401 us, total = 572.407 us, Queueing time: mean = 43.422 us, max = 140.746 us, min = 7.240 us, total = 260.529 us |
|
WorkerPool.PopWorkerCallback - 6 total (0 active), Execution time: mean = 47.279 us, total = 283.677 us, Queueing time: mean = 29.447 us, max = 38.510 us, min = 20.335 us, total = 176.684 us |
|
ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 3 total (1 active), Execution time: mean = 198.863 s, total = 596.590 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s |
|
RaySyncerRegister - 2 total (0 active), Execution time: mean = 2.083 us, total = 4.165 us, Queueing time: mean = 301.000 ns, max = 535.000 ns, min = 67.000 ns, total = 602.000 ns |
|
ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 2 total (0 active), Execution time: mean = 355.635 us, total = 711.270 us, Queueing time: mean = 123.462 us, max = 133.083 us, min = 113.841 us, total = 246.924 us |
|
ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 129.555 us, total = 259.110 us, Queueing time: mean = 655.112 us, max = 1.180 ms, min = 129.843 us, total = 1.310 ms |
|
ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 1.377 ms, total = 2.754 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s |
|
ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 2.419 ms, total = 2.419 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s |
|
ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 342.194 us, total = 342.194 us, Queueing time: mean = 163.766 us, max = 163.766 us, min = 163.766 us, total = 163.766 us |
|
Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 96.544 us, total = 96.544 us, Queueing time: mean = 315.750 us, max = 315.750 us, min = 315.750 us, total = 315.750 us |
|
ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 1.019 s, total = 1.019 s, Queueing time: mean = 90.737 us, max = 90.737 us, min = 90.737 us, total = 90.737 us |
|
ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 1.553 ms, total = 1.553 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s |
|
ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 1.873 ms, total = 1.873 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s |
|
ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 29.991 us, total = 29.991 us, Queueing time: mean = 111.550 us, max = 111.550 us, min = 111.550 us, total = 111.550 us |
|
ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 1.569 ms, total = 1.569 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s |
|
ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 1.597 ms, total = 1.597 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s |
|
NodeManager.GCTaskFailureReason - 1 total (1 active), Execution time: mean = 0.000 s, total = 0.000 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s |
|
ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 224.222 us, total = 224.222 us, Queueing time: mean = 119.308 us, max = 119.308 us, min = 119.308 us, total = 119.308 us |
|
ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 137.575 us, total = 137.575 us, Queueing time: mean = 36.079 us, max = 36.079 us, min = 36.079 us, total = 36.079 us |
|
DebugString() time ms: 1 |