|
NodeManager: |
|
Node ID: 959d61ca307c5c3e7967dc4f62c340eb34c700e436feb4955e4f5877 |
|
Node name: 192.168.0.2 |
|
InitialConfigResources: {node:192.168.0.2: 10000, accelerator_type:A40: 10000, node:__internal_head__: 10000, memory: 779659989000000, object_store_memory: 21474836480000, CPU: 200000, GPU: 20000} |
|
ClusterTaskManager: |
|
========== Node: 959d61ca307c5c3e7967dc4f62c340eb34c700e436feb4955e4f5877 ================= |
|
Infeasible queue length: 0 |
|
Schedule queue length: 0 |
|
Dispatch queue length: 0 |
|
num_waiting_for_resource: 0 |
|
num_waiting_for_plasma_memory: 0 |
|
num_waiting_for_remote_node_resources: 0 |
|
num_worker_not_started_by_job_config_not_exist: 0 |
|
num_worker_not_started_by_registration_timeout: 0 |
|
num_tasks_waiting_for_workers: 0 |
|
num_cancelled_tasks: 0 |
|
cluster_resource_scheduler state: |
|
Local id: -3074196584474872412 Local resources: {"total":{node:__internal_head__: [10000], node:192.168.0.2: [10000], GPU: [10000, 10000], CPU: [200000], memory: [779659989000000], object_store_memory: [21474836480000], accelerator_type:A40: [10000]}}, "available": {node:__internal_head__: [10000], node:192.168.0.2: [10000], GPU: [10000, 10000], CPU: [200000], memory: [779659989000000], object_store_memory: [21474836480000], accelerator_type:A40: [10000]}}, "labels":{"ray.io/node_id":"959d61ca307c5c3e7967dc4f62c340eb34c700e436feb4955e4f5877",} is_draining: 0 is_idle: 1 Cluster resources: node id: -3074196584474872412{"total":{node:192.168.0.2: 10000, GPU: 20000, memory: 779659989000000, accelerator_type:A40: 10000, node:__internal_head__: 10000, CPU: 200000, object_store_memory: 21474836480000}}, "available": {node:192.168.0.2: 10000, accelerator_type:A40: 10000, memory: 779659989000000, node:__internal_head__: 10000, object_store_memory: 21474836480000, CPU: 200000, GPU: 20000}}, "labels":{"ray.io/node_id":"959d61ca307c5c3e7967dc4f62c340eb34c700e436feb4955e4f5877",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} |
|
Waiting tasks size: 0 |
|
Number of executing tasks: 0 |
|
Number of pinned task arguments: 0 |
|
Number of total spilled tasks: 0 |
|
Number of spilled waiting tasks: 0 |
|
Number of spilled unschedulable tasks: 0 |
|
Resource usage { |
|
} |
|
Backlog Size per scheduling descriptor :{workerId: num backlogs}: |
|
|
|
Running tasks by scheduling class: |
|
================================================== |
|
|
|
ClusterResources: |
|
LocalObjectManager: |
|
- num pinned objects: 0 |
|
- pinned objects size: 0 |
|
- num objects pending restore: 0 |
|
- num objects pending spill: 0 |
|
- num bytes pending spill: 0 |
|
- num bytes currently spilled: 0 |
|
- cumulative spill requests: 0 |
|
- cumulative restore requests: 0 |
|
- spilled objects pending delete: 0 |
|
|
|
ObjectManager: |
|
- num local objects: 0 |
|
- num unfulfilled push requests: 0 |
|
- num object pull requests: 0 |
|
- num chunks received total: 0 |
|
- num chunks received failed (all): 0 |
|
- num chunks received failed / cancelled: 0 |
|
- num chunks received failed / plasma error: 0 |
|
Event stats: |
|
Global stats: 0 total (0 active) |
|
Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s |
|
Execution time: mean = -nan s, total = 0.000 s |
|
Event stats: |
|
PushManager: |
|
- num pushes in flight: 0 |
|
- num chunks in flight: 0 |
|
- num chunks remaining: 0 |
|
- max chunks allowed: 409 |
|
OwnershipBasedObjectDirectory: |
|
- num listeners: 0 |
|
- cumulative location updates: 0 |
|
- num location updates per second: 0.000 |
|
- num location lookups per second: 0.000 |
|
- num locations added per second: 0.000 |
|
- num locations removed per second: 0.000 |
|
BufferPool: |
|
- create buffer state map size: 0 |
|
PullManager: |
|
- num bytes available for pulled objects: 2147483648 |
|
- num bytes being pulled (all): 0 |
|
- num bytes being pulled / pinned: 0 |
|
- get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} |
|
- wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} |
|
- task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} |
|
- first get request bundle: N/A |
|
- first wait request bundle: N/A |
|
- first task request bundle: N/A |
|
- num objects queued: 0 |
|
- num objects actively pulled (all): 0 |
|
- num objects actively pulled / pinned: 0 |
|
- num bundles being pulled: 0 |
|
- num pull retries: 0 |
|
- max timeout seconds: 0 |
|
- max timeout request is already processed. No entry. |
|
|
|
WorkerPool: |
|
- registered jobs: 1 |
|
- process_failed_job_config_missing: 0 |
|
- process_failed_rate_limited: 0 |
|
- process_failed_pending_registration: 0 |
|
- process_failed_runtime_env_setup_failed: 0 |
|
- num PYTHON workers: 20 |
|
- num PYTHON drivers: 1 |
|
- num PYTHON pending start requests: 0 |
|
- num PYTHON pending registration requests: 0 |
|
- num object spill callbacks queued: 0 |
|
- num object restore queued: 0 |
|
- num util functions queued: 0 |
|
- num idle workers: 20 |
|
TaskDependencyManager: |
|
- task deps map size: 0 |
|
- get req map size: 0 |
|
- wait req map size: 0 |
|
- local objects map size: 0 |
|
WaitManager: |
|
- num active wait requests: 0 |
|
Subscriber: |
|
Channel WORKER_OBJECT_LOCATIONS_CHANNEL |
|
- cumulative subscribe requests: 0 |
|
- cumulative unsubscribe requests: 0 |
|
- active subscribed publishers: 0 |
|
- cumulative published messages: 0 |
|
- cumulative processed messages: 0 |
|
Channel WORKER_OBJECT_EVICTION |
|
- cumulative subscribe requests: 0 |
|
- cumulative unsubscribe requests: 0 |
|
- active subscribed publishers: 0 |
|
- cumulative published messages: 0 |
|
- cumulative processed messages: 0 |
|
Channel WORKER_REF_REMOVED_CHANNEL |
|
- cumulative subscribe requests: 0 |
|
- cumulative unsubscribe requests: 0 |
|
- active subscribed publishers: 0 |
|
- cumulative published messages: 0 |
|
- cumulative processed messages: 0 |
|
num async plasma notifications: 0 |
|
Remote node managers: |
|
Event stats: |
|
Global stats: 5540 total (35 active) |
|
Queueing time: mean = 6.402 ms, max = 25.116 s, min = 57.000 ns, total = 35.468 s |
|
Execution time: mean = 511.138 us, total = 2.832 s |
|
Event stats: |
|
NodeManagerService.grpc_server.ReportWorkerBacklog - 1260 total (0 active), Execution time: mean = 464.660 us, total = 585.472 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s |
|
NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 1260 total (0 active), Execution time: mean = 34.055 us, total = 42.910 ms, Queueing time: mean = 92.372 us, max = 396.025 us, min = 4.093 us, total = 116.388 ms |
|
NodeManager.CheckGC - 600 total (1 active), Execution time: mean = 3.105 us, total = 1.863 ms, Queueing time: mean = 78.095 us, max = 2.991 ms, min = 8.473 us, total = 46.857 ms |
|
RaySyncer.OnDemandBroadcasting - 600 total (1 active), Execution time: mean = 11.912 us, total = 7.147 ms, Queueing time: mean = 70.762 us, max = 2.979 ms, min = 10.189 us, total = 42.457 ms |
|
ObjectManager.UpdateAvailableMemory - 600 total (0 active), Execution time: mean = 5.258 us, total = 3.155 ms, Queueing time: mean = 88.884 us, max = 375.599 us, min = 5.091 us, total = 53.330 ms |
|
RayletWorkerPool.deadline_timer.kill_idle_workers - 300 total (1 active), Execution time: mean = 18.544 us, total = 5.563 ms, Queueing time: mean = 70.306 us, max = 648.172 us, min = 12.544 us, total = 21.092 ms |
|
MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 240 total (1 active), Execution time: mean = 436.074 us, total = 104.658 ms, Queueing time: mean = 70.864 us, max = 1.639 ms, min = 19.225 us, total = 17.007 ms |
|
ClientConnection.async_read.ProcessMessageHeader - 86 total (21 active), Execution time: mean = 4.687 us, total = 403.106 us, Queueing time: mean = 407.886 ms, max = 25.116 s, min = 16.268 us, total = 35.078 s |
|
ClientConnection.async_read.ProcessMessage - 65 total (0 active), Execution time: mean = 750.865 us, total = 48.806 ms, Queueing time: mean = 28.560 us, max = 397.692 us, min = 2.888 us, total = 1.856 ms |
|
NodeManager.ScheduleAndDispatchTasks - 61 total (1 active), Execution time: mean = 14.586 us, total = 889.764 us, Queueing time: mean = 91.744 us, max = 1.577 ms, min = 20.155 us, total = 5.596 ms |
|
NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 60 total (0 active), Execution time: mean = 102.042 us, total = 6.123 ms, Queueing time: mean = 82.329 us, max = 181.150 us, min = 13.512 us, total = 4.940 ms |
|
NodeManager.deadline_timer.spill_objects_when_over_threshold - 60 total (1 active), Execution time: mean = 2.810 us, total = 168.621 us, Queueing time: mean = 176.670 us, max = 1.542 ms, min = 9.629 us, total = 10.600 ms |
|
NodeManager.deadline_timer.flush_free_objects - 60 total (1 active), Execution time: mean = 7.329 us, total = 439.758 us, Queueing time: mean = 173.334 us, max = 1.546 ms, min = 12.114 us, total = 10.400 ms |
|
NodeManagerService.grpc_server.GetResourceLoad - 60 total (0 active), Execution time: mean = 549.440 us, total = 32.966 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s |
|
ClientConnection.async_write.DoAsyncWrites - 22 total (0 active), Execution time: mean = 977.045 ns, total = 21.495 us, Queueing time: mean = 37.996 us, max = 138.120 us, min = 9.530 us, total = 835.918 us |
|
ClusterResourceManager.ResetRemoteNodeView - 21 total (1 active), Execution time: mean = 8.918 us, total = 187.281 us, Queueing time: mean = 70.133 us, max = 147.205 us, min = 39.639 us, total = 1.473 ms |
|
ObjectManager.ObjectAdded - 21 total (0 active), Execution time: mean = 10.226 us, total = 214.738 us, Queueing time: mean = 113.424 us, max = 356.656 us, min = 13.491 us, total = 2.382 ms |
|
NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 21 total (0 active), Execution time: mean = 53.386 us, total = 1.121 ms, Queueing time: mean = 49.966 us, max = 200.350 us, min = 3.857 us, total = 1.049 ms |
|
ObjectManager.ObjectDeleted - 21 total (0 active), Execution time: mean = 16.456 us, total = 345.584 us, Queueing time: mean = 89.691 us, max = 194.377 us, min = 20.318 us, total = 1.884 ms |
|
NodeManagerService.grpc_server.GetSystemConfig - 21 total (0 active), Execution time: mean = 720.054 us, total = 15.121 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s |
|
PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 169.893 us, total = 2.209 ms, Queueing time: mean = 2.849 ms, max = 9.306 ms, min = 19.328 us, total = 37.031 ms |
|
ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 12 total (0 active), Execution time: mean = 1.311 ms, total = 15.729 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s |
|
ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 12 total (0 active), Execution time: mean = 44.350 us, total = 532.199 us, Queueing time: mean = 95.656 us, max = 150.995 us, min = 12.388 us, total = 1.148 ms |
|
NodeManager.GcsCheckAlive - 12 total (1 active), Execution time: mean = 263.009 us, total = 3.156 ms, Queueing time: mean = 578.199 us, max = 1.238 ms, min = 249.805 us, total = 6.938 ms |
|
NodeManager.deadline_timer.record_metrics - 12 total (1 active), Execution time: mean = 530.125 us, total = 6.362 ms, Queueing time: mean = 332.377 us, max = 973.453 us, min = 24.971 us, total = 3.989 ms |
|
NodeManager.deadline_timer.debug_state_dump - 6 total (1 active, 1 running), Execution time: mean = 1.580 ms, total = 9.481 ms, Queueing time: mean = 50.409 us, max = 79.074 us, min = 19.740 us, total = 302.453 us |
|
- 3 total (0 active), Execution time: mean = 461.667 ns, total = 1.385 us, Queueing time: mean = 69.898 us, max = 178.722 us, min = 10.326 us, total = 209.695 us |
|
RaySyncer.BroadcastMessage - 3 total (0 active), Execution time: mean = 137.253 us, total = 411.758 us, Queueing time: mean = 349.667 ns, max = 667.000 ns, min = 77.000 ns, total = 1.049 us |
|
ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 106.888 us, total = 213.777 us, Queueing time: mean = 535.597 us, max = 1.063 ms, min = 8.076 us, total = 1.071 ms |
|
RaySyncerRegister - 2 total (0 active), Execution time: mean = 1.131 us, total = 2.263 us, Queueing time: mean = 169.500 ns, max = 282.000 ns, min = 57.000 ns, total = 339.000 ns |
|
ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 1.056 ms, total = 2.112 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s |
|
NodeManager.deadline_timer.print_event_loop_stats - 2 total (1 active), Execution time: mean = 1.447 ms, total = 2.894 ms, Queueing time: mean = 32.171 us, max = 64.342 us, min = 64.342 us, total = 64.342 us |
|
ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 2 total (1 active), Execution time: mean = 452.461 ms, total = 904.923 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s |
|
ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 212.923 us, total = 212.923 us, Queueing time: mean = 17.139 us, max = 17.139 us, min = 17.139 us, total = 17.139 us |
|
NodeManager.GCTaskFailureReason - 1 total (1 active), Execution time: mean = 0.000 s, total = 0.000 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s |
|
NodeManagerService.grpc_server.RequestWorkerLease.HandleRequestImpl - 1 total (0 active), Execution time: mean = 325.413 us, total = 325.413 us, Queueing time: mean = 141.179 us, max = 141.179 us, min = 141.179 us, total = 141.179 us |
|
ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 1.599 ms, total = 1.599 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s |
|
ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 9.117 us, total = 9.117 us, Queueing time: mean = 8.677 us, max = 8.677 us, min = 8.677 us, total = 8.677 us |
|
NodeManagerService.grpc_server.ReturnWorker.HandleRequestImpl - 1 total (0 active), Execution time: mean = 65.318 us, total = 65.318 us, Queueing time: mean = 15.801 us, max = 15.801 us, min = 15.801 us, total = 15.801 us |
|
ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 777.793 us, total = 777.793 us, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s |
|
ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 1.415 ms, total = 1.415 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s |
|
ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 44.565 us, total = 44.565 us, Queueing time: mean = 308.871 us, max = 308.871 us, min = 308.871 us, total = 308.871 us |
|
WorkerPool.PopWorkerCallback - 1 total (0 active), Execution time: mean = 46.270 us, total = 46.270 us, Queueing time: mean = 33.074 us, max = 33.074 us, min = 33.074 us, total = 33.074 us |
|
NodeManagerService.grpc_server.RequestWorkerLease - 1 total (0 active), Execution time: mean = 938.179 us, total = 938.179 us, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s |
|
ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 1.123 ms, total = 1.123 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s |
|
ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 1.018 s, total = 1.018 s, Queueing time: mean = 12.808 us, max = 12.808 us, min = 12.808 us, total = 12.808 us |
|
Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 67.617 us, total = 67.617 us, Queueing time: mean = 273.520 us, max = 273.520 us, min = 273.520 us, total = 273.520 us |
|
ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 878.054 us, total = 878.054 us, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s |
|
ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 1 total (0 active), Execution time: mean = 237.927 us, total = 237.927 us, Queueing time: mean = 89.366 us, max = 89.366 us, min = 89.366 us, total = 89.366 us |
|
ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 106.078 us, total = 106.078 us, Queueing time: mean = 9.662 us, max = 9.662 us, min = 9.662 us, total = 9.662 us |
|
NodeManagerService.grpc_server.ReturnWorker - 1 total (0 active), Execution time: mean = 283.702 us, total = 283.702 us, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s |
|
DebugString() time ms: 1 |