File size: 16,661 Bytes
c011401
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
NodeManager:
Node ID: 381e636a10e4140b2e9620d2650d6a018da067c3591f2305edfa793d
Node name: 192.168.0.2
InitialConfigResources: {object_store_memory: 21474836480000, node:192.168.0.2: 10000, node:__internal_head__: 10000, memory: 752056999940000, accelerator_type:A40: 10000, GPU: 20000, CPU: 200000}
ClusterTaskManager:
========== Node: 381e636a10e4140b2e9620d2650d6a018da067c3591f2305edfa793d =================
Infeasible queue length: 0
Schedule queue length: 0
Dispatch queue length: 0
num_waiting_for_resource: 0
num_waiting_for_plasma_memory: 0
num_waiting_for_remote_node_resources: 0
num_worker_not_started_by_job_config_not_exist: 0
num_worker_not_started_by_registration_timeout: 0
num_tasks_waiting_for_workers: 0
num_cancelled_tasks: 0
cluster_resource_scheduler state: 
Local id: 688648627895828852 Local resources: {"total":{node:__internal_head__: [10000], accelerator_type:A40: [10000], node:192.168.0.2: [10000], CPU: [200000], memory: [752056999940000], GPU: [10000, 10000], object_store_memory: [21474836480000]}}, "available": {node:__internal_head__: [10000], accelerator_type:A40: [10000], node:192.168.0.2: [10000], CPU: [200000], memory: [752056999940000], GPU: [10000, 10000], object_store_memory: [21474836480000]}}, "labels":{"ray.io/node_id":"381e636a10e4140b2e9620d2650d6a018da067c3591f2305edfa793d",} is_draining: 0 is_idle: 1 Cluster resources: node id: 688648627895828852{"total":{object_store_memory: 21474836480000, node:__internal_head__: 10000, node:192.168.0.2: 10000, accelerator_type:A40: 10000, GPU: 20000, memory: 752056999940000, CPU: 200000}}, "available": {object_store_memory: 21474836480000, memory: 752056999940000, node:__internal_head__: 10000, accelerator_type:A40: 10000, GPU: 20000, node:192.168.0.2: 10000, CPU: 200000}}, "labels":{"ray.io/node_id":"381e636a10e4140b2e9620d2650d6a018da067c3591f2305edfa793d",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []}
Waiting tasks size: 0
Number of executing tasks: 0
Number of pinned task arguments: 0
Number of total spilled tasks: 0
Number of spilled waiting tasks: 0
Number of spilled unschedulable tasks: 0
Resource usage {
}
Backlog Size per scheduling descriptor :{workerId: num backlogs}:

Running tasks by scheduling class:
==================================================

ClusterResources:
LocalObjectManager:
- num pinned objects: 0
- pinned objects size: 0
- num objects pending restore: 0
- num objects pending spill: 0
- num bytes pending spill: 0
- num bytes currently spilled: 0
- cumulative spill requests: 0
- cumulative restore requests: 0
- spilled objects pending delete: 0

ObjectManager:
- num local objects: 0
- num unfulfilled push requests: 0
- num object pull requests: 0
- num chunks received total: 0
- num chunks received failed (all): 0
- num chunks received failed / cancelled: 0
- num chunks received failed / plasma error: 0
Event stats:
Global stats: 0 total (0 active)
Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s
Execution time:  mean = -nan s, total = 0.000 s
Event stats:
PushManager:
- num pushes in flight: 0
- num chunks in flight: 0
- num chunks remaining: 0
- max chunks allowed: 409
OwnershipBasedObjectDirectory:
- num listeners: 0
- cumulative location updates: 0
- num location updates per second: 0.000
- num location lookups per second: 0.000
- num locations added per second: 0.000
- num locations removed per second: 0.000
BufferPool:
- create buffer state map size: 0
PullManager:
- num bytes available for pulled objects: 2147483648
- num bytes being pulled (all): 0
- num bytes being pulled / pinned: 0
- get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable}
- wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable}
- task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable}
- first get request bundle: N/A
- first wait request bundle: N/A
- first task request bundle: N/A
- num objects queued: 0
- num objects actively pulled (all): 0
- num objects actively pulled / pinned: 0
- num bundles being pulled: 0
- num pull retries: 0
- max timeout seconds: 0
- max timeout request is already processed. No entry.

WorkerPool:
- registered jobs: 1
- process_failed_job_config_missing: 0
- process_failed_rate_limited: 0
- process_failed_pending_registration: 0
- process_failed_runtime_env_setup_failed: 0
- num PYTHON workers: 20
- num PYTHON drivers: 1
- num PYTHON pending start requests: 0
- num PYTHON pending registration requests: 0
- num object spill callbacks queued: 0
- num object restore queued: 0
- num util functions queued: 0
- num idle workers: 20
TaskDependencyManager:
- task deps map size: 0
- get req map size: 0
- wait req map size: 0
- local objects map size: 0
WaitManager:
- num active wait requests: 0
Subscriber:
Channel WORKER_OBJECT_LOCATIONS_CHANNEL
- cumulative subscribe requests: 0
- cumulative unsubscribe requests: 0
- active subscribed publishers: 0
- cumulative published messages: 0
- cumulative processed messages: 0
Channel WORKER_REF_REMOVED_CHANNEL
- cumulative subscribe requests: 0
- cumulative unsubscribe requests: 0
- active subscribed publishers: 0
- cumulative published messages: 0
- cumulative processed messages: 0
Channel WORKER_OBJECT_EVICTION
- cumulative subscribe requests: 0
- cumulative unsubscribe requests: 0
- active subscribed publishers: 0
- cumulative published messages: 0
- cumulative processed messages: 0
num async plasma notifications: 0
Remote node managers: 
Event stats:
Global stats: 54426 total (35 active)
Queueing time: mean = 22.293 ms, max = 149.071 s, min = 67.000 ns, total = 1213.324 s
Execution time:  mean = 11.148 ms, total = 606.759 s
Event stats:
	NodeManagerService.grpc_server.ReportWorkerBacklog - 13011 total (0 active), Execution time: mean = 496.742 us, total = 6.463 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s
	NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 13011 total (0 active), Execution time: mean = 36.437 us, total = 474.085 ms, Queueing time: mean = 100.731 us, max = 2.189 ms, min = 4.142 us, total = 1.311 s
	RaySyncer.OnDemandBroadcasting - 6196 total (1 active), Execution time: mean = 9.386 us, total = 58.156 ms, Queueing time: mean = 81.554 us, max = 3.517 ms, min = 8.344 us, total = 505.306 ms
	ObjectManager.UpdateAvailableMemory - 6196 total (0 active), Execution time: mean = 4.952 us, total = 30.680 ms, Queueing time: mean = 95.721 us, max = 9.283 ms, min = 3.503 us, total = 593.086 ms
	NodeManager.CheckGC - 6196 total (1 active), Execution time: mean = 2.838 us, total = 17.584 ms, Queueing time: mean = 87.248 us, max = 3.519 ms, min = 6.447 us, total = 540.589 ms
	RayletWorkerPool.deadline_timer.kill_idle_workers - 3100 total (1 active), Execution time: mean = 15.985 us, total = 49.554 ms, Queueing time: mean = 65.442 us, max = 992.162 us, min = 9.895 us, total = 202.872 ms
	MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 2476 total (1 active), Execution time: mean = 434.392 us, total = 1.076 s, Queueing time: mean = 69.240 us, max = 3.232 ms, min = 8.760 us, total = 171.439 ms
	NodeManager.ScheduleAndDispatchTasks - 621 total (1 active), Execution time: mean = 13.903 us, total = 8.634 ms, Queueing time: mean = 75.585 us, max = 2.272 ms, min = 12.508 us, total = 46.938 ms
	NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 620 total (0 active), Execution time: mean = 104.860 us, total = 65.013 ms, Queueing time: mean = 101.329 us, max = 238.952 us, min = 18.297 us, total = 62.824 ms
	NodeManager.deadline_timer.spill_objects_when_over_threshold - 620 total (1 active), Execution time: mean = 3.040 us, total = 1.885 ms, Queueing time: mean = 169.907 us, max = 2.205 ms, min = 6.247 us, total = 105.342 ms
	NodeManager.deadline_timer.flush_free_objects - 620 total (1 active), Execution time: mean = 7.922 us, total = 4.912 ms, Queueing time: mean = 166.580 us, max = 2.209 ms, min = 9.779 us, total = 103.280 ms
	NodeManagerService.grpc_server.GetResourceLoad - 620 total (0 active), Execution time: mean = 616.940 us, total = 382.503 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s
	ClusterResourceManager.ResetRemoteNodeView - 207 total (1 active), Execution time: mean = 7.768 us, total = 1.608 ms, Queueing time: mean = 72.296 us, max = 253.106 us, min = 11.307 us, total = 14.965 ms
	ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 124 total (0 active), Execution time: mean = 1.286 ms, total = 159.513 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s
	NodeManager.GcsCheckAlive - 124 total (1 active), Execution time: mean = 253.644 us, total = 31.452 ms, Queueing time: mean = 601.611 us, max = 2.274 ms, min = 115.311 us, total = 74.600 ms
	NodeManager.deadline_timer.record_metrics - 124 total (1 active), Execution time: mean = 516.252 us, total = 64.015 ms, Queueing time: mean = 340.401 us, max = 1.700 ms, min = 9.061 us, total = 42.210 ms
	ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 124 total (0 active), Execution time: mean = 47.576 us, total = 5.899 ms, Queueing time: mean = 97.622 us, max = 241.540 us, min = 14.505 us, total = 12.105 ms
	ClientConnection.async_read.ProcessMessageHeader - 96 total (21 active), Execution time: mean = 7.737 us, total = 742.799 us, Queueing time: mean = 12.598 s, max = 149.071 s, min = 27.575 us, total = 1209.432 s
	ClientConnection.async_read.ProcessMessage - 75 total (0 active), Execution time: mean = 806.725 us, total = 60.504 ms, Queueing time: mean = 67.160 us, max = 1.027 ms, min = 2.835 us, total = 5.037 ms
	NodeManager.deadline_timer.debug_state_dump - 62 total (1 active, 1 running), Execution time: mean = 1.638 ms, total = 101.551 ms, Queueing time: mean = 64.617 us, max = 196.608 us, min = 11.928 us, total = 4.006 ms
	ClientConnection.async_write.DoAsyncWrites - 22 total (0 active), Execution time: mean = 1.317 us, total = 28.982 us, Queueing time: mean = 49.767 us, max = 431.510 us, min = 17.047 us, total = 1.095 ms
	NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 21 total (0 active), Execution time: mean = 154.352 us, total = 3.241 ms, Queueing time: mean = 162.625 us, max = 432.570 us, min = 33.451 us, total = 3.415 ms
	ObjectManager.ObjectAdded - 21 total (0 active), Execution time: mean = 11.466 us, total = 240.795 us, Queueing time: mean = 2.139 ms, max = 21.287 ms, min = 13.920 us, total = 44.925 ms
	ObjectManager.ObjectDeleted - 21 total (0 active), Execution time: mean = 21.296 us, total = 447.224 us, Queueing time: mean = 186.718 us, max = 583.430 us, min = 33.345 us, total = 3.921 ms
	NodeManagerService.grpc_server.GetSystemConfig - 21 total (0 active), Execution time: mean = 1.630 ms, total = 34.233 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s
	PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 217.267 us, total = 2.824 ms, Queueing time: mean = 2.965 ms, max = 8.958 ms, min = 32.982 us, total = 38.551 ms
	NodeManager.deadline_timer.print_event_loop_stats - 11 total (1 active), Execution time: mean = 2.383 ms, total = 26.214 ms, Queueing time: mean = 42.203 us, max = 107.871 us, min = 17.957 us, total = 464.235 us
	RaySyncer.BroadcastMessage - 10 total (0 active), Execution time: mean = 182.922 us, total = 1.829 ms, Queueing time: mean = 565.700 ns, max = 727.000 ns, min = 148.000 ns, total = 5.657 us
	 - 10 total (0 active), Execution time: mean = 928.300 ns, total = 9.283 us, Queueing time: mean = 76.912 us, max = 165.908 us, min = 23.770 us, total = 769.116 us
	NodeManagerService.grpc_server.RequestWorkerLease - 6 total (0 active), Execution time: mean = 776.617 us, total = 4.660 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s
	NodeManagerService.grpc_server.RequestWorkerLease.HandleRequestImpl - 6 total (0 active), Execution time: mean = 229.578 us, total = 1.377 ms, Queueing time: mean = 99.063 us, max = 123.315 us, min = 37.134 us, total = 594.378 us
	NodeManagerService.grpc_server.ReturnWorker - 6 total (0 active), Execution time: mean = 538.559 us, total = 3.231 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s
	NodeManagerService.grpc_server.ReturnWorker.HandleRequestImpl - 6 total (0 active), Execution time: mean = 95.401 us, total = 572.407 us, Queueing time: mean = 43.422 us, max = 140.746 us, min = 7.240 us, total = 260.529 us
	WorkerPool.PopWorkerCallback - 6 total (0 active), Execution time: mean = 47.279 us, total = 283.677 us, Queueing time: mean = 29.447 us, max = 38.510 us, min = 20.335 us, total = 176.684 us
	ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 3 total (1 active), Execution time: mean = 198.863 s, total = 596.590 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s
	RaySyncerRegister - 2 total (0 active), Execution time: mean = 2.083 us, total = 4.165 us, Queueing time: mean = 301.000 ns, max = 535.000 ns, min = 67.000 ns, total = 602.000 ns
	ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 2 total (0 active), Execution time: mean = 355.635 us, total = 711.270 us, Queueing time: mean = 123.462 us, max = 133.083 us, min = 113.841 us, total = 246.924 us
	ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 129.555 us, total = 259.110 us, Queueing time: mean = 655.112 us, max = 1.180 ms, min = 129.843 us, total = 1.310 ms
	ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 1.377 ms, total = 2.754 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s
	ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 2.419 ms, total = 2.419 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s
	ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 342.194 us, total = 342.194 us, Queueing time: mean = 163.766 us, max = 163.766 us, min = 163.766 us, total = 163.766 us
	Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 96.544 us, total = 96.544 us, Queueing time: mean = 315.750 us, max = 315.750 us, min = 315.750 us, total = 315.750 us
	ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 1.019 s, total = 1.019 s, Queueing time: mean = 90.737 us, max = 90.737 us, min = 90.737 us, total = 90.737 us
	ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 1.553 ms, total = 1.553 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s
	ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 1.873 ms, total = 1.873 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s
	ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 29.991 us, total = 29.991 us, Queueing time: mean = 111.550 us, max = 111.550 us, min = 111.550 us, total = 111.550 us
	ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 1.569 ms, total = 1.569 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s
	ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 1.597 ms, total = 1.597 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s
	NodeManager.GCTaskFailureReason - 1 total (1 active), Execution time: mean = 0.000 s, total = 0.000 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s
	ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 224.222 us, total = 224.222 us, Queueing time: mean = 119.308 us, max = 119.308 us, min = 119.308 us, total = 119.308 us
	ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 137.575 us, total = 137.575 us, Queueing time: mean = 36.079 us, max = 36.079 us, min = 36.079 us, total = 36.079 us
DebugString() time ms: 1