File size: 16,688 Bytes
c011401
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
NodeManager:
Node ID: 13be7277f830f5a8b967d2a0092091c94c7576cfebf8a5fa66025fcf
Node name: 192.168.0.2
InitialConfigResources: {node:192.168.0.2: 10000, accelerator_type:A40: 10000, node:__internal_head__: 10000, memory: 844922429440000, object_store_memory: 21474836480000, CPU: 200000, GPU: 20000}
ClusterTaskManager:
========== Node: 13be7277f830f5a8b967d2a0092091c94c7576cfebf8a5fa66025fcf =================
Infeasible queue length: 0
Schedule queue length: 0
Dispatch queue length: 0
num_waiting_for_resource: 0
num_waiting_for_plasma_memory: 0
num_waiting_for_remote_node_resources: 0
num_worker_not_started_by_job_config_not_exist: 0
num_worker_not_started_by_registration_timeout: 0
num_tasks_waiting_for_workers: 0
num_cancelled_tasks: 0
cluster_resource_scheduler state: 
Local id: -2158256074887862688 Local resources: {"total":{node:__internal_head__: [10000], accelerator_type:A40: [10000], GPU: [10000, 10000], CPU: [200000], memory: [844922429440000], object_store_memory: [21474836480000], node:192.168.0.2: [10000]}}, "available": {node:__internal_head__: [10000], accelerator_type:A40: [10000], GPU: [10000, 10000], CPU: [200000], memory: [844922429440000], object_store_memory: [21474836480000], node:192.168.0.2: [10000]}}, "labels":{"ray.io/node_id":"13be7277f830f5a8b967d2a0092091c94c7576cfebf8a5fa66025fcf",} is_draining: 0 is_idle: 1 Cluster resources: node id: -2158256074887862688{"total":{object_store_memory: 21474836480000, CPU: 200000, accelerator_type:A40: 10000, GPU: 20000, node:192.168.0.2: 10000, node:__internal_head__: 10000, memory: 844922429440000}}, "available": {object_store_memory: 21474836480000, CPU: 200000, memory: 844922429440000, GPU: 20000, node:192.168.0.2: 10000, accelerator_type:A40: 10000, node:__internal_head__: 10000}}, "labels":{"ray.io/node_id":"13be7277f830f5a8b967d2a0092091c94c7576cfebf8a5fa66025fcf",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []}
Waiting tasks size: 0
Number of executing tasks: 0
Number of pinned task arguments: 0
Number of total spilled tasks: 0
Number of spilled waiting tasks: 0
Number of spilled unschedulable tasks: 0
Resource usage {
}
Backlog Size per scheduling descriptor :{workerId: num backlogs}:

Running tasks by scheduling class:
==================================================

ClusterResources:
LocalObjectManager:
- num pinned objects: 0
- pinned objects size: 0
- num objects pending restore: 0
- num objects pending spill: 0
- num bytes pending spill: 0
- num bytes currently spilled: 0
- cumulative spill requests: 0
- cumulative restore requests: 0
- spilled objects pending delete: 0

ObjectManager:
- num local objects: 0
- num unfulfilled push requests: 0
- num object pull requests: 0
- num chunks received total: 0
- num chunks received failed (all): 0
- num chunks received failed / cancelled: 0
- num chunks received failed / plasma error: 0
Event stats:
Global stats: 0 total (0 active)
Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s
Execution time:  mean = -nan s, total = 0.000 s
Event stats:
PushManager:
- num pushes in flight: 0
- num chunks in flight: 0
- num chunks remaining: 0
- max chunks allowed: 409
OwnershipBasedObjectDirectory:
- num listeners: 0
- cumulative location updates: 0
- num location updates per second: 0.000
- num location lookups per second: 0.000
- num locations added per second: 0.000
- num locations removed per second: 0.000
BufferPool:
- create buffer state map size: 0
PullManager:
- num bytes available for pulled objects: 2147483648
- num bytes being pulled (all): 0
- num bytes being pulled / pinned: 0
- get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable}
- wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable}
- task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable}
- first get request bundle: N/A
- first wait request bundle: N/A
- first task request bundle: N/A
- num objects queued: 0
- num objects actively pulled (all): 0
- num objects actively pulled / pinned: 0
- num bundles being pulled: 0
- num pull retries: 0
- max timeout seconds: 0
- max timeout request is already processed. No entry.

WorkerPool:
- registered jobs: 1
- process_failed_job_config_missing: 0
- process_failed_rate_limited: 0
- process_failed_pending_registration: 0
- process_failed_runtime_env_setup_failed: 0
- num PYTHON workers: 20
- num PYTHON drivers: 1
- num PYTHON pending start requests: 0
- num PYTHON pending registration requests: 0
- num object spill callbacks queued: 0
- num object restore queued: 0
- num util functions queued: 0
- num idle workers: 20
TaskDependencyManager:
- task deps map size: 0
- get req map size: 0
- wait req map size: 0
- local objects map size: 0
WaitManager:
- num active wait requests: 0
Subscriber:
Channel WORKER_OBJECT_LOCATIONS_CHANNEL
- cumulative subscribe requests: 0
- cumulative unsubscribe requests: 0
- active subscribed publishers: 0
- cumulative published messages: 0
- cumulative processed messages: 0
Channel WORKER_OBJECT_EVICTION
- cumulative subscribe requests: 0
- cumulative unsubscribe requests: 0
- active subscribed publishers: 0
- cumulative published messages: 0
- cumulative processed messages: 0
Channel WORKER_REF_REMOVED_CHANNEL
- cumulative subscribe requests: 0
- cumulative unsubscribe requests: 0
- active subscribed publishers: 0
- cumulative published messages: 0
- cumulative processed messages: 0
num async plasma notifications: 0
Remote node managers: 
Event stats:
Global stats: 45735 total (35 active)
Queueing time: mean = 22.619 ms, max = 123.051 s, min = 57.000 ns, total = 1034.490 s
Execution time:  mean = 228.250 us, total = 10.439 s
Event stats:
	NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 10920 total (0 active), Execution time: mean = 38.910 us, total = 424.898 ms, Queueing time: mean = 118.335 us, max = 26.128 ms, min = 5.488 us, total = 1.292 s
	NodeManagerService.grpc_server.ReportWorkerBacklog - 10920 total (0 active), Execution time: mean = 555.357 us, total = 6.064 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s
	RaySyncer.OnDemandBroadcasting - 5196 total (1 active), Execution time: mean = 11.697 us, total = 60.779 ms, Queueing time: mean = 98.022 us, max = 28.199 ms, min = 12.241 us, total = 509.320 ms
	ObjectManager.UpdateAvailableMemory - 5196 total (0 active), Execution time: mean = 6.196 us, total = 32.192 ms, Queueing time: mean = 116.465 us, max = 706.852 us, min = 3.115 us, total = 605.153 ms
	NodeManager.CheckGC - 5196 total (1 active), Execution time: mean = 3.059 us, total = 15.895 ms, Queueing time: mean = 105.703 us, max = 28.206 ms, min = 6.199 us, total = 549.234 ms
	RayletWorkerPool.deadline_timer.kill_idle_workers - 2600 total (1 active), Execution time: mean = 19.169 us, total = 49.838 ms, Queueing time: mean = 78.963 us, max = 1.689 ms, min = 11.310 us, total = 205.304 ms
	MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 2076 total (1 active), Execution time: mean = 460.779 us, total = 956.577 ms, Queueing time: mean = 89.006 us, max = 23.328 ms, min = 9.730 us, total = 184.776 ms
	NodeManager.ScheduleAndDispatchTasks - 521 total (1 active), Execution time: mean = 15.483 us, total = 8.066 ms, Queueing time: mean = 83.815 us, max = 2.235 ms, min = 5.788 us, total = 43.668 ms
	NodeManager.deadline_timer.spill_objects_when_over_threshold - 520 total (1 active), Execution time: mean = 2.884 us, total = 1.500 ms, Queueing time: mean = 184.971 us, max = 2.259 ms, min = 6.405 us, total = 96.185 ms
	NodeManager.deadline_timer.flush_free_objects - 520 total (1 active), Execution time: mean = 8.940 us, total = 4.649 ms, Queueing time: mean = 180.920 us, max = 2.264 ms, min = 8.861 us, total = 94.079 ms
	NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 520 total (0 active), Execution time: mean = 102.406 us, total = 53.251 ms, Queueing time: mean = 117.319 us, max = 492.037 us, min = 16.203 us, total = 61.006 ms
	NodeManagerService.grpc_server.GetResourceLoad - 520 total (0 active), Execution time: mean = 636.948 us, total = 331.213 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s
	ClusterResourceManager.ResetRemoteNodeView - 174 total (1 active), Execution time: mean = 8.947 us, total = 1.557 ms, Queueing time: mean = 76.749 us, max = 253.487 us, min = 16.912 us, total = 13.354 ms
	NodeManager.GcsCheckAlive - 104 total (1 active), Execution time: mean = 286.918 us, total = 29.839 ms, Queueing time: mean = 642.545 us, max = 2.306 ms, min = 99.479 us, total = 66.825 ms
	ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 104 total (0 active), Execution time: mean = 53.596 us, total = 5.574 ms, Queueing time: mean = 117.882 us, max = 218.783 us, min = 27.880 us, total = 12.260 ms
	ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 104 total (0 active), Execution time: mean = 1.519 ms, total = 157.936 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s
	NodeManager.deadline_timer.record_metrics - 104 total (1 active), Execution time: mean = 552.821 us, total = 57.493 ms, Queueing time: mean = 379.650 us, max = 1.725 ms, min = 8.885 us, total = 39.484 ms
	ClientConnection.async_read.ProcessMessageHeader - 95 total (21 active), Execution time: mean = 8.155 us, total = 774.725 us, Queueing time: mean = 10.849 s, max = 123.051 s, min = 33.517 us, total = 1030.640 s
	ClientConnection.async_read.ProcessMessage - 74 total (0 active), Execution time: mean = 929.411 us, total = 68.776 ms, Queueing time: mean = 43.207 us, max = 369.478 us, min = 3.543 us, total = 3.197 ms
	NodeManager.deadline_timer.debug_state_dump - 52 total (1 active, 1 running), Execution time: mean = 1.791 ms, total = 93.132 ms, Queueing time: mean = 66.948 us, max = 153.123 us, min = 21.807 us, total = 3.481 ms
	ClientConnection.async_write.DoAsyncWrites - 22 total (0 active), Execution time: mean = 1.574 us, total = 34.627 us, Queueing time: mean = 41.270 us, max = 146.493 us, min = 11.116 us, total = 907.929 us
	NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 21 total (0 active), Execution time: mean = 121.458 us, total = 2.551 ms, Queueing time: mean = 94.581 us, max = 235.076 us, min = 11.959 us, total = 1.986 ms
	ObjectManager.ObjectAdded - 21 total (0 active), Execution time: mean = 12.500 us, total = 262.503 us, Queueing time: mean = 106.328 us, max = 212.914 us, min = 9.723 us, total = 2.233 ms
	NodeManagerService.grpc_server.GetSystemConfig - 21 total (0 active), Execution time: mean = 758.936 us, total = 15.938 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s
	ObjectManager.ObjectDeleted - 21 total (0 active), Execution time: mean = 19.347 us, total = 406.289 us, Queueing time: mean = 139.222 us, max = 517.588 us, min = 35.214 us, total = 2.924 ms
	PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 197.633 us, total = 2.569 ms, Queueing time: mean = 4.308 ms, max = 15.027 ms, min = 25.785 us, total = 56.007 ms
	NodeManagerService.grpc_server.RequestWorkerLease - 10 total (0 active), Execution time: mean = 780.093 us, total = 7.801 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s
	 - 10 total (0 active), Execution time: mean = 1.025 us, total = 10.254 us, Queueing time: mean = 114.099 us, max = 182.528 us, min = 27.587 us, total = 1.141 ms
	RaySyncer.BroadcastMessage - 10 total (0 active), Execution time: mean = 221.900 us, total = 2.219 ms, Queueing time: mean = 695.800 ns, max = 931.000 ns, min = 70.000 ns, total = 6.958 us
	WorkerPool.PopWorkerCallback - 10 total (0 active), Execution time: mean = 21.932 us, total = 219.320 us, Queueing time: mean = 109.091 us, max = 200.547 us, min = 20.707 us, total = 1.091 ms
	NodeManagerService.grpc_server.ReturnWorker.HandleRequestImpl - 10 total (0 active), Execution time: mean = 123.767 us, total = 1.238 ms, Queueing time: mean = 96.345 us, max = 137.794 us, min = 36.307 us, total = 963.446 us
	NodeManagerService.grpc_server.ReturnWorker - 10 total (0 active), Execution time: mean = 635.791 us, total = 6.358 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s
	NodeManagerService.grpc_server.RequestWorkerLease.HandleRequestImpl - 10 total (0 active), Execution time: mean = 115.576 us, total = 1.156 ms, Queueing time: mean = 115.782 us, max = 331.820 us, min = 13.069 us, total = 1.158 ms
	NodeManager.deadline_timer.print_event_loop_stats - 9 total (1 active), Execution time: mean = 2.059 ms, total = 18.527 ms, Queueing time: mean = 54.393 us, max = 111.886 us, min = 25.166 us, total = 489.536 us
	ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 1.646 ms, total = 3.291 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s
	ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 2 total (1 active), Execution time: mean = 460.725 ms, total = 921.449 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s
	RaySyncerRegister - 2 total (0 active), Execution time: mean = 1.800 us, total = 3.599 us, Queueing time: mean = 209.500 ns, max = 362.000 ns, min = 57.000 ns, total = 419.000 ns
	ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 129.215 us, total = 258.430 us, Queueing time: mean = 412.096 us, max = 685.035 us, min = 139.158 us, total = 824.193 us
	ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 1.759 ms, total = 1.759 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s
	ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 1.586 ms, total = 1.586 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s
	ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 1.546 ms, total = 1.546 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s
	ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 2.263 ms, total = 2.263 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s
	ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 132.115 us, total = 132.115 us, Queueing time: mean = 203.939 us, max = 203.939 us, min = 203.939 us, total = 203.939 us
	ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 407.271 us, total = 407.271 us, Queueing time: mean = 44.821 us, max = 44.821 us, min = 44.821 us, total = 44.821 us
	ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 1.026 s, total = 1.026 s, Queueing time: mean = 109.121 us, max = 109.121 us, min = 109.121 us, total = 109.121 us
	ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 29.389 us, total = 29.389 us, Queueing time: mean = 150.068 us, max = 150.068 us, min = 150.068 us, total = 150.068 us
	Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 86.446 us, total = 86.446 us, Queueing time: mean = 403.833 us, max = 403.833 us, min = 403.833 us, total = 403.833 us
	NodeManager.GCTaskFailureReason - 1 total (1 active), Execution time: mean = 0.000 s, total = 0.000 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s
	ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 1 total (0 active), Execution time: mean = 327.529 us, total = 327.529 us, Queueing time: mean = 115.620 us, max = 115.620 us, min = 115.620 us, total = 115.620 us
	ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 60.113 us, total = 60.113 us, Queueing time: mean = 447.798 us, max = 447.798 us, min = 447.798 us, total = 447.798 us
	ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 1.740 ms, total = 1.740 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s
DebugString() time ms: 1