JayKimDevolved's picture
JayKimDevolved/deepseek
c011401 verified
raw
history blame
17.1 kB
NodeManager:
Node ID: 436871bdc85bdc6a74a720eb95141ed73f9ec5b33bf63663a74609e3
Node name: 192.168.0.2
InitialConfigResources: {node:192.168.0.2: 10000, accelerator_type:A40: 10000, node:__internal_head__: 10000, memory: 577933393920000, object_store_memory: 288966696960000, CPU: 160000, GPU: 20000}
ClusterTaskManager:
========== Node: 436871bdc85bdc6a74a720eb95141ed73f9ec5b33bf63663a74609e3 =================
Infeasible queue length: 0
Schedule queue length: 0
Dispatch queue length: 0
num_waiting_for_resource: 0
num_waiting_for_plasma_memory: 0
num_waiting_for_remote_node_resources: 0
num_worker_not_started_by_job_config_not_exist: 0
num_worker_not_started_by_registration_timeout: 0
num_tasks_waiting_for_workers: 0
num_cancelled_tasks: 0
cluster_resource_scheduler state:
Local id: 5156797141345256205 Local resources: {"total":{GPU: [10000, 10000], CPU: [160000], object_store_memory: [288966696960000], node:192.168.0.2: [10000], accelerator_type:A40: [10000], node:__internal_head__: [10000], memory: [577933393920000]}}, "available": {GPU: [10000, 10000], CPU: [160000], object_store_memory: [288966696960000], node:192.168.0.2: [10000], accelerator_type:A40: [10000], node:__internal_head__: [10000], memory: [577933393920000]}}, "labels":{"ray.io/node_id":"436871bdc85bdc6a74a720eb95141ed73f9ec5b33bf63663a74609e3",} is_draining: 0 is_idle: 1 Cluster resources: node id: 5156797141345256205{"total":{node:192.168.0.2: 10000, accelerator_type:A40: 10000, memory: 577933393920000, node:__internal_head__: 10000, GPU: 20000, CPU: 160000, object_store_memory: 288966696960000}}, "available": {accelerator_type:A40: 10000, node:192.168.0.2: 10000, memory: 577933393920000, GPU: 20000, node:__internal_head__: 10000, CPU: 160000, object_store_memory: 288966696960000}}, "labels":{"ray.io/node_id":"436871bdc85bdc6a74a720eb95141ed73f9ec5b33bf63663a74609e3",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []}
Waiting tasks size: 0
Number of executing tasks: 0
Number of pinned task arguments: 0
Number of total spilled tasks: 0
Number of spilled waiting tasks: 0
Number of spilled unschedulable tasks: 0
Resource usage {
}
Backlog Size per scheduling descriptor :{workerId: num backlogs}:
Running tasks by scheduling class:
==================================================
ClusterResources:
LocalObjectManager:
- num pinned objects: 0
- pinned objects size: 0
- num objects pending restore: 0
- num objects pending spill: 0
- num bytes pending spill: 0
- num bytes currently spilled: 0
- cumulative spill requests: 0
- cumulative restore requests: 0
- spilled objects pending delete: 0
ObjectManager:
- num local objects: 0
- num unfulfilled push requests: 0
- num object pull requests: 0
- num chunks received total: 0
- num chunks received failed (all): 0
- num chunks received failed / cancelled: 0
- num chunks received failed / plasma error: 0
Event stats:
Global stats: 0 total (0 active)
Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s
Execution time: mean = -nan s, total = 0.000 s
Event stats:
PushManager:
- num pushes in flight: 0
- num chunks in flight: 0
- num chunks remaining: 0
- max chunks allowed: 409
OwnershipBasedObjectDirectory:
- num listeners: 0
- cumulative location updates: 0
- num location updates per second: 0.000
- num location lookups per second: 0.000
- num locations added per second: 0.000
- num locations removed per second: 0.000
BufferPool:
- create buffer state map size: 0
PullManager:
- num bytes available for pulled objects: 28896669696
- num bytes being pulled (all): 0
- num bytes being pulled / pinned: 0
- get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable}
- wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable}
- task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable}
- first get request bundle: N/A
- first wait request bundle: N/A
- first task request bundle: N/A
- num objects queued: 0
- num objects actively pulled (all): 0
- num objects actively pulled / pinned: 0
- num bundles being pulled: 0
- num pull retries: 0
- max timeout seconds: 0
- max timeout request is already processed. No entry.
WorkerPool:
- registered jobs: 1
- process_failed_job_config_missing: 0
- process_failed_rate_limited: 0
- process_failed_pending_registration: 0
- process_failed_runtime_env_setup_failed: 0
- num PYTHON workers: 16
- num PYTHON drivers: 1
- num PYTHON pending start requests: 0
- num PYTHON pending registration requests: 0
- num object spill callbacks queued: 0
- num object restore queued: 0
- num util functions queued: 0
- num idle workers: 16
TaskDependencyManager:
- task deps map size: 0
- get req map size: 0
- wait req map size: 0
- local objects map size: 0
WaitManager:
- num active wait requests: 0
Subscriber:
Channel WORKER_OBJECT_LOCATIONS_CHANNEL
- cumulative subscribe requests: 0
- cumulative unsubscribe requests: 0
- active subscribed publishers: 0
- cumulative published messages: 0
- cumulative processed messages: 0
Channel WORKER_OBJECT_EVICTION
- cumulative subscribe requests: 0
- cumulative unsubscribe requests: 0
- active subscribed publishers: 0
- cumulative published messages: 0
- cumulative processed messages: 0
Channel WORKER_REF_REMOVED_CHANNEL
- cumulative subscribe requests: 0
- cumulative unsubscribe requests: 0
- active subscribed publishers: 0
- cumulative published messages: 0
- cumulative processed messages: 0
num async plasma notifications: 0
Remote node managers:
Event stats:
Global stats: 16120 total (31 active)
Queueing time: mean = 514.358 us, max = 4.119 s, min = 53.000 ns, total = 8.291 s
Execution time: mean = 291.764 us, total = 4.703 s
Event stats:
NodeManagerService.grpc_server.ReportWorkerBacklog - 3400 total (0 active), Execution time: mean = 555.048 us, total = 1.887 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s
NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 3400 total (0 active), Execution time: mean = 39.332 us, total = 133.727 ms, Queueing time: mean = 114.196 us, max = 457.744 us, min = 4.360 us, total = 388.265 ms
NodeManager.CheckGC - 1999 total (1 active), Execution time: mean = 3.113 us, total = 6.222 ms, Queueing time: mean = 106.145 us, max = 7.776 ms, min = 12.553 us, total = 212.183 ms
RaySyncer.OnDemandBroadcasting - 1999 total (1 active), Execution time: mean = 11.670 us, total = 23.327 ms, Queueing time: mean = 98.620 us, max = 7.768 ms, min = 17.205 us, total = 197.141 ms
ObjectManager.UpdateAvailableMemory - 1999 total (0 active), Execution time: mean = 6.299 us, total = 12.591 ms, Queueing time: mean = 114.426 us, max = 448.428 us, min = 4.677 us, total = 228.738 ms
RayletWorkerPool.deadline_timer.kill_idle_workers - 1000 total (1 active), Execution time: mean = 20.088 us, total = 20.088 ms, Queueing time: mean = 77.705 us, max = 1.318 ms, min = 16.015 us, total = 77.705 ms
MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 799 total (1 active), Execution time: mean = 466.407 us, total = 372.660 ms, Queueing time: mean = 84.572 us, max = 5.629 ms, min = 9.673 us, total = 67.573 ms
NodeManager.ScheduleAndDispatchTasks - 201 total (1 active), Execution time: mean = 17.152 us, total = 3.448 ms, Queueing time: mean = 80.714 us, max = 2.335 ms, min = 10.081 us, total = 16.224 ms
NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 200 total (0 active), Execution time: mean = 113.744 us, total = 22.749 ms, Queueing time: mean = 113.433 us, max = 179.044 us, min = 28.435 us, total = 22.687 ms
NodeManager.deadline_timer.flush_free_objects - 200 total (1 active), Execution time: mean = 9.238 us, total = 1.848 ms, Queueing time: mean = 204.994 us, max = 3.852 ms, min = 10.003 us, total = 40.999 ms
NodeManagerService.grpc_server.GetResourceLoad - 200 total (0 active), Execution time: mean = 672.850 us, total = 134.570 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s
NodeManager.deadline_timer.spill_objects_when_over_threshold - 200 total (1 active), Execution time: mean = 2.750 us, total = 549.916 us, Queueing time: mean = 209.745 us, max = 3.872 ms, min = 8.097 us, total = 41.949 ms
ClientConnection.async_read.ProcessMessageHeader - 73 total (17 active), Execution time: mean = 7.363 us, total = 537.491 us, Queueing time: mean = 93.740 ms, max = 4.119 s, min = 23.863 us, total = 6.843 s
ClusterResourceManager.ResetRemoteNodeView - 67 total (1 active), Execution time: mean = 9.474 us, total = 634.770 us, Queueing time: mean = 74.448 us, max = 138.971 us, min = 18.048 us, total = 4.988 ms
ClientConnection.async_read.ProcessMessage - 56 total (0 active), Execution time: mean = 853.118 us, total = 47.775 ms, Queueing time: mean = 50.305 us, max = 330.349 us, min = 4.637 us, total = 2.817 ms
ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 40 total (0 active), Execution time: mean = 1.531 ms, total = 61.224 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s
ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 40 total (0 active), Execution time: mean = 55.452 us, total = 2.218 ms, Queueing time: mean = 119.211 us, max = 174.670 us, min = 35.268 us, total = 4.768 ms
NodeManager.deadline_timer.record_metrics - 40 total (1 active), Execution time: mean = 582.699 us, total = 23.308 ms, Queueing time: mean = 405.189 us, max = 2.227 ms, min = 20.120 us, total = 16.208 ms
NodeManager.GcsCheckAlive - 40 total (1 active), Execution time: mean = 289.528 us, total = 11.581 ms, Queueing time: mean = 691.446 us, max = 2.620 ms, min = 90.244 us, total = 27.658 ms
NodeManager.deadline_timer.debug_state_dump - 20 total (1 active, 1 running), Execution time: mean = 1.874 ms, total = 37.485 ms, Queueing time: mean = 66.033 us, max = 128.814 us, min = 40.760 us, total = 1.321 ms
ClientConnection.async_write.DoAsyncWrites - 18 total (0 active), Execution time: mean = 1.183 us, total = 21.290 us, Queueing time: mean = 58.972 us, max = 350.163 us, min = 11.754 us, total = 1.061 ms
NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 17 total (0 active), Execution time: mean = 111.454 us, total = 1.895 ms, Queueing time: mean = 2.344 ms, max = 37.348 ms, min = 10.325 us, total = 39.845 ms
ObjectManager.ObjectAdded - 17 total (0 active), Execution time: mean = 10.989 us, total = 186.814 us, Queueing time: mean = 90.374 us, max = 201.039 us, min = 8.654 us, total = 1.536 ms
NodeManagerService.grpc_server.GetSystemConfig - 17 total (0 active), Execution time: mean = 3.105 ms, total = 52.788 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s
ObjectManager.ObjectDeleted - 17 total (0 active), Execution time: mean = 19.136 us, total = 325.310 us, Queueing time: mean = 150.300 us, max = 363.207 us, min = 38.732 us, total = 2.555 ms
PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 193.429 us, total = 2.515 ms, Queueing time: mean = 3.658 ms, max = 12.297 ms, min = 61.360 us, total = 47.557 ms
NodeManager.deadline_timer.print_event_loop_stats - 4 total (1 active), Execution time: mean = 2.249 ms, total = 8.995 ms, Queueing time: mean = 176.598 us, max = 599.439 us, min = 40.291 us, total = 706.391 us
- 3 total (0 active), Execution time: mean = 1.041 us, total = 3.122 us, Queueing time: mean = 173.118 us, max = 225.649 us, min = 112.440 us, total = 519.354 us
NodeManagerService.grpc_server.ReturnWorker.HandleRequestImpl - 3 total (0 active), Execution time: mean = 105.602 us, total = 316.805 us, Queueing time: mean = 94.684 us, max = 118.743 us, min = 66.960 us, total = 284.051 us
NodeManagerService.grpc_server.RequestWorkerLease - 3 total (0 active), Execution time: mean = 831.908 us, total = 2.496 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s
WorkerPool.PopWorkerCallback - 3 total (0 active), Execution time: mean = 37.328 us, total = 111.985 us, Queueing time: mean = 70.069 us, max = 108.129 us, min = 36.205 us, total = 210.206 us
NodeManagerService.grpc_server.ReturnWorker - 3 total (0 active), Execution time: mean = 618.798 us, total = 1.856 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s
RaySyncer.BroadcastMessage - 3 total (0 active), Execution time: mean = 171.243 us, total = 513.728 us, Queueing time: mean = 495.000 ns, max = 795.000 ns, min = 64.000 ns, total = 1.485 us
NodeManagerService.grpc_server.RequestWorkerLease.HandleRequestImpl - 3 total (0 active), Execution time: mean = 172.808 us, total = 518.424 us, Queueing time: mean = 157.745 us, max = 249.875 us, min = 104.417 us, total = 473.236 us
ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 2 total (1 active), Execution time: mean = 395.836 ms, total = 791.673 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s
RaySyncerRegister - 2 total (0 active), Execution time: mean = 2.103 us, total = 4.205 us, Queueing time: mean = 207.500 ns, max = 362.000 ns, min = 53.000 ns, total = 415.000 ns
ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 1.672 ms, total = 3.345 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s
ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 151.719 us, total = 303.438 us, Queueing time: mean = 634.301 us, max = 1.112 ms, min = 156.108 us, total = 1.269 ms
ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 1.021 s, total = 1.021 s, Queueing time: mean = 115.896 us, max = 115.896 us, min = 115.896 us, total = 115.896 us
ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 1 total (0 active), Execution time: mean = 250.770 us, total = 250.770 us, Queueing time: mean = 167.708 us, max = 167.708 us, min = 167.708 us, total = 167.708 us
NodeManager.GCTaskFailureReason - 1 total (1 active), Execution time: mean = 0.000 s, total = 0.000 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s
ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 1.936 ms, total = 1.936 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s
ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 110.445 us, total = 110.445 us, Queueing time: mean = 119.814 us, max = 119.814 us, min = 119.814 us, total = 119.814 us
Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 65.944 us, total = 65.944 us, Queueing time: mean = 245.340 us, max = 245.340 us, min = 245.340 us, total = 245.340 us
ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 25.894 us, total = 25.894 us, Queueing time: mean = 128.891 us, max = 128.891 us, min = 128.891 us, total = 128.891 us
ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 1.896 ms, total = 1.896 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s
ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 213.138 us, total = 213.138 us, Queueing time: mean = 91.253 us, max = 91.253 us, min = 91.253 us, total = 91.253 us
ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 1.687 ms, total = 1.687 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s
ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 1.171 ms, total = 1.171 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s
ray::rpc::JobInfoGcsService.grpc_client.ReportJobError.OnReplyReceived - 1 total (0 active), Execution time: mean = 57.772 us, total = 57.772 us, Queueing time: mean = 171.436 us, max = 171.436 us, min = 171.436 us, total = 171.436 us
ray::rpc::JobInfoGcsService.grpc_client.ReportJobError - 1 total (0 active), Execution time: mean = 1.637 ms, total = 1.637 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s
ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 1.350 ms, total = 1.350 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s
ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 633.811 us, total = 633.811 us, Queueing time: mean = 119.913 us, max = 119.913 us, min = 119.913 us, total = 119.913 us
DebugString() time ms: 2