JayKimDevolved's picture
JayKimDevolved/deepseek
c011401 verified
raw
history blame
15.6 kB
NodeManager:
Node ID: df7ee78b4bc3ac4f066532a9ba0bb0c580f959d04b14b9c1859f7fa0
Node name: 192.168.0.2
InitialConfigResources: {CPU: 200000, memory: 849738305540000, node:192.168.0.2: 10000, accelerator_type:A40: 10000, GPU: 20000, object_store_memory: 21474836480000, node:__internal_head__: 10000}
ClusterTaskManager:
========== Node: df7ee78b4bc3ac4f066532a9ba0bb0c580f959d04b14b9c1859f7fa0 =================
Infeasible queue length: 0
Schedule queue length: 0
Dispatch queue length: 0
num_waiting_for_resource: 0
num_waiting_for_plasma_memory: 0
num_waiting_for_remote_node_resources: 0
num_worker_not_started_by_job_config_not_exist: 0
num_worker_not_started_by_registration_timeout: 0
num_tasks_waiting_for_workers: 0
num_cancelled_tasks: 0
cluster_resource_scheduler state:
Local id: 652636538163867454 Local resources: {"total":{node:__internal_head__: [10000], accelerator_type:A40: [10000], node:192.168.0.2: [10000], CPU: [200000], memory: [849738305540000], GPU: [10000, 10000], object_store_memory: [21474836480000]}}, "available": {node:__internal_head__: [10000], accelerator_type:A40: [10000], node:192.168.0.2: [10000], CPU: [200000], memory: [849738305540000], GPU: [10000, 10000], object_store_memory: [21474836480000]}}, "labels":{"ray.io/node_id":"df7ee78b4bc3ac4f066532a9ba0bb0c580f959d04b14b9c1859f7fa0",} is_draining: 0 is_idle: 1 Cluster resources: node id: 652636538163867454{"total":{GPU: 20000, memory: 849738305540000, CPU: 200000, object_store_memory: 21474836480000, node:__internal_head__: 10000, accelerator_type:A40: 10000, node:192.168.0.2: 10000}}, "available": {GPU: 20000, memory: 849738305540000, node:__internal_head__: 10000, object_store_memory: 21474836480000, CPU: 200000, accelerator_type:A40: 10000, node:192.168.0.2: 10000}}, "labels":{"ray.io/node_id":"df7ee78b4bc3ac4f066532a9ba0bb0c580f959d04b14b9c1859f7fa0",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []}
Waiting tasks size: 0
Number of executing tasks: 0
Number of pinned task arguments: 0
Number of total spilled tasks: 0
Number of spilled waiting tasks: 0
Number of spilled unschedulable tasks: 0
Resource usage {
}
Backlog Size per scheduling descriptor :{workerId: num backlogs}:
Running tasks by scheduling class:
==================================================
ClusterResources:
LocalObjectManager:
- num pinned objects: 0
- pinned objects size: 0
- num objects pending restore: 0
- num objects pending spill: 0
- num bytes pending spill: 0
- num bytes currently spilled: 0
- cumulative spill requests: 0
- cumulative restore requests: 0
- spilled objects pending delete: 0
ObjectManager:
- num local objects: 0
- num unfulfilled push requests: 0
- num object pull requests: 0
- num chunks received total: 0
- num chunks received failed (all): 0
- num chunks received failed / cancelled: 0
- num chunks received failed / plasma error: 0
Event stats:
Global stats: 0 total (0 active)
Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s
Execution time: mean = -nan s, total = 0.000 s
Event stats:
PushManager:
- num pushes in flight: 0
- num chunks in flight: 0
- num chunks remaining: 0
- max chunks allowed: 409
OwnershipBasedObjectDirectory:
- num listeners: 0
- cumulative location updates: 0
- num location updates per second: 0.000
- num location lookups per second: 0.000
- num locations added per second: 0.000
- num locations removed per second: 0.000
BufferPool:
- create buffer state map size: 0
PullManager:
- num bytes available for pulled objects: 2147483648
- num bytes being pulled (all): 0
- num bytes being pulled / pinned: 0
- get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable}
- wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable}
- task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable}
- first get request bundle: N/A
- first wait request bundle: N/A
- first task request bundle: N/A
- num objects queued: 0
- num objects actively pulled (all): 0
- num objects actively pulled / pinned: 0
- num bundles being pulled: 0
- num pull retries: 0
- max timeout seconds: 0
- max timeout request is already processed. No entry.
WorkerPool:
- registered jobs: 1
- process_failed_job_config_missing: 0
- process_failed_rate_limited: 0
- process_failed_pending_registration: 0
- process_failed_runtime_env_setup_failed: 0
- num PYTHON workers: 20
- num PYTHON drivers: 1
- num PYTHON pending start requests: 0
- num PYTHON pending registration requests: 0
- num object spill callbacks queued: 0
- num object restore queued: 0
- num util functions queued: 0
- num idle workers: 20
TaskDependencyManager:
- task deps map size: 0
- get req map size: 0
- wait req map size: 0
- local objects map size: 0
WaitManager:
- num active wait requests: 0
Subscriber:
Channel WORKER_OBJECT_LOCATIONS_CHANNEL
- cumulative subscribe requests: 0
- cumulative unsubscribe requests: 0
- active subscribed publishers: 0
- cumulative published messages: 0
- cumulative processed messages: 0
Channel WORKER_REF_REMOVED_CHANNEL
- cumulative subscribe requests: 0
- cumulative unsubscribe requests: 0
- active subscribed publishers: 0
- cumulative published messages: 0
- cumulative processed messages: 0
Channel WORKER_OBJECT_EVICTION
- cumulative subscribe requests: 0
- cumulative unsubscribe requests: 0
- active subscribed publishers: 0
- cumulative published messages: 0
- cumulative processed messages: 0
num async plasma notifications: 0
Remote node managers:
Event stats:
Global stats: 1164 total (35 active)
Queueing time: mean = 882.789 us, max = 802.311 ms, min = 57.000 ns, total = 1.028 s
Execution time: mean = 979.010 us, total = 1.140 s
Event stats:
NodeManagerService.grpc_server.ReportWorkerBacklog - 210 total (0 active), Execution time: mean = 269.668 us, total = 56.630 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s
NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 210 total (0 active), Execution time: mean = 19.678 us, total = 4.132 ms, Queueing time: mean = 40.684 us, max = 562.645 us, min = 2.460 us, total = 8.544 ms
ObjectManager.UpdateAvailableMemory - 100 total (0 active), Execution time: mean = 3.578 us, total = 357.838 us, Queueing time: mean = 33.812 us, max = 134.128 us, min = 14.880 us, total = 3.381 ms
NodeManager.CheckGC - 100 total (1 active), Execution time: mean = 2.363 us, total = 236.271 us, Queueing time: mean = 37.632 us, max = 210.220 us, min = 13.599 us, total = 3.763 ms
RaySyncer.OnDemandBroadcasting - 100 total (1 active), Execution time: mean = 6.788 us, total = 678.760 us, Queueing time: mean = 34.169 us, max = 211.808 us, min = 15.450 us, total = 3.417 ms
ClientConnection.async_read.ProcessMessageHeader - 84 total (21 active), Execution time: mean = 5.779 us, total = 485.478 us, Queueing time: mean = 11.374 ms, max = 802.311 ms, min = 18.925 us, total = 955.416 ms
ClientConnection.async_read.ProcessMessage - 63 total (0 active), Execution time: mean = 751.226 us, total = 47.327 ms, Queueing time: mean = 89.619 us, max = 648.651 us, min = 3.423 us, total = 5.646 ms
RayletWorkerPool.deadline_timer.kill_idle_workers - 50 total (1 active), Execution time: mean = 8.933 us, total = 446.653 us, Queueing time: mean = 37.368 us, max = 77.404 us, min = 15.386 us, total = 1.868 ms
MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 40 total (1 active), Execution time: mean = 384.062 us, total = 15.362 ms, Queueing time: mean = 33.003 us, max = 70.333 us, min = 13.390 us, total = 1.320 ms
ClientConnection.async_write.DoAsyncWrites - 22 total (0 active), Execution time: mean = 990.045 ns, total = 21.781 us, Queueing time: mean = 35.949 us, max = 106.575 us, min = 14.308 us, total = 790.880 us
NodeManagerService.grpc_server.GetSystemConfig - 21 total (0 active), Execution time: mean = 984.462 us, total = 20.674 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s
ObjectManager.ObjectAdded - 21 total (0 active), Execution time: mean = 6.589 us, total = 138.359 us, Queueing time: mean = 104.430 us, max = 575.126 us, min = 11.183 us, total = 2.193 ms
ObjectManager.ObjectDeleted - 21 total (0 active), Execution time: mean = 13.286 us, total = 279.008 us, Queueing time: mean = 145.941 us, max = 618.608 us, min = 19.844 us, total = 3.065 ms
NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 21 total (0 active), Execution time: mean = 90.451 us, total = 1.899 ms, Queueing time: mean = 33.567 us, max = 213.432 us, min = 8.296 us, total = 704.912 us
PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 167.874 us, total = 2.182 ms, Queueing time: mean = 2.520 ms, max = 7.977 ms, min = 24.441 us, total = 32.757 ms
NodeManager.ScheduleAndDispatchTasks - 11 total (1 active), Execution time: mean = 7.595 us, total = 83.550 us, Queueing time: mean = 27.742 us, max = 67.218 us, min = 14.959 us, total = 305.164 us
NodeManager.deadline_timer.flush_free_objects - 10 total (1 active), Execution time: mean = 3.285 us, total = 32.851 us, Queueing time: mean = 89.287 us, max = 747.563 us, min = 15.720 us, total = 892.869 us
NodeManager.deadline_timer.spill_objects_when_over_threshold - 10 total (1 active), Execution time: mean = 2.189 us, total = 21.894 us, Queueing time: mean = 90.590 us, max = 748.971 us, min = 17.060 us, total = 905.904 us
NodeManagerService.grpc_server.GetResourceLoad - 10 total (0 active), Execution time: mean = 547.703 us, total = 5.477 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s
NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 10 total (0 active), Execution time: mean = 81.480 us, total = 814.798 us, Queueing time: mean = 32.979 us, max = 105.980 us, min = 12.917 us, total = 329.789 us
ClusterResourceManager.ResetRemoteNodeView - 4 total (1 active), Execution time: mean = 3.971 us, total = 15.883 us, Queueing time: mean = 18.888 us, max = 27.066 us, min = 22.402 us, total = 75.552 us
ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 2 total (0 active), Execution time: mean = 25.256 us, total = 50.511 us, Queueing time: mean = 15.526 us, max = 21.927 us, min = 9.125 us, total = 31.052 us
ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 89.472 us, total = 178.943 us, Queueing time: mean = 675.398 us, max = 1.342 ms, min = 9.245 us, total = 1.351 ms
NodeManager.GcsCheckAlive - 2 total (1 active), Execution time: mean = 138.817 us, total = 277.634 us, Queueing time: mean = 164.706 us, max = 329.411 us, min = 329.411 us, total = 329.411 us
RaySyncerRegister - 2 total (0 active), Execution time: mean = 1.268 us, total = 2.535 us, Queueing time: mean = 202.500 ns, max = 348.000 ns, min = 57.000 ns, total = 405.000 ns
ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 720.242 us, total = 1.440 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s
ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 2 total (0 active), Execution time: mean = 634.042 us, total = 1.268 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s
NodeManager.deadline_timer.record_metrics - 2 total (1 active), Execution time: mean = 401.781 us, total = 803.562 us, Queueing time: mean = 12.668 us, max = 25.335 us, min = 25.335 us, total = 25.335 us
ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 2 total (1 active), Execution time: mean = 478.141 ms, total = 956.281 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s
RaySyncer.BroadcastMessage - 1 total (0 active), Execution time: mean = 52.571 us, total = 52.571 us, Queueing time: mean = 75.000 ns, max = 75.000 ns, min = 75.000 ns, total = 75.000 ns
NodeManager.deadline_timer.debug_state_dump - 1 total (1 active, 1 running), Execution time: mean = 0.000 s, total = 0.000 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s
- 1 total (0 active), Execution time: mean = 317.000 ns, total = 317.000 ns, Queueing time: mean = 19.306 us, max = 19.306 us, min = 19.306 us, total = 19.306 us
ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 1 total (0 active), Execution time: mean = 161.934 us, total = 161.934 us, Queueing time: mean = 14.176 us, max = 14.176 us, min = 14.176 us, total = 14.176 us
ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 41.743 us, total = 41.743 us, Queueing time: mean = 163.965 us, max = 163.965 us, min = 163.965 us, total = 163.965 us
ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 748.712 us, total = 748.712 us, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s
ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 874.703 us, total = 874.703 us, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s
Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 61.375 us, total = 61.375 us, Queueing time: mean = 194.268 us, max = 194.268 us, min = 194.268 us, total = 194.268 us
ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 16.732 ms, total = 16.732 ms, Queueing time: mean = 12.747 us, max = 12.747 us, min = 12.747 us, total = 12.747 us
NodeManager.GCTaskFailureReason - 1 total (1 active), Execution time: mean = 0.000 s, total = 0.000 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s
ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 619.132 us, total = 619.132 us, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s
NodeManager.deadline_timer.print_event_loop_stats - 1 total (1 active), Execution time: mean = 0.000 s, total = 0.000 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s
ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 1.199 ms, total = 1.199 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s
ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 1.156 ms, total = 1.156 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s
ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 203.966 us, total = 203.966 us, Queueing time: mean = 10.832 us, max = 10.832 us, min = 10.832 us, total = 10.832 us
ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 9.492 us, total = 9.492 us, Queueing time: mean = 9.578 us, max = 9.578 us, min = 9.578 us, total = 9.578 us
ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 105.229 us, total = 105.229 us, Queueing time: mean = 29.033 us, max = 29.033 us, min = 29.033 us, total = 29.033 us
DebugString() time ms: 1