JayKimDevolved's picture
JayKimDevolved/deepseek
c011401 verified
NodeManager:
Node ID: 1cc7243d4d7faf0b5672664c331eda22d6e6a5d17cce88079d187efc
Node name: 192.168.0.2
InitialConfigResources: {GPU: 20000, node:192.168.0.2: 10000, CPU: 960000, object_store_memory: 42949672960000, accelerator_type:A40: 10000, memory: 21474836480000, node:__internal_head__: 10000}
ClusterTaskManager:
========== Node: 1cc7243d4d7faf0b5672664c331eda22d6e6a5d17cce88079d187efc =================
Infeasible queue length: 0
Schedule queue length: 0
Dispatch queue length: 0
num_waiting_for_resource: 0
num_waiting_for_plasma_memory: 0
num_waiting_for_remote_node_resources: 0
num_worker_not_started_by_job_config_not_exist: 0
num_worker_not_started_by_registration_timeout: 0
num_tasks_waiting_for_workers: 0
num_cancelled_tasks: 0
cluster_resource_scheduler state:
Local id: -9171529465629905244 Local resources: {"total":{GPU: [10000, 10000], CPU: [960000], object_store_memory: [42949672960000], node:192.168.0.2: [10000], accelerator_type:A40: [10000], node:__internal_head__: [10000], memory: [21474836480000]}}, "available": {GPU: [10000, 10000], CPU: [950000], object_store_memory: [41330736250000], node:192.168.0.2: [10000], accelerator_type:A40: [10000], node:__internal_head__: [10000], memory: [21474836480000]}}, "labels":{"ray.io/node_id":"1cc7243d4d7faf0b5672664c331eda22d6e6a5d17cce88079d187efc",} is_draining: 0 is_idle: 0 Cluster resources: node id: -9171529465629905244{"total":{CPU: 960000, memory: 21474836480000, accelerator_type:A40: 10000, node:__internal_head__: 10000, object_store_memory: 42949672960000, node:192.168.0.2: 10000, GPU: 20000}}, "available": {node:__internal_head__: 10000, GPU: 20000, object_store_memory: 41330736250000, CPU: 950000, node:192.168.0.2: 10000, memory: 21474836480000, accelerator_type:A40: 10000}}, "labels":{"ray.io/node_id":"1cc7243d4d7faf0b5672664c331eda22d6e6a5d17cce88079d187efc",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []}
Waiting tasks size: 0
Number of executing tasks: 1
Number of pinned task arguments: 1
Number of total spilled tasks: 0
Number of spilled waiting tasks: 0
Number of spilled unschedulable tasks: 0
Resource usage {
- (language=PYTHON actor_or_task=AutoscalingRequester.__init__ pid=9993 worker_id=522a0f4099acdc01500b9ffb59119de77306126837ba24d26ad2af7f): {}
- (language=PYTHON actor_or_task=_split_single_block pid=9980 worker_id=e452c28a091ba3e96eea4862e004f22bc03a3e52b7d9dc4769c5199b): {CPU: 10000}
- (language=PYTHON actor_or_task=_StatsActor.__init__ pid=9936 worker_id=3bb4461a97785cb0d8c35531d6b1b95ed047f5f21c22790bbb90baa2): {}
}
Backlog Size per scheduling descriptor :{workerId: num backlogs}:
Running tasks by scheduling class:
- {depth=1 function_descriptor={type=PythonFunctionDescriptor, module_name=ray.data._internal.split, class_name=, function_name=_split_single_block, function_hash=8a2c78bb1c1f424692402153f47e6a04} scheduling_strategy=spread_scheduling_strategy {
}
resource_set={CPU : 1, }}: 1/96
==================================================
ClusterResources:
LocalObjectManager:
- num pinned objects: 31
- pinned objects size: 161893671
- num objects pending restore: 0
- num objects pending spill: 0
- num bytes pending spill: 0
- num bytes currently spilled: 0
- cumulative spill requests: 0
- cumulative restore requests: 0
- spilled objects pending delete: 0
ObjectManager:
- num local objects: 40
- num unfulfilled push requests: 0
- num object pull requests: 0
- num chunks received total: 0
- num chunks received failed (all): 0
- num chunks received failed / cancelled: 0
- num chunks received failed / plasma error: 0
Event stats:
Global stats: 20 total (0 active)
Queueing time: mean = 149.245 us, max = 1.090 ms, min = 23.985 us, total = 2.985 ms
Execution time: mean = 81.061 us, total = 1.621 ms
Event stats:
ObjectManager.FreeObjects - 20 total (0 active), Execution time: mean = 81.061 us, total = 1.621 ms, Queueing time: mean = 149.245 us, max = 1.090 ms, min = 23.985 us, total = 2.985 ms
PushManager:
- num pushes in flight: 0
- num chunks in flight: 0
- num chunks remaining: 0
- max chunks allowed: 409
OwnershipBasedObjectDirectory:
- num listeners: 0
- cumulative location updates: 180
- num location updates per second: 9.998
- num location lookups per second: 0.000
- num locations added per second: 13.997
- num locations removed per second: 12.597
BufferPool:
- create buffer state map size: 0
PullManager:
- num bytes available for pulled objects: 4133073625
- num bytes being pulled (all): 0
- num bytes being pulled / pinned: 0
- get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable}
- wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable}
- task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable}
- first get request bundle: N/A
- first wait request bundle: N/A
- first task request bundle: N/A
- num objects queued: 0
- num objects actively pulled (all): 0
- num objects actively pulled / pinned: 0
- num bundles being pulled: 0
- num pull retries: 0
- max timeout seconds: 0
- max timeout request is already processed. No entry.
WorkerPool:
- registered jobs: 1
- process_failed_job_config_missing: 0
- process_failed_rate_limited: 0
- process_failed_pending_registration: 0
- process_failed_runtime_env_setup_failed: 0
- num PYTHON workers: 96
- num PYTHON drivers: 1
- num PYTHON pending start requests: 0
- num PYTHON pending registration requests: 0
- num object spill callbacks queued: 0
- num object restore queued: 0
- num util functions queued: 0
- num idle workers: 93
TaskDependencyManager:
- task deps map size: 0
- get req map size: 0
- wait req map size: 0
- local objects map size: 40
WaitManager:
- num active wait requests: 0
Subscriber:
Channel WORKER_OBJECT_LOCATIONS_CHANNEL
- cumulative subscribe requests: 1833
- cumulative unsubscribe requests: 1841
- active subscribed publishers: 0
- cumulative published messages: 328
- cumulative processed messages: 285
Channel WORKER_OBJECT_EVICTION
- cumulative subscribe requests: 328
- cumulative unsubscribe requests: 297
- active subscribed publishers: 11
- cumulative published messages: 297
- cumulative processed messages: 297
Channel WORKER_REF_REMOVED_CHANNEL
- cumulative subscribe requests: 0
- cumulative unsubscribe requests: 0
- active subscribed publishers: 0
- cumulative published messages: 0
- cumulative processed messages: 0
num async plasma notifications: 0
Remote node managers:
Event stats:
Global stats: 38583 total (127 active)
Queueing time: mean = 36.040 ms, max = 14.619 s, min = 70.000 ns, total = 1390.525 s
Execution time: mean = 6.773 ms, total = 261.316 s
Event stats:
ClientConnection.async_read.ProcessMessageHeader - 5556 total (97 active), Execution time: mean = 7.178 us, total = 39.879 ms, Queueing time: mean = 242.889 ms, max = 14.619 s, min = 3.492 us, total = 1349.489 s
ClientConnection.async_read.ProcessMessage - 5459 total (0 active), Execution time: mean = 441.788 us, total = 2.412 s, Queueing time: mean = 165.277 us, max = 76.861 ms, min = 1.607 us, total = 902.249 ms
NodeManagerService.grpc_server.ReportWorkerBacklog - 3288 total (0 active), Execution time: mean = 2.590 ms, total = 8.516 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s
NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 3288 total (0 active), Execution time: mean = 52.115 us, total = 171.353 ms, Queueing time: mean = 1.960 ms, max = 101.116 ms, min = 1.921 us, total = 6.445 s
CoreWorkerService.grpc_client.PubsubCommandBatch - 2115 total (0 active), Execution time: mean = 2.333 ms, total = 4.935 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s
CoreWorkerService.grpc_client.PubsubCommandBatch.OnReplyReceived - 2115 total (0 active), Execution time: mean = 203.484 us, total = 430.369 ms, Queueing time: mean = 714.201 us, max = 98.358 ms, min = 4.710 us, total = 1.511 s
WorkerPool.PopWorkerCallback - 1688 total (0 active), Execution time: mean = 136.076 us, total = 229.696 ms, Queueing time: mean = 2.057 ms, max = 95.961 ms, min = 11.546 us, total = 3.473 s
NodeManagerService.grpc_server.RequestWorkerLease - 1688 total (0 active), Execution time: mean = 7.644 ms, total = 12.902 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s
NodeManagerService.grpc_server.RequestWorkerLease.HandleRequestImpl - 1688 total (0 active), Execution time: mean = 1.539 ms, total = 2.598 s, Queueing time: mean = 3.367 ms, max = 96.533 ms, min = 2.364 us, total = 5.684 s
NodeManagerService.grpc_server.ReturnWorker.HandleRequestImpl - 1682 total (0 active), Execution time: mean = 111.540 us, total = 187.611 ms, Queueing time: mean = 747.378 us, max = 93.499 ms, min = 3.504 us, total = 1.257 s
NodeManagerService.grpc_server.ReturnWorker - 1682 total (0 active), Execution time: mean = 1.261 ms, total = 2.120 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s
CoreWorkerService.grpc_client.PubsubLongPolling - 1445 total (16 active), Execution time: mean = 147.874 ms, total = 213.678 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s
CoreWorkerService.grpc_client.PubsubLongPolling.OnReplyReceived - 1429 total (0 active), Execution time: mean = 281.293 us, total = 401.968 ms, Queueing time: mean = 899.345 us, max = 79.771 ms, min = 3.367 us, total = 1.285 s
ObjectManager.ObjectAdded - 425 total (0 active), Execution time: mean = 177.392 us, total = 75.391 ms, Queueing time: mean = 1.010 ms, max = 93.462 ms, min = 11.090 us, total = 429.231 ms
CoreWorkerService.grpc_client.UpdateObjectLocationBatch.OnReplyReceived - 395 total (0 active), Execution time: mean = 42.338 us, total = 16.724 ms, Queueing time: mean = 429.382 us, max = 66.733 ms, min = 4.539 us, total = 169.606 ms
CoreWorkerService.grpc_client.UpdateObjectLocationBatch - 395 total (0 active), Execution time: mean = 2.602 ms, total = 1.028 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s
ObjectManager.ObjectDeleted - 385 total (0 active), Execution time: mean = 41.419 us, total = 15.946 ms, Queueing time: mean = 766.344 us, max = 75.572 ms, min = 22.945 us, total = 295.043 ms
RaySyncer.OnDemandBroadcasting - 373 total (1 active), Execution time: mean = 193.804 us, total = 72.289 ms, Queueing time: mean = 7.316 ms, max = 1.979 s, min = 13.905 us, total = 2.729 s
NodeManager.CheckGC - 373 total (1 active), Execution time: mean = 3.454 us, total = 1.288 ms, Queueing time: mean = 7.504 ms, max = 1.979 s, min = 14.333 us, total = 2.799 s
ObjectManager.UpdateAvailableMemory - 373 total (0 active), Execution time: mean = 4.527 us, total = 1.689 ms, Queueing time: mean = 872.621 us, max = 67.514 ms, min = 4.755 us, total = 325.488 ms
NodeManagerService.grpc_server.PinObjectIDs.HandleRequestImpl - 328 total (0 active), Execution time: mean = 440.264 us, total = 144.406 ms, Queueing time: mean = 1.709 ms, max = 93.173 ms, min = 3.629 us, total = 560.680 ms
NodeManagerService.grpc_server.PinObjectIDs - 328 total (0 active), Execution time: mean = 2.506 ms, total = 821.869 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s
Subscriber.HandlePublishedMessage_WORKER_OBJECT_EVICTION - 297 total (0 active), Execution time: mean = 59.764 us, total = 17.750 ms, Queueing time: mean = 2.700 ms, max = 90.866 ms, min = 61.627 us, total = 801.848 ms
Subscriber.HandlePublishedMessage_WORKER_OBJECT_LOCATIONS_CHANNEL - 285 total (0 active), Execution time: mean = 15.477 us, total = 4.411 ms, Queueing time: mean = 2.698 ms, max = 95.909 ms, min = 66.101 us, total = 769.052 ms
RaySyncer.BroadcastMessage - 235 total (0 active), Execution time: mean = 172.432 us, total = 40.522 ms, Queueing time: mean = 682.336 ns, max = 1.340 us, min = 209.000 ns, total = 160.349 us
- 235 total (0 active), Execution time: mean = 890.315 ns, total = 209.224 us, Queueing time: mean = 1.553 ms, max = 67.558 ms, min = 4.418 us, total = 364.909 ms
RayletWorkerPool.deadline_timer.kill_idle_workers - 189 total (1 active), Execution time: mean = 28.223 us, total = 5.334 ms, Queueing time: mean = 12.363 ms, max = 1.988 s, min = 16.585 us, total = 2.337 s
MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 148 total (1 active), Execution time: mean = 450.788 us, total = 66.717 ms, Queueing time: mean = 20.552 ms, max = 1.938 s, min = 18.248 us, total = 3.042 s
ClientConnection.async_write.DoAsyncWrites - 98 total (0 active), Execution time: mean = 1.794 us, total = 175.840 us, Queueing time: mean = 60.740 us, max = 1.425 ms, min = 13.271 us, total = 5.952 ms
NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 97 total (0 active), Execution time: mean = 81.595 us, total = 7.915 ms, Queueing time: mean = 250.213 us, max = 6.134 ms, min = 18.494 us, total = 24.271 ms
NodeManagerService.grpc_server.GetSystemConfig - 97 total (0 active), Execution time: mean = 737.393 us, total = 71.527 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s
NodeManagerService.grpc_server.CancelWorkerLease.HandleRequestImpl - 57 total (0 active), Execution time: mean = 30.230 us, total = 1.723 ms, Queueing time: mean = 1.456 ms, max = 29.374 ms, min = 12.719 us, total = 82.982 ms
NodeManagerService.grpc_server.CancelWorkerLease - 57 total (0 active), Execution time: mean = 1.918 ms, total = 109.302 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s
NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 40 total (0 active), Execution time: mean = 134.159 us, total = 5.366 ms, Queueing time: mean = 42.342 ms, max = 1.291 s, min = 8.781 us, total = 1.694 s
NodeManagerService.grpc_server.GetResourceLoad - 40 total (0 active), Execution time: mean = 42.951 ms, total = 1.718 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s
NodeManager.ScheduleAndDispatchTasks - 39 total (1 active), Execution time: mean = 13.150 us, total = 512.867 us, Queueing time: mean = 31.651 ms, max = 1.189 s, min = 32.626 us, total = 1.234 s
NodeManager.deadline_timer.spill_objects_when_over_threshold - 39 total (1 active), Execution time: mean = 3.322 us, total = 129.543 us, Queueing time: mean = 31.544 ms, max = 1.184 s, min = 28.135 us, total = 1.230 s
NodeManager.deadline_timer.flush_free_objects - 39 total (1 active), Execution time: mean = 204.164 us, total = 7.962 ms, Queueing time: mean = 31.345 ms, max = 1.184 s, min = 20.774 us, total = 1.222 s
ClusterResourceManager.ResetRemoteNodeView - 14 total (1 active), Execution time: mean = 9.558 us, total = 133.813 us, Queueing time: mean = 9.019 ms, max = 84.332 ms, min = 25.352 us, total = 126.269 ms
PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 293.161 us, total = 3.811 ms, Queueing time: mean = 4.727 ms, max = 14.428 ms, min = 42.640 us, total = 61.455 ms
ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 8 total (0 active), Execution time: mean = 49.559 us, total = 396.468 us, Queueing time: mean = 129.743 us, max = 286.435 us, min = 28.310 us, total = 1.038 ms
NodeManager.deadline_timer.record_metrics - 8 total (1 active), Execution time: mean = 1.446 ms, total = 11.568 ms, Queueing time: mean = 9.988 ms, max = 79.249 ms, min = 23.940 us, total = 79.903 ms
ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 8 total (0 active), Execution time: mean = 1.400 ms, total = 11.200 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s
Subscriber.HandleFailureCallback_WORKER_OBJECT_LOCATIONS_CHANNEL - 8 total (0 active), Execution time: mean = 310.942 us, total = 2.488 ms, Queueing time: mean = 319.207 us, max = 601.596 us, min = 110.037 us, total = 2.554 ms
NodeManager.GcsCheckAlive - 8 total (1 active), Execution time: mean = 364.459 us, total = 2.916 ms, Queueing time: mean = 11.014 ms, max = 85.229 ms, min = 28.367 us, total = 88.109 ms
NodeManager.deadline_timer.debug_state_dump - 4 total (1 active, 1 running), Execution time: mean = 1.574 ms, total = 6.296 ms, Queueing time: mean = 63.189 us, max = 165.893 us, min = 42.272 us, total = 252.756 us
ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 2 total (1 active), Execution time: mean = 3.690 s, total = 7.380 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s
RaySyncerRegister - 2 total (0 active), Execution time: mean = 2.089 us, total = 4.178 us, Queueing time: mean = 286.000 ns, max = 502.000 ns, min = 70.000 ns, total = 572.000 ns
ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 1.661 ms, total = 3.323 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s
ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 107.998 us, total = 215.996 us, Queueing time: mean = 1.366 ms, max = 2.440 ms, min = 291.935 us, total = 2.732 ms
NodeManager.GCTaskFailureReason - 1 total (1 active), Execution time: mean = 0.000 s, total = 0.000 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s
ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 400.463 us, total = 400.463 us, Queueing time: mean = 95.563 us, max = 95.563 us, min = 95.563 us, total = 95.563 us
ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 135.089 us, total = 135.089 us, Queueing time: mean = 127.446 us, max = 127.446 us, min = 127.446 us, total = 127.446 us
ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 1.027 s, total = 1.027 s, Queueing time: mean = 100.553 us, max = 100.553 us, min = 100.553 us, total = 100.553 us
ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 2.551 ms, total = 2.551 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s
ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 1.897 ms, total = 1.897 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s
ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 1.739 ms, total = 1.739 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s
ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 45.977 us, total = 45.977 us, Queueing time: mean = 125.739 us, max = 125.739 us, min = 125.739 us, total = 125.739 us
ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 219.003 us, total = 219.003 us, Queueing time: mean = 80.757 us, max = 80.757 us, min = 80.757 us, total = 80.757 us
ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 1 total (0 active), Execution time: mean = 263.972 us, total = 263.972 us, Queueing time: mean = 150.282 us, max = 150.282 us, min = 150.282 us, total = 150.282 us
NodeManager.deadline_timer.print_event_loop_stats - 1 total (1 active), Execution time: mean = 0.000 s, total = 0.000 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s
ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 1.589 ms, total = 1.589 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s
ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 2.020 ms, total = 2.020 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s
Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 91.662 us, total = 91.662 us, Queueing time: mean = 261.030 us, max = 261.030 us, min = 261.030 us, total = 261.030 us
DebugString() time ms: 2