File size: 11,728 Bytes
c011401 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 |
[2025-01-15 18:18:10,993 I 537971 537971] (gcs_server) gcs_server_main.cc:52: Ray cluster metadata ray_version=2.40.0 ray_commit=22541c38dbef25286cd6d19f1c151bf4fd62f2ed
[2025-01-15 18:18:10,993 I 537971 537971] (gcs_server) io_service_pool.cc:35: IOServicePool is running with 1 io_service.
[2025-01-15 18:18:10,999 I 537971 537971] (gcs_server) event.cc:493: Ray Event initialized for GCS
[2025-01-15 18:18:10,999 I 537971 537971] (gcs_server) event.cc:493: Ray Event initialized for EXPORT_NODE
[2025-01-15 18:18:10,999 I 537971 537971] (gcs_server) event.cc:493: Ray Event initialized for EXPORT_ACTOR
[2025-01-15 18:18:11,000 I 537971 537971] (gcs_server) event.cc:493: Ray Event initialized for EXPORT_DRIVER_JOB
[2025-01-15 18:18:11,000 I 537971 537971] (gcs_server) event.cc:324: Set ray event level to warning
[2025-01-15 18:18:11,007 I 537971 537971] (gcs_server) gcs_server.cc:73: GCS storage type is StorageType::IN_MEMORY
[2025-01-15 18:18:11,009 I 537971 537971] (gcs_server) gcs_init_data.cc:42: Loading job table data.
[2025-01-15 18:18:11,009 I 537971 537971] (gcs_server) gcs_init_data.cc:54: Loading node table data.
[2025-01-15 18:18:11,009 I 537971 537971] (gcs_server) gcs_init_data.cc:80: Loading actor table data.
[2025-01-15 18:18:11,009 I 537971 537971] (gcs_server) gcs_init_data.cc:93: Loading actor task spec table data.
[2025-01-15 18:18:11,009 I 537971 537971] (gcs_server) gcs_init_data.cc:66: Loading placement group table data.
[2025-01-15 18:18:11,009 I 537971 537971] (gcs_server) gcs_init_data.cc:46: Finished loading job table data, size = 0
[2025-01-15 18:18:11,009 I 537971 537971] (gcs_server) gcs_init_data.cc:58: Finished loading node table data, size = 0
[2025-01-15 18:18:11,009 I 537971 537971] (gcs_server) gcs_init_data.cc:84: Finished loading actor table data, size = 0
[2025-01-15 18:18:11,009 I 537971 537971] (gcs_server) gcs_init_data.cc:97: Finished loading actor task spec table data, size = 0
[2025-01-15 18:18:11,009 I 537971 537971] (gcs_server) gcs_init_data.cc:71: Finished loading placement group table data, size = 0
[2025-01-15 18:18:11,009 I 537971 537971] (gcs_server) gcs_server.cc:162: No existing server cluster ID found. Generating new ID: b5a2fe24b4d79a2c32d29b776b9ff3f360bca7b16257a52509c99232
[2025-01-15 18:18:11,010 I 537971 537971] (gcs_server) gcs_server.cc:644: Autoscaler V2 enabled: 0
[2025-01-15 18:18:11,013 I 537971 537971] (gcs_server) grpc_server.cc:134: GcsServer server started, listening on port 63970.
[2025-01-15 18:18:11,261 I 537971 537971] (gcs_server) gcs_server.cc:245: Gcs Debug state:
GcsNodeManager:
- RegisterNode request count: 0
- DrainNode request count: 0
- GetAllNodeInfo request count: 0
GcsActorManager:
- RegisterActor request count: 0
- CreateActor request count: 0
- GetActorInfo request count: 0
- GetNamedActorInfo request count: 0
- GetAllActorInfo request count: 0
- KillActor request count: 0
- ListNamedActors request count: 0
- Registered actors count: 0
- Destroyed actors count: 0
- Named actors count: 0
- Unresolved actors count: 0
- Pending actors count: 0
- Created actors count: 0
- owners_: 0
- actor_to_register_callbacks_: 0
- actor_to_restart_callbacks_: 0
- actor_to_create_callbacks_: 0
- sorted_destroyed_actor_list_: 0
GcsResourceManager:
- GetAllAvailableResources request count: 0
- GetAllTotalResources request count: 0
- GetAllResourceUsage request count: 0
GcsPlacementGroupManager:
- CreatePlacementGroup request count: 0
- RemovePlacementGroup request count: 0
- GetPlacementGroup request count: 0
- GetAllPlacementGroup request count: 0
- WaitPlacementGroupUntilReady request count: 0
- GetNamedPlacementGroup request count: 0
- Scheduling pending placement group count: 0
- Registered placement groups count: 0
- Named placement group count: 0
- Pending placement groups count: 0
- Infeasible placement groups count: 0
Publisher:
[runtime env manager] ID to URIs table:
[runtime env manager] URIs reference table:
GcsTaskManager:
-Total num task events reported: 0
-Total num status task events dropped: 0
-Total num profile events dropped: 0
-Current num of task events stored: 0
-Total num of actor creation tasks: 0
-Total num of actor tasks: 0
-Total num of normal tasks: 0
-Total num of driver tasks: 0
GcsAutoscalerStateManager:
- last_seen_autoscaler_state_version_: 0
- last_cluster_resource_state_version_: 0
- pending demands:
[2025-01-15 18:18:11,261 I 537971 537971] (gcs_server) gcs_server.cc:843: Main service Event stats:
Global stats: 25 total (5 active)
Queueing time: mean = 90.197 ms, max = 249.567 ms, min = 4.242 us, total = 2.255 s
Execution time: mean = 10.073 ms, total = 251.823 ms
Event stats:
GcsInMemoryStore.Put - 9 total (0 active), Execution time: mean = 27.717 ms, total = 249.456 ms, Queueing time: mean = 193.212 ms, max = 248.944 ms, min = 4.242 us, total = 1.739 s
GcsInMemoryStore.GetAll - 5 total (0 active), Execution time: mean = 17.181 us, total = 85.906 us, Queueing time: mean = 116.348 us, max = 125.095 us, min = 105.597 us, total = 581.738 us
PeriodicalRunner.RunFnPeriodically - 4 total (2 active, 1 running), Execution time: mean = 88.208 us, total = 352.832 us, Queueing time: mean = 124.746 ms, max = 249.567 ms, min = 249.419 ms, total = 498.986 ms
event_loop_lag_probe - 2 total (0 active), Execution time: mean = 18.654 us, total = 37.309 us, Queueing time: mean = 7.731 ms, max = 15.136 ms, min = 326.630 us, total = 15.463 ms
NodeInfoGcsService.grpc_server.GetClusterId - 1 total (1 active), Execution time: mean = 0.000 s, total = 0.000 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s
ClusterResourceManager.ResetRemoteNodeView - 1 total (1 active), Execution time: mean = 0.000 s, total = 0.000 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s
RayletLoadPulled - 1 total (1 active), Execution time: mean = 0.000 s, total = 0.000 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s
NodeInfoGcsService.grpc_server.GetClusterId.HandleRequestImpl - 1 total (0 active), Execution time: mean = 1.866 ms, total = 1.866 ms, Queueing time: mean = 975.640 us, max = 975.640 us, min = 975.640 us, total = 975.640 us
GcsInMemoryStore.Get - 1 total (0 active), Execution time: mean = 25.696 us, total = 25.696 us, Queueing time: mean = 6.459 us, max = 6.459 us, min = 6.459 us, total = 6.459 us
[2025-01-15 18:18:11,261 I 537971 537971] (gcs_server) gcs_server.cc:847: task_io_context Event stats:
Global stats: 5 total (1 active)
Queueing time: mean = 384.144 us, max = 1.376 ms, min = 9.744 us, total = 1.921 ms
Execution time: mean = 757.232 us, total = 3.786 ms
Event stats:
event_loop_lag_probe - 3 total (0 active), Execution time: mean = 1.257 ms, total = 3.772 ms, Queueing time: mean = 608.958 us, max = 1.376 ms, min = 9.744 us, total = 1.827 ms
PeriodicalRunner.RunFnPeriodically - 1 total (0 active), Execution time: mean = 13.713 us, total = 13.713 us, Queueing time: mean = 93.848 us, max = 93.848 us, min = 93.848 us, total = 93.848 us
GcsTaskManager.GcJobSummary - 1 total (1 active), Execution time: mean = 0.000 s, total = 0.000 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s
[2025-01-15 18:18:11,261 I 537971 537971] (gcs_server) gcs_server.cc:847: pubsub_io_context Event stats:
Global stats: 5 total (1 active)
Queueing time: mean = 1.207 ms, max = 5.858 ms, min = 8.717 us, total = 6.035 ms
Execution time: mean = 183.696 us, total = 918.478 us
Event stats:
event_loop_lag_probe - 3 total (0 active), Execution time: mean = 300.755 us, total = 902.266 us, Queueing time: mean = 1.969 ms, max = 5.858 ms, min = 8.717 us, total = 5.908 ms
Publisher.CheckDeadSubscribers - 1 total (1 active), Execution time: mean = 0.000 s, total = 0.000 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s
PeriodicalRunner.RunFnPeriodically - 1 total (0 active), Execution time: mean = 16.212 us, total = 16.212 us, Queueing time: mean = 126.750 us, max = 126.750 us, min = 126.750 us, total = 126.750 us
[2025-01-15 18:18:11,261 I 537971 537971] (gcs_server) gcs_server.cc:847: ray_syncer_io_context Event stats:
Global stats: 5 total (0 active)
Queueing time: mean = 1.667 ms, max = 8.092 ms, min = 8.736 us, total = 8.337 ms
Execution time: mean = 45.877 us, total = 229.387 us
Event stats:
event_loop_lag_probe - 3 total (0 active), Execution time: mean = 75.741 us, total = 227.223 us, Queueing time: mean = 2.710 ms, max = 8.092 ms, min = 8.736 us, total = 8.131 ms
RaySyncerRegister - 2 total (0 active), Execution time: mean = 1.082 us, total = 2.164 us, Queueing time: mean = 103.088 us, max = 105.416 us, min = 100.760 us, total = 206.176 us
[2025-01-15 18:18:13,525 I 537971 537971] (gcs_server) gcs_node_manager.cc:85: Registering node info, address = 192.168.0.2, node name = 192.168.0.2 node_id=cb077e6889c43a72ef05e35fce4524837e135a1a964b1abd9c68b471
[2025-01-15 18:18:13,525 I 537971 537971] (gcs_server) gcs_node_manager.cc:91: Finished registering node info, address = 192.168.0.2, node name = 192.168.0.2, is_head_node = 1 node_id=cb077e6889c43a72ef05e35fce4524837e135a1a964b1abd9c68b471
[2025-01-15 18:18:13,525 I 537971 537971] (gcs_server) gcs_placement_group_manager.cc:819: A new node: cb077e6889c43a72ef05e35fce4524837e135a1a964b1abd9c68b471 registered, will try to reschedule all the infeasible placement groups.
[2025-01-15 18:18:13,532 I 537971 538046] (gcs_server) ray_syncer.cc:377: Get connection node_id=cb077e6889c43a72ef05e35fce4524837e135a1a964b1abd9c68b471
[2025-01-15 18:18:14,614 I 537971 537971] (gcs_server) gcs_job_manager.cc:90: Adding job, job id = 01000000, driver pid = 537904
[2025-01-15 18:18:14,614 I 537971 537971] (gcs_server) gcs_job_manager.cc:111: Finished adding job, job id = 01000000, driver pid = 537904
[2025-01-15 18:18:14,889 I 537971 537971] (gcs_server) gcs_job_manager.cc:149: Finished marking job state, job id = 01000000
[2025-01-15 18:18:15,094 I 537971 537971] (gcs_server) gcs_node_manager.cc:366: Removing node, node name = 192.168.0.2, death reason = EXPECTED_TERMINATION, death message = received SIGTERM node_id=cb077e6889c43a72ef05e35fce4524837e135a1a964b1abd9c68b471
[2025-01-15 18:18:15,094 I 537971 537971] (gcs_server) gcs_placement_group_manager.cc:789: Node failed, rescheduling the placement groups on the dead node. node_id=cb077e6889c43a72ef05e35fce4524837e135a1a964b1abd9c68b471
[2025-01-15 18:18:15,094 I 537971 537971] (gcs_server) gcs_actor_manager.cc:1274: Node failed, reconstructing actors. node_id=cb077e6889c43a72ef05e35fce4524837e135a1a964b1abd9c68b471
[2025-01-15 18:18:15,094 I 537971 537971] (gcs_server) gcs_job_manager.cc:454: Node failed, mark all jobs from this node as finished node_id=cb077e6889c43a72ef05e35fce4524837e135a1a964b1abd9c68b471
[2025-01-15 18:18:15,343 I 537971 538020] (gcs_server) ray_syncer-inl.h:318: Failed to read the message from: cb077e6889c43a72ef05e35fce4524837e135a1a964b1abd9c68b471
[2025-01-15 18:18:15,344 I 537971 538020] (gcs_server) ray_syncer.cc:373: Connection is broken. node_id=cb077e6889c43a72ef05e35fce4524837e135a1a964b1abd9c68b471
[2025-01-15 18:18:15,358 I 537971 537971] (gcs_server) gcs_server_main.cc:130: GCS server received SIGTERM, shutting down...
[2025-01-15 18:18:15,360 I 537971 537971] (gcs_server) gcs_server.cc:267: Stopping GCS server.
[2025-01-15 18:18:15,447 I 537971 537971] (gcs_server) gcs_server.cc:284: GCS server stopped.
[2025-01-15 18:18:15,447 I 537971 537971] (gcs_server) io_service_pool.cc:47: IOServicePool is stopped.
[2025-01-15 18:18:15,504 I 537971 537971] (gcs_server) stats.h:120: Stats module has shutdown.
|