|
[2025-01-17 17:06:14,589 I 9569 9569] (gcs_server) gcs_server_main.cc:52: Ray cluster metadata ray_version=2.40.0 ray_commit=22541c38dbef25286cd6d19f1c151bf4fd62f2ed |
|
[2025-01-17 17:06:14,589 I 9569 9569] (gcs_server) io_service_pool.cc:35: IOServicePool is running with 1 io_service. |
|
[2025-01-17 17:06:14,595 I 9569 9569] (gcs_server) event.cc:493: Ray Event initialized for GCS |
|
[2025-01-17 17:06:14,595 I 9569 9569] (gcs_server) event.cc:493: Ray Event initialized for EXPORT_NODE |
|
[2025-01-17 17:06:14,595 I 9569 9569] (gcs_server) event.cc:493: Ray Event initialized for EXPORT_ACTOR |
|
[2025-01-17 17:06:14,595 I 9569 9569] (gcs_server) event.cc:493: Ray Event initialized for EXPORT_DRIVER_JOB |
|
[2025-01-17 17:06:14,595 I 9569 9569] (gcs_server) event.cc:324: Set ray event level to warning |
|
[2025-01-17 17:06:14,599 I 9569 9569] (gcs_server) gcs_server.cc:73: GCS storage type is StorageType::IN_MEMORY |
|
[2025-01-17 17:06:14,601 I 9569 9569] (gcs_server) gcs_init_data.cc:42: Loading job table data. |
|
[2025-01-17 17:06:14,601 I 9569 9569] (gcs_server) gcs_init_data.cc:54: Loading node table data. |
|
[2025-01-17 17:06:14,601 I 9569 9569] (gcs_server) gcs_init_data.cc:80: Loading actor table data. |
|
[2025-01-17 17:06:14,601 I 9569 9569] (gcs_server) gcs_init_data.cc:93: Loading actor task spec table data. |
|
[2025-01-17 17:06:14,601 I 9569 9569] (gcs_server) gcs_init_data.cc:66: Loading placement group table data. |
|
[2025-01-17 17:06:14,602 I 9569 9569] (gcs_server) gcs_init_data.cc:46: Finished loading job table data, size = 0 |
|
[2025-01-17 17:06:14,602 I 9569 9569] (gcs_server) gcs_init_data.cc:58: Finished loading node table data, size = 0 |
|
[2025-01-17 17:06:14,602 I 9569 9569] (gcs_server) gcs_init_data.cc:84: Finished loading actor table data, size = 0 |
|
[2025-01-17 17:06:14,602 I 9569 9569] (gcs_server) gcs_init_data.cc:97: Finished loading actor task spec table data, size = 0 |
|
[2025-01-17 17:06:14,602 I 9569 9569] (gcs_server) gcs_init_data.cc:71: Finished loading placement group table data, size = 0 |
|
[2025-01-17 17:06:14,602 I 9569 9569] (gcs_server) gcs_server.cc:162: No existing server cluster ID found. Generating new ID: 89ce0fd4a5ecdb913e19c717c43de3b7cb06bc2ce49d59bd917d5b31 |
|
[2025-01-17 17:06:14,603 I 9569 9569] (gcs_server) gcs_server.cc:644: Autoscaler V2 enabled: 0 |
|
[2025-01-17 17:06:14,606 I 9569 9569] (gcs_server) grpc_server.cc:134: GcsServer server started, listening on port 53056. |
|
[2025-01-17 17:06:14,877 I 9569 9569] (gcs_server) gcs_server.cc:245: Gcs Debug state: |
|
|
|
GcsNodeManager: |
|
- RegisterNode request count: 0 |
|
- DrainNode request count: 0 |
|
- GetAllNodeInfo request count: 0 |
|
|
|
GcsActorManager: |
|
- RegisterActor request count: 0 |
|
- CreateActor request count: 0 |
|
- GetActorInfo request count: 0 |
|
- GetNamedActorInfo request count: 0 |
|
- GetAllActorInfo request count: 0 |
|
- KillActor request count: 0 |
|
- ListNamedActors request count: 0 |
|
- Registered actors count: 0 |
|
- Destroyed actors count: 0 |
|
- Named actors count: 0 |
|
- Unresolved actors count: 0 |
|
- Pending actors count: 0 |
|
- Created actors count: 0 |
|
- owners_: 0 |
|
- actor_to_register_callbacks_: 0 |
|
- actor_to_restart_callbacks_: 0 |
|
- actor_to_create_callbacks_: 0 |
|
- sorted_destroyed_actor_list_: 0 |
|
|
|
GcsResourceManager: |
|
- GetAllAvailableResources request count: 0 |
|
- GetAllTotalResources request count: 0 |
|
- GetAllResourceUsage request count: 0 |
|
|
|
GcsPlacementGroupManager: |
|
- CreatePlacementGroup request count: 0 |
|
- RemovePlacementGroup request count: 0 |
|
- GetPlacementGroup request count: 0 |
|
- GetAllPlacementGroup request count: 0 |
|
- WaitPlacementGroupUntilReady request count: 0 |
|
- GetNamedPlacementGroup request count: 0 |
|
- Scheduling pending placement group count: 0 |
|
- Registered placement groups count: 0 |
|
- Named placement group count: 0 |
|
- Pending placement groups count: 0 |
|
- Infeasible placement groups count: 0 |
|
|
|
Publisher: |
|
|
|
[runtime env manager] ID to URIs table: |
|
[runtime env manager] URIs reference table: |
|
|
|
GcsTaskManager: |
|
-Total num task events reported: 0 |
|
-Total num status task events dropped: 0 |
|
-Total num profile events dropped: 0 |
|
-Current num of task events stored: 0 |
|
-Total num of actor creation tasks: 0 |
|
-Total num of actor tasks: 0 |
|
-Total num of normal tasks: 0 |
|
-Total num of driver tasks: 0 |
|
|
|
GcsAutoscalerStateManager: |
|
- last_seen_autoscaler_state_version_: 0 |
|
- last_cluster_resource_state_version_: 0 |
|
- pending demands: |
|
|
|
|
|
|
|
[2025-01-17 17:06:14,877 I 9569 9569] (gcs_server) gcs_server.cc:843: Main service Event stats: |
|
|
|
|
|
Global stats: 25 total (5 active) |
|
Queueing time: mean = 97.962 ms, max = 271.392 ms, min = 4.930 us, total = 2.449 s |
|
Execution time: mean = 11.011 ms, total = 275.270 ms |
|
Event stats: |
|
GcsInMemoryStore.Put - 9 total (0 active), Execution time: mean = 30.161 ms, total = 271.446 ms, Queueing time: mean = 209.897 ms, max = 270.469 ms, min = 4.930 us, total = 1.889 s |
|
GcsInMemoryStore.GetAll - 5 total (0 active), Execution time: mean = 18.656 us, total = 93.278 us, Queueing time: mean = 117.426 us, max = 129.597 us, min = 105.642 us, total = 587.132 us |
|
PeriodicalRunner.RunFnPeriodically - 4 total (2 active, 1 running), Execution time: mean = 2.785 us, total = 11.141 us, Queueing time: mean = 135.644 ms, max = 271.392 ms, min = 271.184 ms, total = 542.576 ms |
|
event_loop_lag_probe - 2 total (0 active), Execution time: mean = 16.697 us, total = 33.394 us, Queueing time: mean = 6.107 ms, max = 11.825 ms, min = 389.335 us, total = 12.215 ms |
|
GcsInMemoryStore.Get - 1 total (0 active), Execution time: mean = 29.016 us, total = 29.016 us, Queueing time: mean = 6.870 us, max = 6.870 us, min = 6.870 us, total = 6.870 us |
|
NodeInfoGcsService.grpc_server.GetClusterId.HandleRequestImpl - 1 total (0 active), Execution time: mean = 3.657 ms, total = 3.657 ms, Queueing time: mean = 4.585 ms, max = 4.585 ms, min = 4.585 ms, total = 4.585 ms |
|
RayletLoadPulled - 1 total (1 active), Execution time: mean = 0.000 s, total = 0.000 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s |
|
NodeInfoGcsService.grpc_server.GetClusterId - 1 total (1 active), Execution time: mean = 0.000 s, total = 0.000 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s |
|
ClusterResourceManager.ResetRemoteNodeView - 1 total (1 active), Execution time: mean = 0.000 s, total = 0.000 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s |
|
|
|
|
|
[2025-01-17 17:06:14,877 I 9569 9569] (gcs_server) gcs_server.cc:847: task_io_context Event stats: |
|
|
|
|
|
Global stats: 5 total (1 active) |
|
Queueing time: mean = 1.233 ms, max = 6.034 ms, min = 9.247 us, total = 6.166 ms |
|
Execution time: mean = 30.476 us, total = 152.380 us |
|
Event stats: |
|
event_loop_lag_probe - 3 total (0 active), Execution time: mean = 47.016 us, total = 141.048 us, Queueing time: mean = 2.019 ms, max = 6.034 ms, min = 9.247 us, total = 6.058 ms |
|
PeriodicalRunner.RunFnPeriodically - 1 total (0 active), Execution time: mean = 11.332 us, total = 11.332 us, Queueing time: mean = 108.002 us, max = 108.002 us, min = 108.002 us, total = 108.002 us |
|
GcsTaskManager.GcJobSummary - 1 total (1 active), Execution time: mean = 0.000 s, total = 0.000 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s |
|
|
|
|
|
[2025-01-17 17:06:14,877 I 9569 9569] (gcs_server) gcs_server.cc:847: pubsub_io_context Event stats: |
|
|
|
|
|
Global stats: 5 total (1 active) |
|
Queueing time: mean = 249.829 us, max = 1.089 ms, min = 10.854 us, total = 1.249 ms |
|
Execution time: mean = 94.198 us, total = 470.990 us |
|
Event stats: |
|
event_loop_lag_probe - 3 total (0 active), Execution time: mean = 147.856 us, total = 443.568 us, Queueing time: mean = 383.518 us, max = 1.089 ms, min = 10.854 us, total = 1.151 ms |
|
Publisher.CheckDeadSubscribers - 1 total (1 active), Execution time: mean = 0.000 s, total = 0.000 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s |
|
PeriodicalRunner.RunFnPeriodically - 1 total (0 active), Execution time: mean = 27.422 us, total = 27.422 us, Queueing time: mean = 98.594 us, max = 98.594 us, min = 98.594 us, total = 98.594 us |
|
|
|
|
|
[2025-01-17 17:06:14,877 I 9569 9569] (gcs_server) gcs_server.cc:847: ray_syncer_io_context Event stats: |
|
|
|
|
|
Global stats: 5 total (0 active) |
|
Queueing time: mean = 2.704 ms, max = 8.947 ms, min = 9.525 us, total = 13.521 ms |
|
Execution time: mean = 155.087 us, total = 775.433 us |
|
Event stats: |
|
event_loop_lag_probe - 3 total (0 active), Execution time: mean = 257.996 us, total = 773.989 us, Queueing time: mean = 3.003 ms, max = 8.947 ms, min = 9.525 us, total = 9.009 ms |
|
RaySyncerRegister - 2 total (0 active), Execution time: mean = 722.000 ns, total = 1.444 us, Queueing time: mean = 2.256 ms, max = 2.256 ms, min = 2.255 ms, total = 4.511 ms |
|
|
|
|
|
[2025-01-17 17:06:17,163 I 9569 9569] (gcs_server) gcs_node_manager.cc:85: Registering node info, address = 192.168.0.2, node name = 192.168.0.2 node_id=1cc7243d4d7faf0b5672664c331eda22d6e6a5d17cce88079d187efc |
|
[2025-01-17 17:06:17,163 I 9569 9569] (gcs_server) gcs_node_manager.cc:91: Finished registering node info, address = 192.168.0.2, node name = 192.168.0.2, is_head_node = 1 node_id=1cc7243d4d7faf0b5672664c331eda22d6e6a5d17cce88079d187efc |
|
[2025-01-17 17:06:17,164 I 9569 9569] (gcs_server) gcs_placement_group_manager.cc:819: A new node: 1cc7243d4d7faf0b5672664c331eda22d6e6a5d17cce88079d187efc registered, will try to reschedule all the infeasible placement groups. |
|
[2025-01-17 17:06:17,172 I 9569 9657] (gcs_server) ray_syncer.cc:377: Get connection node_id=1cc7243d4d7faf0b5672664c331eda22d6e6a5d17cce88079d187efc |
|
[2025-01-17 17:06:24,544 I 9569 9569] (gcs_server) gcs_job_manager.cc:90: Adding job, job id = 01000000, driver pid = 9501 |
|
[2025-01-17 17:06:24,544 I 9569 9569] (gcs_server) gcs_job_manager.cc:111: Finished adding job, job id = 01000000, driver pid = 9501 |
|
[2025-01-17 17:06:24,658 W 9569 9592] (gcs_server) metric_exporter.cc:105: [1] Export metrics to agent failed: RpcError: RPC Error message: failed to connect to all addresses; last error: UNKNOWN: ipv4:127.0.0.1:63676: Failed to connect to remote host: Connection refused; RPC Error details: . This won't affect Ray, but you can lose metrics from the cluster. |
|
[2025-01-17 17:06:31,057 I 9569 9569] (gcs_server) gcs_actor_manager.cc:393: Registering actor job_id=01000000 actor_id=88ab808e4cbbeec0e9927bd101000000 |
|
[2025-01-17 17:06:31,057 I 9569 9569] (gcs_server) gcs_actor_manager.cc:398: Registered actor, job id = 01000000, actor id = 88ab808e4cbbeec0e9927bd101000000 |
|
[2025-01-17 17:06:31,059 I 9569 9569] (gcs_server) gcs_actor_manager.cc:479: Creating actor job_id=01000000 actor_id=88ab808e4cbbeec0e9927bd101000000 |
|
[2025-01-17 17:06:31,059 I 9569 9569] (gcs_server) gcs_actor_scheduler.cc:313: Start leasing worker from node 1cc7243d4d7faf0b5672664c331eda22d6e6a5d17cce88079d187efc for actor 88ab808e4cbbeec0e9927bd101000000, job id = 01000000 |
|
[2025-01-17 17:06:31,061 I 9569 9569] (gcs_server) gcs_actor_scheduler.cc:633: Finished leasing worker from 1cc7243d4d7faf0b5672664c331eda22d6e6a5d17cce88079d187efc for actor 88ab808e4cbbeec0e9927bd101000000, job id = 01000000 |
|
[2025-01-17 17:06:31,061 I 9569 9569] (gcs_server) gcs_actor_scheduler.cc:448: Start creating actor 88ab808e4cbbeec0e9927bd101000000 on worker 3bb4461a97785cb0d8c35531d6b1b95ed047f5f21c22790bbb90baa2 at node 1cc7243d4d7faf0b5672664c331eda22d6e6a5d17cce88079d187efc, job id = 01000000 |
|
[2025-01-17 17:06:32,183 I 9569 9569] (gcs_server) gcs_actor_scheduler.cc:485: Finished actor creation task for actor 88ab808e4cbbeec0e9927bd101000000 on worker 3bb4461a97785cb0d8c35531d6b1b95ed047f5f21c22790bbb90baa2 at node 1cc7243d4d7faf0b5672664c331eda22d6e6a5d17cce88079d187efc, job id = 01000000 |
|
[2025-01-17 17:06:32,183 I 9569 9569] (gcs_server) gcs_actor_manager.cc:1530: Actor created successfully job_id=01000000 actor_id=88ab808e4cbbeec0e9927bd101000000 |
|
[2025-01-17 17:06:32,184 I 9569 9569] (gcs_server) gcs_actor_manager.cc:494: Finished creating actor. Status: OK job_id=01000000 actor_id=88ab808e4cbbeec0e9927bd101000000 |
|
[2025-01-17 17:06:36,847 I 9569 9569] (gcs_server) gcs_actor_manager.cc:393: Registering actor job_id=01000000 actor_id=124b096e2a41077fec1e8e1601000000 |
|
[2025-01-17 17:06:36,848 I 9569 9569] (gcs_server) gcs_actor_manager.cc:398: Registered actor, job id = 01000000, actor id = 124b096e2a41077fec1e8e1601000000 |
|
[2025-01-17 17:06:36,850 I 9569 9569] (gcs_server) gcs_actor_manager.cc:479: Creating actor job_id=01000000 actor_id=124b096e2a41077fec1e8e1601000000 |
|
[2025-01-17 17:06:36,850 I 9569 9569] (gcs_server) gcs_actor_scheduler.cc:313: Start leasing worker from node 1cc7243d4d7faf0b5672664c331eda22d6e6a5d17cce88079d187efc for actor 124b096e2a41077fec1e8e1601000000, job id = 01000000 |
|
[2025-01-17 17:06:36,852 I 9569 9569] (gcs_server) gcs_actor_scheduler.cc:633: Finished leasing worker from 1cc7243d4d7faf0b5672664c331eda22d6e6a5d17cce88079d187efc for actor 124b096e2a41077fec1e8e1601000000, job id = 01000000 |
|
[2025-01-17 17:06:36,852 I 9569 9569] (gcs_server) gcs_actor_scheduler.cc:448: Start creating actor 124b096e2a41077fec1e8e1601000000 on worker 522a0f4099acdc01500b9ffb59119de77306126837ba24d26ad2af7f at node 1cc7243d4d7faf0b5672664c331eda22d6e6a5d17cce88079d187efc, job id = 01000000 |
|
[2025-01-17 17:06:36,865 I 9569 9569] (gcs_server) gcs_actor_scheduler.cc:485: Finished actor creation task for actor 124b096e2a41077fec1e8e1601000000 on worker 522a0f4099acdc01500b9ffb59119de77306126837ba24d26ad2af7f at node 1cc7243d4d7faf0b5672664c331eda22d6e6a5d17cce88079d187efc, job id = 01000000 |
|
[2025-01-17 17:06:36,865 I 9569 9569] (gcs_server) gcs_actor_manager.cc:1530: Actor created successfully job_id=01000000 actor_id=124b096e2a41077fec1e8e1601000000 |
|
[2025-01-17 17:06:36,866 I 9569 9569] (gcs_server) gcs_actor_manager.cc:494: Finished creating actor. Status: OK job_id=01000000 actor_id=124b096e2a41077fec1e8e1601000000 |
|
[2025-01-17 17:06:36,866 I 9569 9569] (gcs_server) gcs_actor_manager.cc:393: Registering actor job_id=01000000 actor_id=ca5f15063e8159e6751f8d2901000000 |
|
[2025-01-17 17:06:36,866 W 9569 9569] (gcs_server) gcs_actor_manager.cc:403: Failed to register actor: NotFound: Actor with name 'AutoscalingRequester' already exists in the namespace AutoscalingRequester job_id=01000000 actor_id=ca5f15063e8159e6751f8d2901000000 |
|
[2025-01-17 17:07:05,780 I 9569 9569] (gcs_server) gcs_job_manager.cc:149: Finished marking job state, job id = 01000000 |
|
[2025-01-17 17:07:06,864 I 9569 9569] (gcs_server) gcs_node_manager.cc:366: Removing node, node name = 192.168.0.2, death reason = EXPECTED_TERMINATION, death message = received SIGTERM node_id=1cc7243d4d7faf0b5672664c331eda22d6e6a5d17cce88079d187efc |
|
[2025-01-17 17:07:06,865 I 9569 9569] (gcs_server) gcs_placement_group_manager.cc:789: Node failed, rescheduling the placement groups on the dead node. node_id=1cc7243d4d7faf0b5672664c331eda22d6e6a5d17cce88079d187efc |
|
[2025-01-17 17:07:06,865 I 9569 9569] (gcs_server) gcs_actor_manager.cc:1274: Node failed, reconstructing actors. node_id=1cc7243d4d7faf0b5672664c331eda22d6e6a5d17cce88079d187efc |
|
[2025-01-17 17:07:06,865 I 9569 9569] (gcs_server) gcs_actor_manager.cc:1397: Actor is failed on worker 3bb4461a97785cb0d8c35531d6b1b95ed047f5f21c22790bbb90baa2 at node 1cc7243d4d7faf0b5672664c331eda22d6e6a5d17cce88079d187efc, need_reschedule = 1, death context type = ActorDiedErrorContext, remaining_restarts = 0 job_id=01000000 actor_id=88ab808e4cbbeec0e9927bd101000000 |
|
[2025-01-17 17:07:06,865 I 9569 9569] (gcs_server) gcs_actor_manager.cc:936: Actor name datasets_stats_actor is cleand up. |
|
[2025-01-17 17:07:06,865 I 9569 9569] (gcs_server) gcs_actor_manager.cc:1397: Actor is failed on worker 522a0f4099acdc01500b9ffb59119de77306126837ba24d26ad2af7f at node 1cc7243d4d7faf0b5672664c331eda22d6e6a5d17cce88079d187efc, need_reschedule = 1, death context type = ActorDiedErrorContext, remaining_restarts = -1 job_id=01000000 actor_id=124b096e2a41077fec1e8e1601000000 |
|
[2025-01-17 17:07:06,865 I 9569 9569] (gcs_server) gcs_job_manager.cc:454: Node failed, mark all jobs from this node as finished node_id=1cc7243d4d7faf0b5672664c331eda22d6e6a5d17cce88079d187efc |
|
[2025-01-17 17:07:06,865 I 9569 9569] (gcs_server) gcs_actor_manager.cc:1023: Destroying actor job_id=01000000 actor_id=88ab808e4cbbeec0e9927bd101000000 |
|
[2025-01-17 17:07:07,160 I 9569 9569] (gcs_server) gcs_server_main.cc:130: GCS server received SIGTERM, shutting down... |
|
[2025-01-17 17:07:07,163 I 9569 9569] (gcs_server) gcs_server.cc:267: Stopping GCS server. |
|
|