JayKimDevolved's picture
JayKimDevolved/deepseek
c011401 verified
raw
history blame
12.1 kB
[2025-01-15 18:17:47,765 I 533898 533898] (gcs_server) gcs_server_main.cc:52: Ray cluster metadata ray_version=2.40.0 ray_commit=22541c38dbef25286cd6d19f1c151bf4fd62f2ed
[2025-01-15 18:17:47,765 I 533898 533898] (gcs_server) io_service_pool.cc:35: IOServicePool is running with 1 io_service.
[2025-01-15 18:17:47,771 I 533898 533898] (gcs_server) event.cc:493: Ray Event initialized for GCS
[2025-01-15 18:17:47,771 I 533898 533898] (gcs_server) event.cc:493: Ray Event initialized for EXPORT_NODE
[2025-01-15 18:17:47,771 I 533898 533898] (gcs_server) event.cc:493: Ray Event initialized for EXPORT_ACTOR
[2025-01-15 18:17:47,771 I 533898 533898] (gcs_server) event.cc:493: Ray Event initialized for EXPORT_DRIVER_JOB
[2025-01-15 18:17:47,771 I 533898 533898] (gcs_server) event.cc:324: Set ray event level to warning
[2025-01-15 18:17:47,779 I 533898 533898] (gcs_server) gcs_server.cc:73: GCS storage type is StorageType::IN_MEMORY
[2025-01-15 18:17:47,780 I 533898 533898] (gcs_server) gcs_init_data.cc:42: Loading job table data.
[2025-01-15 18:17:47,780 I 533898 533898] (gcs_server) gcs_init_data.cc:54: Loading node table data.
[2025-01-15 18:17:47,780 I 533898 533898] (gcs_server) gcs_init_data.cc:80: Loading actor table data.
[2025-01-15 18:17:47,780 I 533898 533898] (gcs_server) gcs_init_data.cc:93: Loading actor task spec table data.
[2025-01-15 18:17:47,780 I 533898 533898] (gcs_server) gcs_init_data.cc:66: Loading placement group table data.
[2025-01-15 18:17:47,780 I 533898 533898] (gcs_server) gcs_init_data.cc:46: Finished loading job table data, size = 0
[2025-01-15 18:17:47,780 I 533898 533898] (gcs_server) gcs_init_data.cc:58: Finished loading node table data, size = 0
[2025-01-15 18:17:47,780 I 533898 533898] (gcs_server) gcs_init_data.cc:84: Finished loading actor table data, size = 0
[2025-01-15 18:17:47,780 I 533898 533898] (gcs_server) gcs_init_data.cc:97: Finished loading actor task spec table data, size = 0
[2025-01-15 18:17:47,780 I 533898 533898] (gcs_server) gcs_init_data.cc:71: Finished loading placement group table data, size = 0
[2025-01-15 18:17:47,780 I 533898 533898] (gcs_server) gcs_server.cc:162: No existing server cluster ID found. Generating new ID: e5dec9027e386ac6ec6f39a457e169a0f2a4e318ad71fe99cf74ef20
[2025-01-15 18:17:47,781 I 533898 533898] (gcs_server) gcs_server.cc:644: Autoscaler V2 enabled: 0
[2025-01-15 18:17:47,786 I 533898 533898] (gcs_server) grpc_server.cc:134: GcsServer server started, listening on port 62933.
[2025-01-15 18:17:48,040 I 533898 533898] (gcs_server) gcs_server.cc:245: Gcs Debug state:
GcsNodeManager:
- RegisterNode request count: 0
- DrainNode request count: 0
- GetAllNodeInfo request count: 0
GcsActorManager:
- RegisterActor request count: 0
- CreateActor request count: 0
- GetActorInfo request count: 0
- GetNamedActorInfo request count: 0
- GetAllActorInfo request count: 0
- KillActor request count: 0
- ListNamedActors request count: 0
- Registered actors count: 0
- Destroyed actors count: 0
- Named actors count: 0
- Unresolved actors count: 0
- Pending actors count: 0
- Created actors count: 0
- owners_: 0
- actor_to_register_callbacks_: 0
- actor_to_restart_callbacks_: 0
- actor_to_create_callbacks_: 0
- sorted_destroyed_actor_list_: 0
GcsResourceManager:
- GetAllAvailableResources request count: 0
- GetAllTotalResources request count: 0
- GetAllResourceUsage request count: 0
GcsPlacementGroupManager:
- CreatePlacementGroup request count: 0
- RemovePlacementGroup request count: 0
- GetPlacementGroup request count: 0
- GetAllPlacementGroup request count: 0
- WaitPlacementGroupUntilReady request count: 0
- GetNamedPlacementGroup request count: 0
- Scheduling pending placement group count: 0
- Registered placement groups count: 0
- Named placement group count: 0
- Pending placement groups count: 0
- Infeasible placement groups count: 0
Publisher:
[runtime env manager] ID to URIs table:
[runtime env manager] URIs reference table:
GcsTaskManager:
-Total num task events reported: 0
-Total num status task events dropped: 0
-Total num profile events dropped: 0
-Current num of task events stored: 0
-Total num of actor creation tasks: 0
-Total num of actor tasks: 0
-Total num of normal tasks: 0
-Total num of driver tasks: 0
GcsAutoscalerStateManager:
- last_seen_autoscaler_state_version_: 0
- last_cluster_resource_state_version_: 0
- pending demands:
[2025-01-15 18:17:48,041 I 533898 533898] (gcs_server) gcs_server.cc:843: Main service Event stats:
Global stats: 25 total (5 active)
Queueing time: mean = 93.160 ms, max = 257.492 ms, min = 1.915 us, total = 2.329 s
Execution time: mean = 10.381 ms, total = 259.528 ms
Event stats:
GcsInMemoryStore.Put - 9 total (0 active), Execution time: mean = 28.613 ms, total = 257.519 ms, Queueing time: mean = 199.498 ms, max = 256.865 ms, min = 1.915 us, total = 1.795 s
GcsInMemoryStore.GetAll - 5 total (0 active), Execution time: mean = 8.645 us, total = 43.227 us, Queueing time: mean = 52.025 us, max = 55.901 us, min = 47.949 us, total = 260.123 us
PeriodicalRunner.RunFnPeriodically - 4 total (2 active, 1 running), Execution time: mean = 2.804 us, total = 11.217 us, Queueing time: mean = 128.716 ms, max = 257.492 ms, min = 257.372 ms, total = 514.864 ms
event_loop_lag_probe - 2 total (0 active), Execution time: mean = 7.656 us, total = 15.312 us, Queueing time: mean = 7.527 ms, max = 14.854 ms, min = 200.542 us, total = 15.054 ms
GcsInMemoryStore.Get - 1 total (0 active), Execution time: mean = 12.623 us, total = 12.623 us, Queueing time: mean = 2.902 us, max = 2.902 us, min = 2.902 us, total = 2.902 us
RayletLoadPulled - 1 total (1 active), Execution time: mean = 0.000 s, total = 0.000 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s
NodeInfoGcsService.grpc_server.GetClusterId - 1 total (1 active), Execution time: mean = 0.000 s, total = 0.000 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s
NodeInfoGcsService.grpc_server.GetClusterId.HandleRequestImpl - 1 total (0 active), Execution time: mean = 1.927 ms, total = 1.927 ms, Queueing time: mean = 3.338 ms, max = 3.338 ms, min = 3.338 ms, total = 3.338 ms
ClusterResourceManager.ResetRemoteNodeView - 1 total (1 active), Execution time: mean = 0.000 s, total = 0.000 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s
[2025-01-15 18:17:48,041 I 533898 533898] (gcs_server) gcs_server.cc:847: task_io_context Event stats:
Global stats: 5 total (1 active)
Queueing time: mean = 359.416 us, max = 866.739 us, min = 13.707 us, total = 1.797 ms
Execution time: mean = 710.139 us, total = 3.551 ms
Event stats:
event_loop_lag_probe - 3 total (0 active), Execution time: mean = 1.179 ms, total = 3.536 ms, Queueing time: mean = 565.575 us, max = 866.739 us, min = 13.707 us, total = 1.697 ms
PeriodicalRunner.RunFnPeriodically - 1 total (0 active), Execution time: mean = 14.460 us, total = 14.460 us, Queueing time: mean = 100.357 us, max = 100.357 us, min = 100.357 us, total = 100.357 us
GcsTaskManager.GcJobSummary - 1 total (1 active), Execution time: mean = 0.000 s, total = 0.000 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s
[2025-01-15 18:17:48,041 I 533898 533898] (gcs_server) gcs_server.cc:847: pubsub_io_context Event stats:
Global stats: 5 total (1 active)
Queueing time: mean = 1.604 ms, max = 7.861 ms, min = 9.658 us, total = 8.022 ms
Execution time: mean = 61.565 us, total = 307.826 us
Event stats:
event_loop_lag_probe - 3 total (0 active), Execution time: mean = 90.467 us, total = 271.401 us, Queueing time: mean = 2.634 ms, max = 7.861 ms, min = 9.658 us, total = 7.903 ms
PeriodicalRunner.RunFnPeriodically - 1 total (0 active), Execution time: mean = 36.425 us, total = 36.425 us, Queueing time: mean = 119.146 us, max = 119.146 us, min = 119.146 us, total = 119.146 us
Publisher.CheckDeadSubscribers - 1 total (1 active), Execution time: mean = 0.000 s, total = 0.000 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s
[2025-01-15 18:17:48,041 I 533898 533898] (gcs_server) gcs_server.cc:847: ray_syncer_io_context Event stats:
Global stats: 5 total (0 active)
Queueing time: mean = 1.380 ms, max = 6.635 ms, min = 15.009 us, total = 6.900 ms
Execution time: mean = 118.320 us, total = 591.600 us
Event stats:
event_loop_lag_probe - 3 total (0 active), Execution time: mean = 196.402 us, total = 589.207 us, Queueing time: mean = 2.234 ms, max = 6.635 ms, min = 15.009 us, total = 6.702 ms
RaySyncerRegister - 2 total (0 active), Execution time: mean = 1.196 us, total = 2.393 us, Queueing time: mean = 99.183 us, max = 99.987 us, min = 98.379 us, total = 198.366 us
[2025-01-15 18:17:50,333 I 533898 533898] (gcs_server) gcs_node_manager.cc:85: Registering node info, address = 192.168.0.2, node name = 192.168.0.2 node_id=e6bd14444f2c0342ac093772e695e167dc22d9d6b4f6ebde7a35c8fc
[2025-01-15 18:17:50,333 I 533898 533898] (gcs_server) gcs_node_manager.cc:91: Finished registering node info, address = 192.168.0.2, node name = 192.168.0.2, is_head_node = 1 node_id=e6bd14444f2c0342ac093772e695e167dc22d9d6b4f6ebde7a35c8fc
[2025-01-15 18:17:50,333 I 533898 533898] (gcs_server) gcs_placement_group_manager.cc:819: A new node: e6bd14444f2c0342ac093772e695e167dc22d9d6b4f6ebde7a35c8fc registered, will try to reschedule all the infeasible placement groups.
[2025-01-15 18:17:50,341 I 533898 533984] (gcs_server) ray_syncer.cc:377: Get connection node_id=e6bd14444f2c0342ac093772e695e167dc22d9d6b4f6ebde7a35c8fc
[2025-01-15 18:17:51,305 I 533898 533898] (gcs_server) gcs_job_manager.cc:90: Adding job, job id = 01000000, driver pid = 533830
[2025-01-15 18:17:51,305 I 533898 533898] (gcs_server) gcs_job_manager.cc:111: Finished adding job, job id = 01000000, driver pid = 533830
[2025-01-15 18:17:57,618 I 533898 533898] (gcs_server) gcs_job_manager.cc:149: Finished marking job state, job id = 01000000
[2025-01-15 18:17:57,714 I 533898 533898] (gcs_server) gcs_node_manager.cc:366: Removing node, node name = 192.168.0.2, death reason = EXPECTED_TERMINATION, death message = received SIGTERM node_id=e6bd14444f2c0342ac093772e695e167dc22d9d6b4f6ebde7a35c8fc
[2025-01-15 18:17:57,715 I 533898 533898] (gcs_server) gcs_placement_group_manager.cc:789: Node failed, rescheduling the placement groups on the dead node. node_id=e6bd14444f2c0342ac093772e695e167dc22d9d6b4f6ebde7a35c8fc
[2025-01-15 18:17:57,715 I 533898 533898] (gcs_server) gcs_actor_manager.cc:1274: Node failed, reconstructing actors. node_id=e6bd14444f2c0342ac093772e695e167dc22d9d6b4f6ebde7a35c8fc
[2025-01-15 18:17:57,715 I 533898 533898] (gcs_server) gcs_job_manager.cc:454: Node failed, mark all jobs from this node as finished node_id=e6bd14444f2c0342ac093772e695e167dc22d9d6b4f6ebde7a35c8fc
[2025-01-15 18:17:57,785 W 533898 533921] (gcs_server) metric_exporter.cc:105: [1] Export metrics to agent failed: RpcError: RPC Error message: failed to connect to all addresses; last error: UNKNOWN: ipv4:127.0.0.1:49953: Failed to connect to remote host: Connection refused; RPC Error details: . This won't affect Ray, but you can lose metrics from the cluster.
[2025-01-15 18:17:57,940 I 533898 533947] (gcs_server) ray_syncer-inl.h:318: Failed to read the message from: e6bd14444f2c0342ac093772e695e167dc22d9d6b4f6ebde7a35c8fc
[2025-01-15 18:17:57,940 I 533898 533947] (gcs_server) ray_syncer.cc:373: Connection is broken. node_id=e6bd14444f2c0342ac093772e695e167dc22d9d6b4f6ebde7a35c8fc
[2025-01-15 18:17:57,978 I 533898 533898] (gcs_server) gcs_server_main.cc:130: GCS server received SIGTERM, shutting down...
[2025-01-15 18:17:57,980 I 533898 533898] (gcs_server) gcs_server.cc:267: Stopping GCS server.
[2025-01-15 18:17:58,056 I 533898 533898] (gcs_server) gcs_server.cc:284: GCS server stopped.
[2025-01-15 18:17:58,057 I 533898 533898] (gcs_server) io_service_pool.cc:47: IOServicePool is stopped.
[2025-01-15 18:17:58,086 I 533898 533898] (gcs_server) stats.h:120: Stats module has shutdown.