|
2025-01-21 06:02:59,899 INFO monitor.py:688 -- Starting monitor using ray installation: /usr/local/lib/python3.10/dist-packages/ray/__init__.py |
|
2025-01-21 06:02:59,899 INFO monitor.py:689 -- Ray version: 2.40.0 |
|
2025-01-21 06:02:59,899 INFO monitor.py:690 -- Ray commit: 22541c38dbef25286cd6d19f1c151bf4fd62f2ed |
|
2025-01-21 06:02:59,899 INFO monitor.py:691 -- Monitor started with command: ['/usr/local/lib/python3.10/dist-packages/ray/autoscaler/_private/monitor.py', '--logs-dir=/tmp/ray/session_2025-01-21_06-02-58_965579_21983/logs', '--logging-rotate-bytes=536870912', '--logging-rotate-backup-count=5', '--gcs-address=192.168.0.2:44745', '--monitor-ip=192.168.0.2'] |
|
2025-01-21 06:02:59,908 INFO monitor.py:159 -- session_name: session_2025-01-21_06-02-58_965579_21983 |
|
2025-01-21 06:02:59,910 INFO monitor.py:191 -- Starting autoscaler metrics server on port 44217 |
|
2025-01-21 06:02:59,910 ERROR monitor.py:208 -- An exception occurred while starting the metrics server. |
|
Traceback (most recent call last): |
|
File "/usr/local/lib/python3.10/dist-packages/ray/autoscaler/_private/monitor.py", line 197, in __init__ |
|
prometheus_client.start_http_server( |
|
File "/usr/local/lib/python3.10/dist-packages/prometheus_client/exposition.py", line 170, in start_wsgi_server |
|
httpd = make_server(addr, port, app, TmpServer, handler_class=_SilentHandler) |
|
File "/usr/lib/python3.10/wsgiref/simple_server.py", line 154, in make_server |
|
server = server_class((host, port), handler_class) |
|
File "/usr/lib/python3.10/socketserver.py", line 452, in __init__ |
|
self.server_bind() |
|
File "/usr/lib/python3.10/wsgiref/simple_server.py", line 50, in server_bind |
|
HTTPServer.server_bind(self) |
|
File "/usr/lib/python3.10/http/server.py", line 137, in server_bind |
|
socketserver.TCPServer.server_bind(self) |
|
File "/usr/lib/python3.10/socketserver.py", line 466, in server_bind |
|
self.socket.bind(self.server_address) |
|
OSError: [Errno 98] Address already in use |
|
2025-01-21 06:02:59,911 INFO monitor.py:216 -- Monitor: Started |
|
2025-01-21 06:02:59,926 INFO autoscaler.py:280 -- disable_node_updaters:False |
|
2025-01-21 06:02:59,926 INFO autoscaler.py:288 -- disable_launch_config_check:True |
|
2025-01-21 06:02:59,926 INFO autoscaler.py:300 -- foreground_node_launch:False |
|
2025-01-21 06:02:59,926 INFO autoscaler.py:310 -- worker_liveness_check:True |
|
2025-01-21 06:02:59,926 INFO autoscaler.py:318 -- worker_rpc_drain:True |
|
2025-01-21 06:02:59,928 INFO autoscaler.py:368 -- StandardAutoscaler: {'cluster_name': 'default', 'max_workers': 0, 'upscaling_speed': 1.0, 'docker': {}, 'idle_timeout_minutes': 0, 'provider': {'type': 'readonly', 'use_node_id_as_ip': True, 'disable_launch_config_check': True}, 'auth': {}, 'available_node_types': {'ray.head.default': {'resources': {}, 'node_config': {}, 'max_workers': 0}}, 'head_node_type': 'ray.head.default', 'file_mounts': {}, 'cluster_synced_files': [], 'file_mounts_sync_continuously': False, 'rsync_exclude': [], 'rsync_filter': [], 'initialization_commands': [], 'setup_commands': [], 'head_setup_commands': [], 'worker_setup_commands': [], 'head_start_ray_commands': [], 'worker_start_ray_commands': []} |
|
2025-01-21 06:02:59,930 INFO monitor.py:383 -- Autoscaler has not yet received load metrics. Waiting. |
|
2025-01-21 06:03:04,938 INFO autoscaler.py:147 -- The autoscaler took 0.0 seconds to fetch the list of non-terminated nodes. |
|
2025-01-21 06:03:04,939 INFO autoscaler.py:427 -- |
|
======== Autoscaler status: 2025-01-21 06:03:04.939603 ======== |
|
Node status |
|
--------------------------------------------------------------- |
|
Active: |
|
1 node_959d61ca307c5c3e7967dc4f62c340eb34c700e436feb4955e4f5877 |
|
Pending: |
|
(no pending nodes) |
|
Recent failures: |
|
(no failures) |
|
|
|
Resources |
|
--------------------------------------------------------------- |
|
Usage: |
|
0.0/20.0 CPU |
|
0.0/2.0 GPU |
|
0B/72.61GiB memory |
|
0B/2.00GiB object_store_memory |
|
|
|
Demands: |
|
(no resource demands) |
|
2025-01-21 06:03:04,942 INFO autoscaler.py:470 -- The autoscaler took 0.004 seconds to complete the update iteration. |
|
2025-01-21 06:03:09,956 INFO autoscaler.py:147 -- The autoscaler took 0.0 seconds to fetch the list of non-terminated nodes. |
|
2025-01-21 06:03:09,956 INFO autoscaler.py:427 -- |
|
======== Autoscaler status: 2025-01-21 06:03:09.956718 ======== |
|
Node status |
|
--------------------------------------------------------------- |
|
Active: |
|
1 node_959d61ca307c5c3e7967dc4f62c340eb34c700e436feb4955e4f5877 |
|
Pending: |
|
(no pending nodes) |
|
Recent failures: |
|
(no failures) |
|
|
|
Resources |
|
--------------------------------------------------------------- |
|
Usage: |
|
0.0/20.0 CPU |
|
0.0/2.0 GPU |
|
0B/72.61GiB memory |
|
0B/2.00GiB object_store_memory |
|
|
|
Demands: |
|
(no resource demands) |
|
2025-01-21 06:03:09,959 INFO autoscaler.py:470 -- The autoscaler took 0.004 seconds to complete the update iteration. |
|
2025-01-21 06:03:14,973 INFO autoscaler.py:147 -- The autoscaler took 0.0 seconds to fetch the list of non-terminated nodes. |
|
2025-01-21 06:03:14,973 INFO autoscaler.py:427 -- |
|
======== Autoscaler status: 2025-01-21 06:03:14.973657 ======== |
|
Node status |
|
--------------------------------------------------------------- |
|
Active: |
|
1 node_959d61ca307c5c3e7967dc4f62c340eb34c700e436feb4955e4f5877 |
|
Pending: |
|
(no pending nodes) |
|
Recent failures: |
|
(no failures) |
|
|
|
Resources |
|
--------------------------------------------------------------- |
|
Usage: |
|
1.0/20.0 CPU |
|
0.0/2.0 GPU |
|
0B/72.61GiB memory |
|
0B/2.00GiB object_store_memory |
|
|
|
Demands: |
|
(no resource demands) |
|
2025-01-21 06:03:14,976 INFO autoscaler.py:470 -- The autoscaler took 0.004 seconds to complete the update iteration. |
|
2025-01-21 06:03:19,986 INFO autoscaler.py:147 -- The autoscaler took 0.0 seconds to fetch the list of non-terminated nodes. |
|
2025-01-21 06:03:19,987 INFO autoscaler.py:427 -- |
|
======== Autoscaler status: 2025-01-21 06:03:19.987341 ======== |
|
Node status |
|
--------------------------------------------------------------- |
|
Active: |
|
1 node_959d61ca307c5c3e7967dc4f62c340eb34c700e436feb4955e4f5877 |
|
Pending: |
|
(no pending nodes) |
|
Recent failures: |
|
(no failures) |
|
|
|
Resources |
|
--------------------------------------------------------------- |
|
Usage: |
|
1.0/20.0 CPU |
|
0.0/2.0 GPU |
|
0B/72.61GiB memory |
|
0B/2.00GiB object_store_memory |
|
|
|
Demands: |
|
(no resource demands) |
|
2025-01-21 06:03:19,989 INFO autoscaler.py:470 -- The autoscaler took 0.003 seconds to complete the update iteration. |
|
2025-01-21 06:03:24,998 INFO autoscaler.py:147 -- The autoscaler took 0.0 seconds to fetch the list of non-terminated nodes. |
|
2025-01-21 06:03:24,999 INFO autoscaler.py:427 -- |
|
======== Autoscaler status: 2025-01-21 06:03:24.999501 ======== |
|
Node status |
|
--------------------------------------------------------------- |
|
Active: |
|
1 node_959d61ca307c5c3e7967dc4f62c340eb34c700e436feb4955e4f5877 |
|
Pending: |
|
(no pending nodes) |
|
Recent failures: |
|
(no failures) |
|
|
|
Resources |
|
--------------------------------------------------------------- |
|
Usage: |
|
1.0/20.0 CPU |
|
0.0/2.0 GPU |
|
0B/72.61GiB memory |
|
0B/2.00GiB object_store_memory |
|
|
|
Demands: |
|
(no resource demands) |
|
2025-01-21 06:03:25,001 INFO autoscaler.py:470 -- The autoscaler took 0.003 seconds to complete the update iteration. |
|
2025-01-21 06:03:30,011 INFO autoscaler.py:147 -- The autoscaler took 0.0 seconds to fetch the list of non-terminated nodes. |
|
2025-01-21 06:03:30,012 INFO autoscaler.py:427 -- |
|
======== Autoscaler status: 2025-01-21 06:03:30.012313 ======== |
|
Node status |
|
--------------------------------------------------------------- |
|
Active: |
|
1 node_959d61ca307c5c3e7967dc4f62c340eb34c700e436feb4955e4f5877 |
|
Pending: |
|
(no pending nodes) |
|
Recent failures: |
|
(no failures) |
|
|
|
Resources |
|
--------------------------------------------------------------- |
|
Usage: |
|
0.0/20.0 CPU |
|
0.0/2.0 GPU |
|
0B/72.61GiB memory |
|
0B/2.00GiB object_store_memory |
|
|
|
Demands: |
|
(no resource demands) |
|
2025-01-21 06:03:30,015 INFO autoscaler.py:470 -- The autoscaler took 0.003 seconds to complete the update iteration. |
|
2025-01-21 06:03:35,026 INFO autoscaler.py:147 -- The autoscaler took 0.0 seconds to fetch the list of non-terminated nodes. |
|
2025-01-21 06:03:35,027 INFO autoscaler.py:427 -- |
|
======== Autoscaler status: 2025-01-21 06:03:35.026879 ======== |
|
Node status |
|
--------------------------------------------------------------- |
|
Active: |
|
1 node_959d61ca307c5c3e7967dc4f62c340eb34c700e436feb4955e4f5877 |
|
Pending: |
|
(no pending nodes) |
|
Recent failures: |
|
(no failures) |
|
|
|
Resources |
|
--------------------------------------------------------------- |
|
Usage: |
|
0.0/20.0 CPU |
|
0.0/2.0 GPU |
|
0B/72.61GiB memory |
|
0B/2.00GiB object_store_memory |
|
|
|
Demands: |
|
(no resource demands) |
|
2025-01-21 06:03:35,029 INFO autoscaler.py:470 -- The autoscaler took 0.003 seconds to complete the update iteration. |
|
2025-01-21 06:03:40,040 INFO autoscaler.py:147 -- The autoscaler took 0.0 seconds to fetch the list of non-terminated nodes. |
|
2025-01-21 06:03:40,041 INFO autoscaler.py:427 -- |
|
======== Autoscaler status: 2025-01-21 06:03:40.040867 ======== |
|
Node status |
|
--------------------------------------------------------------- |
|
Active: |
|
1 node_959d61ca307c5c3e7967dc4f62c340eb34c700e436feb4955e4f5877 |
|
Pending: |
|
(no pending nodes) |
|
Recent failures: |
|
(no failures) |
|
|
|
Resources |
|
--------------------------------------------------------------- |
|
Usage: |
|
0.0/20.0 CPU |
|
0.0/2.0 GPU |
|
0B/72.61GiB memory |
|
0B/2.00GiB object_store_memory |
|
|
|
Demands: |
|
(no resource demands) |
|
2025-01-21 06:03:40,043 INFO autoscaler.py:470 -- The autoscaler took 0.003 seconds to complete the update iteration. |
|
2025-01-21 06:03:45,051 INFO autoscaler.py:147 -- The autoscaler took 0.0 seconds to fetch the list of non-terminated nodes. |
|
2025-01-21 06:03:45,052 INFO autoscaler.py:427 -- |
|
======== Autoscaler status: 2025-01-21 06:03:45.052321 ======== |
|
Node status |
|
--------------------------------------------------------------- |
|
Active: |
|
1 node_959d61ca307c5c3e7967dc4f62c340eb34c700e436feb4955e4f5877 |
|
Pending: |
|
(no pending nodes) |
|
Recent failures: |
|
(no failures) |
|
|
|
Resources |
|
--------------------------------------------------------------- |
|
Usage: |
|
0.0/20.0 CPU |
|
0.0/2.0 GPU |
|
0B/72.61GiB memory |
|
0B/2.00GiB object_store_memory |
|
|
|
Demands: |
|
(no resource demands) |
|
2025-01-21 06:03:45,054 INFO autoscaler.py:470 -- The autoscaler took 0.003 seconds to complete the update iteration. |
|
2025-01-21 06:03:50,066 INFO autoscaler.py:147 -- The autoscaler took 0.0 seconds to fetch the list of non-terminated nodes. |
|
2025-01-21 06:03:50,066 INFO autoscaler.py:427 -- |
|
======== Autoscaler status: 2025-01-21 06:03:50.066682 ======== |
|
Node status |
|
--------------------------------------------------------------- |
|
Active: |
|
1 node_959d61ca307c5c3e7967dc4f62c340eb34c700e436feb4955e4f5877 |
|
Pending: |
|
(no pending nodes) |
|
Recent failures: |
|
(no failures) |
|
|
|
Resources |
|
--------------------------------------------------------------- |
|
Usage: |
|
0.0/20.0 CPU |
|
0.0/2.0 GPU |
|
0B/72.61GiB memory |
|
0B/2.00GiB object_store_memory |
|
|
|
Demands: |
|
(no resource demands) |
|
2025-01-21 06:03:50,068 INFO autoscaler.py:470 -- The autoscaler took 0.002 seconds to complete the update iteration. |
|
2025-01-21 06:03:55,078 INFO autoscaler.py:147 -- The autoscaler took 0.0 seconds to fetch the list of non-terminated nodes. |
|
2025-01-21 06:03:55,079 INFO autoscaler.py:427 -- |
|
======== Autoscaler status: 2025-01-21 06:03:55.078964 ======== |
|
Node status |
|
--------------------------------------------------------------- |
|
Active: |
|
1 node_959d61ca307c5c3e7967dc4f62c340eb34c700e436feb4955e4f5877 |
|
Pending: |
|
(no pending nodes) |
|
Recent failures: |
|
(no failures) |
|
|
|
Resources |
|
--------------------------------------------------------------- |
|
Usage: |
|
0.0/20.0 CPU |
|
0.0/2.0 GPU |
|
0B/72.61GiB memory |
|
0B/2.00GiB object_store_memory |
|
|
|
Demands: |
|
(no resource demands) |
|
2025-01-21 06:03:55,081 INFO autoscaler.py:470 -- The autoscaler took 0.003 seconds to complete the update iteration. |
|
2025-01-21 06:04:00,090 INFO autoscaler.py:147 -- The autoscaler took 0.0 seconds to fetch the list of non-terminated nodes. |
|
2025-01-21 06:04:00,091 INFO autoscaler.py:427 -- |
|
======== Autoscaler status: 2025-01-21 06:04:00.091151 ======== |
|
Node status |
|
--------------------------------------------------------------- |
|
Active: |
|
1 node_959d61ca307c5c3e7967dc4f62c340eb34c700e436feb4955e4f5877 |
|
Pending: |
|
(no pending nodes) |
|
Recent failures: |
|
(no failures) |
|
|
|
Resources |
|
--------------------------------------------------------------- |
|
Usage: |
|
0.0/20.0 CPU |
|
0.0/2.0 GPU |
|
0B/72.61GiB memory |
|
0B/2.00GiB object_store_memory |
|
|
|
Demands: |
|
(no resource demands) |
|
2025-01-21 06:04:00,092 INFO autoscaler.py:470 -- The autoscaler took 0.002 seconds to complete the update iteration. |
|
2025-01-21 06:04:05,100 INFO autoscaler.py:147 -- The autoscaler took 0.0 seconds to fetch the list of non-terminated nodes. |
|
2025-01-21 06:04:05,101 INFO autoscaler.py:427 -- |
|
======== Autoscaler status: 2025-01-21 06:04:05.101323 ======== |
|
Node status |
|
--------------------------------------------------------------- |
|
Active: |
|
1 node_959d61ca307c5c3e7967dc4f62c340eb34c700e436feb4955e4f5877 |
|
Pending: |
|
(no pending nodes) |
|
Recent failures: |
|
(no failures) |
|
|
|
Resources |
|
--------------------------------------------------------------- |
|
Usage: |
|
0.0/20.0 CPU |
|
0.0/2.0 GPU |
|
0B/72.61GiB memory |
|
0B/2.00GiB object_store_memory |
|
|
|
Demands: |
|
(no resource demands) |
|
2025-01-21 06:04:05,103 INFO autoscaler.py:470 -- The autoscaler took 0.003 seconds to complete the update iteration. |
|
|