[2024-09-01 06:28:21,632][00307] Saving configuration to /content/train_dir/default_experiment/config.json... [2024-09-01 06:28:21,635][00307] Rollout worker 0 uses device cpu [2024-09-01 06:28:21,638][00307] Rollout worker 1 uses device cpu [2024-09-01 06:28:21,639][00307] Rollout worker 2 uses device cpu [2024-09-01 06:28:21,641][00307] Rollout worker 3 uses device cpu [2024-09-01 06:28:21,643][00307] Rollout worker 4 uses device cpu [2024-09-01 06:28:21,645][00307] Rollout worker 5 uses device cpu [2024-09-01 06:28:21,646][00307] Rollout worker 6 uses device cpu [2024-09-01 06:28:21,648][00307] Rollout worker 7 uses device cpu [2024-09-01 06:32:11,949][00307] Environment doom_basic already registered, overwriting... [2024-09-01 06:32:11,952][00307] Environment doom_two_colors_easy already registered, overwriting... [2024-09-01 06:32:11,955][00307] Environment doom_two_colors_hard already registered, overwriting... [2024-09-01 06:32:11,956][00307] Environment doom_dm already registered, overwriting... [2024-09-01 06:32:11,958][00307] Environment doom_dwango5 already registered, overwriting... [2024-09-01 06:32:11,961][00307] Environment doom_my_way_home_flat_actions already registered, overwriting... [2024-09-01 06:32:11,962][00307] Environment doom_defend_the_center_flat_actions already registered, overwriting... [2024-09-01 06:32:11,964][00307] Environment doom_my_way_home already registered, overwriting... [2024-09-01 06:32:11,967][00307] Environment doom_deadly_corridor already registered, overwriting... [2024-09-01 06:32:11,968][00307] Environment doom_defend_the_center already registered, overwriting... [2024-09-01 06:32:11,969][00307] Environment doom_defend_the_line already registered, overwriting... [2024-09-01 06:32:11,971][00307] Environment doom_health_gathering already registered, overwriting... [2024-09-01 06:32:11,973][00307] Environment doom_health_gathering_supreme already registered, overwriting... [2024-09-01 06:32:11,974][00307] Environment doom_battle already registered, overwriting... [2024-09-01 06:32:11,976][00307] Environment doom_battle2 already registered, overwriting... [2024-09-01 06:32:11,977][00307] Environment doom_duel_bots already registered, overwriting... [2024-09-01 06:32:11,979][00307] Environment doom_deathmatch_bots already registered, overwriting... [2024-09-01 06:32:11,981][00307] Environment doom_duel already registered, overwriting... [2024-09-01 06:32:11,982][00307] Environment doom_deathmatch_full already registered, overwriting... [2024-09-01 06:32:11,984][00307] Environment doom_benchmark already registered, overwriting... [2024-09-01 06:32:11,985][00307] register_encoder_factory: [2024-09-01 06:32:12,019][00307] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json [2024-09-01 06:32:12,021][00307] Overriding arg 'device' with value 'cpu' passed from command line [2024-09-01 06:32:12,035][00307] Experiment dir /content/train_dir/default_experiment already exists! [2024-09-01 06:32:12,039][00307] Resuming existing experiment from /content/train_dir/default_experiment... [2024-09-01 06:32:12,040][00307] Weights and Biases integration disabled [2024-09-01 06:32:12,046][00307] Environment var CUDA_VISIBLE_DEVICES is [2024-09-01 06:32:14,625][00307] Starting experiment with the following configuration: help=False algo=APPO env=doom_health_gathering_supreme experiment=default_experiment train_dir=/content/train_dir restart_behavior=resume device=cpu seed=None num_policies=1 async_rl=True serial_mode=False batched_sampling=False num_batches_to_accumulate=2 worker_num_splits=2 policy_workers_per_policy=1 max_policy_lag=1000 num_workers=8 num_envs_per_worker=4 batch_size=1024 num_batches_per_epoch=1 num_epochs=1 rollout=32 recurrence=32 shuffle_minibatches=False gamma=0.99 reward_scale=1.0 reward_clip=1000.0 value_bootstrap=False normalize_returns=True exploration_loss_coeff=0.001 value_loss_coeff=0.5 kl_loss_coeff=0.0 exploration_loss=symmetric_kl gae_lambda=0.95 ppo_clip_ratio=0.1 ppo_clip_value=0.2 with_vtrace=False vtrace_rho=1.0 vtrace_c=1.0 optimizer=adam adam_eps=1e-06 adam_beta1=0.9 adam_beta2=0.999 max_grad_norm=4.0 learning_rate=0.0001 lr_schedule=constant lr_schedule_kl_threshold=0.008 lr_adaptive_min=1e-06 lr_adaptive_max=0.01 obs_subtract_mean=0.0 obs_scale=255.0 normalize_input=True normalize_input_keys=None decorrelate_experience_max_seconds=0 decorrelate_envs_on_one_worker=True actor_worker_gpus=[] set_workers_cpu_affinity=True force_envs_single_thread=False default_niceness=0 log_to_file=True experiment_summaries_interval=10 flush_summaries_interval=30 stats_avg=100 summaries_use_frameskip=True heartbeat_interval=20 heartbeat_reporting_interval=600 train_for_env_steps=4000000 train_for_seconds=10000000000 save_every_sec=120 keep_checkpoints=2 load_checkpoint_kind=latest save_milestones_sec=-1 save_best_every_sec=5 save_best_metric=reward save_best_after=100000 benchmark=False encoder_mlp_layers=[512, 512] encoder_conv_architecture=convnet_simple encoder_conv_mlp_layers=[512] use_rnn=True rnn_size=512 rnn_type=gru rnn_num_layers=1 decoder_mlp_layers=[] nonlinearity=elu policy_initialization=orthogonal policy_init_gain=1.0 actor_critic_share_weights=True adaptive_stddev=True continuous_tanh_scale=0.0 initial_stddev=1.0 use_env_info_cache=False env_gpu_actions=False env_gpu_observations=True env_frameskip=4 env_framestack=1 pixel_format=CHW use_record_episode_statistics=False with_wandb=False wandb_user=None wandb_project=sample_factory wandb_group=None wandb_job_type=SF wandb_tags=[] with_pbt=False pbt_mix_policies_in_one_env=True pbt_period_env_steps=5000000 pbt_start_mutation=20000000 pbt_replace_fraction=0.3 pbt_mutation_rate=0.15 pbt_replace_reward_gap=0.1 pbt_replace_reward_gap_absolute=1e-06 pbt_optimize_gamma=False pbt_target_objective=true_objective pbt_perturb_min=1.1 pbt_perturb_max=1.5 num_agents=-1 num_humans=0 num_bots=-1 start_bot_difficulty=None timelimit=None res_w=128 res_h=72 wide_aspect_ratio=False eval_env_frameskip=1 fps=35 command_line=--env=doom_health_gathering_supreme --num_workers=8 --num_envs_per_worker=4 --train_for_env_steps=4000000 cli_args={'env': 'doom_health_gathering_supreme', 'num_workers': 8, 'num_envs_per_worker': 4, 'train_for_env_steps': 4000000} git_hash=unknown git_repo_name=not a git repository [2024-09-01 06:32:14,627][00307] Saving configuration to /content/train_dir/default_experiment/config.json... [2024-09-01 06:32:14,636][00307] Rollout worker 0 uses device cpu [2024-09-01 06:32:14,638][00307] Rollout worker 1 uses device cpu [2024-09-01 06:32:14,639][00307] Rollout worker 2 uses device cpu [2024-09-01 06:32:14,645][00307] Rollout worker 3 uses device cpu [2024-09-01 06:32:14,646][00307] Rollout worker 4 uses device cpu [2024-09-01 06:32:14,648][00307] Rollout worker 5 uses device cpu [2024-09-01 06:32:14,650][00307] Rollout worker 6 uses device cpu [2024-09-01 06:32:14,652][00307] Rollout worker 7 uses device cpu [2024-09-01 06:32:14,816][00307] InferenceWorker_p0-w0: min num requests: 2 [2024-09-01 06:32:14,859][00307] Starting all processes... [2024-09-01 06:32:14,860][00307] Starting process learner_proc0 [2024-09-01 06:32:14,918][00307] Starting all processes... [2024-09-01 06:32:14,930][00307] Starting process inference_proc0-0 [2024-09-01 06:32:14,931][00307] Starting process rollout_proc0 [2024-09-01 06:32:14,932][00307] Starting process rollout_proc1 [2024-09-01 06:32:14,932][00307] Starting process rollout_proc2 [2024-09-01 06:32:14,932][00307] Starting process rollout_proc3 [2024-09-01 06:32:14,932][00307] Starting process rollout_proc4 [2024-09-01 06:32:14,932][00307] Starting process rollout_proc5 [2024-09-01 06:32:14,932][00307] Starting process rollout_proc6 [2024-09-01 06:32:14,932][00307] Starting process rollout_proc7 [2024-09-01 06:32:30,958][04801] Starting seed is not provided [2024-09-01 06:32:30,958][04801] Initializing actor-critic model on device cpu [2024-09-01 06:32:30,959][04801] RunningMeanStd input shape: (3, 72, 128) [2024-09-01 06:32:30,970][04801] RunningMeanStd input shape: (1,) [2024-09-01 06:32:31,128][04801] ConvEncoder: input_channels=3 [2024-09-01 06:32:31,184][04820] Worker 6 uses CPU cores [0] [2024-09-01 06:32:31,432][04817] Worker 1 uses CPU cores [1] [2024-09-01 06:32:31,551][04816] Worker 2 uses CPU cores [0] [2024-09-01 06:32:31,605][04818] Worker 3 uses CPU cores [1] [2024-09-01 06:32:31,642][04819] Worker 4 uses CPU cores [0] [2024-09-01 06:32:31,652][04821] Worker 5 uses CPU cores [1] [2024-09-01 06:32:31,673][04822] Worker 7 uses CPU cores [1] [2024-09-01 06:32:31,699][04815] Worker 0 uses CPU cores [0] [2024-09-01 06:32:31,803][04801] Conv encoder output size: 512 [2024-09-01 06:32:31,803][04801] Policy head output size: 512 [2024-09-01 06:32:31,829][04801] Created Actor Critic model with architecture: [2024-09-01 06:32:31,829][04801] ActorCriticSharedWeights( (obs_normalizer): ObservationNormalizer( (running_mean_std): RunningMeanStdDictInPlace( (running_mean_std): ModuleDict( (obs): RunningMeanStdInPlace() ) ) ) (returns_normalizer): RecursiveScriptModule(original_name=RunningMeanStdInPlace) (encoder): VizdoomEncoder( (basic_encoder): ConvEncoder( (enc): RecursiveScriptModule( original_name=ConvEncoderImpl (conv_head): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Conv2d) (1): RecursiveScriptModule(original_name=ELU) (2): RecursiveScriptModule(original_name=Conv2d) (3): RecursiveScriptModule(original_name=ELU) (4): RecursiveScriptModule(original_name=Conv2d) (5): RecursiveScriptModule(original_name=ELU) ) (mlp_layers): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Linear) (1): RecursiveScriptModule(original_name=ELU) ) ) ) ) (core): ModelCoreRNN( (core): GRU(512, 512) ) (decoder): MlpDecoder( (mlp): Identity() ) (critic_linear): Linear(in_features=512, out_features=1, bias=True) (action_parameterization): ActionParameterizationDefault( (distribution_linear): Linear(in_features=512, out_features=5, bias=True) ) ) [2024-09-01 06:32:32,452][04801] Using optimizer [2024-09-01 06:32:32,454][04801] No checkpoints found [2024-09-01 06:32:32,454][04801] Did not load from checkpoint, starting from scratch! [2024-09-01 06:32:32,455][04801] Initialized policy 0 weights for model version 0 [2024-09-01 06:32:32,458][04801] LearnerWorker_p0 finished initialization! [2024-09-01 06:32:32,467][04814] RunningMeanStd input shape: (3, 72, 128) [2024-09-01 06:32:32,470][04814] RunningMeanStd input shape: (1,) [2024-09-01 06:32:32,495][04814] ConvEncoder: input_channels=3 [2024-09-01 06:32:32,705][04814] Conv encoder output size: 512 [2024-09-01 06:32:32,706][04814] Policy head output size: 512 [2024-09-01 06:32:32,736][00307] Inference worker 0-0 is ready! [2024-09-01 06:32:32,738][00307] All inference workers are ready! Signal rollout workers to start! [2024-09-01 06:32:32,884][04815] Doom resolution: 160x120, resize resolution: (128, 72) [2024-09-01 06:32:32,888][04820] Doom resolution: 160x120, resize resolution: (128, 72) [2024-09-01 06:32:32,882][04816] Doom resolution: 160x120, resize resolution: (128, 72) [2024-09-01 06:32:32,889][04819] Doom resolution: 160x120, resize resolution: (128, 72) [2024-09-01 06:32:32,892][04817] Doom resolution: 160x120, resize resolution: (128, 72) [2024-09-01 06:32:32,897][04821] Doom resolution: 160x120, resize resolution: (128, 72) [2024-09-01 06:32:32,899][04822] Doom resolution: 160x120, resize resolution: (128, 72) [2024-09-01 06:32:32,908][04818] Doom resolution: 160x120, resize resolution: (128, 72) [2024-09-01 06:32:34,051][04818] Decorrelating experience for 0 frames... [2024-09-01 06:32:34,050][04815] Decorrelating experience for 0 frames... [2024-09-01 06:32:34,108][04822] Decorrelating experience for 0 frames... [2024-09-01 06:32:34,540][04815] Decorrelating experience for 32 frames... [2024-09-01 06:32:34,811][00307] Heartbeat connected on Batcher_0 [2024-09-01 06:32:34,814][00307] Heartbeat connected on LearnerWorker_p0 [2024-09-01 06:32:34,847][00307] Heartbeat connected on InferenceWorker_p0-w0 [2024-09-01 06:32:34,993][04818] Decorrelating experience for 32 frames... [2024-09-01 06:32:35,096][04822] Decorrelating experience for 32 frames... [2024-09-01 06:32:35,237][04815] Decorrelating experience for 64 frames... [2024-09-01 06:32:36,072][04818] Decorrelating experience for 64 frames... [2024-09-01 06:32:36,283][04822] Decorrelating experience for 64 frames... [2024-09-01 06:32:36,391][04819] Decorrelating experience for 0 frames... [2024-09-01 06:32:36,506][04816] Decorrelating experience for 0 frames... [2024-09-01 06:32:36,673][04815] Decorrelating experience for 96 frames... [2024-09-01 06:32:36,888][00307] Heartbeat connected on RolloutWorker_w0 [2024-09-01 06:32:37,046][00307] Fps is (10 sec: nan, 60 sec: nan, 300 sec: nan). Total num frames: 0. Throughput: 0: nan. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-09-01 06:32:37,241][04818] Decorrelating experience for 96 frames... [2024-09-01 06:32:37,431][00307] Heartbeat connected on RolloutWorker_w3 [2024-09-01 06:32:37,467][04822] Decorrelating experience for 96 frames... [2024-09-01 06:32:37,622][00307] Heartbeat connected on RolloutWorker_w7 [2024-09-01 06:32:38,118][04819] Decorrelating experience for 32 frames... [2024-09-01 06:32:38,690][04816] Decorrelating experience for 32 frames... [2024-09-01 06:32:42,049][00307] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 220.7. Samples: 1104. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-09-01 06:32:42,053][00307] Avg episode reward: [(0, '2.930')] [2024-09-01 06:32:42,324][04819] Decorrelating experience for 64 frames... [2024-09-01 06:32:43,242][04816] Decorrelating experience for 64 frames... [2024-09-01 06:32:47,046][00307] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 153.6. Samples: 1536. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-09-01 06:32:47,055][00307] Avg episode reward: [(0, '3.385')] [2024-09-01 06:32:47,961][04819] Decorrelating experience for 96 frames... [2024-09-01 06:32:48,680][00307] Heartbeat connected on RolloutWorker_w4 [2024-09-01 06:32:48,727][04801] Signal inference workers to stop experience collection... [2024-09-01 06:32:48,758][04814] InferenceWorker_p0-w0: stopping experience collection [2024-09-01 06:32:48,805][04816] Decorrelating experience for 96 frames... [2024-09-01 06:32:49,063][00307] Heartbeat connected on RolloutWorker_w2 [2024-09-01 06:32:49,189][04801] Signal inference workers to resume experience collection... [2024-09-01 06:32:49,191][04814] InferenceWorker_p0-w0: resuming experience collection [2024-09-01 06:32:52,046][00307] Fps is (10 sec: 409.7, 60 sec: 273.1, 300 sec: 273.1). Total num frames: 4096. Throughput: 0: 168.7. Samples: 2530. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2024-09-01 06:32:52,049][00307] Avg episode reward: [(0, '3.410')] [2024-09-01 06:32:57,046][00307] Fps is (10 sec: 819.2, 60 sec: 409.6, 300 sec: 409.6). Total num frames: 8192. Throughput: 0: 210.1. Samples: 4202. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2024-09-01 06:32:57,048][00307] Avg episode reward: [(0, '3.576')] [2024-09-01 06:33:02,046][00307] Fps is (10 sec: 819.2, 60 sec: 491.5, 300 sec: 491.5). Total num frames: 12288. Throughput: 0: 182.2. Samples: 4556. Policy #0 lag: (min: 1.0, avg: 1.1, max: 2.0) [2024-09-01 06:33:02,053][00307] Avg episode reward: [(0, '3.645')] [2024-09-01 06:33:07,046][00307] Fps is (10 sec: 819.2, 60 sec: 546.1, 300 sec: 546.1). Total num frames: 16384. Throughput: 0: 198.5. Samples: 5956. Policy #0 lag: (min: 1.0, avg: 1.1, max: 2.0) [2024-09-01 06:33:07,049][00307] Avg episode reward: [(0, '3.833')] [2024-09-01 06:33:12,046][00307] Fps is (10 sec: 1228.8, 60 sec: 702.2, 300 sec: 702.2). Total num frames: 24576. Throughput: 0: 214.1. Samples: 7492. Policy #0 lag: (min: 1.0, avg: 1.1, max: 2.0) [2024-09-01 06:33:12,060][00307] Avg episode reward: [(0, '3.930')] [2024-09-01 06:33:17,046][00307] Fps is (10 sec: 819.2, 60 sec: 614.4, 300 sec: 614.4). Total num frames: 24576. Throughput: 0: 198.3. Samples: 7932. Policy #0 lag: (min: 1.0, avg: 1.1, max: 2.0) [2024-09-01 06:33:17,049][00307] Avg episode reward: [(0, '4.062')] [2024-09-01 06:33:22,047][00307] Fps is (10 sec: 409.6, 60 sec: 637.2, 300 sec: 637.2). Total num frames: 28672. Throughput: 0: 205.3. Samples: 9238. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) [2024-09-01 06:33:22,050][00307] Avg episode reward: [(0, '4.058')] [2024-09-01 06:33:27,046][00307] Fps is (10 sec: 1228.8, 60 sec: 737.3, 300 sec: 737.3). Total num frames: 36864. Throughput: 0: 213.1. Samples: 10694. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) [2024-09-01 06:33:27,049][00307] Avg episode reward: [(0, '4.098')] [2024-09-01 06:33:30,210][04814] Updated weights for policy 0, policy_version 10 (0.1260) [2024-09-01 06:33:32,046][00307] Fps is (10 sec: 1228.8, 60 sec: 744.7, 300 sec: 744.7). Total num frames: 40960. Throughput: 0: 220.6. Samples: 11464. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) [2024-09-01 06:33:32,054][00307] Avg episode reward: [(0, '4.198')] [2024-09-01 06:33:37,046][00307] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 750.9). Total num frames: 45056. Throughput: 0: 222.0. Samples: 12518. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) [2024-09-01 06:33:37,054][00307] Avg episode reward: [(0, '4.202')] [2024-09-01 06:33:42,046][00307] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 756.2). Total num frames: 49152. Throughput: 0: 216.3. Samples: 13936. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) [2024-09-01 06:33:42,054][00307] Avg episode reward: [(0, '4.294')] [2024-09-01 06:33:47,046][00307] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 760.7). Total num frames: 53248. Throughput: 0: 223.5. Samples: 14614. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 06:33:47,051][00307] Avg episode reward: [(0, '4.265')] [2024-09-01 06:33:52,046][00307] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 764.6). Total num frames: 57344. Throughput: 0: 220.5. Samples: 15878. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 06:33:52,058][00307] Avg episode reward: [(0, '4.265')] [2024-09-01 06:33:57,046][00307] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 768.0). Total num frames: 61440. Throughput: 0: 216.0. Samples: 17212. Policy #0 lag: (min: 1.0, avg: 1.2, max: 2.0) [2024-09-01 06:33:57,051][00307] Avg episode reward: [(0, '4.275')] [2024-09-01 06:34:02,046][00307] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 771.0). Total num frames: 65536. Throughput: 0: 222.7. Samples: 17954. Policy #0 lag: (min: 1.0, avg: 1.2, max: 2.0) [2024-09-01 06:34:02,051][00307] Avg episode reward: [(0, '4.296')] [2024-09-01 06:34:07,046][00307] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 773.7). Total num frames: 69632. Throughput: 0: 224.9. Samples: 19358. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 06:34:07,048][00307] Avg episode reward: [(0, '4.329')] [2024-09-01 06:34:12,046][00307] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 776.1). Total num frames: 73728. Throughput: 0: 221.7. Samples: 20672. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 06:34:12,054][00307] Avg episode reward: [(0, '4.347')] [2024-09-01 06:34:17,046][00307] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 778.2). Total num frames: 77824. Throughput: 0: 218.9. Samples: 21316. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) [2024-09-01 06:34:17,048][00307] Avg episode reward: [(0, '4.376')] [2024-09-01 06:34:17,118][04801] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000020_81920.pth... [2024-09-01 06:34:17,123][04814] Updated weights for policy 0, policy_version 20 (0.1022) [2024-09-01 06:34:22,046][00307] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 819.2). Total num frames: 86016. Throughput: 0: 227.2. Samples: 22744. Policy #0 lag: (min: 1.0, avg: 1.2, max: 2.0) [2024-09-01 06:34:22,050][00307] Avg episode reward: [(0, '4.497')] [2024-09-01 06:34:27,046][00307] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 782.0). Total num frames: 86016. Throughput: 0: 220.0. Samples: 23836. Policy #0 lag: (min: 1.0, avg: 1.2, max: 2.0) [2024-09-01 06:34:27,054][00307] Avg episode reward: [(0, '4.517')] [2024-09-01 06:34:32,046][00307] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 819.2). Total num frames: 94208. Throughput: 0: 221.6. Samples: 24588. Policy #0 lag: (min: 1.0, avg: 1.2, max: 2.0) [2024-09-01 06:34:32,050][00307] Avg episode reward: [(0, '4.592')] [2024-09-01 06:34:37,046][00307] Fps is (10 sec: 1228.8, 60 sec: 887.5, 300 sec: 819.2). Total num frames: 98304. Throughput: 0: 225.8. Samples: 26040. Policy #0 lag: (min: 1.0, avg: 1.2, max: 2.0) [2024-09-01 06:34:37,050][00307] Avg episode reward: [(0, '4.547')] [2024-09-01 06:34:42,046][00307] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 819.2). Total num frames: 102400. Throughput: 0: 220.1. Samples: 27116. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) [2024-09-01 06:34:42,056][00307] Avg episode reward: [(0, '4.481')] [2024-09-01 06:34:45,297][04801] Saving new best policy, reward=4.481! [2024-09-01 06:34:47,048][00307] Fps is (10 sec: 819.0, 60 sec: 887.4, 300 sec: 819.2). Total num frames: 106496. Throughput: 0: 222.2. Samples: 27954. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) [2024-09-01 06:34:47,056][00307] Avg episode reward: [(0, '4.419')] [2024-09-01 06:34:52,046][00307] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 819.2). Total num frames: 110592. Throughput: 0: 225.6. Samples: 29512. Policy #0 lag: (min: 1.0, avg: 1.2, max: 2.0) [2024-09-01 06:34:52,054][00307] Avg episode reward: [(0, '4.413')] [2024-09-01 06:34:57,046][00307] Fps is (10 sec: 819.4, 60 sec: 887.5, 300 sec: 819.2). Total num frames: 114688. Throughput: 0: 221.4. Samples: 30636. Policy #0 lag: (min: 1.0, avg: 1.2, max: 2.0) [2024-09-01 06:34:57,058][00307] Avg episode reward: [(0, '4.383')] [2024-09-01 06:35:02,046][00307] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 819.2). Total num frames: 118784. Throughput: 0: 217.9. Samples: 31120. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 06:35:02,052][00307] Avg episode reward: [(0, '4.296')] [2024-09-01 06:35:03,411][04814] Updated weights for policy 0, policy_version 30 (0.1001) [2024-09-01 06:35:07,046][00307] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 819.2). Total num frames: 122880. Throughput: 0: 224.2. Samples: 32834. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 06:35:07,056][00307] Avg episode reward: [(0, '4.225')] [2024-09-01 06:35:12,046][00307] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 819.2). Total num frames: 126976. Throughput: 0: 226.1. Samples: 34010. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) [2024-09-01 06:35:12,056][00307] Avg episode reward: [(0, '4.193')] [2024-09-01 06:35:17,046][00307] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 819.2). Total num frames: 131072. Throughput: 0: 220.5. Samples: 34512. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) [2024-09-01 06:35:17,054][00307] Avg episode reward: [(0, '4.151')] [2024-09-01 06:35:22,046][00307] Fps is (10 sec: 1228.8, 60 sec: 887.5, 300 sec: 844.0). Total num frames: 139264. Throughput: 0: 222.5. Samples: 36052. Policy #0 lag: (min: 1.0, avg: 1.2, max: 2.0) [2024-09-01 06:35:22,052][00307] Avg episode reward: [(0, '4.206')] [2024-09-01 06:35:27,046][00307] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 843.3). Total num frames: 143360. Throughput: 0: 231.2. Samples: 37518. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) [2024-09-01 06:35:27,052][00307] Avg episode reward: [(0, '4.157')] [2024-09-01 06:35:32,046][00307] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 842.6). Total num frames: 147456. Throughput: 0: 224.9. Samples: 38076. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) [2024-09-01 06:35:32,061][00307] Avg episode reward: [(0, '4.177')] [2024-09-01 06:35:37,046][00307] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 842.0). Total num frames: 151552. Throughput: 0: 217.6. Samples: 39304. Policy #0 lag: (min: 1.0, avg: 1.2, max: 2.0) [2024-09-01 06:35:37,049][00307] Avg episode reward: [(0, '4.312')] [2024-09-01 06:35:42,046][00307] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 841.3). Total num frames: 155648. Throughput: 0: 225.2. Samples: 40768. Policy #0 lag: (min: 1.0, avg: 1.2, max: 2.0) [2024-09-01 06:35:42,049][00307] Avg episode reward: [(0, '4.367')] [2024-09-01 06:35:47,046][00307] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 840.8). Total num frames: 159744. Throughput: 0: 228.6. Samples: 41406. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 06:35:47,053][00307] Avg episode reward: [(0, '4.360')] [2024-09-01 06:35:49,882][04814] Updated weights for policy 0, policy_version 40 (0.0538) [2024-09-01 06:35:52,046][00307] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 840.2). Total num frames: 163840. Throughput: 0: 217.3. Samples: 42614. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 06:35:52,052][00307] Avg episode reward: [(0, '4.431')] [2024-09-01 06:35:57,046][00307] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 839.7). Total num frames: 167936. Throughput: 0: 229.9. Samples: 44354. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) [2024-09-01 06:35:57,055][00307] Avg episode reward: [(0, '4.412')] [2024-09-01 06:36:02,046][00307] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 839.2). Total num frames: 172032. Throughput: 0: 235.3. Samples: 45102. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) [2024-09-01 06:36:02,049][00307] Avg episode reward: [(0, '4.415')] [2024-09-01 06:36:07,046][00307] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 838.7). Total num frames: 176128. Throughput: 0: 216.0. Samples: 45774. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) [2024-09-01 06:36:07,049][00307] Avg episode reward: [(0, '4.500')] [2024-09-01 06:36:07,837][04801] Saving new best policy, reward=4.500! [2024-09-01 06:36:12,046][00307] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 857.3). Total num frames: 184320. Throughput: 0: 219.6. Samples: 47400. Policy #0 lag: (min: 1.0, avg: 1.2, max: 2.0) [2024-09-01 06:36:12,054][00307] Avg episode reward: [(0, '4.486')] [2024-09-01 06:36:16,073][04801] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000046_188416.pth... [2024-09-01 06:36:17,047][00307] Fps is (10 sec: 1228.7, 60 sec: 955.7, 300 sec: 856.4). Total num frames: 188416. Throughput: 0: 225.0. Samples: 48202. Policy #0 lag: (min: 1.0, avg: 1.2, max: 2.0) [2024-09-01 06:36:17,052][00307] Avg episode reward: [(0, '4.546')] [2024-09-01 06:36:22,046][00307] Fps is (10 sec: 409.6, 60 sec: 819.2, 300 sec: 837.4). Total num frames: 188416. Throughput: 0: 224.3. Samples: 49396. Policy #0 lag: (min: 1.0, avg: 1.2, max: 2.0) [2024-09-01 06:36:22,053][00307] Avg episode reward: [(0, '4.441')] [2024-09-01 06:36:22,096][04801] Saving new best policy, reward=4.546! [2024-09-01 06:36:27,046][00307] Fps is (10 sec: 819.3, 60 sec: 887.5, 300 sec: 854.8). Total num frames: 196608. Throughput: 0: 217.7. Samples: 50564. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) [2024-09-01 06:36:27,049][00307] Avg episode reward: [(0, '4.392')] [2024-09-01 06:36:32,046][00307] Fps is (10 sec: 1228.8, 60 sec: 887.5, 300 sec: 854.1). Total num frames: 200704. Throughput: 0: 223.5. Samples: 51464. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) [2024-09-01 06:36:32,049][00307] Avg episode reward: [(0, '4.385')] [2024-09-01 06:36:35,001][04814] Updated weights for policy 0, policy_version 50 (0.1674) [2024-09-01 06:36:37,046][00307] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 853.3). Total num frames: 204800. Throughput: 0: 223.5. Samples: 52672. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) [2024-09-01 06:36:37,049][00307] Avg episode reward: [(0, '4.349')] [2024-09-01 06:36:39,771][04801] Signal inference workers to stop experience collection... (50 times) [2024-09-01 06:36:39,799][04814] InferenceWorker_p0-w0: stopping experience collection (50 times) [2024-09-01 06:36:40,363][04801] Signal inference workers to resume experience collection... (50 times) [2024-09-01 06:36:40,365][04814] InferenceWorker_p0-w0: resuming experience collection (50 times) [2024-09-01 06:36:42,046][00307] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 852.6). Total num frames: 208896. Throughput: 0: 215.9. Samples: 54068. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) [2024-09-01 06:36:42,049][00307] Avg episode reward: [(0, '4.409')] [2024-09-01 06:36:47,046][00307] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 852.0). Total num frames: 212992. Throughput: 0: 210.5. Samples: 54574. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) [2024-09-01 06:36:47,049][00307] Avg episode reward: [(0, '4.504')] [2024-09-01 06:36:52,049][00307] Fps is (10 sec: 819.0, 60 sec: 887.4, 300 sec: 851.3). Total num frames: 217088. Throughput: 0: 231.1. Samples: 56176. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) [2024-09-01 06:36:52,054][00307] Avg episode reward: [(0, '4.488')] [2024-09-01 06:36:57,046][00307] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 850.7). Total num frames: 221184. Throughput: 0: 222.2. Samples: 57400. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) [2024-09-01 06:36:57,052][00307] Avg episode reward: [(0, '4.590')] [2024-09-01 06:36:58,239][04801] Saving new best policy, reward=4.590! [2024-09-01 06:37:02,046][00307] Fps is (10 sec: 819.4, 60 sec: 887.5, 300 sec: 850.1). Total num frames: 225280. Throughput: 0: 220.8. Samples: 58138. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) [2024-09-01 06:37:02,058][00307] Avg episode reward: [(0, '4.570')] [2024-09-01 06:37:07,048][00307] Fps is (10 sec: 1228.6, 60 sec: 955.7, 300 sec: 864.7). Total num frames: 233472. Throughput: 0: 227.6. Samples: 59638. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) [2024-09-01 06:37:07,051][00307] Avg episode reward: [(0, '4.629')] [2024-09-01 06:37:12,046][00307] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 849.0). Total num frames: 233472. Throughput: 0: 223.4. Samples: 60618. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) [2024-09-01 06:37:12,051][00307] Avg episode reward: [(0, '4.599')] [2024-09-01 06:37:12,466][04801] Saving new best policy, reward=4.629! [2024-09-01 06:37:17,046][00307] Fps is (10 sec: 819.4, 60 sec: 887.5, 300 sec: 863.1). Total num frames: 241664. Throughput: 0: 223.1. Samples: 61504. Policy #0 lag: (min: 1.0, avg: 1.2, max: 2.0) [2024-09-01 06:37:17,049][00307] Avg episode reward: [(0, '4.586')] [2024-09-01 06:37:20,281][04814] Updated weights for policy 0, policy_version 60 (0.1080) [2024-09-01 06:37:22,046][00307] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 862.3). Total num frames: 245760. Throughput: 0: 228.2. Samples: 62940. Policy #0 lag: (min: 1.0, avg: 1.2, max: 2.0) [2024-09-01 06:37:22,055][00307] Avg episode reward: [(0, '4.757')] [2024-09-01 06:37:25,355][04801] Saving new best policy, reward=4.757! [2024-09-01 06:37:27,046][00307] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 861.6). Total num frames: 249856. Throughput: 0: 221.5. Samples: 64034. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) [2024-09-01 06:37:27,050][00307] Avg episode reward: [(0, '4.803')] [2024-09-01 06:37:30,324][04801] Saving new best policy, reward=4.803! [2024-09-01 06:37:32,046][00307] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 860.9). Total num frames: 253952. Throughput: 0: 227.7. Samples: 64822. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) [2024-09-01 06:37:32,049][00307] Avg episode reward: [(0, '4.721')] [2024-09-01 06:37:37,046][00307] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 258048. Throughput: 0: 227.5. Samples: 66412. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) [2024-09-01 06:37:37,049][00307] Avg episode reward: [(0, '4.723')] [2024-09-01 06:37:42,046][00307] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 262144. Throughput: 0: 226.1. Samples: 67576. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) [2024-09-01 06:37:42,053][00307] Avg episode reward: [(0, '4.727')] [2024-09-01 06:37:47,047][00307] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 266240. Throughput: 0: 219.9. Samples: 68032. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 06:37:47,050][00307] Avg episode reward: [(0, '4.677')] [2024-09-01 06:37:52,046][00307] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 270336. Throughput: 0: 224.1. Samples: 69720. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 06:37:52,048][00307] Avg episode reward: [(0, '4.616')] [2024-09-01 06:37:57,046][00307] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 902.5). Total num frames: 278528. Throughput: 0: 230.4. Samples: 70986. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) [2024-09-01 06:37:57,053][00307] Avg episode reward: [(0, '4.600')] [2024-09-01 06:38:02,046][00307] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 278528. Throughput: 0: 223.9. Samples: 71578. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) [2024-09-01 06:38:02,056][00307] Avg episode reward: [(0, '4.721')] [2024-09-01 06:38:06,090][04814] Updated weights for policy 0, policy_version 70 (0.0047) [2024-09-01 06:38:07,046][00307] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 286720. Throughput: 0: 222.8. Samples: 72966. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) [2024-09-01 06:38:07,050][00307] Avg episode reward: [(0, '4.574')] [2024-09-01 06:38:12,046][00307] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 902.5). Total num frames: 290816. Throughput: 0: 235.1. Samples: 74614. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) [2024-09-01 06:38:12,049][00307] Avg episode reward: [(0, '4.490')] [2024-09-01 06:38:16,895][04801] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000072_294912.pth... [2024-09-01 06:38:17,047][00307] Fps is (10 sec: 819.1, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 294912. Throughput: 0: 227.2. Samples: 75048. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) [2024-09-01 06:38:17,077][00307] Avg episode reward: [(0, '4.539')] [2024-09-01 06:38:17,319][04801] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000020_81920.pth [2024-09-01 06:38:22,046][00307] Fps is (10 sec: 409.6, 60 sec: 819.2, 300 sec: 874.7). Total num frames: 294912. Throughput: 0: 199.6. Samples: 75394. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) [2024-09-01 06:38:22,056][00307] Avg episode reward: [(0, '4.563')] [2024-09-01 06:38:27,047][00307] Fps is (10 sec: 409.6, 60 sec: 819.2, 300 sec: 874.7). Total num frames: 299008. Throughput: 0: 202.2. Samples: 76676. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) [2024-09-01 06:38:27,051][00307] Avg episode reward: [(0, '4.573')] [2024-09-01 06:38:32,046][00307] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 874.7). Total num frames: 303104. Throughput: 0: 206.6. Samples: 77330. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) [2024-09-01 06:38:32,049][00307] Avg episode reward: [(0, '4.518')] [2024-09-01 06:38:37,049][00307] Fps is (10 sec: 819.0, 60 sec: 819.2, 300 sec: 874.7). Total num frames: 307200. Throughput: 0: 197.9. Samples: 78626. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) [2024-09-01 06:38:37,055][00307] Avg episode reward: [(0, '4.552')] [2024-09-01 06:38:42,046][00307] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 874.7). Total num frames: 311296. Throughput: 0: 202.5. Samples: 80100. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) [2024-09-01 06:38:42,048][00307] Avg episode reward: [(0, '4.670')] [2024-09-01 06:38:47,047][00307] Fps is (10 sec: 819.4, 60 sec: 819.2, 300 sec: 874.7). Total num frames: 315392. Throughput: 0: 200.0. Samples: 80580. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) [2024-09-01 06:38:47,054][00307] Avg episode reward: [(0, '4.684')] [2024-09-01 06:38:52,046][00307] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 874.7). Total num frames: 319488. Throughput: 0: 203.4. Samples: 82118. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) [2024-09-01 06:38:52,051][00307] Avg episode reward: [(0, '4.668')] [2024-09-01 06:38:57,046][00307] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 874.7). Total num frames: 323584. Throughput: 0: 193.1. Samples: 83304. Policy #0 lag: (min: 1.0, avg: 1.2, max: 2.0) [2024-09-01 06:38:57,054][00307] Avg episode reward: [(0, '4.670')] [2024-09-01 06:38:57,426][04814] Updated weights for policy 0, policy_version 80 (0.1538) [2024-09-01 06:39:02,046][00307] Fps is (10 sec: 1228.8, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 331776. Throughput: 0: 201.2. Samples: 84102. Policy #0 lag: (min: 1.0, avg: 1.2, max: 2.0) [2024-09-01 06:39:02,049][00307] Avg episode reward: [(0, '4.742')] [2024-09-01 06:39:07,049][00307] Fps is (10 sec: 1228.5, 60 sec: 819.2, 300 sec: 888.6). Total num frames: 335872. Throughput: 0: 220.8. Samples: 85332. Policy #0 lag: (min: 1.0, avg: 1.2, max: 2.0) [2024-09-01 06:39:07,057][00307] Avg episode reward: [(0, '4.626')] [2024-09-01 06:39:12,047][00307] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 888.6). Total num frames: 339968. Throughput: 0: 211.8. Samples: 86208. Policy #0 lag: (min: 1.0, avg: 1.2, max: 2.0) [2024-09-01 06:39:12,049][00307] Avg episode reward: [(0, '4.626')] [2024-09-01 06:39:17,046][00307] Fps is (10 sec: 819.4, 60 sec: 819.2, 300 sec: 874.7). Total num frames: 344064. Throughput: 0: 219.0. Samples: 87184. Policy #0 lag: (min: 1.0, avg: 1.2, max: 2.0) [2024-09-01 06:39:17,051][00307] Avg episode reward: [(0, '4.579')] [2024-09-01 06:39:22,052][00307] Fps is (10 sec: 818.7, 60 sec: 887.4, 300 sec: 888.6). Total num frames: 348160. Throughput: 0: 216.3. Samples: 88358. Policy #0 lag: (min: 1.0, avg: 1.1, max: 2.0) [2024-09-01 06:39:22,055][00307] Avg episode reward: [(0, '4.557')] [2024-09-01 06:39:27,046][00307] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 352256. Throughput: 0: 210.8. Samples: 89588. Policy #0 lag: (min: 1.0, avg: 1.1, max: 2.0) [2024-09-01 06:39:27,056][00307] Avg episode reward: [(0, '4.511')] [2024-09-01 06:39:32,046][00307] Fps is (10 sec: 819.7, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 356352. Throughput: 0: 217.8. Samples: 90382. Policy #0 lag: (min: 1.0, avg: 1.2, max: 2.0) [2024-09-01 06:39:32,055][00307] Avg episode reward: [(0, '4.439')] [2024-09-01 06:39:37,046][00307] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 360448. Throughput: 0: 215.8. Samples: 91830. Policy #0 lag: (min: 1.0, avg: 1.2, max: 2.0) [2024-09-01 06:39:37,049][00307] Avg episode reward: [(0, '4.391')] [2024-09-01 06:39:42,047][00307] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 364544. Throughput: 0: 215.6. Samples: 93008. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) [2024-09-01 06:39:42,050][00307] Avg episode reward: [(0, '4.343')] [2024-09-01 06:39:45,083][04814] Updated weights for policy 0, policy_version 90 (0.1511) [2024-09-01 06:39:47,046][00307] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 368640. Throughput: 0: 208.0. Samples: 93460. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) [2024-09-01 06:39:47,049][00307] Avg episode reward: [(0, '4.281')] [2024-09-01 06:39:52,046][00307] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 372736. Throughput: 0: 215.4. Samples: 95026. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) [2024-09-01 06:39:52,052][00307] Avg episode reward: [(0, '4.307')] [2024-09-01 06:39:57,046][00307] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 376832. Throughput: 0: 227.5. Samples: 96446. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) [2024-09-01 06:39:57,049][00307] Avg episode reward: [(0, '4.295')] [2024-09-01 06:40:02,046][00307] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 874.7). Total num frames: 380928. Throughput: 0: 214.4. Samples: 96834. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) [2024-09-01 06:40:02,049][00307] Avg episode reward: [(0, '4.377')] [2024-09-01 06:40:07,047][00307] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 874.7). Total num frames: 385024. Throughput: 0: 221.7. Samples: 98332. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) [2024-09-01 06:40:07,054][00307] Avg episode reward: [(0, '4.394')] [2024-09-01 06:40:12,046][00307] Fps is (10 sec: 1228.8, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 393216. Throughput: 0: 210.4. Samples: 99056. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 06:40:12,052][00307] Avg episode reward: [(0, '4.365')] [2024-09-01 06:40:17,046][00307] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 860.9). Total num frames: 393216. Throughput: 0: 216.7. Samples: 100134. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 06:40:17,052][00307] Avg episode reward: [(0, '4.329')] [2024-09-01 06:40:17,859][04801] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000097_397312.pth... [2024-09-01 06:40:18,008][04801] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000046_188416.pth [2024-09-01 06:40:22,047][00307] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 401408. Throughput: 0: 216.2. Samples: 101560. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) [2024-09-01 06:40:22,049][00307] Avg episode reward: [(0, '4.398')] [2024-09-01 06:40:27,046][00307] Fps is (10 sec: 1228.8, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 405504. Throughput: 0: 223.1. Samples: 103046. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) [2024-09-01 06:40:27,049][00307] Avg episode reward: [(0, '4.355')] [2024-09-01 06:40:30,666][04814] Updated weights for policy 0, policy_version 100 (0.1213) [2024-09-01 06:40:32,051][00307] Fps is (10 sec: 818.8, 60 sec: 887.4, 300 sec: 874.7). Total num frames: 409600. Throughput: 0: 226.7. Samples: 103664. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) [2024-09-01 06:40:32,054][00307] Avg episode reward: [(0, '4.384')] [2024-09-01 06:40:35,692][04801] Signal inference workers to stop experience collection... (100 times) [2024-09-01 06:40:35,724][04814] InferenceWorker_p0-w0: stopping experience collection (100 times) [2024-09-01 06:40:36,196][04801] Signal inference workers to resume experience collection... (100 times) [2024-09-01 06:40:36,198][04814] InferenceWorker_p0-w0: resuming experience collection (100 times) [2024-09-01 06:40:37,046][00307] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 413696. Throughput: 0: 216.8. Samples: 104780. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) [2024-09-01 06:40:37,049][00307] Avg episode reward: [(0, '4.347')] [2024-09-01 06:40:42,046][00307] Fps is (10 sec: 819.6, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 417792. Throughput: 0: 218.8. Samples: 106294. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) [2024-09-01 06:40:42,054][00307] Avg episode reward: [(0, '4.408')] [2024-09-01 06:40:47,046][00307] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 421888. Throughput: 0: 220.2. Samples: 106742. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) [2024-09-01 06:40:47,049][00307] Avg episode reward: [(0, '4.377')] [2024-09-01 06:40:52,046][00307] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 425984. Throughput: 0: 212.6. Samples: 107898. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) [2024-09-01 06:40:52,050][00307] Avg episode reward: [(0, '4.373')] [2024-09-01 06:40:57,046][00307] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 430080. Throughput: 0: 232.1. Samples: 109502. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2024-09-01 06:40:57,053][00307] Avg episode reward: [(0, '4.432')] [2024-09-01 06:41:02,046][00307] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 434176. Throughput: 0: 224.2. Samples: 110224. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2024-09-01 06:41:02,054][00307] Avg episode reward: [(0, '4.479')] [2024-09-01 06:41:07,046][00307] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 860.9). Total num frames: 438272. Throughput: 0: 216.9. Samples: 111320. Policy #0 lag: (min: 1.0, avg: 1.2, max: 2.0) [2024-09-01 06:41:07,055][00307] Avg episode reward: [(0, '4.489')] [2024-09-01 06:41:12,046][00307] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 860.9). Total num frames: 442368. Throughput: 0: 218.2. Samples: 112864. Policy #0 lag: (min: 1.0, avg: 1.2, max: 2.0) [2024-09-01 06:41:12,049][00307] Avg episode reward: [(0, '4.561')] [2024-09-01 06:41:17,046][00307] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 446464. Throughput: 0: 219.0. Samples: 113516. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) [2024-09-01 06:41:17,050][00307] Avg episode reward: [(0, '4.600')] [2024-09-01 06:41:17,362][04814] Updated weights for policy 0, policy_version 110 (0.0603) [2024-09-01 06:41:22,046][00307] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 860.9). Total num frames: 450560. Throughput: 0: 223.3. Samples: 114830. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) [2024-09-01 06:41:22,053][00307] Avg episode reward: [(0, '4.725')] [2024-09-01 06:41:27,047][00307] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 860.9). Total num frames: 454656. Throughput: 0: 216.5. Samples: 116038. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) [2024-09-01 06:41:27,055][00307] Avg episode reward: [(0, '4.653')] [2024-09-01 06:41:32,046][00307] Fps is (10 sec: 1228.8, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 462848. Throughput: 0: 224.6. Samples: 116850. Policy #0 lag: (min: 1.0, avg: 1.2, max: 2.0) [2024-09-01 06:41:32,054][00307] Avg episode reward: [(0, '4.709')] [2024-09-01 06:41:37,049][00307] Fps is (10 sec: 1228.5, 60 sec: 887.4, 300 sec: 874.7). Total num frames: 466944. Throughput: 0: 225.1. Samples: 118030. Policy #0 lag: (min: 1.0, avg: 1.2, max: 2.0) [2024-09-01 06:41:37,051][00307] Avg episode reward: [(0, '4.812')] [2024-09-01 06:41:41,549][04801] Saving new best policy, reward=4.812! [2024-09-01 06:41:42,046][00307] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 471040. Throughput: 0: 212.8. Samples: 119078. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) [2024-09-01 06:41:42,055][00307] Avg episode reward: [(0, '4.812')] [2024-09-01 06:41:47,046][00307] Fps is (10 sec: 819.4, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 475136. Throughput: 0: 219.0. Samples: 120080. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) [2024-09-01 06:41:47,054][00307] Avg episode reward: [(0, '4.810')] [2024-09-01 06:41:52,047][00307] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 479232. Throughput: 0: 225.2. Samples: 121456. Policy #0 lag: (min: 1.0, avg: 1.2, max: 2.0) [2024-09-01 06:41:52,062][00307] Avg episode reward: [(0, '4.817')] [2024-09-01 06:41:55,244][04801] Saving new best policy, reward=4.817! [2024-09-01 06:41:57,046][00307] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 483328. Throughput: 0: 212.5. Samples: 122426. Policy #0 lag: (min: 1.0, avg: 1.2, max: 2.0) [2024-09-01 06:41:57,052][00307] Avg episode reward: [(0, '4.819')] [2024-09-01 06:42:00,066][04801] Saving new best policy, reward=4.819! [2024-09-01 06:42:02,046][00307] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 860.9). Total num frames: 487424. Throughput: 0: 215.3. Samples: 123204. Policy #0 lag: (min: 1.0, avg: 1.2, max: 2.0) [2024-09-01 06:42:02,052][00307] Avg episode reward: [(0, '4.882')] [2024-09-01 06:42:03,931][04801] Saving new best policy, reward=4.882! [2024-09-01 06:42:03,939][04814] Updated weights for policy 0, policy_version 120 (0.0059) [2024-09-01 06:42:07,046][00307] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 491520. Throughput: 0: 221.1. Samples: 124780. Policy #0 lag: (min: 1.0, avg: 1.2, max: 2.0) [2024-09-01 06:42:07,049][00307] Avg episode reward: [(0, '4.867')] [2024-09-01 06:42:12,049][00307] Fps is (10 sec: 819.0, 60 sec: 887.4, 300 sec: 860.8). Total num frames: 495616. Throughput: 0: 218.7. Samples: 125882. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) [2024-09-01 06:42:12,059][00307] Avg episode reward: [(0, '4.933')] [2024-09-01 06:42:12,088][00307] Components not started: RolloutWorker_w1, RolloutWorker_w5, RolloutWorker_w6, wait_time=600.1 seconds [2024-09-01 06:42:14,299][04801] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000122_499712.pth... [2024-09-01 06:42:14,410][04801] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000072_294912.pth [2024-09-01 06:42:14,422][04801] Saving new best policy, reward=4.933! [2024-09-01 06:42:17,047][00307] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 860.9). Total num frames: 499712. Throughput: 0: 212.5. Samples: 126414. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) [2024-09-01 06:42:17,055][00307] Avg episode reward: [(0, '4.976')] [2024-09-01 06:42:22,046][00307] Fps is (10 sec: 819.4, 60 sec: 887.5, 300 sec: 860.9). Total num frames: 503808. Throughput: 0: 217.6. Samples: 127820. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) [2024-09-01 06:42:22,049][00307] Avg episode reward: [(0, '4.923')] [2024-09-01 06:42:22,577][04801] Saving new best policy, reward=4.976! [2024-09-01 06:42:27,046][00307] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 860.9). Total num frames: 507904. Throughput: 0: 226.7. Samples: 129278. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) [2024-09-01 06:42:27,050][00307] Avg episode reward: [(0, '4.930')] [2024-09-01 06:42:32,047][00307] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 860.9). Total num frames: 512000. Throughput: 0: 211.5. Samples: 129596. Policy #0 lag: (min: 1.0, avg: 1.2, max: 2.0) [2024-09-01 06:42:32,050][00307] Avg episode reward: [(0, '4.953')] [2024-09-01 06:42:37,046][00307] Fps is (10 sec: 1228.8, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 520192. Throughput: 0: 217.6. Samples: 131248. Policy #0 lag: (min: 1.0, avg: 1.2, max: 2.0) [2024-09-01 06:42:37,053][00307] Avg episode reward: [(0, '4.908')] [2024-09-01 06:42:42,051][00307] Fps is (10 sec: 1228.2, 60 sec: 887.4, 300 sec: 874.7). Total num frames: 524288. Throughput: 0: 223.8. Samples: 132498. Policy #0 lag: (min: 1.0, avg: 1.2, max: 2.0) [2024-09-01 06:42:42,059][00307] Avg episode reward: [(0, '4.918')] [2024-09-01 06:42:47,046][00307] Fps is (10 sec: 409.6, 60 sec: 819.2, 300 sec: 860.9). Total num frames: 524288. Throughput: 0: 219.6. Samples: 133084. Policy #0 lag: (min: 1.0, avg: 1.2, max: 2.0) [2024-09-01 06:42:47,050][00307] Avg episode reward: [(0, '4.826')] [2024-09-01 06:42:51,731][04814] Updated weights for policy 0, policy_version 130 (0.1524) [2024-09-01 06:42:52,046][00307] Fps is (10 sec: 819.6, 60 sec: 887.5, 300 sec: 860.9). Total num frames: 532480. Throughput: 0: 215.2. Samples: 134464. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) [2024-09-01 06:42:52,049][00307] Avg episode reward: [(0, '4.746')] [2024-09-01 06:42:57,046][00307] Fps is (10 sec: 1228.8, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 536576. Throughput: 0: 220.7. Samples: 135812. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) [2024-09-01 06:42:57,049][00307] Avg episode reward: [(0, '4.652')] [2024-09-01 06:43:02,054][00307] Fps is (10 sec: 818.6, 60 sec: 887.4, 300 sec: 860.8). Total num frames: 540672. Throughput: 0: 221.4. Samples: 136380. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) [2024-09-01 06:43:02,057][00307] Avg episode reward: [(0, '4.617')] [2024-09-01 06:43:07,046][00307] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 860.9). Total num frames: 544768. Throughput: 0: 215.9. Samples: 137534. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) [2024-09-01 06:43:07,049][00307] Avg episode reward: [(0, '4.541')] [2024-09-01 06:43:12,046][00307] Fps is (10 sec: 819.8, 60 sec: 887.5, 300 sec: 860.9). Total num frames: 548864. Throughput: 0: 223.5. Samples: 139336. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) [2024-09-01 06:43:12,050][00307] Avg episode reward: [(0, '4.449')] [2024-09-01 06:43:17,047][00307] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 552960. Throughput: 0: 225.9. Samples: 139760. Policy #0 lag: (min: 1.0, avg: 1.2, max: 2.0) [2024-09-01 06:43:17,050][00307] Avg episode reward: [(0, '4.426')] [2024-09-01 06:43:22,046][00307] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 557056. Throughput: 0: 214.4. Samples: 140894. Policy #0 lag: (min: 1.0, avg: 1.2, max: 2.0) [2024-09-01 06:43:22,049][00307] Avg episode reward: [(0, '4.377')] [2024-09-01 06:43:27,046][00307] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 561152. Throughput: 0: 224.5. Samples: 142598. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) [2024-09-01 06:43:27,050][00307] Avg episode reward: [(0, '4.309')] [2024-09-01 06:43:32,046][00307] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 565248. Throughput: 0: 226.2. Samples: 143264. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) [2024-09-01 06:43:32,054][00307] Avg episode reward: [(0, '4.408')] [2024-09-01 06:43:37,047][00307] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 874.7). Total num frames: 569344. Throughput: 0: 216.8. Samples: 144222. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) [2024-09-01 06:43:37,052][00307] Avg episode reward: [(0, '4.430')] [2024-09-01 06:43:38,311][04814] Updated weights for policy 0, policy_version 140 (0.1026) [2024-09-01 06:43:42,046][00307] Fps is (10 sec: 1228.8, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 577536. Throughput: 0: 220.1. Samples: 145718. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) [2024-09-01 06:43:42,054][00307] Avg episode reward: [(0, '4.423')] [2024-09-01 06:43:47,046][00307] Fps is (10 sec: 1228.9, 60 sec: 955.7, 300 sec: 888.6). Total num frames: 581632. Throughput: 0: 227.7. Samples: 146624. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) [2024-09-01 06:43:47,052][00307] Avg episode reward: [(0, '4.361')] [2024-09-01 06:43:52,046][00307] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 585728. Throughput: 0: 225.0. Samples: 147658. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) [2024-09-01 06:43:52,054][00307] Avg episode reward: [(0, '4.455')] [2024-09-01 06:43:57,046][00307] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 589824. Throughput: 0: 216.6. Samples: 149082. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) [2024-09-01 06:43:57,056][00307] Avg episode reward: [(0, '4.445')] [2024-09-01 06:44:02,046][00307] Fps is (10 sec: 819.2, 60 sec: 887.6, 300 sec: 874.7). Total num frames: 593920. Throughput: 0: 221.6. Samples: 149734. Policy #0 lag: (min: 1.0, avg: 1.2, max: 2.0) [2024-09-01 06:44:02,053][00307] Avg episode reward: [(0, '4.538')] [2024-09-01 06:44:07,050][00307] Fps is (10 sec: 818.9, 60 sec: 887.4, 300 sec: 874.7). Total num frames: 598016. Throughput: 0: 225.7. Samples: 151050. Policy #0 lag: (min: 1.0, avg: 1.2, max: 2.0) [2024-09-01 06:44:07,052][00307] Avg episode reward: [(0, '4.551')] [2024-09-01 06:44:12,046][00307] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 602112. Throughput: 0: 218.0. Samples: 152406. Policy #0 lag: (min: 1.0, avg: 1.2, max: 2.0) [2024-09-01 06:44:12,054][00307] Avg episode reward: [(0, '4.597')] [2024-09-01 06:44:14,054][04801] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000148_606208.pth... [2024-09-01 06:44:14,163][04801] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000097_397312.pth [2024-09-01 06:44:17,047][00307] Fps is (10 sec: 819.5, 60 sec: 887.5, 300 sec: 874.8). Total num frames: 606208. Throughput: 0: 219.2. Samples: 153130. Policy #0 lag: (min: 1.0, avg: 1.2, max: 2.0) [2024-09-01 06:44:17,051][00307] Avg episode reward: [(0, '4.620')] [2024-09-01 06:44:22,046][00307] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 610304. Throughput: 0: 223.8. Samples: 154294. Policy #0 lag: (min: 1.0, avg: 1.2, max: 2.0) [2024-09-01 06:44:22,051][00307] Avg episode reward: [(0, '4.666')] [2024-09-01 06:44:24,054][04814] Updated weights for policy 0, policy_version 150 (0.0556) [2024-09-01 06:44:27,046][00307] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 614400. Throughput: 0: 219.8. Samples: 155608. Policy #0 lag: (min: 1.0, avg: 1.2, max: 2.0) [2024-09-01 06:44:27,049][00307] Avg episode reward: [(0, '4.680')] [2024-09-01 06:44:31,997][04801] Signal inference workers to stop experience collection... (150 times) [2024-09-01 06:44:32,021][04814] InferenceWorker_p0-w0: stopping experience collection (150 times) [2024-09-01 06:44:32,046][00307] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 618496. Throughput: 0: 212.0. Samples: 156164. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) [2024-09-01 06:44:32,049][00307] Avg episode reward: [(0, '4.797')] [2024-09-01 06:44:32,619][04801] Signal inference workers to resume experience collection... (150 times) [2024-09-01 06:44:32,620][04814] InferenceWorker_p0-w0: resuming experience collection (150 times) [2024-09-01 06:44:37,046][00307] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 888.6). Total num frames: 626688. Throughput: 0: 226.6. Samples: 157856. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) [2024-09-01 06:44:37,052][00307] Avg episode reward: [(0, '4.706')] [2024-09-01 06:44:42,046][00307] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 874.7). Total num frames: 626688. Throughput: 0: 216.6. Samples: 158828. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) [2024-09-01 06:44:42,049][00307] Avg episode reward: [(0, '4.680')] [2024-09-01 06:44:47,046][00307] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 634880. Throughput: 0: 221.1. Samples: 159684. Policy #0 lag: (min: 1.0, avg: 1.2, max: 2.0) [2024-09-01 06:44:47,053][00307] Avg episode reward: [(0, '4.672')] [2024-09-01 06:44:52,046][00307] Fps is (10 sec: 1228.8, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 638976. Throughput: 0: 223.2. Samples: 161094. Policy #0 lag: (min: 1.0, avg: 1.2, max: 2.0) [2024-09-01 06:44:52,049][00307] Avg episode reward: [(0, '4.888')] [2024-09-01 06:44:57,052][00307] Fps is (10 sec: 818.7, 60 sec: 887.4, 300 sec: 888.6). Total num frames: 643072. Throughput: 0: 217.4. Samples: 162188. Policy #0 lag: (min: 1.0, avg: 1.2, max: 2.0) [2024-09-01 06:44:57,058][00307] Avg episode reward: [(0, '4.846')] [2024-09-01 06:45:02,046][00307] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 647168. Throughput: 0: 217.8. Samples: 162930. Policy #0 lag: (min: 1.0, avg: 1.2, max: 2.0) [2024-09-01 06:45:02,054][00307] Avg episode reward: [(0, '4.846')] [2024-09-01 06:45:07,046][00307] Fps is (10 sec: 819.7, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 651264. Throughput: 0: 220.5. Samples: 164218. Policy #0 lag: (min: 1.0, avg: 1.2, max: 2.0) [2024-09-01 06:45:07,052][00307] Avg episode reward: [(0, '4.885')] [2024-09-01 06:45:09,701][04814] Updated weights for policy 0, policy_version 160 (0.2669) [2024-09-01 06:45:12,047][00307] Fps is (10 sec: 819.1, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 655360. Throughput: 0: 221.2. Samples: 165562. Policy #0 lag: (min: 1.0, avg: 1.2, max: 2.0) [2024-09-01 06:45:12,057][00307] Avg episode reward: [(0, '4.840')] [2024-09-01 06:45:17,046][00307] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 659456. Throughput: 0: 221.5. Samples: 166132. Policy #0 lag: (min: 1.0, avg: 1.2, max: 2.0) [2024-09-01 06:45:17,051][00307] Avg episode reward: [(0, '4.791')] [2024-09-01 06:45:22,051][00307] Fps is (10 sec: 818.9, 60 sec: 887.4, 300 sec: 874.7). Total num frames: 663552. Throughput: 0: 218.2. Samples: 167678. Policy #0 lag: (min: 1.0, avg: 1.2, max: 2.0) [2024-09-01 06:45:22,054][00307] Avg episode reward: [(0, '4.778')] [2024-09-01 06:45:27,047][00307] Fps is (10 sec: 819.1, 60 sec: 887.5, 300 sec: 874.8). Total num frames: 667648. Throughput: 0: 227.6. Samples: 169070. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 06:45:27,056][00307] Avg episode reward: [(0, '4.824')] [2024-09-01 06:45:32,050][00307] Fps is (10 sec: 819.3, 60 sec: 887.4, 300 sec: 874.7). Total num frames: 671744. Throughput: 0: 219.4. Samples: 169556. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 06:45:32,057][00307] Avg episode reward: [(0, '4.821')] [2024-09-01 06:45:37,047][00307] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 874.7). Total num frames: 675840. Throughput: 0: 221.0. Samples: 171040. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) [2024-09-01 06:45:37,052][00307] Avg episode reward: [(0, '4.783')] [2024-09-01 06:45:42,046][00307] Fps is (10 sec: 1229.2, 60 sec: 955.7, 300 sec: 888.6). Total num frames: 684032. Throughput: 0: 228.2. Samples: 172456. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) [2024-09-01 06:45:42,049][00307] Avg episode reward: [(0, '4.838')] [2024-09-01 06:45:47,046][00307] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 874.7). Total num frames: 684032. Throughput: 0: 225.5. Samples: 173076. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) [2024-09-01 06:45:47,049][00307] Avg episode reward: [(0, '4.829')] [2024-09-01 06:45:52,046][00307] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 692224. Throughput: 0: 223.4. Samples: 174270. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) [2024-09-01 06:45:52,049][00307] Avg episode reward: [(0, '4.742')] [2024-09-01 06:45:56,256][04814] Updated weights for policy 0, policy_version 170 (0.1006) [2024-09-01 06:45:57,046][00307] Fps is (10 sec: 1228.8, 60 sec: 887.6, 300 sec: 888.6). Total num frames: 696320. Throughput: 0: 224.4. Samples: 175662. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) [2024-09-01 06:45:57,048][00307] Avg episode reward: [(0, '4.725')] [2024-09-01 06:46:02,046][00307] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 700416. Throughput: 0: 228.5. Samples: 176416. Policy #0 lag: (min: 1.0, avg: 1.2, max: 2.0) [2024-09-01 06:46:02,053][00307] Avg episode reward: [(0, '4.843')] [2024-09-01 06:46:07,046][00307] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 704512. Throughput: 0: 217.5. Samples: 177464. Policy #0 lag: (min: 1.0, avg: 1.2, max: 2.0) [2024-09-01 06:46:07,048][00307] Avg episode reward: [(0, '4.881')] [2024-09-01 06:46:12,047][00307] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 708608. Throughput: 0: 219.7. Samples: 178956. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) [2024-09-01 06:46:12,056][00307] Avg episode reward: [(0, '4.863')] [2024-09-01 06:46:13,922][04801] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000174_712704.pth... [2024-09-01 06:46:14,024][04801] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000122_499712.pth [2024-09-01 06:46:17,054][00307] Fps is (10 sec: 818.6, 60 sec: 887.4, 300 sec: 888.6). Total num frames: 712704. Throughput: 0: 227.5. Samples: 179796. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) [2024-09-01 06:46:17,058][00307] Avg episode reward: [(0, '4.873')] [2024-09-01 06:46:22,046][00307] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 716800. Throughput: 0: 215.4. Samples: 180734. Policy #0 lag: (min: 1.0, avg: 1.2, max: 2.0) [2024-09-01 06:46:22,056][00307] Avg episode reward: [(0, '4.976')] [2024-09-01 06:46:27,047][00307] Fps is (10 sec: 819.8, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 720896. Throughput: 0: 221.8. Samples: 182436. Policy #0 lag: (min: 1.0, avg: 1.2, max: 2.0) [2024-09-01 06:46:27,049][00307] Avg episode reward: [(0, '4.849')] [2024-09-01 06:46:32,046][00307] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 724992. Throughput: 0: 219.1. Samples: 182934. Policy #0 lag: (min: 1.0, avg: 1.2, max: 2.0) [2024-09-01 06:46:32,054][00307] Avg episode reward: [(0, '5.001')] [2024-09-01 06:46:37,049][00307] Fps is (10 sec: 819.0, 60 sec: 887.4, 300 sec: 874.7). Total num frames: 729088. Throughput: 0: 223.9. Samples: 184344. Policy #0 lag: (min: 1.0, avg: 1.2, max: 2.0) [2024-09-01 06:46:37,053][00307] Avg episode reward: [(0, '5.127')] [2024-09-01 06:46:38,042][04801] Saving new best policy, reward=5.001! [2024-09-01 06:46:42,046][00307] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 874.7). Total num frames: 733184. Throughput: 0: 221.7. Samples: 185640. Policy #0 lag: (min: 1.0, avg: 1.2, max: 2.0) [2024-09-01 06:46:42,049][00307] Avg episode reward: [(0, '5.156')] [2024-09-01 06:46:42,425][04801] Saving new best policy, reward=5.127! [2024-09-01 06:46:42,430][04814] Updated weights for policy 0, policy_version 180 (0.1015) [2024-09-01 06:46:46,585][04801] Saving new best policy, reward=5.156! [2024-09-01 06:46:47,046][00307] Fps is (10 sec: 1229.1, 60 sec: 955.7, 300 sec: 888.6). Total num frames: 741376. Throughput: 0: 223.0. Samples: 186450. Policy #0 lag: (min: 1.0, avg: 1.2, max: 2.0) [2024-09-01 06:46:47,048][00307] Avg episode reward: [(0, '5.165')] [2024-09-01 06:46:51,326][04801] Saving new best policy, reward=5.165! [2024-09-01 06:46:52,051][00307] Fps is (10 sec: 1228.2, 60 sec: 887.4, 300 sec: 888.6). Total num frames: 745472. Throughput: 0: 226.0. Samples: 187634. Policy #0 lag: (min: 1.0, avg: 1.2, max: 2.0) [2024-09-01 06:46:52,059][00307] Avg episode reward: [(0, '5.060')] [2024-09-01 06:46:57,048][00307] Fps is (10 sec: 819.0, 60 sec: 887.4, 300 sec: 888.6). Total num frames: 749568. Throughput: 0: 218.7. Samples: 188800. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) [2024-09-01 06:46:57,059][00307] Avg episode reward: [(0, '5.055')] [2024-09-01 06:47:02,046][00307] Fps is (10 sec: 819.6, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 753664. Throughput: 0: 220.7. Samples: 189724. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) [2024-09-01 06:47:02,048][00307] Avg episode reward: [(0, '5.189')] [2024-09-01 06:47:04,938][04801] Saving new best policy, reward=5.189! [2024-09-01 06:47:07,047][00307] Fps is (10 sec: 819.3, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 757760. Throughput: 0: 230.2. Samples: 191092. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 06:47:07,054][00307] Avg episode reward: [(0, '5.182')] [2024-09-01 06:47:12,046][00307] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 761856. Throughput: 0: 212.8. Samples: 192010. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 06:47:12,053][00307] Avg episode reward: [(0, '5.134')] [2024-09-01 06:47:17,046][00307] Fps is (10 sec: 819.2, 60 sec: 887.6, 300 sec: 888.6). Total num frames: 765952. Throughput: 0: 219.8. Samples: 192824. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 06:47:17,053][00307] Avg episode reward: [(0, '5.202')] [2024-09-01 06:47:18,984][04801] Saving new best policy, reward=5.202! [2024-09-01 06:47:22,046][00307] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 770048. Throughput: 0: 223.7. Samples: 194408. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 06:47:22,057][00307] Avg episode reward: [(0, '5.121')] [2024-09-01 06:47:27,050][00307] Fps is (10 sec: 818.9, 60 sec: 887.4, 300 sec: 888.6). Total num frames: 774144. Throughput: 0: 218.8. Samples: 195486. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) [2024-09-01 06:47:27,055][00307] Avg episode reward: [(0, '5.019')] [2024-09-01 06:47:29,651][04814] Updated weights for policy 0, policy_version 190 (0.0556) [2024-09-01 06:47:32,046][00307] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 778240. Throughput: 0: 208.5. Samples: 195832. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) [2024-09-01 06:47:32,048][00307] Avg episode reward: [(0, '5.043')] [2024-09-01 06:47:37,046][00307] Fps is (10 sec: 819.5, 60 sec: 887.5, 300 sec: 874.8). Total num frames: 782336. Throughput: 0: 225.5. Samples: 197780. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) [2024-09-01 06:47:37,055][00307] Avg episode reward: [(0, '5.087')] [2024-09-01 06:47:42,047][00307] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 786432. Throughput: 0: 226.0. Samples: 198968. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) [2024-09-01 06:47:42,049][00307] Avg episode reward: [(0, '5.065')] [2024-09-01 06:47:47,046][00307] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 874.7). Total num frames: 790528. Throughput: 0: 214.3. Samples: 199368. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) [2024-09-01 06:47:47,048][00307] Avg episode reward: [(0, '5.110')] [2024-09-01 06:47:52,046][00307] Fps is (10 sec: 1228.8, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 798720. Throughput: 0: 218.6. Samples: 200928. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) [2024-09-01 06:47:52,051][00307] Avg episode reward: [(0, '5.071')] [2024-09-01 06:47:57,048][00307] Fps is (10 sec: 1228.6, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 802816. Throughput: 0: 228.1. Samples: 202274. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) [2024-09-01 06:47:57,055][00307] Avg episode reward: [(0, '5.094')] [2024-09-01 06:48:02,046][00307] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 806912. Throughput: 0: 223.4. Samples: 202876. Policy #0 lag: (min: 1.0, avg: 1.2, max: 2.0) [2024-09-01 06:48:02,051][00307] Avg episode reward: [(0, '5.078')] [2024-09-01 06:48:07,046][00307] Fps is (10 sec: 819.4, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 811008. Throughput: 0: 217.2. Samples: 204182. Policy #0 lag: (min: 1.0, avg: 1.2, max: 2.0) [2024-09-01 06:48:07,049][00307] Avg episode reward: [(0, '5.280')] [2024-09-01 06:48:09,337][04801] Saving new best policy, reward=5.280! [2024-09-01 06:48:12,046][00307] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 815104. Throughput: 0: 229.7. Samples: 205820. Policy #0 lag: (min: 1.0, avg: 1.2, max: 2.0) [2024-09-01 06:48:12,050][00307] Avg episode reward: [(0, '5.254')] [2024-09-01 06:48:14,491][04801] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000200_819200.pth... [2024-09-01 06:48:14,494][04814] Updated weights for policy 0, policy_version 200 (0.0039) [2024-09-01 06:48:14,603][04801] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000148_606208.pth [2024-09-01 06:48:17,046][00307] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 819200. Throughput: 0: 230.4. Samples: 206200. Policy #0 lag: (min: 1.0, avg: 1.2, max: 2.0) [2024-09-01 06:48:17,056][00307] Avg episode reward: [(0, '5.193')] [2024-09-01 06:48:22,046][00307] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 823296. Throughput: 0: 214.3. Samples: 207422. Policy #0 lag: (min: 1.0, avg: 1.2, max: 2.0) [2024-09-01 06:48:22,049][00307] Avg episode reward: [(0, '5.117')] [2024-09-01 06:48:23,443][04801] Signal inference workers to stop experience collection... (200 times) [2024-09-01 06:48:23,474][04814] InferenceWorker_p0-w0: stopping experience collection (200 times) [2024-09-01 06:48:24,400][04801] Signal inference workers to resume experience collection... (200 times) [2024-09-01 06:48:24,402][04814] InferenceWorker_p0-w0: resuming experience collection (200 times) [2024-09-01 06:48:27,047][00307] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 827392. Throughput: 0: 214.8. Samples: 208634. Policy #0 lag: (min: 1.0, avg: 1.2, max: 2.0) [2024-09-01 06:48:27,056][00307] Avg episode reward: [(0, '5.199')] [2024-09-01 06:48:32,046][00307] Fps is (10 sec: 409.6, 60 sec: 819.2, 300 sec: 874.7). Total num frames: 827392. Throughput: 0: 216.4. Samples: 209104. Policy #0 lag: (min: 1.0, avg: 1.2, max: 2.0) [2024-09-01 06:48:32,052][00307] Avg episode reward: [(0, '5.199')] [2024-09-01 06:48:37,046][00307] Fps is (10 sec: 409.6, 60 sec: 819.2, 300 sec: 860.9). Total num frames: 831488. Throughput: 0: 197.5. Samples: 209816. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) [2024-09-01 06:48:37,050][00307] Avg episode reward: [(0, '5.208')] [2024-09-01 06:48:42,047][00307] Fps is (10 sec: 1228.8, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 839680. Throughput: 0: 195.9. Samples: 211088. Policy #0 lag: (min: 1.0, avg: 1.2, max: 2.0) [2024-09-01 06:48:42,049][00307] Avg episode reward: [(0, '5.186')] [2024-09-01 06:48:47,046][00307] Fps is (10 sec: 1228.8, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 843776. Throughput: 0: 205.4. Samples: 212118. Policy #0 lag: (min: 1.0, avg: 1.2, max: 2.0) [2024-09-01 06:48:47,049][00307] Avg episode reward: [(0, '5.275')] [2024-09-01 06:48:52,046][00307] Fps is (10 sec: 409.6, 60 sec: 750.9, 300 sec: 860.9). Total num frames: 843776. Throughput: 0: 198.3. Samples: 213104. Policy #0 lag: (min: 1.0, avg: 1.2, max: 2.0) [2024-09-01 06:48:52,051][00307] Avg episode reward: [(0, '5.495')] [2024-09-01 06:48:52,188][04801] Saving new best policy, reward=5.495! [2024-09-01 06:48:57,046][00307] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 874.7). Total num frames: 851968. Throughput: 0: 190.7. Samples: 214400. Policy #0 lag: (min: 1.0, avg: 1.1, max: 2.0) [2024-09-01 06:48:57,050][00307] Avg episode reward: [(0, '5.653')] [2024-09-01 06:49:00,295][04801] Saving new best policy, reward=5.653! [2024-09-01 06:49:02,046][00307] Fps is (10 sec: 1228.8, 60 sec: 819.2, 300 sec: 874.7). Total num frames: 856064. Throughput: 0: 199.1. Samples: 215160. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 06:49:02,049][00307] Avg episode reward: [(0, '5.555')] [2024-09-01 06:49:06,043][04814] Updated weights for policy 0, policy_version 210 (0.2196) [2024-09-01 06:49:07,046][00307] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 874.7). Total num frames: 860160. Throughput: 0: 199.0. Samples: 216376. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 06:49:07,053][00307] Avg episode reward: [(0, '5.485')] [2024-09-01 06:49:12,046][00307] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 874.7). Total num frames: 864256. Throughput: 0: 198.8. Samples: 217580. Policy #0 lag: (min: 1.0, avg: 1.1, max: 2.0) [2024-09-01 06:49:12,048][00307] Avg episode reward: [(0, '5.439')] [2024-09-01 06:49:17,046][00307] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 874.7). Total num frames: 868352. Throughput: 0: 205.4. Samples: 218346. Policy #0 lag: (min: 1.0, avg: 1.1, max: 2.0) [2024-09-01 06:49:17,053][00307] Avg episode reward: [(0, '5.471')] [2024-09-01 06:49:22,047][00307] Fps is (10 sec: 819.1, 60 sec: 819.2, 300 sec: 874.7). Total num frames: 872448. Throughput: 0: 221.1. Samples: 219766. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 06:49:22,050][00307] Avg episode reward: [(0, '5.398')] [2024-09-01 06:49:27,046][00307] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 874.7). Total num frames: 876544. Throughput: 0: 217.3. Samples: 220866. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 06:49:27,050][00307] Avg episode reward: [(0, '5.473')] [2024-09-01 06:49:32,046][00307] Fps is (10 sec: 819.3, 60 sec: 887.5, 300 sec: 860.9). Total num frames: 880640. Throughput: 0: 209.3. Samples: 221538. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) [2024-09-01 06:49:32,049][00307] Avg episode reward: [(0, '5.506')] [2024-09-01 06:49:37,046][00307] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 884736. Throughput: 0: 227.5. Samples: 223340. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) [2024-09-01 06:49:37,053][00307] Avg episode reward: [(0, '5.353')] [2024-09-01 06:49:42,049][00307] Fps is (10 sec: 819.0, 60 sec: 819.2, 300 sec: 860.8). Total num frames: 888832. Throughput: 0: 221.7. Samples: 224378. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) [2024-09-01 06:49:42,058][00307] Avg episode reward: [(0, '5.220')] [2024-09-01 06:49:47,046][00307] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 860.9). Total num frames: 892928. Throughput: 0: 217.0. Samples: 224926. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) [2024-09-01 06:49:47,051][00307] Avg episode reward: [(0, '5.088')] [2024-09-01 06:49:51,577][04814] Updated weights for policy 0, policy_version 220 (0.2052) [2024-09-01 06:49:52,046][00307] Fps is (10 sec: 1229.1, 60 sec: 955.7, 300 sec: 874.8). Total num frames: 901120. Throughput: 0: 228.8. Samples: 226672. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) [2024-09-01 06:49:52,049][00307] Avg episode reward: [(0, '5.236')] [2024-09-01 06:49:57,046][00307] Fps is (10 sec: 1228.8, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 905216. Throughput: 0: 216.0. Samples: 227302. Policy #0 lag: (min: 1.0, avg: 1.2, max: 2.0) [2024-09-01 06:49:57,053][00307] Avg episode reward: [(0, '5.256')] [2024-09-01 06:50:02,046][00307] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 909312. Throughput: 0: 219.1. Samples: 228206. Policy #0 lag: (min: 1.0, avg: 1.2, max: 2.0) [2024-09-01 06:50:02,049][00307] Avg episode reward: [(0, '5.168')] [2024-09-01 06:50:07,046][00307] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 913408. Throughput: 0: 219.0. Samples: 229620. Policy #0 lag: (min: 1.0, avg: 1.2, max: 2.0) [2024-09-01 06:50:07,051][00307] Avg episode reward: [(0, '5.149')] [2024-09-01 06:50:12,046][00307] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 917504. Throughput: 0: 220.7. Samples: 230798. Policy #0 lag: (min: 1.0, avg: 1.2, max: 2.0) [2024-09-01 06:50:12,051][00307] Avg episode reward: [(0, '5.023')] [2024-09-01 06:50:16,706][04801] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000225_921600.pth... [2024-09-01 06:50:16,813][04801] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000174_712704.pth [2024-09-01 06:50:17,046][00307] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 874.8). Total num frames: 921600. Throughput: 0: 218.7. Samples: 231380. Policy #0 lag: (min: 1.0, avg: 1.2, max: 2.0) [2024-09-01 06:50:17,048][00307] Avg episode reward: [(0, '4.984')] [2024-09-01 06:50:22,047][00307] Fps is (10 sec: 819.1, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 925696. Throughput: 0: 210.3. Samples: 232804. Policy #0 lag: (min: 1.0, avg: 1.2, max: 2.0) [2024-09-01 06:50:22,056][00307] Avg episode reward: [(0, '5.105')] [2024-09-01 06:50:27,047][00307] Fps is (10 sec: 819.1, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 929792. Throughput: 0: 216.0. Samples: 234098. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) [2024-09-01 06:50:27,059][00307] Avg episode reward: [(0, '5.160')] [2024-09-01 06:50:32,046][00307] Fps is (10 sec: 819.3, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 933888. Throughput: 0: 217.3. Samples: 234706. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) [2024-09-01 06:50:32,052][00307] Avg episode reward: [(0, '5.124')] [2024-09-01 06:50:37,046][00307] Fps is (10 sec: 819.3, 60 sec: 887.5, 300 sec: 860.9). Total num frames: 937984. Throughput: 0: 209.7. Samples: 236108. Policy #0 lag: (min: 1.0, avg: 1.2, max: 2.0) [2024-09-01 06:50:37,049][00307] Avg episode reward: [(0, '4.992')] [2024-09-01 06:50:39,171][04814] Updated weights for policy 0, policy_version 230 (0.1174) [2024-09-01 06:50:42,046][00307] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 942080. Throughput: 0: 232.5. Samples: 237764. Policy #0 lag: (min: 1.0, avg: 1.2, max: 2.0) [2024-09-01 06:50:42,055][00307] Avg episode reward: [(0, '4.904')] [2024-09-01 06:50:47,046][00307] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 860.9). Total num frames: 946176. Throughput: 0: 219.7. Samples: 238092. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 06:50:47,049][00307] Avg episode reward: [(0, '4.904')] [2024-09-01 06:50:52,050][00307] Fps is (10 sec: 818.9, 60 sec: 819.2, 300 sec: 860.8). Total num frames: 950272. Throughput: 0: 216.6. Samples: 239368. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 06:50:52,053][00307] Avg episode reward: [(0, '5.005')] [2024-09-01 06:50:57,046][00307] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 860.9). Total num frames: 954368. Throughput: 0: 223.7. Samples: 240864. Policy #0 lag: (min: 1.0, avg: 1.2, max: 2.0) [2024-09-01 06:50:57,049][00307] Avg episode reward: [(0, '5.030')] [2024-09-01 06:51:02,046][00307] Fps is (10 sec: 819.5, 60 sec: 819.2, 300 sec: 860.9). Total num frames: 958464. Throughput: 0: 224.1. Samples: 241464. Policy #0 lag: (min: 1.0, avg: 1.2, max: 2.0) [2024-09-01 06:51:02,052][00307] Avg episode reward: [(0, '5.118')] [2024-09-01 06:51:07,046][00307] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 860.9). Total num frames: 962560. Throughput: 0: 217.5. Samples: 242590. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) [2024-09-01 06:51:07,050][00307] Avg episode reward: [(0, '5.036')] [2024-09-01 06:51:12,046][00307] Fps is (10 sec: 1228.8, 60 sec: 887.5, 300 sec: 874.8). Total num frames: 970752. Throughput: 0: 219.5. Samples: 243974. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 06:51:12,050][00307] Avg episode reward: [(0, '5.049')] [2024-09-01 06:51:17,046][00307] Fps is (10 sec: 1228.8, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 974848. Throughput: 0: 228.3. Samples: 244980. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 06:51:17,051][00307] Avg episode reward: [(0, '4.990')] [2024-09-01 06:51:22,046][00307] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 978944. Throughput: 0: 221.0. Samples: 246054. Policy #0 lag: (min: 1.0, avg: 1.2, max: 2.0) [2024-09-01 06:51:22,049][00307] Avg episode reward: [(0, '5.003')] [2024-09-01 06:51:26,156][04814] Updated weights for policy 0, policy_version 240 (0.1025) [2024-09-01 06:51:27,047][00307] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 983040. Throughput: 0: 209.0. Samples: 247168. Policy #0 lag: (min: 1.0, avg: 1.2, max: 2.0) [2024-09-01 06:51:27,055][00307] Avg episode reward: [(0, '5.055')] [2024-09-01 06:51:32,046][00307] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 987136. Throughput: 0: 222.9. Samples: 248122. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) [2024-09-01 06:51:32,048][00307] Avg episode reward: [(0, '5.109')] [2024-09-01 06:51:37,046][00307] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 991232. Throughput: 0: 222.1. Samples: 249360. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) [2024-09-01 06:51:37,052][00307] Avg episode reward: [(0, '5.128')] [2024-09-01 06:51:42,046][00307] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 860.9). Total num frames: 995328. Throughput: 0: 214.4. Samples: 250512. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) [2024-09-01 06:51:42,056][00307] Avg episode reward: [(0, '5.177')] [2024-09-01 06:51:47,046][00307] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 860.9). Total num frames: 999424. Throughput: 0: 217.5. Samples: 251250. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) [2024-09-01 06:51:47,049][00307] Avg episode reward: [(0, '5.111')] [2024-09-01 06:51:52,046][00307] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 860.9). Total num frames: 1003520. Throughput: 0: 225.8. Samples: 252752. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) [2024-09-01 06:51:52,050][00307] Avg episode reward: [(0, '5.037')] [2024-09-01 06:51:57,046][00307] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 860.9). Total num frames: 1007616. Throughput: 0: 217.6. Samples: 253766. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) [2024-09-01 06:51:57,049][00307] Avg episode reward: [(0, '4.948')] [2024-09-01 06:52:02,046][00307] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 860.9). Total num frames: 1011712. Throughput: 0: 213.0. Samples: 254564. Policy #0 lag: (min: 1.0, avg: 1.2, max: 2.0) [2024-09-01 06:52:02,053][00307] Avg episode reward: [(0, '4.961')] [2024-09-01 06:52:07,046][00307] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 860.9). Total num frames: 1015808. Throughput: 0: 224.5. Samples: 256158. Policy #0 lag: (min: 1.0, avg: 1.2, max: 2.0) [2024-09-01 06:52:07,050][00307] Avg episode reward: [(0, '5.023')] [2024-09-01 06:52:12,047][00307] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 860.9). Total num frames: 1019904. Throughput: 0: 222.2. Samples: 257168. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 06:52:12,050][00307] Avg episode reward: [(0, '5.196')] [2024-09-01 06:52:12,072][00307] Components not started: RolloutWorker_w1, RolloutWorker_w5, RolloutWorker_w6, wait_time=1200.0 seconds [2024-09-01 06:52:13,055][04814] Updated weights for policy 0, policy_version 250 (0.0048) [2024-09-01 06:52:17,046][00307] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 860.9). Total num frames: 1024000. Throughput: 0: 214.4. Samples: 257768. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 06:52:17,054][00307] Avg episode reward: [(0, '5.176')] [2024-09-01 06:52:17,206][04801] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000251_1028096.pth... [2024-09-01 06:52:17,304][04801] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000200_819200.pth [2024-09-01 06:52:22,048][00307] Fps is (10 sec: 1228.6, 60 sec: 887.4, 300 sec: 874.7). Total num frames: 1032192. Throughput: 0: 224.4. Samples: 259460. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 06:52:22,055][00307] Avg episode reward: [(0, '5.169')] [2024-09-01 06:52:24,935][04801] Signal inference workers to stop experience collection... (250 times) [2024-09-01 06:52:25,014][04814] InferenceWorker_p0-w0: stopping experience collection (250 times) [2024-09-01 06:52:26,631][04801] Signal inference workers to resume experience collection... (250 times) [2024-09-01 06:52:26,638][04814] InferenceWorker_p0-w0: resuming experience collection (250 times) [2024-09-01 06:52:27,046][00307] Fps is (10 sec: 1228.8, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 1036288. Throughput: 0: 213.3. Samples: 260110. Policy #0 lag: (min: 1.0, avg: 1.2, max: 2.0) [2024-09-01 06:52:27,054][00307] Avg episode reward: [(0, '5.163')] [2024-09-01 06:52:32,046][00307] Fps is (10 sec: 819.3, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 1040384. Throughput: 0: 215.8. Samples: 260960. Policy #0 lag: (min: 1.0, avg: 1.2, max: 2.0) [2024-09-01 06:52:32,057][00307] Avg episode reward: [(0, '5.212')] [2024-09-01 06:52:37,046][00307] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 1044480. Throughput: 0: 216.1. Samples: 262476. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) [2024-09-01 06:52:37,053][00307] Avg episode reward: [(0, '5.220')] [2024-09-01 06:52:42,047][00307] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 1048576. Throughput: 0: 223.3. Samples: 263814. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) [2024-09-01 06:52:42,050][00307] Avg episode reward: [(0, '5.324')] [2024-09-01 06:52:47,046][00307] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 860.9). Total num frames: 1052672. Throughput: 0: 218.0. Samples: 264372. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) [2024-09-01 06:52:47,050][00307] Avg episode reward: [(0, '5.275')] [2024-09-01 06:52:52,046][00307] Fps is (10 sec: 819.3, 60 sec: 887.5, 300 sec: 860.9). Total num frames: 1056768. Throughput: 0: 215.7. Samples: 265864. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) [2024-09-01 06:52:52,050][00307] Avg episode reward: [(0, '5.349')] [2024-09-01 06:52:57,046][00307] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 860.9). Total num frames: 1060864. Throughput: 0: 223.6. Samples: 267230. Policy #0 lag: (min: 1.0, avg: 1.2, max: 2.0) [2024-09-01 06:52:57,050][00307] Avg episode reward: [(0, '5.519')] [2024-09-01 06:53:00,005][04814] Updated weights for policy 0, policy_version 260 (0.3423) [2024-09-01 06:53:02,047][00307] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 860.9). Total num frames: 1064960. Throughput: 0: 219.0. Samples: 267624. Policy #0 lag: (min: 1.0, avg: 1.2, max: 2.0) [2024-09-01 06:53:02,050][00307] Avg episode reward: [(0, '5.408')] [2024-09-01 06:53:07,046][00307] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 860.9). Total num frames: 1069056. Throughput: 0: 214.9. Samples: 269132. Policy #0 lag: (min: 1.0, avg: 1.2, max: 2.0) [2024-09-01 06:53:07,055][00307] Avg episode reward: [(0, '5.474')] [2024-09-01 06:53:12,046][00307] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 874.7). Total num frames: 1077248. Throughput: 0: 231.2. Samples: 270516. Policy #0 lag: (min: 1.0, avg: 1.2, max: 2.0) [2024-09-01 06:53:12,051][00307] Avg episode reward: [(0, '5.451')] [2024-09-01 06:53:17,055][00307] Fps is (10 sec: 818.5, 60 sec: 887.3, 300 sec: 860.8). Total num frames: 1077248. Throughput: 0: 224.7. Samples: 271074. Policy #0 lag: (min: 1.0, avg: 1.2, max: 2.0) [2024-09-01 06:53:17,065][00307] Avg episode reward: [(0, '5.599')] [2024-09-01 06:53:22,046][00307] Fps is (10 sec: 409.6, 60 sec: 819.2, 300 sec: 860.9). Total num frames: 1081344. Throughput: 0: 223.0. Samples: 272512. Policy #0 lag: (min: 1.0, avg: 1.2, max: 2.0) [2024-09-01 06:53:22,049][00307] Avg episode reward: [(0, '5.639')] [2024-09-01 06:53:27,046][00307] Fps is (10 sec: 1229.9, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 1089536. Throughput: 0: 220.1. Samples: 273720. Policy #0 lag: (min: 1.0, avg: 1.2, max: 2.0) [2024-09-01 06:53:27,049][00307] Avg episode reward: [(0, '5.708')] [2024-09-01 06:53:31,658][04801] Saving new best policy, reward=5.708! [2024-09-01 06:53:32,046][00307] Fps is (10 sec: 1228.8, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 1093632. Throughput: 0: 225.1. Samples: 274502. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 06:53:32,052][00307] Avg episode reward: [(0, '5.943')] [2024-09-01 06:53:36,694][04801] Saving new best policy, reward=5.943! [2024-09-01 06:53:37,046][00307] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 1097728. Throughput: 0: 216.7. Samples: 275616. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 06:53:37,048][00307] Avg episode reward: [(0, '5.927')] [2024-09-01 06:53:42,046][00307] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 1101824. Throughput: 0: 217.8. Samples: 277030. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 06:53:42,054][00307] Avg episode reward: [(0, '5.882')] [2024-09-01 06:53:45,010][04814] Updated weights for policy 0, policy_version 270 (0.0540) [2024-09-01 06:53:47,046][00307] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 1105920. Throughput: 0: 225.6. Samples: 277778. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 06:53:47,055][00307] Avg episode reward: [(0, '5.654')] [2024-09-01 06:53:52,046][00307] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 1110016. Throughput: 0: 216.1. Samples: 278856. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) [2024-09-01 06:53:52,051][00307] Avg episode reward: [(0, '5.703')] [2024-09-01 06:53:57,047][00307] Fps is (10 sec: 819.1, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 1114112. Throughput: 0: 216.9. Samples: 280276. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) [2024-09-01 06:53:57,053][00307] Avg episode reward: [(0, '5.742')] [2024-09-01 06:54:02,046][00307] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 1118208. Throughput: 0: 221.2. Samples: 281026. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) [2024-09-01 06:54:02,049][00307] Avg episode reward: [(0, '5.816')] [2024-09-01 06:54:07,050][00307] Fps is (10 sec: 819.0, 60 sec: 887.4, 300 sec: 874.7). Total num frames: 1122304. Throughput: 0: 216.9. Samples: 282274. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) [2024-09-01 06:54:07,056][00307] Avg episode reward: [(0, '5.825')] [2024-09-01 06:54:12,046][00307] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 874.7). Total num frames: 1126400. Throughput: 0: 220.7. Samples: 283650. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) [2024-09-01 06:54:12,049][00307] Avg episode reward: [(0, '5.706')] [2024-09-01 06:54:13,705][04801] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000276_1130496.pth... [2024-09-01 06:54:13,815][04801] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000225_921600.pth [2024-09-01 06:54:17,046][00307] Fps is (10 sec: 819.5, 60 sec: 887.6, 300 sec: 874.7). Total num frames: 1130496. Throughput: 0: 219.7. Samples: 284390. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) [2024-09-01 06:54:17,059][00307] Avg episode reward: [(0, '5.779')] [2024-09-01 06:54:22,046][00307] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 1134592. Throughput: 0: 221.4. Samples: 285578. Policy #0 lag: (min: 1.0, avg: 1.2, max: 2.0) [2024-09-01 06:54:22,054][00307] Avg episode reward: [(0, '5.752')] [2024-09-01 06:54:27,047][00307] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 874.7). Total num frames: 1138688. Throughput: 0: 219.9. Samples: 286924. Policy #0 lag: (min: 1.0, avg: 1.2, max: 2.0) [2024-09-01 06:54:27,055][00307] Avg episode reward: [(0, '5.973')] [2024-09-01 06:54:31,915][04801] Saving new best policy, reward=5.973! [2024-09-01 06:54:31,921][04814] Updated weights for policy 0, policy_version 280 (0.0048) [2024-09-01 06:54:32,046][00307] Fps is (10 sec: 1228.8, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 1146880. Throughput: 0: 218.8. Samples: 287624. Policy #0 lag: (min: 1.0, avg: 1.2, max: 2.0) [2024-09-01 06:54:32,049][00307] Avg episode reward: [(0, '6.098')] [2024-09-01 06:54:36,132][04801] Saving new best policy, reward=6.098! [2024-09-01 06:54:37,046][00307] Fps is (10 sec: 1228.8, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 1150976. Throughput: 0: 224.0. Samples: 288936. Policy #0 lag: (min: 1.0, avg: 1.2, max: 2.0) [2024-09-01 06:54:37,056][00307] Avg episode reward: [(0, '6.125')] [2024-09-01 06:54:42,046][00307] Fps is (10 sec: 409.6, 60 sec: 819.2, 300 sec: 874.7). Total num frames: 1150976. Throughput: 0: 216.4. Samples: 290014. Policy #0 lag: (min: 1.0, avg: 1.2, max: 2.0) [2024-09-01 06:54:42,049][00307] Avg episode reward: [(0, '6.202')] [2024-09-01 06:54:42,373][04801] Saving new best policy, reward=6.125! [2024-09-01 06:54:46,303][04801] Saving new best policy, reward=6.202! [2024-09-01 06:54:47,046][00307] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 1159168. Throughput: 0: 219.0. Samples: 290882. Policy #0 lag: (min: 1.0, avg: 1.2, max: 2.0) [2024-09-01 06:54:47,053][00307] Avg episode reward: [(0, '6.009')] [2024-09-01 06:54:52,048][00307] Fps is (10 sec: 1228.7, 60 sec: 887.4, 300 sec: 874.7). Total num frames: 1163264. Throughput: 0: 220.4. Samples: 292190. Policy #0 lag: (min: 1.0, avg: 1.2, max: 2.0) [2024-09-01 06:54:52,051][00307] Avg episode reward: [(0, '6.051')] [2024-09-01 06:54:57,046][00307] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 1167360. Throughput: 0: 212.4. Samples: 293206. Policy #0 lag: (min: 1.0, avg: 1.2, max: 2.0) [2024-09-01 06:54:57,049][00307] Avg episode reward: [(0, '5.880')] [2024-09-01 06:55:02,046][00307] Fps is (10 sec: 819.3, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 1171456. Throughput: 0: 213.8. Samples: 294010. Policy #0 lag: (min: 1.0, avg: 1.2, max: 2.0) [2024-09-01 06:55:02,050][00307] Avg episode reward: [(0, '5.998')] [2024-09-01 06:55:07,047][00307] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 1175552. Throughput: 0: 219.4. Samples: 295452. Policy #0 lag: (min: 1.0, avg: 1.2, max: 2.0) [2024-09-01 06:55:07,050][00307] Avg episode reward: [(0, '6.144')] [2024-09-01 06:55:12,046][00307] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 1179648. Throughput: 0: 214.3. Samples: 296566. Policy #0 lag: (min: 1.0, avg: 1.2, max: 2.0) [2024-09-01 06:55:12,049][00307] Avg episode reward: [(0, '6.232')] [2024-09-01 06:55:15,481][04801] Saving new best policy, reward=6.232! [2024-09-01 06:55:17,046][00307] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 1183744. Throughput: 0: 211.4. Samples: 297138. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) [2024-09-01 06:55:17,049][00307] Avg episode reward: [(0, '6.224')] [2024-09-01 06:55:19,221][04814] Updated weights for policy 0, policy_version 290 (0.1680) [2024-09-01 06:55:22,046][00307] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 1187840. Throughput: 0: 217.4. Samples: 298718. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) [2024-09-01 06:55:22,049][00307] Avg episode reward: [(0, '5.844')] [2024-09-01 06:55:27,052][00307] Fps is (10 sec: 818.7, 60 sec: 887.4, 300 sec: 874.7). Total num frames: 1191936. Throughput: 0: 225.6. Samples: 300168. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 06:55:27,057][00307] Avg episode reward: [(0, '6.150')] [2024-09-01 06:55:32,046][00307] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 874.7). Total num frames: 1196032. Throughput: 0: 212.4. Samples: 300438. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 06:55:32,054][00307] Avg episode reward: [(0, '6.086')] [2024-09-01 06:55:37,047][00307] Fps is (10 sec: 819.6, 60 sec: 819.2, 300 sec: 874.7). Total num frames: 1200128. Throughput: 0: 217.4. Samples: 301972. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) [2024-09-01 06:55:37,050][00307] Avg episode reward: [(0, '6.188')] [2024-09-01 06:55:42,046][00307] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 888.6). Total num frames: 1208320. Throughput: 0: 225.4. Samples: 303348. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) [2024-09-01 06:55:42,054][00307] Avg episode reward: [(0, '6.515')] [2024-09-01 06:55:47,046][00307] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 874.7). Total num frames: 1208320. Throughput: 0: 220.5. Samples: 303932. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) [2024-09-01 06:55:47,053][00307] Avg episode reward: [(0, '6.549')] [2024-09-01 06:55:48,061][04801] Saving new best policy, reward=6.515! [2024-09-01 06:55:48,186][04801] Saving new best policy, reward=6.549! [2024-09-01 06:55:52,046][00307] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 1216512. Throughput: 0: 219.8. Samples: 305342. Policy #0 lag: (min: 1.0, avg: 1.2, max: 2.0) [2024-09-01 06:55:52,053][00307] Avg episode reward: [(0, '6.582')] [2024-09-01 06:55:56,102][04801] Saving new best policy, reward=6.582! [2024-09-01 06:55:57,046][00307] Fps is (10 sec: 1228.8, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 1220608. Throughput: 0: 227.8. Samples: 306818. Policy #0 lag: (min: 1.0, avg: 1.2, max: 2.0) [2024-09-01 06:55:57,054][00307] Avg episode reward: [(0, '6.378')] [2024-09-01 06:56:02,046][00307] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 1224704. Throughput: 0: 226.2. Samples: 307318. Policy #0 lag: (min: 1.0, avg: 1.2, max: 2.0) [2024-09-01 06:56:02,049][00307] Avg episode reward: [(0, '6.353')] [2024-09-01 06:56:06,179][04814] Updated weights for policy 0, policy_version 300 (0.0553) [2024-09-01 06:56:07,046][00307] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 1228800. Throughput: 0: 219.1. Samples: 308578. Policy #0 lag: (min: 1.0, avg: 1.2, max: 2.0) [2024-09-01 06:56:07,048][00307] Avg episode reward: [(0, '6.458')] [2024-09-01 06:56:12,047][00307] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 1232896. Throughput: 0: 220.2. Samples: 310074. Policy #0 lag: (min: 1.0, avg: 1.1, max: 2.0) [2024-09-01 06:56:12,052][00307] Avg episode reward: [(0, '6.456')] [2024-09-01 06:56:14,443][04801] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000302_1236992.pth... [2024-09-01 06:56:14,538][04801] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000251_1028096.pth [2024-09-01 06:56:17,046][00307] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 1236992. Throughput: 0: 227.2. Samples: 310664. Policy #0 lag: (min: 1.0, avg: 1.1, max: 2.0) [2024-09-01 06:56:17,049][00307] Avg episode reward: [(0, '6.411')] [2024-09-01 06:56:19,148][04801] Signal inference workers to stop experience collection... (300 times) [2024-09-01 06:56:19,181][04814] InferenceWorker_p0-w0: stopping experience collection (300 times) [2024-09-01 06:56:20,208][04801] Signal inference workers to resume experience collection... (300 times) [2024-09-01 06:56:20,212][04814] InferenceWorker_p0-w0: resuming experience collection (300 times) [2024-09-01 06:56:22,046][00307] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 1241088. Throughput: 0: 212.3. Samples: 311524. Policy #0 lag: (min: 1.0, avg: 1.1, max: 2.0) [2024-09-01 06:56:22,051][00307] Avg episode reward: [(0, '6.587')] [2024-09-01 06:56:24,425][04801] Saving new best policy, reward=6.587! [2024-09-01 06:56:27,046][00307] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 1245184. Throughput: 0: 224.6. Samples: 313454. Policy #0 lag: (min: 1.0, avg: 1.1, max: 2.0) [2024-09-01 06:56:27,057][00307] Avg episode reward: [(0, '6.699')] [2024-09-01 06:56:32,050][00307] Fps is (10 sec: 818.9, 60 sec: 887.4, 300 sec: 874.7). Total num frames: 1249280. Throughput: 0: 224.3. Samples: 314028. Policy #0 lag: (min: 1.0, avg: 1.2, max: 2.0) [2024-09-01 06:56:32,058][00307] Avg episode reward: [(0, '6.992')] [2024-09-01 06:56:34,074][04801] Saving new best policy, reward=6.699! [2024-09-01 06:56:34,260][04801] Saving new best policy, reward=6.992! [2024-09-01 06:56:37,046][00307] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 1253376. Throughput: 0: 215.2. Samples: 315024. Policy #0 lag: (min: 1.0, avg: 1.2, max: 2.0) [2024-09-01 06:56:37,051][00307] Avg episode reward: [(0, '6.976')] [2024-09-01 06:56:42,046][00307] Fps is (10 sec: 819.5, 60 sec: 819.2, 300 sec: 874.7). Total num frames: 1257472. Throughput: 0: 218.2. Samples: 316638. Policy #0 lag: (min: 1.0, avg: 1.2, max: 2.0) [2024-09-01 06:56:42,057][00307] Avg episode reward: [(0, '6.993')] [2024-09-01 06:56:47,046][00307] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 1261568. Throughput: 0: 218.5. Samples: 317150. Policy #0 lag: (min: 1.0, avg: 1.2, max: 2.0) [2024-09-01 06:56:47,049][00307] Avg episode reward: [(0, '6.949')] [2024-09-01 06:56:52,046][00307] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 874.7). Total num frames: 1265664. Throughput: 0: 220.9. Samples: 318518. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) [2024-09-01 06:56:52,050][00307] Avg episode reward: [(0, '6.895')] [2024-09-01 06:56:52,786][04814] Updated weights for policy 0, policy_version 310 (0.1056) [2024-09-01 06:56:57,046][00307] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 874.7). Total num frames: 1269760. Throughput: 0: 216.8. Samples: 319830. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) [2024-09-01 06:56:57,056][00307] Avg episode reward: [(0, '7.210')] [2024-09-01 06:57:01,060][04801] Saving new best policy, reward=7.210! [2024-09-01 06:57:02,046][00307] Fps is (10 sec: 1228.8, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 1277952. Throughput: 0: 223.6. Samples: 320728. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 06:57:02,051][00307] Avg episode reward: [(0, '7.062')] [2024-09-01 06:57:07,046][00307] Fps is (10 sec: 1228.8, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 1282048. Throughput: 0: 228.3. Samples: 321796. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) [2024-09-01 06:57:07,050][00307] Avg episode reward: [(0, '7.017')] [2024-09-01 06:57:12,046][00307] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 1286144. Throughput: 0: 213.2. Samples: 323050. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) [2024-09-01 06:57:12,048][00307] Avg episode reward: [(0, '7.259')] [2024-09-01 06:57:15,192][04801] Saving new best policy, reward=7.259! [2024-09-01 06:57:17,046][00307] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 1290240. Throughput: 0: 216.9. Samples: 323786. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) [2024-09-01 06:57:17,056][00307] Avg episode reward: [(0, '7.402')] [2024-09-01 06:57:19,885][04801] Saving new best policy, reward=7.402! [2024-09-01 06:57:22,049][00307] Fps is (10 sec: 819.0, 60 sec: 887.4, 300 sec: 874.7). Total num frames: 1294336. Throughput: 0: 224.7. Samples: 325134. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) [2024-09-01 06:57:22,052][00307] Avg episode reward: [(0, '7.251')] [2024-09-01 06:57:27,046][00307] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 1298432. Throughput: 0: 212.8. Samples: 326212. Policy #0 lag: (min: 1.0, avg: 1.2, max: 2.0) [2024-09-01 06:57:27,051][00307] Avg episode reward: [(0, '6.844')] [2024-09-01 06:57:32,046][00307] Fps is (10 sec: 819.4, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 1302528. Throughput: 0: 220.0. Samples: 327052. Policy #0 lag: (min: 1.0, avg: 1.2, max: 2.0) [2024-09-01 06:57:32,048][00307] Avg episode reward: [(0, '6.837')] [2024-09-01 06:57:37,046][00307] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 1306624. Throughput: 0: 224.0. Samples: 328600. Policy #0 lag: (min: 1.0, avg: 1.2, max: 2.0) [2024-09-01 06:57:37,055][00307] Avg episode reward: [(0, '6.744')] [2024-09-01 06:57:39,387][04814] Updated weights for policy 0, policy_version 320 (0.0064) [2024-09-01 06:57:42,046][00307] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 1310720. Throughput: 0: 217.2. Samples: 329602. Policy #0 lag: (min: 1.0, avg: 1.2, max: 2.0) [2024-09-01 06:57:42,050][00307] Avg episode reward: [(0, '6.713')] [2024-09-01 06:57:47,046][00307] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 1314816. Throughput: 0: 206.8. Samples: 330036. Policy #0 lag: (min: 1.0, avg: 1.2, max: 2.0) [2024-09-01 06:57:47,049][00307] Avg episode reward: [(0, '7.126')] [2024-09-01 06:57:52,046][00307] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 888.6). Total num frames: 1323008. Throughput: 0: 226.4. Samples: 331986. Policy #0 lag: (min: 1.0, avg: 1.2, max: 2.0) [2024-09-01 06:57:52,051][00307] Avg episode reward: [(0, '6.927')] [2024-09-01 06:57:57,046][00307] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 1323008. Throughput: 0: 221.4. Samples: 333014. Policy #0 lag: (min: 1.0, avg: 1.2, max: 2.0) [2024-09-01 06:57:57,053][00307] Avg episode reward: [(0, '7.361')] [2024-09-01 06:58:02,046][00307] Fps is (10 sec: 409.6, 60 sec: 819.2, 300 sec: 874.7). Total num frames: 1327104. Throughput: 0: 217.6. Samples: 333580. Policy #0 lag: (min: 1.0, avg: 1.2, max: 2.0) [2024-09-01 06:58:02,055][00307] Avg episode reward: [(0, '7.348')] [2024-09-01 06:58:07,046][00307] Fps is (10 sec: 1228.8, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 1335296. Throughput: 0: 220.6. Samples: 335062. Policy #0 lag: (min: 1.0, avg: 1.2, max: 2.0) [2024-09-01 06:58:07,049][00307] Avg episode reward: [(0, '7.589')] [2024-09-01 06:58:11,578][04801] Saving new best policy, reward=7.589! [2024-09-01 06:58:12,048][00307] Fps is (10 sec: 1228.5, 60 sec: 887.4, 300 sec: 888.6). Total num frames: 1339392. Throughput: 0: 225.2. Samples: 336346. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) [2024-09-01 06:58:12,054][00307] Avg episode reward: [(0, '7.528')] [2024-09-01 06:58:16,685][04801] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000328_1343488.pth... [2024-09-01 06:58:16,787][04801] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000276_1130496.pth [2024-09-01 06:58:17,049][00307] Fps is (10 sec: 819.0, 60 sec: 887.4, 300 sec: 888.6). Total num frames: 1343488. Throughput: 0: 219.0. Samples: 336906. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) [2024-09-01 06:58:17,052][00307] Avg episode reward: [(0, '7.362')] [2024-09-01 06:58:22,047][00307] Fps is (10 sec: 819.4, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 1347584. Throughput: 0: 216.0. Samples: 338322. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 06:58:22,055][00307] Avg episode reward: [(0, '7.383')] [2024-09-01 06:58:24,819][04814] Updated weights for policy 0, policy_version 330 (0.1038) [2024-09-01 06:58:27,049][00307] Fps is (10 sec: 819.4, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 1351680. Throughput: 0: 223.5. Samples: 339660. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 06:58:27,060][00307] Avg episode reward: [(0, '7.867')] [2024-09-01 06:58:30,696][04801] Saving new best policy, reward=7.867! [2024-09-01 06:58:32,046][00307] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 1355776. Throughput: 0: 226.5. Samples: 340228. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 06:58:32,050][00307] Avg episode reward: [(0, '8.014')] [2024-09-01 06:58:36,976][04801] Saving new best policy, reward=8.014! [2024-09-01 06:58:37,048][00307] Fps is (10 sec: 819.1, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 1359872. Throughput: 0: 208.6. Samples: 341374. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 06:58:37,055][00307] Avg episode reward: [(0, '7.982')] [2024-09-01 06:58:42,047][00307] Fps is (10 sec: 409.6, 60 sec: 819.2, 300 sec: 860.9). Total num frames: 1359872. Throughput: 0: 205.7. Samples: 342272. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 06:58:42,054][00307] Avg episode reward: [(0, '8.018')] [2024-09-01 06:58:47,047][00307] Fps is (10 sec: 0.0, 60 sec: 750.9, 300 sec: 847.0). Total num frames: 1359872. Throughput: 0: 193.2. Samples: 342276. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 06:58:47,061][00307] Avg episode reward: [(0, '8.018')] [2024-09-01 06:58:52,046][00307] Fps is (10 sec: 409.6, 60 sec: 682.7, 300 sec: 847.0). Total num frames: 1363968. Throughput: 0: 168.4. Samples: 342640. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) [2024-09-01 06:58:52,049][00307] Avg episode reward: [(0, '8.154')] [2024-09-01 06:58:54,640][04801] Saving new best policy, reward=8.018! [2024-09-01 06:58:54,772][04801] Saving new best policy, reward=8.154! [2024-09-01 06:58:57,046][00307] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 847.0). Total num frames: 1368064. Throughput: 0: 174.9. Samples: 344218. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) [2024-09-01 06:58:57,052][00307] Avg episode reward: [(0, '8.018')] [2024-09-01 06:59:02,046][00307] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 847.0). Total num frames: 1372160. Throughput: 0: 173.7. Samples: 344720. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) [2024-09-01 06:59:02,055][00307] Avg episode reward: [(0, '7.931')] [2024-09-01 06:59:07,046][00307] Fps is (10 sec: 819.2, 60 sec: 682.7, 300 sec: 847.0). Total num frames: 1376256. Throughput: 0: 168.0. Samples: 345880. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) [2024-09-01 06:59:07,054][00307] Avg episode reward: [(0, '7.894')] [2024-09-01 06:59:12,046][00307] Fps is (10 sec: 819.2, 60 sec: 682.7, 300 sec: 847.0). Total num frames: 1380352. Throughput: 0: 171.4. Samples: 347372. Policy #0 lag: (min: 1.0, avg: 1.2, max: 2.0) [2024-09-01 06:59:12,057][00307] Avg episode reward: [(0, '7.882')] [2024-09-01 06:59:17,046][00307] Fps is (10 sec: 819.2, 60 sec: 682.7, 300 sec: 847.0). Total num frames: 1384448. Throughput: 0: 170.8. Samples: 347914. Policy #0 lag: (min: 1.0, avg: 1.2, max: 2.0) [2024-09-01 06:59:17,056][00307] Avg episode reward: [(0, '7.925')] [2024-09-01 06:59:21,796][04814] Updated weights for policy 0, policy_version 340 (0.1224) [2024-09-01 06:59:22,047][00307] Fps is (10 sec: 1228.8, 60 sec: 750.9, 300 sec: 860.9). Total num frames: 1392640. Throughput: 0: 177.9. Samples: 349380. Policy #0 lag: (min: 1.0, avg: 1.2, max: 2.0) [2024-09-01 06:59:22,056][00307] Avg episode reward: [(0, '7.947')] [2024-09-01 06:59:27,046][00307] Fps is (10 sec: 819.2, 60 sec: 682.7, 300 sec: 833.1). Total num frames: 1392640. Throughput: 0: 184.8. Samples: 350586. Policy #0 lag: (min: 1.0, avg: 1.2, max: 2.0) [2024-09-01 06:59:27,054][00307] Avg episode reward: [(0, '8.102')] [2024-09-01 06:59:32,046][00307] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 847.0). Total num frames: 1400832. Throughput: 0: 202.8. Samples: 351400. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 06:59:32,049][00307] Avg episode reward: [(0, '8.073')] [2024-09-01 06:59:37,046][00307] Fps is (10 sec: 1228.8, 60 sec: 750.9, 300 sec: 860.9). Total num frames: 1404928. Throughput: 0: 221.1. Samples: 352590. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 06:59:37,055][00307] Avg episode reward: [(0, '7.779')] [2024-09-01 06:59:42,046][00307] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 847.0). Total num frames: 1409024. Throughput: 0: 210.1. Samples: 353672. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 06:59:42,054][00307] Avg episode reward: [(0, '7.939')] [2024-09-01 06:59:47,046][00307] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 847.0). Total num frames: 1413120. Throughput: 0: 218.6. Samples: 354556. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) [2024-09-01 06:59:47,049][00307] Avg episode reward: [(0, '8.129')] [2024-09-01 06:59:52,046][00307] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 847.0). Total num frames: 1417216. Throughput: 0: 226.9. Samples: 356090. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) [2024-09-01 06:59:52,048][00307] Avg episode reward: [(0, '8.087')] [2024-09-01 06:59:57,047][00307] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 847.0). Total num frames: 1421312. Throughput: 0: 217.2. Samples: 357146. Policy #0 lag: (min: 1.0, avg: 1.2, max: 2.0) [2024-09-01 06:59:57,056][00307] Avg episode reward: [(0, '8.373')] [2024-09-01 07:00:00,005][04801] Saving new best policy, reward=8.373! [2024-09-01 07:00:02,046][00307] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 847.0). Total num frames: 1425408. Throughput: 0: 219.9. Samples: 357810. Policy #0 lag: (min: 1.0, avg: 1.2, max: 2.0) [2024-09-01 07:00:02,050][00307] Avg episode reward: [(0, '8.411')] [2024-09-01 07:00:03,775][04801] Saving new best policy, reward=8.411! [2024-09-01 07:00:07,046][00307] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 847.0). Total num frames: 1429504. Throughput: 0: 219.2. Samples: 359246. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) [2024-09-01 07:00:07,048][00307] Avg episode reward: [(0, '8.428')] [2024-09-01 07:00:08,161][04801] Saving new best policy, reward=8.428! [2024-09-01 07:00:08,167][04814] Updated weights for policy 0, policy_version 350 (0.0702) [2024-09-01 07:00:12,046][00307] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 847.0). Total num frames: 1433600. Throughput: 0: 221.3. Samples: 360544. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) [2024-09-01 07:00:12,050][00307] Avg episode reward: [(0, '8.193')] [2024-09-01 07:00:17,046][00307] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 847.0). Total num frames: 1437696. Throughput: 0: 208.9. Samples: 360800. Policy #0 lag: (min: 1.0, avg: 1.2, max: 2.0) [2024-09-01 07:00:17,050][00307] Avg episode reward: [(0, '8.293')] [2024-09-01 07:00:18,400][04801] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000352_1441792.pth... [2024-09-01 07:00:18,503][04801] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000302_1236992.pth [2024-09-01 07:00:21,483][04801] Signal inference workers to stop experience collection... (350 times) [2024-09-01 07:00:21,510][04814] InferenceWorker_p0-w0: stopping experience collection (350 times) [2024-09-01 07:00:22,046][00307] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 847.0). Total num frames: 1441792. Throughput: 0: 221.4. Samples: 362554. Policy #0 lag: (min: 1.0, avg: 1.2, max: 2.0) [2024-09-01 07:00:22,051][00307] Avg episode reward: [(0, '8.300')] [2024-09-01 07:00:22,571][04801] Signal inference workers to resume experience collection... (350 times) [2024-09-01 07:00:22,572][04814] InferenceWorker_p0-w0: resuming experience collection (350 times) [2024-09-01 07:00:27,046][00307] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 847.0). Total num frames: 1445888. Throughput: 0: 225.2. Samples: 363806. Policy #0 lag: (min: 1.0, avg: 1.1, max: 2.0) [2024-09-01 07:00:27,054][00307] Avg episode reward: [(0, '8.277')] [2024-09-01 07:00:32,046][00307] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 847.0). Total num frames: 1449984. Throughput: 0: 215.8. Samples: 364268. Policy #0 lag: (min: 1.0, avg: 1.1, max: 2.0) [2024-09-01 07:00:32,049][00307] Avg episode reward: [(0, '8.523')] [2024-09-01 07:00:37,047][00307] Fps is (10 sec: 819.1, 60 sec: 819.2, 300 sec: 833.1). Total num frames: 1454080. Throughput: 0: 212.9. Samples: 365672. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) [2024-09-01 07:00:37,059][00307] Avg episode reward: [(0, '8.631')] [2024-09-01 07:00:37,099][04801] Saving new best policy, reward=8.523! [2024-09-01 07:00:41,508][04801] Saving new best policy, reward=8.631! [2024-09-01 07:00:42,046][00307] Fps is (10 sec: 1228.8, 60 sec: 887.5, 300 sec: 860.9). Total num frames: 1462272. Throughput: 0: 221.9. Samples: 367130. Policy #0 lag: (min: 1.0, avg: 1.1, max: 2.0) [2024-09-01 07:00:42,054][00307] Avg episode reward: [(0, '8.381')] [2024-09-01 07:00:47,046][00307] Fps is (10 sec: 819.3, 60 sec: 819.2, 300 sec: 833.1). Total num frames: 1462272. Throughput: 0: 218.3. Samples: 367632. Policy #0 lag: (min: 1.0, avg: 1.1, max: 2.0) [2024-09-01 07:00:47,050][00307] Avg episode reward: [(0, '8.580')] [2024-09-01 07:00:52,046][00307] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 847.0). Total num frames: 1470464. Throughput: 0: 211.8. Samples: 368776. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 07:00:52,049][00307] Avg episode reward: [(0, '8.550')] [2024-09-01 07:00:56,321][04814] Updated weights for policy 0, policy_version 360 (0.0549) [2024-09-01 07:00:57,046][00307] Fps is (10 sec: 1228.8, 60 sec: 887.5, 300 sec: 847.0). Total num frames: 1474560. Throughput: 0: 212.8. Samples: 370122. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 07:00:57,049][00307] Avg episode reward: [(0, '8.633')] [2024-09-01 07:01:01,331][04801] Saving new best policy, reward=8.633! [2024-09-01 07:01:02,046][00307] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 847.0). Total num frames: 1478656. Throughput: 0: 224.4. Samples: 370896. Policy #0 lag: (min: 1.0, avg: 1.1, max: 2.0) [2024-09-01 07:01:02,051][00307] Avg episode reward: [(0, '8.621')] [2024-09-01 07:01:07,046][00307] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 847.0). Total num frames: 1482752. Throughput: 0: 207.6. Samples: 371894. Policy #0 lag: (min: 1.0, avg: 1.1, max: 2.0) [2024-09-01 07:01:07,057][00307] Avg episode reward: [(0, '8.660')] [2024-09-01 07:01:10,674][04801] Saving new best policy, reward=8.660! [2024-09-01 07:01:12,046][00307] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 847.0). Total num frames: 1486848. Throughput: 0: 212.4. Samples: 373364. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 07:01:12,048][00307] Avg episode reward: [(0, '8.685')] [2024-09-01 07:01:14,812][04801] Saving new best policy, reward=8.685! [2024-09-01 07:01:17,049][00307] Fps is (10 sec: 819.0, 60 sec: 887.4, 300 sec: 847.0). Total num frames: 1490944. Throughput: 0: 216.7. Samples: 374020. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 07:01:17,052][00307] Avg episode reward: [(0, '8.791')] [2024-09-01 07:01:21,392][04801] Saving new best policy, reward=8.791! [2024-09-01 07:01:22,048][00307] Fps is (10 sec: 819.0, 60 sec: 887.4, 300 sec: 847.0). Total num frames: 1495040. Throughput: 0: 212.7. Samples: 375244. Policy #0 lag: (min: 1.0, avg: 1.2, max: 2.0) [2024-09-01 07:01:22,054][00307] Avg episode reward: [(0, '9.080')] [2024-09-01 07:01:25,429][04801] Saving new best policy, reward=9.080! [2024-09-01 07:01:27,046][00307] Fps is (10 sec: 819.4, 60 sec: 887.5, 300 sec: 847.0). Total num frames: 1499136. Throughput: 0: 207.5. Samples: 376466. Policy #0 lag: (min: 1.0, avg: 1.2, max: 2.0) [2024-09-01 07:01:27,049][00307] Avg episode reward: [(0, '9.598')] [2024-09-01 07:01:29,870][04801] Saving new best policy, reward=9.598! [2024-09-01 07:01:32,046][00307] Fps is (10 sec: 819.4, 60 sec: 887.5, 300 sec: 847.0). Total num frames: 1503232. Throughput: 0: 213.2. Samples: 377224. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 07:01:32,049][00307] Avg episode reward: [(0, '9.573')] [2024-09-01 07:01:37,046][00307] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 847.0). Total num frames: 1507328. Throughput: 0: 211.9. Samples: 378312. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 07:01:37,053][00307] Avg episode reward: [(0, '9.372')] [2024-09-01 07:01:42,046][00307] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 847.0). Total num frames: 1511424. Throughput: 0: 211.8. Samples: 379654. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 07:01:42,050][00307] Avg episode reward: [(0, '9.218')] [2024-09-01 07:01:45,015][04814] Updated weights for policy 0, policy_version 370 (0.1080) [2024-09-01 07:01:47,046][00307] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 847.0). Total num frames: 1515520. Throughput: 0: 208.5. Samples: 380280. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 07:01:47,051][00307] Avg episode reward: [(0, '9.132')] [2024-09-01 07:01:52,046][00307] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 847.0). Total num frames: 1519616. Throughput: 0: 220.1. Samples: 381800. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) [2024-09-01 07:01:52,053][00307] Avg episode reward: [(0, '9.043')] [2024-09-01 07:01:57,046][00307] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 833.1). Total num frames: 1523712. Throughput: 0: 211.6. Samples: 382884. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) [2024-09-01 07:01:57,055][00307] Avg episode reward: [(0, '9.055')] [2024-09-01 07:02:02,047][00307] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 833.1). Total num frames: 1527808. Throughput: 0: 211.6. Samples: 383542. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) [2024-09-01 07:02:02,050][00307] Avg episode reward: [(0, '9.068')] [2024-09-01 07:02:07,046][00307] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 833.1). Total num frames: 1531904. Throughput: 0: 222.9. Samples: 385274. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) [2024-09-01 07:02:07,052][00307] Avg episode reward: [(0, '9.133')] [2024-09-01 07:02:12,047][00307] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 833.1). Total num frames: 1536000. Throughput: 0: 210.8. Samples: 385952. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) [2024-09-01 07:02:12,051][00307] Avg episode reward: [(0, '9.096')] [2024-09-01 07:02:12,061][00307] Components not started: RolloutWorker_w1, RolloutWorker_w5, RolloutWorker_w6, wait_time=1800.0 seconds [2024-09-01 07:02:12,069][00307] Components take too long to start: RolloutWorker_w1, RolloutWorker_w5, RolloutWorker_w6. Aborting the experiment! [2024-09-01 07:02:12,075][04801] Stopping Batcher_0... [2024-09-01 07:02:12,075][04801] Loop batcher_evt_loop terminating... [2024-09-01 07:02:12,078][00307] Component Batcher_0 stopped! [2024-09-01 07:02:12,087][00307] Component RolloutWorker_w1 process died already! Don't wait for it. [2024-09-01 07:02:12,092][00307] Component RolloutWorker_w5 process died already! Don't wait for it. [2024-09-01 07:02:12,100][00307] Component RolloutWorker_w6 process died already! Don't wait for it. [2024-09-01 07:02:12,106][00307] Waiting for ['LearnerWorker_p0', 'InferenceWorker_p0-w0', 'RolloutWorker_w0', 'RolloutWorker_w2', 'RolloutWorker_w3', 'RolloutWorker_w4', 'RolloutWorker_w7'] to stop... [2024-09-01 07:02:12,264][04814] Weights refcount: 2 0 [2024-09-01 07:02:12,274][00307] Component InferenceWorker_p0-w0 stopped! [2024-09-01 07:02:12,281][00307] Waiting for ['LearnerWorker_p0', 'RolloutWorker_w0', 'RolloutWorker_w2', 'RolloutWorker_w3', 'RolloutWorker_w4', 'RolloutWorker_w7'] to stop... [2024-09-01 07:02:12,302][04814] Stopping InferenceWorker_p0-w0... [2024-09-01 07:02:12,308][04814] Loop inference_proc0-0_evt_loop terminating... [2024-09-01 07:02:12,691][04822] Stopping RolloutWorker_w7... [2024-09-01 07:02:12,690][00307] Component RolloutWorker_w7 stopped! [2024-09-01 07:02:12,694][00307] Waiting for ['LearnerWorker_p0', 'RolloutWorker_w0', 'RolloutWorker_w2', 'RolloutWorker_w3', 'RolloutWorker_w4'] to stop... [2024-09-01 07:02:12,692][04822] Loop rollout_proc7_evt_loop terminating... [2024-09-01 07:02:12,746][00307] Component RolloutWorker_w2 stopped! [2024-09-01 07:02:12,749][00307] Waiting for ['LearnerWorker_p0', 'RolloutWorker_w0', 'RolloutWorker_w3', 'RolloutWorker_w4'] to stop... [2024-09-01 07:02:12,748][04818] Stopping RolloutWorker_w3... [2024-09-01 07:02:12,754][04818] Loop rollout_proc3_evt_loop terminating... [2024-09-01 07:02:12,759][04816] Stopping RolloutWorker_w2... [2024-09-01 07:02:12,760][04816] Loop rollout_proc2_evt_loop terminating... [2024-09-01 07:02:12,752][00307] Component RolloutWorker_w3 stopped! [2024-09-01 07:02:12,763][00307] Waiting for ['LearnerWorker_p0', 'RolloutWorker_w0', 'RolloutWorker_w4'] to stop... [2024-09-01 07:02:12,783][00307] Component RolloutWorker_w4 stopped! [2024-09-01 07:02:12,790][04819] Stopping RolloutWorker_w4... [2024-09-01 07:02:12,791][04819] Loop rollout_proc4_evt_loop terminating... [2024-09-01 07:02:12,785][00307] Waiting for ['LearnerWorker_p0', 'RolloutWorker_w0'] to stop... [2024-09-01 07:02:12,828][00307] Component RolloutWorker_w0 stopped! [2024-09-01 07:02:12,831][00307] Waiting for ['LearnerWorker_p0'] to stop... [2024-09-01 07:02:12,837][04815] Stopping RolloutWorker_w0... [2024-09-01 07:02:12,838][04815] Loop rollout_proc0_evt_loop terminating... [2024-09-01 07:02:13,369][04801] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000376_1540096.pth... [2024-09-01 07:02:13,491][04801] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000328_1343488.pth [2024-09-01 07:02:13,508][04801] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000376_1540096.pth... [2024-09-01 07:02:13,686][04801] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000376_1540096.pth... [2024-09-01 07:02:13,857][04801] Stopping LearnerWorker_p0... [2024-09-01 07:02:13,857][04801] Loop learner_proc0_evt_loop terminating... [2024-09-01 07:02:13,859][00307] Component LearnerWorker_p0 stopped! [2024-09-01 07:02:13,867][00307] Waiting for process learner_proc0 to stop... [2024-09-01 07:02:14,692][00307] Waiting for process inference_proc0-0 to join... [2024-09-01 07:02:14,704][00307] Waiting for process rollout_proc0 to join... [2024-09-01 07:02:15,571][00307] Waiting for process rollout_proc1 to join... [2024-09-01 07:02:15,574][00307] Waiting for process rollout_proc2 to join... [2024-09-01 07:02:15,578][00307] Waiting for process rollout_proc3 to join... [2024-09-01 07:02:15,587][00307] Waiting for process rollout_proc4 to join... [2024-09-01 07:02:15,592][00307] Waiting for process rollout_proc5 to join... [2024-09-01 07:02:15,594][00307] Waiting for process rollout_proc6 to join... [2024-09-01 07:02:15,598][00307] Waiting for process rollout_proc7 to join... [2024-09-01 07:02:15,601][00307] Batcher 0 profile tree view: batching: 7.2864, releasing_batches: 0.1124 [2024-09-01 07:02:15,607][00307] InferenceWorker_p0-w0 profile tree view: wait_policy: 0.0001 wait_policy_total: 68.0720 update_model: 45.9865 weight_update: 0.1255 one_step: 0.0551 handle_policy_step: 1285.7323 deserialize: 18.0612, stack: 3.3590, obs_to_device_normalize: 133.9782, forward: 1044.8374, send_messages: 26.8132 prepare_outputs: 30.6331 to_cpu: 3.5112 [2024-09-01 07:02:15,609][00307] Learner 0 profile tree view: misc: 0.0023, prepare_batch: 444.6382 train: 1309.7648 epoch_init: 0.0050, minibatch_init: 0.0040, losses_postprocess: 0.0519, kl_divergence: 0.2532, after_optimizer: 0.9695 calculate_losses: 543.4052 losses_init: 0.0017, forward_head: 475.6739, bptt_initial: 1.9416, tail: 1.2634, advantages_returns: 0.1093, losses: 0.6511 bptt: 63.5370 bptt_forward_core: 63.1447 update: 764.8126 clip: 1.5666 [2024-09-01 07:02:15,612][00307] RolloutWorker_w0 profile tree view: wait_for_trajectories: 0.3756, enqueue_policy_requests: 43.7998, env_step: 958.1747, overhead: 21.8893, complete_rollouts: 9.7052 save_policy_outputs: 37.9437 split_output_tensors: 12.8258 [2024-09-01 07:02:15,614][00307] RolloutWorker_w7 profile tree view: wait_for_trajectories: 0.4284, enqueue_policy_requests: 54.3901, env_step: 923.3749, overhead: 19.9329, complete_rollouts: 7.7933 save_policy_outputs: 34.3538 split_output_tensors: 11.3254 [2024-09-01 07:02:15,617][00307] Loop Runner_EvtLoop terminating... [2024-09-01 07:02:15,620][00307] Runner profile tree view: main_loop: 1800.7612 [2024-09-01 07:02:15,623][00307] Collected {0: 1540096}, FPS: 855.2 [2024-09-01 07:02:15,692][00307] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json [2024-09-01 07:02:15,695][00307] Overriding arg 'num_workers' with value 1 passed from command line [2024-09-01 07:02:15,697][00307] Adding new argument 'no_render'=True that is not in the saved config file! [2024-09-01 07:02:15,700][00307] Adding new argument 'save_video'=True that is not in the saved config file! [2024-09-01 07:02:15,702][00307] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! [2024-09-01 07:02:15,704][00307] Adding new argument 'video_name'=None that is not in the saved config file! [2024-09-01 07:02:15,706][00307] Adding new argument 'max_num_frames'=1000000000.0 that is not in the saved config file! [2024-09-01 07:02:15,707][00307] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! [2024-09-01 07:02:15,708][00307] Adding new argument 'push_to_hub'=False that is not in the saved config file! [2024-09-01 07:02:15,709][00307] Adding new argument 'hf_repository'=None that is not in the saved config file! [2024-09-01 07:02:15,711][00307] Adding new argument 'policy_index'=0 that is not in the saved config file! [2024-09-01 07:02:15,712][00307] Adding new argument 'eval_deterministic'=False that is not in the saved config file! [2024-09-01 07:02:15,713][00307] Adding new argument 'train_script'=None that is not in the saved config file! [2024-09-01 07:02:15,715][00307] Adding new argument 'enjoy_script'=None that is not in the saved config file! [2024-09-01 07:02:15,716][00307] Using frameskip 1 and render_action_repeat=4 for evaluation [2024-09-01 07:02:15,754][00307] Doom resolution: 160x120, resize resolution: (128, 72) [2024-09-01 07:02:15,757][00307] RunningMeanStd input shape: (3, 72, 128) [2024-09-01 07:02:15,762][00307] RunningMeanStd input shape: (1,) [2024-09-01 07:02:15,796][00307] ConvEncoder: input_channels=3 [2024-09-01 07:02:15,986][00307] Conv encoder output size: 512 [2024-09-01 07:02:15,988][00307] Policy head output size: 512 [2024-09-01 07:02:16,014][00307] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000376_1540096.pth... [2024-09-01 07:02:16,694][00307] Num frames 100... [2024-09-01 07:02:16,886][00307] Num frames 200... [2024-09-01 07:02:17,098][00307] Num frames 300... [2024-09-01 07:02:17,325][00307] Num frames 400... [2024-09-01 07:02:17,482][00307] Avg episode rewards: #0: 7.480, true rewards: #0: 4.480 [2024-09-01 07:02:17,487][00307] Avg episode reward: 7.480, avg true_objective: 4.480 [2024-09-01 07:02:17,596][00307] Num frames 500... [2024-09-01 07:02:17,790][00307] Num frames 600... [2024-09-01 07:02:17,979][00307] Num frames 700... [2024-09-01 07:02:18,194][00307] Num frames 800... [2024-09-01 07:02:18,414][00307] Num frames 900... [2024-09-01 07:02:18,610][00307] Num frames 1000... [2024-09-01 07:02:18,809][00307] Num frames 1100... [2024-09-01 07:02:18,998][00307] Num frames 1200... [2024-09-01 07:02:19,090][00307] Avg episode rewards: #0: 10.580, true rewards: #0: 6.080 [2024-09-01 07:02:19,092][00307] Avg episode reward: 10.580, avg true_objective: 6.080 [2024-09-01 07:02:19,251][00307] Num frames 1300... [2024-09-01 07:02:19,453][00307] Num frames 1400... [2024-09-01 07:02:19,646][00307] Num frames 1500... [2024-09-01 07:02:19,843][00307] Num frames 1600... [2024-09-01 07:02:20,039][00307] Num frames 1700... [2024-09-01 07:02:20,238][00307] Num frames 1800... [2024-09-01 07:02:20,434][00307] Num frames 1900... [2024-09-01 07:02:20,627][00307] Num frames 2000... [2024-09-01 07:02:20,829][00307] Num frames 2100... [2024-09-01 07:02:21,030][00307] Num frames 2200... [2024-09-01 07:02:21,239][00307] Num frames 2300... [2024-09-01 07:02:21,430][00307] Avg episode rewards: #0: 14.227, true rewards: #0: 7.893 [2024-09-01 07:02:21,432][00307] Avg episode reward: 14.227, avg true_objective: 7.893 [2024-09-01 07:02:21,502][00307] Num frames 2400... [2024-09-01 07:02:21,693][00307] Num frames 2500... [2024-09-01 07:02:21,892][00307] Num frames 2600... [2024-09-01 07:02:22,088][00307] Num frames 2700... [2024-09-01 07:02:22,180][00307] Avg episode rewards: #0: 11.788, true rewards: #0: 6.787 [2024-09-01 07:02:22,182][00307] Avg episode reward: 11.788, avg true_objective: 6.787 [2024-09-01 07:02:22,350][00307] Num frames 2800... [2024-09-01 07:02:22,569][00307] Num frames 2900... [2024-09-01 07:02:22,845][00307] Num frames 3000... [2024-09-01 07:02:22,919][00307] Avg episode rewards: #0: 10.406, true rewards: #0: 6.006 [2024-09-01 07:02:22,922][00307] Avg episode reward: 10.406, avg true_objective: 6.006 [2024-09-01 07:02:23,197][00307] Num frames 3100... [2024-09-01 07:02:23,478][00307] Num frames 3200... [2024-09-01 07:02:23,744][00307] Num frames 3300... [2024-09-01 07:02:24,006][00307] Num frames 3400... [2024-09-01 07:02:24,298][00307] Num frames 3500... [2024-09-01 07:02:24,567][00307] Num frames 3600... [2024-09-01 07:02:24,838][00307] Num frames 3700... [2024-09-01 07:02:25,114][00307] Num frames 3800... [2024-09-01 07:02:25,403][00307] Num frames 3900... [2024-09-01 07:02:25,696][00307] Num frames 4000... [2024-09-01 07:02:25,904][00307] Num frames 4100... [2024-09-01 07:02:26,107][00307] Num frames 4200... [2024-09-01 07:02:26,304][00307] Num frames 4300... [2024-09-01 07:02:26,391][00307] Avg episode rewards: #0: 13.025, true rewards: #0: 7.192 [2024-09-01 07:02:26,393][00307] Avg episode reward: 13.025, avg true_objective: 7.192 [2024-09-01 07:02:26,557][00307] Num frames 4400... [2024-09-01 07:02:26,751][00307] Num frames 4500... [2024-09-01 07:02:26,939][00307] Num frames 4600... [2024-09-01 07:02:27,144][00307] Num frames 4700... [2024-09-01 07:02:27,346][00307] Num frames 4800... [2024-09-01 07:02:27,599][00307] Avg episode rewards: #0: 12.559, true rewards: #0: 6.987 [2024-09-01 07:02:27,602][00307] Avg episode reward: 12.559, avg true_objective: 6.987 [2024-09-01 07:02:27,624][00307] Num frames 4900... [2024-09-01 07:02:27,810][00307] Num frames 5000... [2024-09-01 07:02:28,004][00307] Num frames 5100... [2024-09-01 07:02:28,197][00307] Num frames 5200... [2024-09-01 07:02:28,405][00307] Num frames 5300... [2024-09-01 07:02:28,607][00307] Num frames 5400... [2024-09-01 07:02:28,670][00307] Avg episode rewards: #0: 12.001, true rewards: #0: 6.751 [2024-09-01 07:02:28,673][00307] Avg episode reward: 12.001, avg true_objective: 6.751 [2024-09-01 07:02:28,894][00307] Num frames 5500... [2024-09-01 07:02:29,085][00307] Num frames 5600... [2024-09-01 07:02:29,277][00307] Num frames 5700... [2024-09-01 07:02:29,481][00307] Num frames 5800... [2024-09-01 07:02:29,674][00307] Num frames 5900... [2024-09-01 07:02:29,878][00307] Avg episode rewards: #0: 11.755, true rewards: #0: 6.643 [2024-09-01 07:02:29,881][00307] Avg episode reward: 11.755, avg true_objective: 6.643 [2024-09-01 07:02:29,924][00307] Num frames 6000... [2024-09-01 07:02:30,122][00307] Num frames 6100... [2024-09-01 07:02:30,319][00307] Num frames 6200... [2024-09-01 07:02:30,535][00307] Num frames 6300... [2024-09-01 07:02:30,713][00307] Avg episode rewards: #0: 10.963, true rewards: #0: 6.363 [2024-09-01 07:02:30,715][00307] Avg episode reward: 10.963, avg true_objective: 6.363 [2024-09-01 07:03:14,376][00307] Replay video saved to /content/train_dir/default_experiment/replay.mp4! [2024-09-01 07:03:14,432][00307] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json [2024-09-01 07:03:14,434][00307] Overriding arg 'num_workers' with value 1 passed from command line [2024-09-01 07:03:14,437][00307] Adding new argument 'no_render'=True that is not in the saved config file! [2024-09-01 07:03:14,440][00307] Adding new argument 'save_video'=True that is not in the saved config file! [2024-09-01 07:03:14,444][00307] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! [2024-09-01 07:03:14,447][00307] Adding new argument 'video_name'=None that is not in the saved config file! [2024-09-01 07:03:14,449][00307] Adding new argument 'max_num_frames'=100000 that is not in the saved config file! [2024-09-01 07:03:14,453][00307] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! [2024-09-01 07:03:14,454][00307] Adding new argument 'push_to_hub'=True that is not in the saved config file! [2024-09-01 07:03:14,456][00307] Adding new argument 'hf_repository'='jarski/rl_course_vizdoom_health_gathering_supreme' that is not in the saved config file! [2024-09-01 07:03:14,457][00307] Adding new argument 'policy_index'=0 that is not in the saved config file! [2024-09-01 07:03:14,462][00307] Adding new argument 'eval_deterministic'=False that is not in the saved config file! [2024-09-01 07:03:14,464][00307] Adding new argument 'train_script'=None that is not in the saved config file! [2024-09-01 07:03:14,466][00307] Adding new argument 'enjoy_script'=None that is not in the saved config file! [2024-09-01 07:03:14,470][00307] Using frameskip 1 and render_action_repeat=4 for evaluation [2024-09-01 07:03:14,484][00307] RunningMeanStd input shape: (3, 72, 128) [2024-09-01 07:03:14,487][00307] RunningMeanStd input shape: (1,) [2024-09-01 07:03:14,505][00307] ConvEncoder: input_channels=3 [2024-09-01 07:03:14,566][00307] Conv encoder output size: 512 [2024-09-01 07:03:14,568][00307] Policy head output size: 512 [2024-09-01 07:03:14,596][00307] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000376_1540096.pth... [2024-09-01 07:03:15,146][00307] Num frames 100... [2024-09-01 07:03:15,357][00307] Num frames 200... [2024-09-01 07:03:15,546][00307] Num frames 300... [2024-09-01 07:03:15,759][00307] Avg episode rewards: #0: 3.840, true rewards: #0: 3.840 [2024-09-01 07:03:15,761][00307] Avg episode reward: 3.840, avg true_objective: 3.840 [2024-09-01 07:03:15,797][00307] Num frames 400... [2024-09-01 07:03:15,991][00307] Num frames 500... [2024-09-01 07:03:16,182][00307] Num frames 600... [2024-09-01 07:03:16,333][00307] Avg episode rewards: #0: 3.200, true rewards: #0: 3.200 [2024-09-01 07:03:16,336][00307] Avg episode reward: 3.200, avg true_objective: 3.200 [2024-09-01 07:03:16,466][00307] Num frames 700... [2024-09-01 07:03:16,732][00307] Num frames 800... [2024-09-01 07:03:16,994][00307] Num frames 900... [2024-09-01 07:03:17,242][00307] Num frames 1000... [2024-09-01 07:03:17,510][00307] Num frames 1100... [2024-09-01 07:03:17,777][00307] Avg episode rewards: #0: 4.613, true rewards: #0: 3.947 [2024-09-01 07:03:17,782][00307] Avg episode reward: 4.613, avg true_objective: 3.947 [2024-09-01 07:03:17,830][00307] Num frames 1200... [2024-09-01 07:03:18,086][00307] Num frames 1300... [2024-09-01 07:03:18,360][00307] Num frames 1400... [2024-09-01 07:03:18,624][00307] Num frames 1500... [2024-09-01 07:03:18,890][00307] Num frames 1600... [2024-09-01 07:03:19,161][00307] Num frames 1700... [2024-09-01 07:03:19,457][00307] Num frames 1800... [2024-09-01 07:03:19,703][00307] Num frames 1900... [2024-09-01 07:03:19,800][00307] Avg episode rewards: #0: 7.050, true rewards: #0: 4.800 [2024-09-01 07:03:19,802][00307] Avg episode reward: 7.050, avg true_objective: 4.800 [2024-09-01 07:03:19,954][00307] Num frames 2000... [2024-09-01 07:03:20,140][00307] Num frames 2100... [2024-09-01 07:03:20,322][00307] Num frames 2200... [2024-09-01 07:03:20,525][00307] Num frames 2300... [2024-09-01 07:03:20,716][00307] Num frames 2400... [2024-09-01 07:03:20,904][00307] Num frames 2500... [2024-09-01 07:03:21,097][00307] Num frames 2600... [2024-09-01 07:03:21,287][00307] Num frames 2700... [2024-09-01 07:03:21,507][00307] Avg episode rewards: #0: 9.168, true rewards: #0: 5.568 [2024-09-01 07:03:21,510][00307] Avg episode reward: 9.168, avg true_objective: 5.568 [2024-09-01 07:03:21,544][00307] Num frames 2800... [2024-09-01 07:03:21,729][00307] Num frames 2900... [2024-09-01 07:03:21,915][00307] Num frames 3000... [2024-09-01 07:03:22,098][00307] Num frames 3100... [2024-09-01 07:03:22,286][00307] Num frames 3200... [2024-09-01 07:03:22,479][00307] Num frames 3300... [2024-09-01 07:03:22,596][00307] Avg episode rewards: #0: 8.880, true rewards: #0: 5.547 [2024-09-01 07:03:22,599][00307] Avg episode reward: 8.880, avg true_objective: 5.547 [2024-09-01 07:03:22,735][00307] Num frames 3400... [2024-09-01 07:03:22,925][00307] Num frames 3500... [2024-09-01 07:03:23,109][00307] Num frames 3600... [2024-09-01 07:03:23,300][00307] Num frames 3700... [2024-09-01 07:03:23,464][00307] Avg episode rewards: #0: 8.366, true rewards: #0: 5.366 [2024-09-01 07:03:23,466][00307] Avg episode reward: 8.366, avg true_objective: 5.366 [2024-09-01 07:03:23,561][00307] Num frames 3800... [2024-09-01 07:03:23,741][00307] Num frames 3900... [2024-09-01 07:03:23,926][00307] Num frames 4000... [2024-09-01 07:03:24,118][00307] Num frames 4100... [2024-09-01 07:03:24,308][00307] Num frames 4200... [2024-09-01 07:03:24,494][00307] Num frames 4300... [2024-09-01 07:03:24,732][00307] Avg episode rewards: #0: 8.620, true rewards: #0: 5.495 [2024-09-01 07:03:24,735][00307] Avg episode reward: 8.620, avg true_objective: 5.495 [2024-09-01 07:03:24,745][00307] Num frames 4400... [2024-09-01 07:03:24,932][00307] Num frames 4500... [2024-09-01 07:03:25,112][00307] Num frames 4600... [2024-09-01 07:03:25,300][00307] Num frames 4700... [2024-09-01 07:03:25,486][00307] Num frames 4800... [2024-09-01 07:03:25,620][00307] Avg episode rewards: #0: 8.271, true rewards: #0: 5.382 [2024-09-01 07:03:25,623][00307] Avg episode reward: 8.271, avg true_objective: 5.382 [2024-09-01 07:03:25,726][00307] Num frames 4900... [2024-09-01 07:03:25,906][00307] Num frames 5000... [2024-09-01 07:03:26,087][00307] Num frames 5100... [2024-09-01 07:03:26,268][00307] Num frames 5200... [2024-09-01 07:03:26,463][00307] Num frames 5300... [2024-09-01 07:03:26,662][00307] Num frames 5400... [2024-09-01 07:03:26,847][00307] Num frames 5500... [2024-09-01 07:03:27,038][00307] Num frames 5600... [2024-09-01 07:03:27,214][00307] Num frames 5700... [2024-09-01 07:03:27,285][00307] Avg episode rewards: #0: 8.908, true rewards: #0: 5.708 [2024-09-01 07:03:27,288][00307] Avg episode reward: 8.908, avg true_objective: 5.708 [2024-09-01 07:04:05,352][00307] Replay video saved to /content/train_dir/default_experiment/replay.mp4!