diff --git "a/sf_log.txt" "b/sf_log.txt" new file mode 100644--- /dev/null +++ "b/sf_log.txt" @@ -0,0 +1,1275 @@ +[2024-12-06 21:35:23,518][00194] Saving configuration to /content/train_dir/default_experiment/config.json... +[2024-12-06 21:35:23,520][00194] Rollout worker 0 uses device cpu +[2024-12-06 21:35:23,521][00194] Rollout worker 1 uses device cpu +[2024-12-06 21:35:23,522][00194] Rollout worker 2 uses device cpu +[2024-12-06 21:35:23,524][00194] Rollout worker 3 uses device cpu +[2024-12-06 21:35:23,525][00194] Rollout worker 4 uses device cpu +[2024-12-06 21:35:23,526][00194] Rollout worker 5 uses device cpu +[2024-12-06 21:35:23,527][00194] Rollout worker 6 uses device cpu +[2024-12-06 21:35:23,529][00194] Rollout worker 7 uses device cpu +[2024-12-06 21:35:23,686][00194] Using GPUs [0] for process 0 (actually maps to GPUs [0]) +[2024-12-06 21:35:23,688][00194] InferenceWorker_p0-w0: min num requests: 2 +[2024-12-06 21:35:23,725][00194] Starting all processes... +[2024-12-06 21:35:23,726][00194] Starting process learner_proc0 +[2024-12-06 21:35:23,770][00194] Starting all processes... +[2024-12-06 21:35:23,779][00194] Starting process inference_proc0-0 +[2024-12-06 21:35:23,779][00194] Starting process rollout_proc0 +[2024-12-06 21:35:23,781][00194] Starting process rollout_proc1 +[2024-12-06 21:35:23,782][00194] Starting process rollout_proc2 +[2024-12-06 21:35:23,782][00194] Starting process rollout_proc3 +[2024-12-06 21:35:23,782][00194] Starting process rollout_proc4 +[2024-12-06 21:35:23,782][00194] Starting process rollout_proc5 +[2024-12-06 21:35:23,782][00194] Starting process rollout_proc6 +[2024-12-06 21:35:23,782][00194] Starting process rollout_proc7 +[2024-12-06 21:35:39,719][03835] Worker 0 uses CPU cores [0] +[2024-12-06 21:35:39,969][03836] Worker 1 uses CPU cores [1] +[2024-12-06 21:35:40,280][03840] Worker 4 uses CPU cores [0] +[2024-12-06 21:35:40,293][03838] Worker 3 uses CPU cores [1] +[2024-12-06 21:35:40,333][03821] Using GPUs [0] for process 0 (actually maps to GPUs [0]) +[2024-12-06 21:35:40,334][03821] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for learning process 0 +[2024-12-06 21:35:40,343][03834] Using GPUs [0] for process 0 (actually maps to GPUs [0]) +[2024-12-06 21:35:40,343][03834] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for inference process 0 +[2024-12-06 21:35:40,428][03821] Num visible devices: 1 +[2024-12-06 21:35:40,430][03834] Num visible devices: 1 +[2024-12-06 21:35:40,452][03821] Starting seed is not provided +[2024-12-06 21:35:40,452][03821] Using GPUs [0] for process 0 (actually maps to GPUs [0]) +[2024-12-06 21:35:40,453][03821] Initializing actor-critic model on device cuda:0 +[2024-12-06 21:35:40,455][03821] RunningMeanStd input shape: (3, 72, 128) +[2024-12-06 21:35:40,468][03821] RunningMeanStd input shape: (1,) +[2024-12-06 21:35:40,551][03821] ConvEncoder: input_channels=3 +[2024-12-06 21:35:40,758][03842] Worker 7 uses CPU cores [1] +[2024-12-06 21:35:40,784][03837] Worker 2 uses CPU cores [0] +[2024-12-06 21:35:40,851][03839] Worker 5 uses CPU cores [1] +[2024-12-06 21:35:40,908][03841] Worker 6 uses CPU cores [0] +[2024-12-06 21:35:41,269][03821] Conv encoder output size: 512 +[2024-12-06 21:35:41,270][03821] Policy head output size: 512 +[2024-12-06 21:35:41,351][03821] Created Actor Critic model with architecture: +[2024-12-06 21:35:41,352][03821] ActorCriticSharedWeights( + (obs_normalizer): ObservationNormalizer( + (running_mean_std): RunningMeanStdDictInPlace( + (running_mean_std): ModuleDict( + (obs): RunningMeanStdInPlace() + ) + ) + ) + (returns_normalizer): RecursiveScriptModule(original_name=RunningMeanStdInPlace) + (encoder): VizdoomEncoder( + (basic_encoder): ConvEncoder( + (enc): RecursiveScriptModule( + original_name=ConvEncoderImpl + (conv_head): RecursiveScriptModule( + original_name=Sequential + (0): RecursiveScriptModule(original_name=Conv2d) + (1): RecursiveScriptModule(original_name=ELU) + (2): RecursiveScriptModule(original_name=Conv2d) + (3): RecursiveScriptModule(original_name=ELU) + (4): RecursiveScriptModule(original_name=Conv2d) + (5): RecursiveScriptModule(original_name=ELU) + ) + (mlp_layers): RecursiveScriptModule( + original_name=Sequential + (0): RecursiveScriptModule(original_name=Linear) + (1): RecursiveScriptModule(original_name=ELU) + ) + ) + ) + ) + (core): ModelCoreRNN( + (core): GRU(512, 512) + ) + (decoder): MlpDecoder( + (mlp): Identity() + ) + (critic_linear): Linear(in_features=512, out_features=1, bias=True) + (action_parameterization): ActionParameterizationDefault( + (distribution_linear): Linear(in_features=512, out_features=5, bias=True) + ) +) +[2024-12-06 21:35:42,075][03821] Using optimizer +[2024-12-06 21:35:43,687][00194] Heartbeat connected on InferenceWorker_p0-w0 +[2024-12-06 21:35:43,697][00194] Heartbeat connected on RolloutWorker_w0 +[2024-12-06 21:35:43,701][00194] Heartbeat connected on RolloutWorker_w1 +[2024-12-06 21:35:43,710][00194] Heartbeat connected on RolloutWorker_w2 +[2024-12-06 21:35:43,719][00194] Heartbeat connected on RolloutWorker_w3 +[2024-12-06 21:35:43,721][00194] Heartbeat connected on RolloutWorker_w5 +[2024-12-06 21:35:43,730][00194] Heartbeat connected on RolloutWorker_w4 +[2024-12-06 21:35:43,731][00194] Heartbeat connected on RolloutWorker_w6 +[2024-12-06 21:35:43,736][00194] Heartbeat connected on RolloutWorker_w7 +[2024-12-06 21:35:43,767][00194] Heartbeat connected on Batcher_0 +[2024-12-06 21:35:47,367][03821] No checkpoints found +[2024-12-06 21:35:47,367][03821] Did not load from checkpoint, starting from scratch! +[2024-12-06 21:35:47,367][03821] Initialized policy 0 weights for model version 0 +[2024-12-06 21:35:47,371][03821] LearnerWorker_p0 finished initialization! +[2024-12-06 21:35:47,371][00194] Heartbeat connected on LearnerWorker_p0 +[2024-12-06 21:35:47,377][03821] Using GPUs [0] for process 0 (actually maps to GPUs [0]) +[2024-12-06 21:35:47,560][03834] RunningMeanStd input shape: (3, 72, 128) +[2024-12-06 21:35:47,562][03834] RunningMeanStd input shape: (1,) +[2024-12-06 21:35:47,573][03834] ConvEncoder: input_channels=3 +[2024-12-06 21:35:47,676][03834] Conv encoder output size: 512 +[2024-12-06 21:35:47,676][03834] Policy head output size: 512 +[2024-12-06 21:35:47,728][00194] Inference worker 0-0 is ready! +[2024-12-06 21:35:47,729][00194] All inference workers are ready! Signal rollout workers to start! +[2024-12-06 21:35:47,932][03836] Doom resolution: 160x120, resize resolution: (128, 72) +[2024-12-06 21:35:47,933][03839] Doom resolution: 160x120, resize resolution: (128, 72) +[2024-12-06 21:35:47,935][03838] Doom resolution: 160x120, resize resolution: (128, 72) +[2024-12-06 21:35:47,935][03842] Doom resolution: 160x120, resize resolution: (128, 72) +[2024-12-06 21:35:47,950][03841] Doom resolution: 160x120, resize resolution: (128, 72) +[2024-12-06 21:35:47,951][00194] Fps is (10 sec: nan, 60 sec: nan, 300 sec: nan). Total num frames: 0. Throughput: 0: nan. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-12-06 21:35:47,953][03840] Doom resolution: 160x120, resize resolution: (128, 72) +[2024-12-06 21:35:47,958][03837] Doom resolution: 160x120, resize resolution: (128, 72) +[2024-12-06 21:35:47,957][03835] Doom resolution: 160x120, resize resolution: (128, 72) +[2024-12-06 21:35:48,987][03841] Decorrelating experience for 0 frames... +[2024-12-06 21:35:48,989][03840] Decorrelating experience for 0 frames... +[2024-12-06 21:35:49,211][03836] Decorrelating experience for 0 frames... +[2024-12-06 21:35:49,220][03838] Decorrelating experience for 0 frames... +[2024-12-06 21:35:49,217][03842] Decorrelating experience for 0 frames... +[2024-12-06 21:35:49,968][03836] Decorrelating experience for 32 frames... +[2024-12-06 21:35:49,971][03842] Decorrelating experience for 32 frames... +[2024-12-06 21:35:49,984][03840] Decorrelating experience for 32 frames... +[2024-12-06 21:35:49,996][03841] Decorrelating experience for 32 frames... +[2024-12-06 21:35:50,379][03837] Decorrelating experience for 0 frames... +[2024-12-06 21:35:51,085][03840] Decorrelating experience for 64 frames... +[2024-12-06 21:35:51,103][03838] Decorrelating experience for 32 frames... +[2024-12-06 21:35:51,166][03837] Decorrelating experience for 32 frames... +[2024-12-06 21:35:51,408][03836] Decorrelating experience for 64 frames... +[2024-12-06 21:35:51,582][03839] Decorrelating experience for 0 frames... +[2024-12-06 21:35:52,007][03837] Decorrelating experience for 64 frames... +[2024-12-06 21:35:52,261][03838] Decorrelating experience for 64 frames... +[2024-12-06 21:35:52,319][03836] Decorrelating experience for 96 frames... +[2024-12-06 21:35:52,389][03840] Decorrelating experience for 96 frames... +[2024-12-06 21:35:52,639][03835] Decorrelating experience for 0 frames... +[2024-12-06 21:35:52,951][00194] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-12-06 21:35:53,416][03837] Decorrelating experience for 96 frames... +[2024-12-06 21:35:53,727][03835] Decorrelating experience for 32 frames... +[2024-12-06 21:35:53,739][03841] Decorrelating experience for 64 frames... +[2024-12-06 21:35:53,848][03838] Decorrelating experience for 96 frames... +[2024-12-06 21:35:53,931][03842] Decorrelating experience for 64 frames... +[2024-12-06 21:35:55,441][03839] Decorrelating experience for 32 frames... +[2024-12-06 21:35:55,634][03841] Decorrelating experience for 96 frames... +[2024-12-06 21:35:56,112][03835] Decorrelating experience for 64 frames... +[2024-12-06 21:35:56,636][03842] Decorrelating experience for 96 frames... +[2024-12-06 21:35:57,952][00194] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 95.4. Samples: 954. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-12-06 21:35:57,954][00194] Avg episode reward: [(0, '1.615')] +[2024-12-06 21:36:01,210][03835] Decorrelating experience for 96 frames... +[2024-12-06 21:36:01,466][03839] Decorrelating experience for 64 frames... +[2024-12-06 21:36:01,826][03821] Signal inference workers to stop experience collection... +[2024-12-06 21:36:01,836][03834] InferenceWorker_p0-w0: stopping experience collection +[2024-12-06 21:36:02,079][03839] Decorrelating experience for 96 frames... +[2024-12-06 21:36:02,951][00194] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 206.7. Samples: 3100. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-12-06 21:36:02,956][00194] Avg episode reward: [(0, '2.926')] +[2024-12-06 21:36:03,863][03821] Signal inference workers to resume experience collection... +[2024-12-06 21:36:03,864][03834] InferenceWorker_p0-w0: resuming experience collection +[2024-12-06 21:36:07,951][00194] Fps is (10 sec: 2867.4, 60 sec: 1433.6, 300 sec: 1433.6). Total num frames: 28672. Throughput: 0: 248.3. Samples: 4966. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2024-12-06 21:36:07,953][00194] Avg episode reward: [(0, '3.447')] +[2024-12-06 21:36:10,422][03834] Updated weights for policy 0, policy_version 10 (0.0029) +[2024-12-06 21:36:12,954][00194] Fps is (10 sec: 4504.3, 60 sec: 1802.0, 300 sec: 1802.0). Total num frames: 45056. Throughput: 0: 471.1. Samples: 11780. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-12-06 21:36:12,956][00194] Avg episode reward: [(0, '4.187')] +[2024-12-06 21:36:17,951][00194] Fps is (10 sec: 2867.2, 60 sec: 1911.5, 300 sec: 1911.5). Total num frames: 57344. Throughput: 0: 509.1. Samples: 15272. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-12-06 21:36:17,956][00194] Avg episode reward: [(0, '4.472')] +[2024-12-06 21:36:22,621][03834] Updated weights for policy 0, policy_version 20 (0.0020) +[2024-12-06 21:36:22,951][00194] Fps is (10 sec: 3687.5, 60 sec: 2340.6, 300 sec: 2340.6). Total num frames: 81920. Throughput: 0: 528.5. Samples: 18496. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2024-12-06 21:36:22,956][00194] Avg episode reward: [(0, '4.601')] +[2024-12-06 21:36:27,952][00194] Fps is (10 sec: 4914.8, 60 sec: 2662.4, 300 sec: 2662.4). Total num frames: 106496. Throughput: 0: 643.9. Samples: 25756. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2024-12-06 21:36:27,958][00194] Avg episode reward: [(0, '4.454')] +[2024-12-06 21:36:27,978][03821] Saving new best policy, reward=4.454! +[2024-12-06 21:36:32,847][03834] Updated weights for policy 0, policy_version 30 (0.0030) +[2024-12-06 21:36:32,951][00194] Fps is (10 sec: 4096.0, 60 sec: 2730.7, 300 sec: 2730.7). Total num frames: 122880. Throughput: 0: 690.7. Samples: 31082. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2024-12-06 21:36:32,958][00194] Avg episode reward: [(0, '4.543')] +[2024-12-06 21:36:32,959][03821] Saving new best policy, reward=4.543! +[2024-12-06 21:36:37,951][00194] Fps is (10 sec: 3277.1, 60 sec: 2785.3, 300 sec: 2785.3). Total num frames: 139264. Throughput: 0: 732.8. Samples: 32974. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2024-12-06 21:36:37,953][00194] Avg episode reward: [(0, '4.696')] +[2024-12-06 21:36:37,959][03821] Saving new best policy, reward=4.696! +[2024-12-06 21:36:42,520][03834] Updated weights for policy 0, policy_version 40 (0.0014) +[2024-12-06 21:36:42,951][00194] Fps is (10 sec: 4096.0, 60 sec: 2978.9, 300 sec: 2978.9). Total num frames: 163840. Throughput: 0: 873.9. Samples: 40278. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2024-12-06 21:36:42,953][00194] Avg episode reward: [(0, '4.418')] +[2024-12-06 21:36:47,956][00194] Fps is (10 sec: 4503.6, 60 sec: 3071.8, 300 sec: 3071.8). Total num frames: 184320. Throughput: 0: 962.3. Samples: 46410. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-12-06 21:36:47,960][00194] Avg episode reward: [(0, '4.357')] +[2024-12-06 21:36:52,951][00194] Fps is (10 sec: 3276.8, 60 sec: 3276.8, 300 sec: 3024.7). Total num frames: 196608. Throughput: 0: 969.2. Samples: 48578. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2024-12-06 21:36:52,961][00194] Avg episode reward: [(0, '4.487')] +[2024-12-06 21:36:53,891][03834] Updated weights for policy 0, policy_version 50 (0.0023) +[2024-12-06 21:36:57,952][00194] Fps is (10 sec: 3687.8, 60 sec: 3686.4, 300 sec: 3159.7). Total num frames: 221184. Throughput: 0: 962.3. Samples: 55082. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2024-12-06 21:36:57,961][00194] Avg episode reward: [(0, '4.384')] +[2024-12-06 21:37:02,169][03834] Updated weights for policy 0, policy_version 60 (0.0019) +[2024-12-06 21:37:02,951][00194] Fps is (10 sec: 4915.2, 60 sec: 4096.0, 300 sec: 3276.8). Total num frames: 245760. Throughput: 0: 1044.8. Samples: 62286. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) +[2024-12-06 21:37:02,960][00194] Avg episode reward: [(0, '4.222')] +[2024-12-06 21:37:07,951][00194] Fps is (10 sec: 4096.3, 60 sec: 3891.2, 300 sec: 3276.8). Total num frames: 262144. Throughput: 0: 1022.0. Samples: 64488. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2024-12-06 21:37:07,953][00194] Avg episode reward: [(0, '4.219')] +[2024-12-06 21:37:12,951][00194] Fps is (10 sec: 3686.4, 60 sec: 3959.7, 300 sec: 3325.0). Total num frames: 282624. Throughput: 0: 987.1. Samples: 70176. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-12-06 21:37:12,953][00194] Avg episode reward: [(0, '4.527')] +[2024-12-06 21:37:13,580][03834] Updated weights for policy 0, policy_version 70 (0.0023) +[2024-12-06 21:37:17,951][00194] Fps is (10 sec: 4505.6, 60 sec: 4164.3, 300 sec: 3413.3). Total num frames: 307200. Throughput: 0: 1028.6. Samples: 77370. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-12-06 21:37:17,953][00194] Avg episode reward: [(0, '4.852')] +[2024-12-06 21:37:17,960][03821] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000075_307200.pth... +[2024-12-06 21:37:18,097][03821] Saving new best policy, reward=4.852! +[2024-12-06 21:37:22,952][00194] Fps is (10 sec: 4095.5, 60 sec: 4027.6, 300 sec: 3406.1). Total num frames: 323584. Throughput: 0: 1049.4. Samples: 80200. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-12-06 21:37:22,957][00194] Avg episode reward: [(0, '4.770')] +[2024-12-06 21:37:23,940][03834] Updated weights for policy 0, policy_version 80 (0.0022) +[2024-12-06 21:37:27,951][00194] Fps is (10 sec: 3276.8, 60 sec: 3891.3, 300 sec: 3399.7). Total num frames: 339968. Throughput: 0: 990.4. Samples: 84848. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-12-06 21:37:27,955][00194] Avg episode reward: [(0, '4.572')] +[2024-12-06 21:37:32,952][00194] Fps is (10 sec: 4096.1, 60 sec: 4027.7, 300 sec: 3471.8). Total num frames: 364544. Throughput: 0: 1018.3. Samples: 92230. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2024-12-06 21:37:32,961][00194] Avg episode reward: [(0, '4.337')] +[2024-12-06 21:37:33,222][03834] Updated weights for policy 0, policy_version 90 (0.0023) +[2024-12-06 21:37:37,951][00194] Fps is (10 sec: 4505.6, 60 sec: 4096.0, 300 sec: 3500.2). Total num frames: 385024. Throughput: 0: 1050.4. Samples: 95846. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-12-06 21:37:37,956][00194] Avg episode reward: [(0, '4.429')] +[2024-12-06 21:37:42,951][00194] Fps is (10 sec: 3686.8, 60 sec: 3959.5, 300 sec: 3490.5). Total num frames: 401408. Throughput: 0: 1010.1. Samples: 100536. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-12-06 21:37:42,955][00194] Avg episode reward: [(0, '4.665')] +[2024-12-06 21:37:44,317][03834] Updated weights for policy 0, policy_version 100 (0.0022) +[2024-12-06 21:37:47,951][00194] Fps is (10 sec: 4096.0, 60 sec: 4028.0, 300 sec: 3549.9). Total num frames: 425984. Throughput: 0: 997.3. Samples: 107166. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-12-06 21:37:47,958][00194] Avg episode reward: [(0, '4.909')] +[2024-12-06 21:37:47,967][03821] Saving new best policy, reward=4.909! +[2024-12-06 21:37:52,747][03834] Updated weights for policy 0, policy_version 110 (0.0023) +[2024-12-06 21:37:52,951][00194] Fps is (10 sec: 4915.1, 60 sec: 4232.5, 300 sec: 3604.5). Total num frames: 450560. Throughput: 0: 1028.7. Samples: 110780. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-12-06 21:37:52,957][00194] Avg episode reward: [(0, '4.629')] +[2024-12-06 21:37:57,951][00194] Fps is (10 sec: 3686.4, 60 sec: 4027.8, 300 sec: 3560.4). Total num frames: 462848. Throughput: 0: 1022.0. Samples: 116168. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-12-06 21:37:57,956][00194] Avg episode reward: [(0, '4.429')] +[2024-12-06 21:38:02,951][00194] Fps is (10 sec: 3276.9, 60 sec: 3959.5, 300 sec: 3580.2). Total num frames: 483328. Throughput: 0: 994.5. Samples: 122124. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-12-06 21:38:02,954][00194] Avg episode reward: [(0, '4.538')] +[2024-12-06 21:38:03,796][03834] Updated weights for policy 0, policy_version 120 (0.0015) +[2024-12-06 21:38:07,951][00194] Fps is (10 sec: 4505.7, 60 sec: 4096.0, 300 sec: 3627.9). Total num frames: 507904. Throughput: 0: 1012.7. Samples: 125770. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-12-06 21:38:07,953][00194] Avg episode reward: [(0, '4.514')] +[2024-12-06 21:38:12,952][00194] Fps is (10 sec: 4505.0, 60 sec: 4095.9, 300 sec: 3644.0). Total num frames: 528384. Throughput: 0: 1052.3. Samples: 132202. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-12-06 21:38:12,955][00194] Avg episode reward: [(0, '4.687')] +[2024-12-06 21:38:13,657][03834] Updated weights for policy 0, policy_version 130 (0.0018) +[2024-12-06 21:38:17,951][00194] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3631.8). Total num frames: 544768. Throughput: 0: 995.4. Samples: 137024. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-12-06 21:38:17,958][00194] Avg episode reward: [(0, '4.627')] +[2024-12-06 21:38:22,951][00194] Fps is (10 sec: 4096.6, 60 sec: 4096.1, 300 sec: 3673.2). Total num frames: 569344. Throughput: 0: 995.7. Samples: 140652. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) +[2024-12-06 21:38:22,955][00194] Avg episode reward: [(0, '4.550')] +[2024-12-06 21:38:23,365][03834] Updated weights for policy 0, policy_version 140 (0.0030) +[2024-12-06 21:38:27,951][00194] Fps is (10 sec: 4505.6, 60 sec: 4164.3, 300 sec: 3686.4). Total num frames: 589824. Throughput: 0: 1053.2. Samples: 147932. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-12-06 21:38:27,959][00194] Avg episode reward: [(0, '4.411')] +[2024-12-06 21:38:32,951][00194] Fps is (10 sec: 3686.4, 60 sec: 4027.8, 300 sec: 3674.0). Total num frames: 606208. Throughput: 0: 1001.7. Samples: 152242. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-12-06 21:38:32,957][00194] Avg episode reward: [(0, '4.444')] +[2024-12-06 21:38:34,854][03834] Updated weights for policy 0, policy_version 150 (0.0040) +[2024-12-06 21:38:37,951][00194] Fps is (10 sec: 3686.4, 60 sec: 4027.7, 300 sec: 3686.4). Total num frames: 626688. Throughput: 0: 990.1. Samples: 155334. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-12-06 21:38:37,957][00194] Avg episode reward: [(0, '4.508')] +[2024-12-06 21:38:42,951][00194] Fps is (10 sec: 4505.6, 60 sec: 4164.3, 300 sec: 3721.5). Total num frames: 651264. Throughput: 0: 1030.9. Samples: 162558. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-12-06 21:38:42,957][00194] Avg episode reward: [(0, '4.607')] +[2024-12-06 21:38:43,197][03834] Updated weights for policy 0, policy_version 160 (0.0033) +[2024-12-06 21:38:47,951][00194] Fps is (10 sec: 4096.0, 60 sec: 4027.7, 300 sec: 3709.2). Total num frames: 667648. Throughput: 0: 1012.8. Samples: 167702. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2024-12-06 21:38:47,954][00194] Avg episode reward: [(0, '4.593')] +[2024-12-06 21:38:52,951][00194] Fps is (10 sec: 3276.8, 60 sec: 3891.2, 300 sec: 3697.5). Total num frames: 684032. Throughput: 0: 979.5. Samples: 169848. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2024-12-06 21:38:52,954][00194] Avg episode reward: [(0, '4.409')] +[2024-12-06 21:38:55,044][03834] Updated weights for policy 0, policy_version 170 (0.0021) +[2024-12-06 21:38:57,951][00194] Fps is (10 sec: 4095.9, 60 sec: 4096.0, 300 sec: 3729.5). Total num frames: 708608. Throughput: 0: 987.4. Samples: 176636. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2024-12-06 21:38:57,956][00194] Avg episode reward: [(0, '4.708')] +[2024-12-06 21:39:02,952][00194] Fps is (10 sec: 4505.4, 60 sec: 4096.0, 300 sec: 3738.9). Total num frames: 729088. Throughput: 0: 1014.3. Samples: 182666. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-12-06 21:39:02,956][00194] Avg episode reward: [(0, '4.678')] +[2024-12-06 21:39:05,799][03834] Updated weights for policy 0, policy_version 180 (0.0028) +[2024-12-06 21:39:07,951][00194] Fps is (10 sec: 3276.9, 60 sec: 3891.2, 300 sec: 3706.9). Total num frames: 741376. Throughput: 0: 978.1. Samples: 184668. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2024-12-06 21:39:07,954][00194] Avg episode reward: [(0, '4.718')] +[2024-12-06 21:39:12,951][00194] Fps is (10 sec: 3686.6, 60 sec: 3959.6, 300 sec: 3736.4). Total num frames: 765952. Throughput: 0: 952.6. Samples: 190798. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-12-06 21:39:12,957][00194] Avg episode reward: [(0, '5.048')] +[2024-12-06 21:39:12,960][03821] Saving new best policy, reward=5.048! +[2024-12-06 21:39:15,700][03834] Updated weights for policy 0, policy_version 190 (0.0025) +[2024-12-06 21:39:17,952][00194] Fps is (10 sec: 4505.0, 60 sec: 4027.6, 300 sec: 3744.9). Total num frames: 786432. Throughput: 0: 1007.5. Samples: 197582. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-12-06 21:39:17,956][00194] Avg episode reward: [(0, '5.284')] +[2024-12-06 21:39:17,973][03821] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000192_786432.pth... +[2024-12-06 21:39:18,144][03821] Saving new best policy, reward=5.284! +[2024-12-06 21:39:22,951][00194] Fps is (10 sec: 3276.8, 60 sec: 3822.9, 300 sec: 3715.0). Total num frames: 798720. Throughput: 0: 983.4. Samples: 199586. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) +[2024-12-06 21:39:22,954][00194] Avg episode reward: [(0, '5.222')] +[2024-12-06 21:39:27,311][03834] Updated weights for policy 0, policy_version 200 (0.0021) +[2024-12-06 21:39:27,952][00194] Fps is (10 sec: 3277.0, 60 sec: 3822.9, 300 sec: 3723.6). Total num frames: 819200. Throughput: 0: 940.6. Samples: 204886. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-12-06 21:39:27,959][00194] Avg episode reward: [(0, '5.231')] +[2024-12-06 21:39:32,951][00194] Fps is (10 sec: 4505.6, 60 sec: 3959.5, 300 sec: 3750.1). Total num frames: 843776. Throughput: 0: 984.1. Samples: 211988. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-12-06 21:39:32,957][00194] Avg episode reward: [(0, '5.062')] +[2024-12-06 21:39:36,837][03834] Updated weights for policy 0, policy_version 210 (0.0019) +[2024-12-06 21:39:37,952][00194] Fps is (10 sec: 4095.8, 60 sec: 3891.1, 300 sec: 3739.8). Total num frames: 860160. Throughput: 0: 1000.4. Samples: 214866. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2024-12-06 21:39:37,958][00194] Avg episode reward: [(0, '5.289')] +[2024-12-06 21:39:37,965][03821] Saving new best policy, reward=5.289! +[2024-12-06 21:39:42,951][00194] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3730.0). Total num frames: 876544. Throughput: 0: 943.1. Samples: 219074. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2024-12-06 21:39:42,955][00194] Avg episode reward: [(0, '5.240')] +[2024-12-06 21:39:47,865][03834] Updated weights for policy 0, policy_version 220 (0.0016) +[2024-12-06 21:39:47,951][00194] Fps is (10 sec: 4096.4, 60 sec: 3891.2, 300 sec: 3754.7). Total num frames: 901120. Throughput: 0: 958.8. Samples: 225812. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) +[2024-12-06 21:39:47,954][00194] Avg episode reward: [(0, '5.303')] +[2024-12-06 21:39:47,964][03821] Saving new best policy, reward=5.303! +[2024-12-06 21:39:52,951][00194] Fps is (10 sec: 4505.6, 60 sec: 3959.5, 300 sec: 3761.6). Total num frames: 921600. Throughput: 0: 989.9. Samples: 229212. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-12-06 21:39:52,956][00194] Avg episode reward: [(0, '5.588')] +[2024-12-06 21:39:52,959][03821] Saving new best policy, reward=5.588! +[2024-12-06 21:39:57,952][00194] Fps is (10 sec: 3276.7, 60 sec: 3754.7, 300 sec: 3735.5). Total num frames: 933888. Throughput: 0: 951.5. Samples: 233616. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-12-06 21:39:57,959][00194] Avg episode reward: [(0, '5.383')] +[2024-12-06 21:39:59,624][03834] Updated weights for policy 0, policy_version 230 (0.0028) +[2024-12-06 21:40:02,951][00194] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3742.6). Total num frames: 954368. Throughput: 0: 933.8. Samples: 239602. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-12-06 21:40:02,954][00194] Avg episode reward: [(0, '5.604')] +[2024-12-06 21:40:02,959][03821] Saving new best policy, reward=5.604! +[2024-12-06 21:40:07,951][00194] Fps is (10 sec: 4505.8, 60 sec: 3959.5, 300 sec: 3765.2). Total num frames: 978944. Throughput: 0: 966.8. Samples: 243094. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-12-06 21:40:07,953][00194] Avg episode reward: [(0, '5.769')] +[2024-12-06 21:40:07,962][03821] Saving new best policy, reward=5.769! +[2024-12-06 21:40:08,592][03834] Updated weights for policy 0, policy_version 240 (0.0019) +[2024-12-06 21:40:12,953][00194] Fps is (10 sec: 4095.3, 60 sec: 3822.8, 300 sec: 3755.9). Total num frames: 995328. Throughput: 0: 971.7. Samples: 248614. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-12-06 21:40:12,955][00194] Avg episode reward: [(0, '5.681')] +[2024-12-06 21:40:17,954][00194] Fps is (10 sec: 3276.0, 60 sec: 3754.6, 300 sec: 3747.0). Total num frames: 1011712. Throughput: 0: 933.1. Samples: 253978. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-12-06 21:40:17,957][00194] Avg episode reward: [(0, '5.700')] +[2024-12-06 21:40:19,959][03834] Updated weights for policy 0, policy_version 250 (0.0017) +[2024-12-06 21:40:22,951][00194] Fps is (10 sec: 4096.8, 60 sec: 3959.5, 300 sec: 3768.3). Total num frames: 1036288. Throughput: 0: 948.6. Samples: 257552. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-12-06 21:40:22,957][00194] Avg episode reward: [(0, '6.051')] +[2024-12-06 21:40:22,962][03821] Saving new best policy, reward=6.051! +[2024-12-06 21:40:27,951][00194] Fps is (10 sec: 4506.7, 60 sec: 3959.5, 300 sec: 3774.2). Total num frames: 1056768. Throughput: 0: 998.3. Samples: 263996. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-12-06 21:40:27,959][00194] Avg episode reward: [(0, '6.444')] +[2024-12-06 21:40:27,967][03821] Saving new best policy, reward=6.444! +[2024-12-06 21:40:30,890][03834] Updated weights for policy 0, policy_version 260 (0.0014) +[2024-12-06 21:40:32,951][00194] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3751.1). Total num frames: 1069056. Throughput: 0: 950.0. Samples: 268560. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-12-06 21:40:32,959][00194] Avg episode reward: [(0, '6.418')] +[2024-12-06 21:40:37,951][00194] Fps is (10 sec: 3686.4, 60 sec: 3891.3, 300 sec: 3771.1). Total num frames: 1093632. Throughput: 0: 953.7. Samples: 272128. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-12-06 21:40:37,954][00194] Avg episode reward: [(0, '6.635')] +[2024-12-06 21:40:37,966][03821] Saving new best policy, reward=6.635! +[2024-12-06 21:40:40,026][03834] Updated weights for policy 0, policy_version 270 (0.0021) +[2024-12-06 21:40:42,951][00194] Fps is (10 sec: 4915.2, 60 sec: 4027.7, 300 sec: 3790.5). Total num frames: 1118208. Throughput: 0: 1013.3. Samples: 279214. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-12-06 21:40:42,953][00194] Avg episode reward: [(0, '7.083')] +[2024-12-06 21:40:42,956][03821] Saving new best policy, reward=7.083! +[2024-12-06 21:40:47,951][00194] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3832.2). Total num frames: 1130496. Throughput: 0: 979.6. Samples: 283686. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-12-06 21:40:47,962][00194] Avg episode reward: [(0, '6.883')] +[2024-12-06 21:40:51,447][03834] Updated weights for policy 0, policy_version 280 (0.0035) +[2024-12-06 21:40:52,951][00194] Fps is (10 sec: 3276.7, 60 sec: 3822.9, 300 sec: 3901.6). Total num frames: 1150976. Throughput: 0: 965.7. Samples: 286552. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-12-06 21:40:52,957][00194] Avg episode reward: [(0, '6.988')] +[2024-12-06 21:40:57,951][00194] Fps is (10 sec: 4505.5, 60 sec: 4027.8, 300 sec: 3984.9). Total num frames: 1175552. Throughput: 0: 1002.6. Samples: 293730. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-12-06 21:40:57,956][00194] Avg episode reward: [(0, '7.069')] +[2024-12-06 21:41:00,669][03834] Updated weights for policy 0, policy_version 290 (0.0017) +[2024-12-06 21:41:02,951][00194] Fps is (10 sec: 4096.1, 60 sec: 3959.5, 300 sec: 3943.3). Total num frames: 1191936. Throughput: 0: 1006.4. Samples: 299262. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-12-06 21:41:02,959][00194] Avg episode reward: [(0, '7.471')] +[2024-12-06 21:41:02,961][03821] Saving new best policy, reward=7.471! +[2024-12-06 21:41:07,951][00194] Fps is (10 sec: 3276.8, 60 sec: 3822.9, 300 sec: 3943.3). Total num frames: 1208320. Throughput: 0: 972.6. Samples: 301318. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-12-06 21:41:07,959][00194] Avg episode reward: [(0, '7.348')] +[2024-12-06 21:41:11,392][03834] Updated weights for policy 0, policy_version 300 (0.0017) +[2024-12-06 21:41:12,951][00194] Fps is (10 sec: 4096.0, 60 sec: 3959.6, 300 sec: 3984.9). Total num frames: 1232896. Throughput: 0: 979.7. Samples: 308084. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-12-06 21:41:12,956][00194] Avg episode reward: [(0, '7.586')] +[2024-12-06 21:41:12,959][03821] Saving new best policy, reward=7.586! +[2024-12-06 21:41:17,951][00194] Fps is (10 sec: 4505.6, 60 sec: 4027.9, 300 sec: 3971.0). Total num frames: 1253376. Throughput: 0: 1013.3. Samples: 314158. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-12-06 21:41:17,956][00194] Avg episode reward: [(0, '8.365')] +[2024-12-06 21:41:17,969][03821] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000306_1253376.pth... +[2024-12-06 21:41:18,129][03821] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000075_307200.pth +[2024-12-06 21:41:18,150][03821] Saving new best policy, reward=8.365! +[2024-12-06 21:41:22,951][00194] Fps is (10 sec: 3276.8, 60 sec: 3822.9, 300 sec: 3929.4). Total num frames: 1265664. Throughput: 0: 977.3. Samples: 316106. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-12-06 21:41:22,953][00194] Avg episode reward: [(0, '8.468')] +[2024-12-06 21:41:22,958][03821] Saving new best policy, reward=8.468! +[2024-12-06 21:41:23,362][03834] Updated weights for policy 0, policy_version 310 (0.0029) +[2024-12-06 21:41:27,951][00194] Fps is (10 sec: 3686.3, 60 sec: 3891.2, 300 sec: 3957.1). Total num frames: 1290240. Throughput: 0: 951.0. Samples: 322010. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) +[2024-12-06 21:41:27,958][00194] Avg episode reward: [(0, '9.212')] +[2024-12-06 21:41:27,975][03821] Saving new best policy, reward=9.212! +[2024-12-06 21:41:32,130][03834] Updated weights for policy 0, policy_version 320 (0.0017) +[2024-12-06 21:41:32,951][00194] Fps is (10 sec: 4505.6, 60 sec: 4027.7, 300 sec: 3971.0). Total num frames: 1310720. Throughput: 0: 1005.3. Samples: 328924. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-12-06 21:41:32,957][00194] Avg episode reward: [(0, '10.342')] +[2024-12-06 21:41:32,961][03821] Saving new best policy, reward=10.342! +[2024-12-06 21:41:37,951][00194] Fps is (10 sec: 3686.5, 60 sec: 3891.2, 300 sec: 3943.3). Total num frames: 1327104. Throughput: 0: 991.2. Samples: 331156. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-12-06 21:41:37,957][00194] Avg episode reward: [(0, '10.814')] +[2024-12-06 21:41:37,970][03821] Saving new best policy, reward=10.814! +[2024-12-06 21:41:42,951][00194] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3929.4). Total num frames: 1343488. Throughput: 0: 940.2. Samples: 336040. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) +[2024-12-06 21:41:42,953][00194] Avg episode reward: [(0, '11.241')] +[2024-12-06 21:41:42,959][03821] Saving new best policy, reward=11.241! +[2024-12-06 21:41:44,119][03834] Updated weights for policy 0, policy_version 330 (0.0051) +[2024-12-06 21:41:47,951][00194] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3971.0). Total num frames: 1368064. Throughput: 0: 970.7. Samples: 342944. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-12-06 21:41:47,953][00194] Avg episode reward: [(0, '11.751')] +[2024-12-06 21:41:47,964][03821] Saving new best policy, reward=11.751! +[2024-12-06 21:41:52,951][00194] Fps is (10 sec: 4505.6, 60 sec: 3959.5, 300 sec: 3957.2). Total num frames: 1388544. Throughput: 0: 1000.1. Samples: 346324. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-12-06 21:41:52,953][00194] Avg episode reward: [(0, '12.128')] +[2024-12-06 21:41:52,967][03821] Saving new best policy, reward=12.128! +[2024-12-06 21:41:54,461][03834] Updated weights for policy 0, policy_version 340 (0.0028) +[2024-12-06 21:41:57,952][00194] Fps is (10 sec: 3276.7, 60 sec: 3754.7, 300 sec: 3915.5). Total num frames: 1400832. Throughput: 0: 946.6. Samples: 350680. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-12-06 21:41:57,956][00194] Avg episode reward: [(0, '12.295')] +[2024-12-06 21:41:58,031][03821] Saving new best policy, reward=12.295! +[2024-12-06 21:42:02,951][00194] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3957.2). Total num frames: 1429504. Throughput: 0: 969.2. Samples: 357774. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) +[2024-12-06 21:42:02,957][00194] Avg episode reward: [(0, '13.943')] +[2024-12-06 21:42:02,959][03821] Saving new best policy, reward=13.943! +[2024-12-06 21:42:03,807][03834] Updated weights for policy 0, policy_version 350 (0.0027) +[2024-12-06 21:42:07,951][00194] Fps is (10 sec: 4915.3, 60 sec: 4027.7, 300 sec: 3957.2). Total num frames: 1449984. Throughput: 0: 1005.6. Samples: 361358. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-12-06 21:42:07,954][00194] Avg episode reward: [(0, '14.164')] +[2024-12-06 21:42:07,965][03821] Saving new best policy, reward=14.164! +[2024-12-06 21:42:12,952][00194] Fps is (10 sec: 3686.2, 60 sec: 3891.2, 300 sec: 3929.4). Total num frames: 1466368. Throughput: 0: 985.9. Samples: 366376. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-12-06 21:42:12,954][00194] Avg episode reward: [(0, '13.678')] +[2024-12-06 21:42:14,927][03834] Updated weights for policy 0, policy_version 360 (0.0023) +[2024-12-06 21:42:17,951][00194] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3943.3). Total num frames: 1486848. Throughput: 0: 967.7. Samples: 372472. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-12-06 21:42:17,953][00194] Avg episode reward: [(0, '13.323')] +[2024-12-06 21:42:22,951][00194] Fps is (10 sec: 4505.7, 60 sec: 4096.0, 300 sec: 3971.0). Total num frames: 1511424. Throughput: 0: 999.9. Samples: 376152. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-12-06 21:42:22,954][00194] Avg episode reward: [(0, '13.358')] +[2024-12-06 21:42:23,571][03834] Updated weights for policy 0, policy_version 370 (0.0013) +[2024-12-06 21:42:27,952][00194] Fps is (10 sec: 4095.7, 60 sec: 3959.4, 300 sec: 3943.3). Total num frames: 1527808. Throughput: 0: 1026.1. Samples: 382214. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-12-06 21:42:27,960][00194] Avg episode reward: [(0, '12.903')] +[2024-12-06 21:42:32,951][00194] Fps is (10 sec: 3686.5, 60 sec: 3959.5, 300 sec: 3943.3). Total num frames: 1548288. Throughput: 0: 993.0. Samples: 387630. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-12-06 21:42:32,958][00194] Avg episode reward: [(0, '14.861')] +[2024-12-06 21:42:32,961][03821] Saving new best policy, reward=14.861! +[2024-12-06 21:42:34,553][03834] Updated weights for policy 0, policy_version 380 (0.0026) +[2024-12-06 21:42:37,951][00194] Fps is (10 sec: 4506.0, 60 sec: 4096.0, 300 sec: 3971.0). Total num frames: 1572864. Throughput: 0: 996.9. Samples: 391186. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-12-06 21:42:37,957][00194] Avg episode reward: [(0, '15.081')] +[2024-12-06 21:42:37,968][03821] Saving new best policy, reward=15.081! +[2024-12-06 21:42:42,952][00194] Fps is (10 sec: 4505.0, 60 sec: 4164.2, 300 sec: 3957.1). Total num frames: 1593344. Throughput: 0: 1053.2. Samples: 398076. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-12-06 21:42:42,956][00194] Avg episode reward: [(0, '15.144')] +[2024-12-06 21:42:42,958][03821] Saving new best policy, reward=15.144! +[2024-12-06 21:42:44,365][03834] Updated weights for policy 0, policy_version 390 (0.0020) +[2024-12-06 21:42:47,951][00194] Fps is (10 sec: 3276.8, 60 sec: 3959.5, 300 sec: 3915.5). Total num frames: 1605632. Throughput: 0: 993.0. Samples: 402460. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2024-12-06 21:42:47,953][00194] Avg episode reward: [(0, '16.380')] +[2024-12-06 21:42:47,961][03821] Saving new best policy, reward=16.380! +[2024-12-06 21:42:52,951][00194] Fps is (10 sec: 3686.8, 60 sec: 4027.7, 300 sec: 3957.1). Total num frames: 1630208. Throughput: 0: 991.6. Samples: 405978. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-12-06 21:42:52,958][00194] Avg episode reward: [(0, '15.367')] +[2024-12-06 21:42:54,148][03834] Updated weights for policy 0, policy_version 400 (0.0047) +[2024-12-06 21:42:57,952][00194] Fps is (10 sec: 4914.9, 60 sec: 4232.5, 300 sec: 3971.0). Total num frames: 1654784. Throughput: 0: 1041.7. Samples: 413254. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) +[2024-12-06 21:42:57,955][00194] Avg episode reward: [(0, '15.883')] +[2024-12-06 21:43:02,954][00194] Fps is (10 sec: 3685.6, 60 sec: 3959.3, 300 sec: 3929.3). Total num frames: 1667072. Throughput: 0: 1015.7. Samples: 418182. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) +[2024-12-06 21:43:02,963][00194] Avg episode reward: [(0, '16.148')] +[2024-12-06 21:43:05,205][03834] Updated weights for policy 0, policy_version 410 (0.0013) +[2024-12-06 21:43:07,951][00194] Fps is (10 sec: 3686.6, 60 sec: 4027.7, 300 sec: 3943.3). Total num frames: 1691648. Throughput: 0: 996.9. Samples: 421014. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-12-06 21:43:07,953][00194] Avg episode reward: [(0, '15.920')] +[2024-12-06 21:43:12,951][00194] Fps is (10 sec: 4916.4, 60 sec: 4164.3, 300 sec: 3971.0). Total num frames: 1716224. Throughput: 0: 1028.2. Samples: 428480. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) +[2024-12-06 21:43:12,960][00194] Avg episode reward: [(0, '15.830')] +[2024-12-06 21:43:13,589][03834] Updated weights for policy 0, policy_version 420 (0.0029) +[2024-12-06 21:43:17,952][00194] Fps is (10 sec: 4095.8, 60 sec: 4096.0, 300 sec: 3943.3). Total num frames: 1732608. Throughput: 0: 1033.6. Samples: 434144. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-12-06 21:43:17,954][00194] Avg episode reward: [(0, '16.198')] +[2024-12-06 21:43:17,969][03821] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000423_1732608.pth... +[2024-12-06 21:43:18,146][03821] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000192_786432.pth +[2024-12-06 21:43:22,951][00194] Fps is (10 sec: 3276.8, 60 sec: 3959.5, 300 sec: 3929.4). Total num frames: 1748992. Throughput: 0: 1003.0. Samples: 436320. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-12-06 21:43:22,958][00194] Avg episode reward: [(0, '15.561')] +[2024-12-06 21:43:24,826][03834] Updated weights for policy 0, policy_version 430 (0.0025) +[2024-12-06 21:43:27,952][00194] Fps is (10 sec: 4096.1, 60 sec: 4096.0, 300 sec: 3957.1). Total num frames: 1773568. Throughput: 0: 1005.0. Samples: 443300. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-12-06 21:43:27,954][00194] Avg episode reward: [(0, '15.741')] +[2024-12-06 21:43:32,951][00194] Fps is (10 sec: 4915.2, 60 sec: 4164.3, 300 sec: 3971.0). Total num frames: 1798144. Throughput: 0: 1058.7. Samples: 450102. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-12-06 21:43:32,955][00194] Avg episode reward: [(0, '16.596')] +[2024-12-06 21:43:32,958][03821] Saving new best policy, reward=16.596! +[2024-12-06 21:43:34,442][03834] Updated weights for policy 0, policy_version 440 (0.0025) +[2024-12-06 21:43:37,954][00194] Fps is (10 sec: 3685.6, 60 sec: 3959.3, 300 sec: 3929.3). Total num frames: 1810432. Throughput: 0: 1025.8. Samples: 452142. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-12-06 21:43:37,956][00194] Avg episode reward: [(0, '17.121')] +[2024-12-06 21:43:37,968][03821] Saving new best policy, reward=17.121! +[2024-12-06 21:43:42,951][00194] Fps is (10 sec: 3686.4, 60 sec: 4027.8, 300 sec: 3957.2). Total num frames: 1835008. Throughput: 0: 1001.1. Samples: 458302. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-12-06 21:43:42,958][00194] Avg episode reward: [(0, '17.992')] +[2024-12-06 21:43:42,961][03821] Saving new best policy, reward=17.992! +[2024-12-06 21:43:44,367][03834] Updated weights for policy 0, policy_version 450 (0.0020) +[2024-12-06 21:43:47,951][00194] Fps is (10 sec: 4916.4, 60 sec: 4232.5, 300 sec: 3984.9). Total num frames: 1859584. Throughput: 0: 1052.2. Samples: 465530. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) +[2024-12-06 21:43:47,954][00194] Avg episode reward: [(0, '19.185')] +[2024-12-06 21:43:47,966][03821] Saving new best policy, reward=19.185! +[2024-12-06 21:43:52,951][00194] Fps is (10 sec: 3686.4, 60 sec: 4027.8, 300 sec: 3943.3). Total num frames: 1871872. Throughput: 0: 1039.1. Samples: 467774. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-12-06 21:43:52,956][00194] Avg episode reward: [(0, '20.053')] +[2024-12-06 21:43:52,960][03821] Saving new best policy, reward=20.053! +[2024-12-06 21:43:55,729][03834] Updated weights for policy 0, policy_version 460 (0.0026) +[2024-12-06 21:43:57,951][00194] Fps is (10 sec: 3276.8, 60 sec: 3959.5, 300 sec: 3943.3). Total num frames: 1892352. Throughput: 0: 987.2. Samples: 472906. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-12-06 21:43:57,960][00194] Avg episode reward: [(0, '21.220')] +[2024-12-06 21:43:57,969][03821] Saving new best policy, reward=21.220! +[2024-12-06 21:44:02,951][00194] Fps is (10 sec: 4505.6, 60 sec: 4164.4, 300 sec: 3984.9). Total num frames: 1916928. Throughput: 0: 1022.6. Samples: 480160. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) +[2024-12-06 21:44:02,957][00194] Avg episode reward: [(0, '20.777')] +[2024-12-06 21:44:04,195][03834] Updated weights for policy 0, policy_version 470 (0.0016) +[2024-12-06 21:44:07,951][00194] Fps is (10 sec: 4505.6, 60 sec: 4096.0, 300 sec: 3971.0). Total num frames: 1937408. Throughput: 0: 1049.4. Samples: 483544. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-12-06 21:44:07,960][00194] Avg episode reward: [(0, '20.635')] +[2024-12-06 21:44:12,951][00194] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3957.2). Total num frames: 1953792. Throughput: 0: 992.7. Samples: 487972. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-12-06 21:44:12,954][00194] Avg episode reward: [(0, '20.687')] +[2024-12-06 21:44:15,156][03834] Updated weights for policy 0, policy_version 480 (0.0028) +[2024-12-06 21:44:17,951][00194] Fps is (10 sec: 4096.0, 60 sec: 4096.0, 300 sec: 3998.8). Total num frames: 1978368. Throughput: 0: 1004.8. Samples: 495316. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-12-06 21:44:17,953][00194] Avg episode reward: [(0, '19.332')] +[2024-12-06 21:44:22,951][00194] Fps is (10 sec: 4505.4, 60 sec: 4164.2, 300 sec: 3998.8). Total num frames: 1998848. Throughput: 0: 1038.6. Samples: 498878. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-12-06 21:44:22,957][00194] Avg episode reward: [(0, '20.276')] +[2024-12-06 21:44:24,583][03834] Updated weights for policy 0, policy_version 490 (0.0026) +[2024-12-06 21:44:27,953][00194] Fps is (10 sec: 3685.6, 60 sec: 4027.6, 300 sec: 3971.0). Total num frames: 2015232. Throughput: 0: 1010.7. Samples: 503786. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) +[2024-12-06 21:44:27,957][00194] Avg episode reward: [(0, '20.581')] +[2024-12-06 21:44:32,951][00194] Fps is (10 sec: 4096.2, 60 sec: 4027.7, 300 sec: 3998.8). Total num frames: 2039808. Throughput: 0: 994.4. Samples: 510280. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-12-06 21:44:32,958][00194] Avg episode reward: [(0, '21.826')] +[2024-12-06 21:44:32,964][03821] Saving new best policy, reward=21.826! +[2024-12-06 21:44:34,633][03834] Updated weights for policy 0, policy_version 500 (0.0020) +[2024-12-06 21:44:37,951][00194] Fps is (10 sec: 4506.5, 60 sec: 4164.4, 300 sec: 4012.7). Total num frames: 2060288. Throughput: 0: 1024.5. Samples: 513876. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-12-06 21:44:37,958][00194] Avg episode reward: [(0, '21.276')] +[2024-12-06 21:44:42,951][00194] Fps is (10 sec: 3686.4, 60 sec: 4027.7, 300 sec: 3984.9). Total num frames: 2076672. Throughput: 0: 1040.4. Samples: 519726. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-12-06 21:44:42,954][00194] Avg episode reward: [(0, '22.396')] +[2024-12-06 21:44:43,009][03821] Saving new best policy, reward=22.396! +[2024-12-06 21:44:45,754][03834] Updated weights for policy 0, policy_version 510 (0.0018) +[2024-12-06 21:44:47,951][00194] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3984.9). Total num frames: 2097152. Throughput: 0: 1003.3. Samples: 525308. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-12-06 21:44:47,954][00194] Avg episode reward: [(0, '22.030')] +[2024-12-06 21:44:52,951][00194] Fps is (10 sec: 4505.6, 60 sec: 4164.3, 300 sec: 4026.6). Total num frames: 2121728. Throughput: 0: 1006.8. Samples: 528852. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-12-06 21:44:52,953][00194] Avg episode reward: [(0, '21.713')] +[2024-12-06 21:44:54,138][03834] Updated weights for policy 0, policy_version 520 (0.0014) +[2024-12-06 21:44:57,951][00194] Fps is (10 sec: 4505.6, 60 sec: 4164.3, 300 sec: 4026.6). Total num frames: 2142208. Throughput: 0: 1061.2. Samples: 535728. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-12-06 21:44:57,953][00194] Avg episode reward: [(0, '21.761')] +[2024-12-06 21:45:02,951][00194] Fps is (10 sec: 3686.4, 60 sec: 4027.7, 300 sec: 3998.8). Total num frames: 2158592. Throughput: 0: 998.0. Samples: 540226. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-12-06 21:45:02,955][00194] Avg episode reward: [(0, '21.658')] +[2024-12-06 21:45:05,294][03834] Updated weights for policy 0, policy_version 530 (0.0021) +[2024-12-06 21:45:07,951][00194] Fps is (10 sec: 4096.0, 60 sec: 4096.0, 300 sec: 4026.6). Total num frames: 2183168. Throughput: 0: 1000.7. Samples: 543910. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2024-12-06 21:45:07,954][00194] Avg episode reward: [(0, '21.256')] +[2024-12-06 21:45:12,952][00194] Fps is (10 sec: 4914.9, 60 sec: 4232.5, 300 sec: 4054.4). Total num frames: 2207744. Throughput: 0: 1055.7. Samples: 551292. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-12-06 21:45:12,957][00194] Avg episode reward: [(0, '20.374')] +[2024-12-06 21:45:14,185][03834] Updated weights for policy 0, policy_version 540 (0.0022) +[2024-12-06 21:45:17,951][00194] Fps is (10 sec: 3686.4, 60 sec: 4027.7, 300 sec: 4012.7). Total num frames: 2220032. Throughput: 0: 1021.0. Samples: 556224. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-12-06 21:45:17,962][00194] Avg episode reward: [(0, '20.272')] +[2024-12-06 21:45:17,978][03821] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000542_2220032.pth... +[2024-12-06 21:45:18,130][03821] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000306_1253376.pth +[2024-12-06 21:45:22,951][00194] Fps is (10 sec: 3277.0, 60 sec: 4027.8, 300 sec: 4012.7). Total num frames: 2240512. Throughput: 0: 999.6. Samples: 558860. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-12-06 21:45:22,953][00194] Avg episode reward: [(0, '20.027')] +[2024-12-06 21:45:24,728][03834] Updated weights for policy 0, policy_version 550 (0.0031) +[2024-12-06 21:45:27,952][00194] Fps is (10 sec: 4505.3, 60 sec: 4164.4, 300 sec: 4054.3). Total num frames: 2265088. Throughput: 0: 1035.0. Samples: 566300. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-12-06 21:45:27,963][00194] Avg episode reward: [(0, '19.816')] +[2024-12-06 21:45:32,951][00194] Fps is (10 sec: 4505.6, 60 sec: 4096.0, 300 sec: 4040.5). Total num frames: 2285568. Throughput: 0: 1041.6. Samples: 572178. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-12-06 21:45:32,954][00194] Avg episode reward: [(0, '19.862')] +[2024-12-06 21:45:35,268][03834] Updated weights for policy 0, policy_version 560 (0.0027) +[2024-12-06 21:45:37,958][00194] Fps is (10 sec: 3684.3, 60 sec: 4027.3, 300 sec: 4012.6). Total num frames: 2301952. Throughput: 0: 1011.5. Samples: 574374. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-12-06 21:45:37,964][00194] Avg episode reward: [(0, '19.439')] +[2024-12-06 21:45:42,951][00194] Fps is (10 sec: 4096.0, 60 sec: 4164.3, 300 sec: 4054.3). Total num frames: 2326528. Throughput: 0: 1015.6. Samples: 581432. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-12-06 21:45:42,954][00194] Avg episode reward: [(0, '18.503')] +[2024-12-06 21:45:44,040][03834] Updated weights for policy 0, policy_version 570 (0.0017) +[2024-12-06 21:45:47,951][00194] Fps is (10 sec: 4918.4, 60 sec: 4232.5, 300 sec: 4068.2). Total num frames: 2351104. Throughput: 0: 1067.6. Samples: 588266. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-12-06 21:45:47,960][00194] Avg episode reward: [(0, '19.297')] +[2024-12-06 21:45:52,951][00194] Fps is (10 sec: 3686.4, 60 sec: 4027.7, 300 sec: 4026.6). Total num frames: 2363392. Throughput: 0: 1033.5. Samples: 590416. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-12-06 21:45:52,957][00194] Avg episode reward: [(0, '19.444')] +[2024-12-06 21:45:55,069][03834] Updated weights for policy 0, policy_version 580 (0.0035) +[2024-12-06 21:45:57,951][00194] Fps is (10 sec: 3686.4, 60 sec: 4096.0, 300 sec: 4054.3). Total num frames: 2387968. Throughput: 0: 1005.2. Samples: 596526. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-12-06 21:45:57,956][00194] Avg episode reward: [(0, '19.112')] +[2024-12-06 21:46:02,951][00194] Fps is (10 sec: 4915.2, 60 sec: 4232.5, 300 sec: 4082.1). Total num frames: 2412544. Throughput: 0: 1059.4. Samples: 603896. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-12-06 21:46:02,955][00194] Avg episode reward: [(0, '19.176')] +[2024-12-06 21:46:03,462][03834] Updated weights for policy 0, policy_version 590 (0.0019) +[2024-12-06 21:46:07,951][00194] Fps is (10 sec: 4095.9, 60 sec: 4096.0, 300 sec: 4054.3). Total num frames: 2428928. Throughput: 0: 1058.7. Samples: 606500. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-12-06 21:46:07,958][00194] Avg episode reward: [(0, '19.872')] +[2024-12-06 21:46:12,951][00194] Fps is (10 sec: 3686.4, 60 sec: 4027.8, 300 sec: 4054.3). Total num frames: 2449408. Throughput: 0: 1009.4. Samples: 611722. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-12-06 21:46:12,953][00194] Avg episode reward: [(0, '19.748')] +[2024-12-06 21:46:14,314][03834] Updated weights for policy 0, policy_version 600 (0.0015) +[2024-12-06 21:46:17,951][00194] Fps is (10 sec: 4505.8, 60 sec: 4232.5, 300 sec: 4096.0). Total num frames: 2473984. Throughput: 0: 1043.6. Samples: 619140. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-12-06 21:46:17,957][00194] Avg episode reward: [(0, '19.702')] +[2024-12-06 21:46:22,953][00194] Fps is (10 sec: 4095.3, 60 sec: 4164.1, 300 sec: 4068.2). Total num frames: 2490368. Throughput: 0: 1067.4. Samples: 622402. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2024-12-06 21:46:22,958][00194] Avg episode reward: [(0, '19.524')] +[2024-12-06 21:46:24,784][03834] Updated weights for policy 0, policy_version 610 (0.0013) +[2024-12-06 21:46:27,952][00194] Fps is (10 sec: 3276.4, 60 sec: 4027.7, 300 sec: 4054.3). Total num frames: 2506752. Throughput: 0: 1009.4. Samples: 626856. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-12-06 21:46:27,959][00194] Avg episode reward: [(0, '20.822')] +[2024-12-06 21:46:32,951][00194] Fps is (10 sec: 4096.7, 60 sec: 4096.0, 300 sec: 4082.1). Total num frames: 2531328. Throughput: 0: 1020.8. Samples: 634200. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-12-06 21:46:32,958][00194] Avg episode reward: [(0, '22.453')] +[2024-12-06 21:46:32,990][03821] Saving new best policy, reward=22.453! +[2024-12-06 21:46:33,937][03834] Updated weights for policy 0, policy_version 620 (0.0041) +[2024-12-06 21:46:37,951][00194] Fps is (10 sec: 4915.8, 60 sec: 4233.0, 300 sec: 4109.9). Total num frames: 2555904. Throughput: 0: 1052.8. Samples: 637794. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) +[2024-12-06 21:46:37,956][00194] Avg episode reward: [(0, '23.393')] +[2024-12-06 21:46:37,965][03821] Saving new best policy, reward=23.393! +[2024-12-06 21:46:42,951][00194] Fps is (10 sec: 3686.4, 60 sec: 4027.7, 300 sec: 4068.2). Total num frames: 2568192. Throughput: 0: 1026.9. Samples: 642738. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) +[2024-12-06 21:46:42,956][00194] Avg episode reward: [(0, '22.955')] +[2024-12-06 21:46:44,930][03834] Updated weights for policy 0, policy_version 630 (0.0017) +[2024-12-06 21:46:47,951][00194] Fps is (10 sec: 3686.4, 60 sec: 4027.7, 300 sec: 4082.1). Total num frames: 2592768. Throughput: 0: 1005.2. Samples: 649132. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-12-06 21:46:47,953][00194] Avg episode reward: [(0, '24.382')] +[2024-12-06 21:46:47,960][03821] Saving new best policy, reward=24.382! +[2024-12-06 21:46:52,951][00194] Fps is (10 sec: 4915.1, 60 sec: 4232.5, 300 sec: 4123.8). Total num frames: 2617344. Throughput: 0: 1025.5. Samples: 652648. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-12-06 21:46:52,958][00194] Avg episode reward: [(0, '23.621')] +[2024-12-06 21:46:53,588][03834] Updated weights for policy 0, policy_version 640 (0.0016) +[2024-12-06 21:46:57,951][00194] Fps is (10 sec: 4096.0, 60 sec: 4096.0, 300 sec: 4082.1). Total num frames: 2633728. Throughput: 0: 1039.9. Samples: 658518. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-12-06 21:46:57,959][00194] Avg episode reward: [(0, '21.863')] +[2024-12-06 21:47:02,951][00194] Fps is (10 sec: 3686.5, 60 sec: 4027.7, 300 sec: 4082.1). Total num frames: 2654208. Throughput: 0: 998.2. Samples: 664058. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-12-06 21:47:02,953][00194] Avg episode reward: [(0, '21.931')] +[2024-12-06 21:47:04,439][03834] Updated weights for policy 0, policy_version 650 (0.0015) +[2024-12-06 21:47:07,951][00194] Fps is (10 sec: 4505.6, 60 sec: 4164.3, 300 sec: 4109.9). Total num frames: 2678784. Throughput: 0: 1008.2. Samples: 667770. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-12-06 21:47:07,954][00194] Avg episode reward: [(0, '21.018')] +[2024-12-06 21:47:12,951][00194] Fps is (10 sec: 4505.6, 60 sec: 4164.3, 300 sec: 4109.9). Total num frames: 2699264. Throughput: 0: 1062.9. Samples: 674684. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) +[2024-12-06 21:47:12,958][00194] Avg episode reward: [(0, '20.070')] +[2024-12-06 21:47:14,077][03834] Updated weights for policy 0, policy_version 660 (0.0021) +[2024-12-06 21:47:17,951][00194] Fps is (10 sec: 3686.4, 60 sec: 4027.7, 300 sec: 4082.1). Total num frames: 2715648. Throughput: 0: 1002.5. Samples: 679312. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-12-06 21:47:17,957][00194] Avg episode reward: [(0, '20.752')] +[2024-12-06 21:47:17,966][03821] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000663_2715648.pth... +[2024-12-06 21:47:18,091][03821] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000423_1732608.pth +[2024-12-06 21:47:22,951][00194] Fps is (10 sec: 3686.4, 60 sec: 4096.1, 300 sec: 4096.0). Total num frames: 2736128. Throughput: 0: 999.3. Samples: 682762. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-12-06 21:47:22,954][00194] Avg episode reward: [(0, '21.864')] +[2024-12-06 21:47:24,079][03834] Updated weights for policy 0, policy_version 670 (0.0020) +[2024-12-06 21:47:27,951][00194] Fps is (10 sec: 4505.6, 60 sec: 4232.6, 300 sec: 4109.9). Total num frames: 2760704. Throughput: 0: 1053.1. Samples: 690128. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-12-06 21:47:27,958][00194] Avg episode reward: [(0, '23.079')] +[2024-12-06 21:47:32,951][00194] Fps is (10 sec: 4096.0, 60 sec: 4096.0, 300 sec: 4082.1). Total num frames: 2777088. Throughput: 0: 1016.6. Samples: 694878. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) +[2024-12-06 21:47:32,953][00194] Avg episode reward: [(0, '23.127')] +[2024-12-06 21:47:34,998][03834] Updated weights for policy 0, policy_version 680 (0.0028) +[2024-12-06 21:47:37,951][00194] Fps is (10 sec: 3686.4, 60 sec: 4027.7, 300 sec: 4082.1). Total num frames: 2797568. Throughput: 0: 1005.7. Samples: 697906. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-12-06 21:47:37,953][00194] Avg episode reward: [(0, '24.003')] +[2024-12-06 21:47:42,951][00194] Fps is (10 sec: 4505.6, 60 sec: 4232.5, 300 sec: 4123.8). Total num frames: 2822144. Throughput: 0: 1039.1. Samples: 705278. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-12-06 21:47:42,954][00194] Avg episode reward: [(0, '24.489')] +[2024-12-06 21:47:42,957][03821] Saving new best policy, reward=24.489! +[2024-12-06 21:47:43,241][03834] Updated weights for policy 0, policy_version 690 (0.0024) +[2024-12-06 21:47:47,951][00194] Fps is (10 sec: 4096.0, 60 sec: 4096.0, 300 sec: 4096.0). Total num frames: 2838528. Throughput: 0: 1036.6. Samples: 710704. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-12-06 21:47:47,956][00194] Avg episode reward: [(0, '23.130')] +[2024-12-06 21:47:52,951][00194] Fps is (10 sec: 3276.8, 60 sec: 3959.5, 300 sec: 4068.2). Total num frames: 2854912. Throughput: 0: 1000.3. Samples: 712784. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-12-06 21:47:52,954][00194] Avg episode reward: [(0, '22.469')] +[2024-12-06 21:47:54,876][03834] Updated weights for policy 0, policy_version 700 (0.0021) +[2024-12-06 21:47:57,952][00194] Fps is (10 sec: 4095.8, 60 sec: 4096.0, 300 sec: 4109.9). Total num frames: 2879488. Throughput: 0: 996.1. Samples: 719508. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-12-06 21:47:57,960][00194] Avg episode reward: [(0, '22.570')] +[2024-12-06 21:48:02,951][00194] Fps is (10 sec: 4505.6, 60 sec: 4096.0, 300 sec: 4096.0). Total num frames: 2899968. Throughput: 0: 1034.3. Samples: 725856. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2024-12-06 21:48:02,955][00194] Avg episode reward: [(0, '23.098')] +[2024-12-06 21:48:05,131][03834] Updated weights for policy 0, policy_version 710 (0.0025) +[2024-12-06 21:48:07,951][00194] Fps is (10 sec: 3686.6, 60 sec: 3959.5, 300 sec: 4068.2). Total num frames: 2916352. Throughput: 0: 1005.0. Samples: 727988. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-12-06 21:48:07,957][00194] Avg episode reward: [(0, '23.021')] +[2024-12-06 21:48:12,951][00194] Fps is (10 sec: 3686.3, 60 sec: 3959.4, 300 sec: 4082.1). Total num frames: 2936832. Throughput: 0: 982.0. Samples: 734318. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-12-06 21:48:12,958][00194] Avg episode reward: [(0, '22.613')] +[2024-12-06 21:48:14,657][03834] Updated weights for policy 0, policy_version 720 (0.0026) +[2024-12-06 21:48:17,951][00194] Fps is (10 sec: 4915.2, 60 sec: 4164.3, 300 sec: 4123.8). Total num frames: 2965504. Throughput: 0: 1041.4. Samples: 741740. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-12-06 21:48:17,955][00194] Avg episode reward: [(0, '24.013')] +[2024-12-06 21:48:22,951][00194] Fps is (10 sec: 4096.1, 60 sec: 4027.7, 300 sec: 4082.1). Total num frames: 2977792. Throughput: 0: 1027.0. Samples: 744120. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) +[2024-12-06 21:48:22,961][00194] Avg episode reward: [(0, '24.282')] +[2024-12-06 21:48:25,637][03834] Updated weights for policy 0, policy_version 730 (0.0023) +[2024-12-06 21:48:27,951][00194] Fps is (10 sec: 3276.8, 60 sec: 3959.5, 300 sec: 4068.2). Total num frames: 2998272. Throughput: 0: 980.1. Samples: 749382. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2024-12-06 21:48:27,954][00194] Avg episode reward: [(0, '24.768')] +[2024-12-06 21:48:28,008][03821] Saving new best policy, reward=24.768! +[2024-12-06 21:48:32,951][00194] Fps is (10 sec: 4505.6, 60 sec: 4096.0, 300 sec: 4109.9). Total num frames: 3022848. Throughput: 0: 1023.9. Samples: 756780. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2024-12-06 21:48:32,954][00194] Avg episode reward: [(0, '25.552')] +[2024-12-06 21:48:32,961][03821] Saving new best policy, reward=25.552! +[2024-12-06 21:48:34,026][03834] Updated weights for policy 0, policy_version 740 (0.0019) +[2024-12-06 21:48:37,951][00194] Fps is (10 sec: 4505.6, 60 sec: 4096.0, 300 sec: 4096.0). Total num frames: 3043328. Throughput: 0: 1051.9. Samples: 760120. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-12-06 21:48:37,954][00194] Avg episode reward: [(0, '25.380')] +[2024-12-06 21:48:42,951][00194] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 4068.2). Total num frames: 3059712. Throughput: 0: 1001.6. Samples: 764580. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-12-06 21:48:42,953][00194] Avg episode reward: [(0, '27.022')] +[2024-12-06 21:48:42,958][03821] Saving new best policy, reward=27.022! +[2024-12-06 21:48:45,087][03834] Updated weights for policy 0, policy_version 750 (0.0033) +[2024-12-06 21:48:47,951][00194] Fps is (10 sec: 4096.0, 60 sec: 4096.0, 300 sec: 4109.9). Total num frames: 3084288. Throughput: 0: 1021.1. Samples: 771806. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2024-12-06 21:48:47,956][00194] Avg episode reward: [(0, '24.393')] +[2024-12-06 21:48:52,956][00194] Fps is (10 sec: 4503.6, 60 sec: 4164.0, 300 sec: 4109.8). Total num frames: 3104768. Throughput: 0: 1055.3. Samples: 775480. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-12-06 21:48:52,961][00194] Avg episode reward: [(0, '22.959')] +[2024-12-06 21:48:54,657][03834] Updated weights for policy 0, policy_version 760 (0.0024) +[2024-12-06 21:48:57,951][00194] Fps is (10 sec: 3686.4, 60 sec: 4027.8, 300 sec: 4082.1). Total num frames: 3121152. Throughput: 0: 1025.6. Samples: 780468. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-12-06 21:48:57,956][00194] Avg episode reward: [(0, '23.913')] +[2024-12-06 21:49:02,951][00194] Fps is (10 sec: 4097.9, 60 sec: 4096.0, 300 sec: 4096.0). Total num frames: 3145728. Throughput: 0: 1003.6. Samples: 786902. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-12-06 21:49:02,958][00194] Avg episode reward: [(0, '23.160')] +[2024-12-06 21:49:04,519][03834] Updated weights for policy 0, policy_version 770 (0.0026) +[2024-12-06 21:49:07,951][00194] Fps is (10 sec: 4915.2, 60 sec: 4232.5, 300 sec: 4123.8). Total num frames: 3170304. Throughput: 0: 1031.6. Samples: 790542. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2024-12-06 21:49:07,955][00194] Avg episode reward: [(0, '22.430')] +[2024-12-06 21:49:12,951][00194] Fps is (10 sec: 4096.0, 60 sec: 4164.3, 300 sec: 4096.0). Total num frames: 3186688. Throughput: 0: 1047.6. Samples: 796522. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-12-06 21:49:12,957][00194] Avg episode reward: [(0, '23.418')] +[2024-12-06 21:49:15,449][03834] Updated weights for policy 0, policy_version 780 (0.0027) +[2024-12-06 21:49:17,951][00194] Fps is (10 sec: 3276.8, 60 sec: 3959.5, 300 sec: 4082.1). Total num frames: 3203072. Throughput: 0: 1005.7. Samples: 802036. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-12-06 21:49:17,958][00194] Avg episode reward: [(0, '23.736')] +[2024-12-06 21:49:17,969][03821] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000782_3203072.pth... +[2024-12-06 21:49:18,087][03821] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000542_2220032.pth +[2024-12-06 21:49:22,951][00194] Fps is (10 sec: 4096.0, 60 sec: 4164.3, 300 sec: 4109.9). Total num frames: 3227648. Throughput: 0: 1010.6. Samples: 805598. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-12-06 21:49:22,954][00194] Avg episode reward: [(0, '23.175')] +[2024-12-06 21:49:24,152][03834] Updated weights for policy 0, policy_version 790 (0.0025) +[2024-12-06 21:49:27,954][00194] Fps is (10 sec: 4504.3, 60 sec: 4164.1, 300 sec: 4096.0). Total num frames: 3248128. Throughput: 0: 1059.3. Samples: 812250. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-12-06 21:49:27,957][00194] Avg episode reward: [(0, '23.915')] +[2024-12-06 21:49:32,951][00194] Fps is (10 sec: 3686.4, 60 sec: 4027.7, 300 sec: 4082.1). Total num frames: 3264512. Throughput: 0: 1000.5. Samples: 816830. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-12-06 21:49:32,959][00194] Avg episode reward: [(0, '24.889')] +[2024-12-06 21:49:35,153][03834] Updated weights for policy 0, policy_version 800 (0.0022) +[2024-12-06 21:49:37,951][00194] Fps is (10 sec: 4097.2, 60 sec: 4096.0, 300 sec: 4109.9). Total num frames: 3289088. Throughput: 0: 1002.5. Samples: 820590. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-12-06 21:49:37,954][00194] Avg episode reward: [(0, '25.818')] +[2024-12-06 21:49:42,958][00194] Fps is (10 sec: 4912.0, 60 sec: 4232.1, 300 sec: 4123.7). Total num frames: 3313664. Throughput: 0: 1054.3. Samples: 827920. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-12-06 21:49:42,960][00194] Avg episode reward: [(0, '27.392')] +[2024-12-06 21:49:42,963][03821] Saving new best policy, reward=27.392! +[2024-12-06 21:49:43,896][03834] Updated weights for policy 0, policy_version 810 (0.0020) +[2024-12-06 21:49:47,951][00194] Fps is (10 sec: 3686.4, 60 sec: 4027.7, 300 sec: 4082.1). Total num frames: 3325952. Throughput: 0: 1016.2. Samples: 832630. Policy #0 lag: (min: 0.0, avg: 0.3, max: 2.0) +[2024-12-06 21:49:47,957][00194] Avg episode reward: [(0, '27.838')] +[2024-12-06 21:49:47,968][03821] Saving new best policy, reward=27.838! +[2024-12-06 21:49:52,951][00194] Fps is (10 sec: 3688.8, 60 sec: 4096.3, 300 sec: 4096.0). Total num frames: 3350528. Throughput: 0: 1002.0. Samples: 835634. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) +[2024-12-06 21:49:52,958][00194] Avg episode reward: [(0, '25.857')] +[2024-12-06 21:49:54,606][03834] Updated weights for policy 0, policy_version 820 (0.0027) +[2024-12-06 21:49:57,951][00194] Fps is (10 sec: 4915.2, 60 sec: 4232.5, 300 sec: 4123.8). Total num frames: 3375104. Throughput: 0: 1029.0. Samples: 842826. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-12-06 21:49:57,954][00194] Avg episode reward: [(0, '26.115')] +[2024-12-06 21:50:02,955][00194] Fps is (10 sec: 4094.4, 60 sec: 4095.7, 300 sec: 4095.9). Total num frames: 3391488. Throughput: 0: 1028.4. Samples: 848318. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-12-06 21:50:02,957][00194] Avg episode reward: [(0, '25.152')] +[2024-12-06 21:50:05,679][03834] Updated weights for policy 0, policy_version 830 (0.0028) +[2024-12-06 21:50:07,951][00194] Fps is (10 sec: 3276.8, 60 sec: 3959.5, 300 sec: 4068.2). Total num frames: 3407872. Throughput: 0: 998.6. Samples: 850536. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2024-12-06 21:50:07,953][00194] Avg episode reward: [(0, '24.509')] +[2024-12-06 21:50:12,951][00194] Fps is (10 sec: 4097.6, 60 sec: 4096.0, 300 sec: 4109.9). Total num frames: 3432448. Throughput: 0: 1012.6. Samples: 857814. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-12-06 21:50:12,958][00194] Avg episode reward: [(0, '23.171')] +[2024-12-06 21:50:14,115][03834] Updated weights for policy 0, policy_version 840 (0.0023) +[2024-12-06 21:50:17,951][00194] Fps is (10 sec: 4505.6, 60 sec: 4164.3, 300 sec: 4109.9). Total num frames: 3452928. Throughput: 0: 1055.0. Samples: 864304. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-12-06 21:50:17,955][00194] Avg episode reward: [(0, '25.034')] +[2024-12-06 21:50:22,951][00194] Fps is (10 sec: 3686.4, 60 sec: 4027.7, 300 sec: 4082.1). Total num frames: 3469312. Throughput: 0: 1020.5. Samples: 866512. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-12-06 21:50:22,957][00194] Avg episode reward: [(0, '23.958')] +[2024-12-06 21:50:25,225][03834] Updated weights for policy 0, policy_version 850 (0.0033) +[2024-12-06 21:50:27,951][00194] Fps is (10 sec: 4096.0, 60 sec: 4096.2, 300 sec: 4096.0). Total num frames: 3493888. Throughput: 0: 997.5. Samples: 872800. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-12-06 21:50:27,961][00194] Avg episode reward: [(0, '24.247')] +[2024-12-06 21:50:32,951][00194] Fps is (10 sec: 4915.2, 60 sec: 4232.5, 300 sec: 4123.9). Total num frames: 3518464. Throughput: 0: 1057.8. Samples: 880232. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-12-06 21:50:32,955][00194] Avg episode reward: [(0, '25.921')] +[2024-12-06 21:50:33,875][03834] Updated weights for policy 0, policy_version 860 (0.0015) +[2024-12-06 21:50:37,954][00194] Fps is (10 sec: 3685.5, 60 sec: 4027.6, 300 sec: 4082.1). Total num frames: 3530752. Throughput: 0: 1043.1. Samples: 882574. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-12-06 21:50:37,960][00194] Avg episode reward: [(0, '26.412')] +[2024-12-06 21:50:42,951][00194] Fps is (10 sec: 3686.4, 60 sec: 4028.2, 300 sec: 4082.1). Total num frames: 3555328. Throughput: 0: 1005.3. Samples: 888064. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-12-06 21:50:42,953][00194] Avg episode reward: [(0, '26.808')] +[2024-12-06 21:50:44,449][03834] Updated weights for policy 0, policy_version 870 (0.0018) +[2024-12-06 21:50:47,951][00194] Fps is (10 sec: 4916.4, 60 sec: 4232.5, 300 sec: 4123.8). Total num frames: 3579904. Throughput: 0: 1048.8. Samples: 895512. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-12-06 21:50:47,961][00194] Avg episode reward: [(0, '27.655')] +[2024-12-06 21:50:52,951][00194] Fps is (10 sec: 4096.0, 60 sec: 4096.0, 300 sec: 4096.0). Total num frames: 3596288. Throughput: 0: 1069.9. Samples: 898680. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-12-06 21:50:52,954][00194] Avg episode reward: [(0, '26.739')] +[2024-12-06 21:50:54,705][03834] Updated weights for policy 0, policy_version 880 (0.0016) +[2024-12-06 21:50:57,951][00194] Fps is (10 sec: 3276.8, 60 sec: 3959.5, 300 sec: 4068.2). Total num frames: 3612672. Throughput: 0: 1007.1. Samples: 903132. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-12-06 21:50:57,953][00194] Avg episode reward: [(0, '25.566')] +[2024-12-06 21:51:02,951][00194] Fps is (10 sec: 4096.0, 60 sec: 4096.3, 300 sec: 4096.0). Total num frames: 3637248. Throughput: 0: 1026.6. Samples: 910502. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) +[2024-12-06 21:51:02,954][00194] Avg episode reward: [(0, '25.115')] +[2024-12-06 21:51:03,894][03834] Updated weights for policy 0, policy_version 890 (0.0013) +[2024-12-06 21:51:07,951][00194] Fps is (10 sec: 4915.2, 60 sec: 4232.5, 300 sec: 4109.9). Total num frames: 3661824. Throughput: 0: 1059.3. Samples: 914180. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-12-06 21:51:07,954][00194] Avg episode reward: [(0, '23.080')] +[2024-12-06 21:51:12,951][00194] Fps is (10 sec: 3686.4, 60 sec: 4027.7, 300 sec: 4068.2). Total num frames: 3674112. Throughput: 0: 1029.7. Samples: 919136. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-12-06 21:51:12,954][00194] Avg episode reward: [(0, '22.567')] +[2024-12-06 21:51:14,794][03834] Updated weights for policy 0, policy_version 900 (0.0028) +[2024-12-06 21:51:17,951][00194] Fps is (10 sec: 3686.4, 60 sec: 4096.0, 300 sec: 4096.0). Total num frames: 3698688. Throughput: 0: 1011.0. Samples: 925726. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) +[2024-12-06 21:51:17,954][00194] Avg episode reward: [(0, '22.615')] +[2024-12-06 21:51:17,965][03821] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000903_3698688.pth... +[2024-12-06 21:51:18,085][03821] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000663_2715648.pth +[2024-12-06 21:51:22,951][00194] Fps is (10 sec: 4915.2, 60 sec: 4232.5, 300 sec: 4123.8). Total num frames: 3723264. Throughput: 0: 1038.6. Samples: 929308. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-12-06 21:51:22,954][00194] Avg episode reward: [(0, '21.386')] +[2024-12-06 21:51:23,297][03834] Updated weights for policy 0, policy_version 910 (0.0018) +[2024-12-06 21:51:27,951][00194] Fps is (10 sec: 4096.0, 60 sec: 4096.0, 300 sec: 4096.0). Total num frames: 3739648. Throughput: 0: 1040.8. Samples: 934902. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) +[2024-12-06 21:51:27,957][00194] Avg episode reward: [(0, '21.071')] +[2024-12-06 21:51:32,951][00194] Fps is (10 sec: 3686.4, 60 sec: 4027.7, 300 sec: 4082.1). Total num frames: 3760128. Throughput: 0: 1001.8. Samples: 940594. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) +[2024-12-06 21:51:32,955][00194] Avg episode reward: [(0, '21.121')] +[2024-12-06 21:51:34,517][03834] Updated weights for policy 0, policy_version 920 (0.0026) +[2024-12-06 21:51:37,951][00194] Fps is (10 sec: 4505.5, 60 sec: 4232.7, 300 sec: 4123.8). Total num frames: 3784704. Throughput: 0: 1012.9. Samples: 944262. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-12-06 21:51:37,959][00194] Avg episode reward: [(0, '23.031')] +[2024-12-06 21:51:42,953][00194] Fps is (10 sec: 4504.8, 60 sec: 4164.1, 300 sec: 4109.9). Total num frames: 3805184. Throughput: 0: 1064.0. Samples: 951016. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-12-06 21:51:42,959][00194] Avg episode reward: [(0, '23.513')] +[2024-12-06 21:51:44,241][03834] Updated weights for policy 0, policy_version 930 (0.0032) +[2024-12-06 21:51:47,951][00194] Fps is (10 sec: 3276.9, 60 sec: 3959.5, 300 sec: 4068.2). Total num frames: 3817472. Throughput: 0: 1007.1. Samples: 955820. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-12-06 21:51:47,959][00194] Avg episode reward: [(0, '24.581')] +[2024-12-06 21:51:52,951][00194] Fps is (10 sec: 4096.8, 60 sec: 4164.3, 300 sec: 4109.9). Total num frames: 3846144. Throughput: 0: 1007.6. Samples: 959524. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-12-06 21:51:52,957][00194] Avg episode reward: [(0, '25.625')] +[2024-12-06 21:51:53,732][03834] Updated weights for policy 0, policy_version 940 (0.0041) +[2024-12-06 21:51:57,953][00194] Fps is (10 sec: 4914.2, 60 sec: 4232.4, 300 sec: 4109.9). Total num frames: 3866624. Throughput: 0: 1060.0. Samples: 966840. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2024-12-06 21:51:57,960][00194] Avg episode reward: [(0, '27.405')] +[2024-12-06 21:52:02,952][00194] Fps is (10 sec: 3686.1, 60 sec: 4096.0, 300 sec: 4082.1). Total num frames: 3883008. Throughput: 0: 1016.9. Samples: 971488. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-12-06 21:52:02,956][00194] Avg episode reward: [(0, '26.874')] +[2024-12-06 21:52:04,767][03834] Updated weights for policy 0, policy_version 950 (0.0034) +[2024-12-06 21:52:07,951][00194] Fps is (10 sec: 3687.0, 60 sec: 4027.7, 300 sec: 4082.1). Total num frames: 3903488. Throughput: 0: 1006.6. Samples: 974604. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2024-12-06 21:52:07,958][00194] Avg episode reward: [(0, '26.386')] +[2024-12-06 21:52:12,951][00194] Fps is (10 sec: 4505.9, 60 sec: 4232.5, 300 sec: 4109.9). Total num frames: 3928064. Throughput: 0: 1044.3. Samples: 981896. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-12-06 21:52:12,958][00194] Avg episode reward: [(0, '25.549')] +[2024-12-06 21:52:13,124][03834] Updated weights for policy 0, policy_version 960 (0.0015) +[2024-12-06 21:52:17,956][00194] Fps is (10 sec: 4503.7, 60 sec: 4164.0, 300 sec: 4109.8). Total num frames: 3948544. Throughput: 0: 1043.7. Samples: 987564. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-12-06 21:52:17,958][00194] Avg episode reward: [(0, '25.216')] +[2024-12-06 21:52:22,954][00194] Fps is (10 sec: 3685.4, 60 sec: 4027.6, 300 sec: 4082.1). Total num frames: 3964928. Throughput: 0: 1013.9. Samples: 989890. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-12-06 21:52:22,956][00194] Avg episode reward: [(0, '23.298')] +[2024-12-06 21:52:24,105][03834] Updated weights for policy 0, policy_version 970 (0.0022) +[2024-12-06 21:52:27,952][00194] Fps is (10 sec: 4097.6, 60 sec: 4164.2, 300 sec: 4109.9). Total num frames: 3989504. Throughput: 0: 1025.7. Samples: 997170. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-12-06 21:52:27,961][00194] Avg episode reward: [(0, '23.044')] +[2024-12-06 21:52:30,850][03821] Stopping Batcher_0... +[2024-12-06 21:52:30,851][03821] Loop batcher_evt_loop terminating... +[2024-12-06 21:52:30,851][00194] Component Batcher_0 stopped! +[2024-12-06 21:52:30,865][03821] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... +[2024-12-06 21:52:30,937][03834] Weights refcount: 2 0 +[2024-12-06 21:52:30,942][00194] Component InferenceWorker_p0-w0 stopped! +[2024-12-06 21:52:30,944][03834] Stopping InferenceWorker_p0-w0... +[2024-12-06 21:52:30,945][03834] Loop inference_proc0-0_evt_loop terminating... +[2024-12-06 21:52:31,040][03821] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000782_3203072.pth +[2024-12-06 21:52:31,070][03821] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... +[2024-12-06 21:52:31,296][00194] Component LearnerWorker_p0 stopped! +[2024-12-06 21:52:31,298][03821] Stopping LearnerWorker_p0... +[2024-12-06 21:52:31,300][03821] Loop learner_proc0_evt_loop terminating... +[2024-12-06 21:52:31,466][00194] Component RolloutWorker_w4 stopped! +[2024-12-06 21:52:31,469][03840] Stopping RolloutWorker_w4... +[2024-12-06 21:52:31,472][03840] Loop rollout_proc4_evt_loop terminating... +[2024-12-06 21:52:31,483][00194] Component RolloutWorker_w2 stopped! +[2024-12-06 21:52:31,488][03837] Stopping RolloutWorker_w2... +[2024-12-06 21:52:31,489][03837] Loop rollout_proc2_evt_loop terminating... +[2024-12-06 21:52:31,539][00194] Component RolloutWorker_w0 stopped! +[2024-12-06 21:52:31,544][03835] Stopping RolloutWorker_w0... +[2024-12-06 21:52:31,545][03835] Loop rollout_proc0_evt_loop terminating... +[2024-12-06 21:52:31,560][00194] Component RolloutWorker_w6 stopped! +[2024-12-06 21:52:31,565][03841] Stopping RolloutWorker_w6... +[2024-12-06 21:52:31,566][03841] Loop rollout_proc6_evt_loop terminating... +[2024-12-06 21:52:31,614][03842] Stopping RolloutWorker_w7... +[2024-12-06 21:52:31,614][00194] Component RolloutWorker_w7 stopped! +[2024-12-06 21:52:31,653][03838] Stopping RolloutWorker_w3... +[2024-12-06 21:52:31,654][03842] Loop rollout_proc7_evt_loop terminating... +[2024-12-06 21:52:31,653][00194] Component RolloutWorker_w3 stopped! +[2024-12-06 21:52:31,665][03838] Loop rollout_proc3_evt_loop terminating... +[2024-12-06 21:52:31,680][00194] Component RolloutWorker_w1 stopped! +[2024-12-06 21:52:31,680][03836] Stopping RolloutWorker_w1... +[2024-12-06 21:52:31,685][03836] Loop rollout_proc1_evt_loop terminating... +[2024-12-06 21:52:31,699][03839] Stopping RolloutWorker_w5... +[2024-12-06 21:52:31,699][00194] Component RolloutWorker_w5 stopped! +[2024-12-06 21:52:31,702][00194] Waiting for process learner_proc0 to stop... +[2024-12-06 21:52:31,705][03839] Loop rollout_proc5_evt_loop terminating... +[2024-12-06 21:52:33,299][00194] Waiting for process inference_proc0-0 to join... +[2024-12-06 21:52:33,560][00194] Waiting for process rollout_proc0 to join... +[2024-12-06 21:52:36,216][00194] Waiting for process rollout_proc1 to join... +[2024-12-06 21:52:36,221][00194] Waiting for process rollout_proc2 to join... +[2024-12-06 21:52:36,229][00194] Waiting for process rollout_proc3 to join... +[2024-12-06 21:52:36,235][00194] Waiting for process rollout_proc4 to join... +[2024-12-06 21:52:36,239][00194] Waiting for process rollout_proc5 to join... +[2024-12-06 21:52:36,243][00194] Waiting for process rollout_proc6 to join... +[2024-12-06 21:52:36,246][00194] Waiting for process rollout_proc7 to join... +[2024-12-06 21:52:36,250][00194] Batcher 0 profile tree view: +batching: 26.2187, releasing_batches: 0.0421 +[2024-12-06 21:52:36,251][00194] InferenceWorker_p0-w0 profile tree view: +wait_policy: 0.0000 + wait_policy_total: 403.2507 +update_model: 8.0717 + weight_update: 0.0021 +one_step: 0.0095 + handle_policy_step: 549.3549 + deserialize: 14.2780, stack: 2.9103, obs_to_device_normalize: 119.0290, forward: 273.9383, send_messages: 27.0422 + prepare_outputs: 84.7471 + to_cpu: 51.3656 +[2024-12-06 21:52:36,253][00194] Learner 0 profile tree view: +misc: 0.0056, prepare_batch: 13.2767 +train: 74.6824 + epoch_init: 0.0159, minibatch_init: 0.0062, losses_postprocess: 0.6705, kl_divergence: 0.5929, after_optimizer: 34.4278 + calculate_losses: 26.5465 + losses_init: 0.0291, forward_head: 1.3518, bptt_initial: 17.9627, tail: 1.0303, advantages_returns: 0.2468, losses: 3.7575 + bptt: 1.9123 + bptt_forward_core: 1.8086 + update: 11.8142 + clip: 0.8521 +[2024-12-06 21:52:36,254][00194] RolloutWorker_w0 profile tree view: +wait_for_trajectories: 0.3495, enqueue_policy_requests: 91.4223, env_step: 785.1041, overhead: 11.8296, complete_rollouts: 7.2625 +save_policy_outputs: 19.4289 + split_output_tensors: 7.7735 +[2024-12-06 21:52:36,256][00194] RolloutWorker_w7 profile tree view: +wait_for_trajectories: 0.2731, enqueue_policy_requests: 93.0694, env_step: 788.5925, overhead: 11.9627, complete_rollouts: 6.6416 +save_policy_outputs: 19.5025 + split_output_tensors: 7.7461 +[2024-12-06 21:52:36,258][00194] Loop Runner_EvtLoop terminating... +[2024-12-06 21:52:36,261][00194] Runner profile tree view: +main_loop: 1032.5364 +[2024-12-06 21:52:36,262][00194] Collected {0: 4005888}, FPS: 3879.7 +[2024-12-06 22:05:37,308][00194] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json +[2024-12-06 22:05:37,310][00194] Overriding arg 'num_workers' with value 1 passed from command line +[2024-12-06 22:05:37,313][00194] Adding new argument 'no_render'=True that is not in the saved config file! +[2024-12-06 22:05:37,315][00194] Adding new argument 'save_video'=True that is not in the saved config file! +[2024-12-06 22:05:37,317][00194] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! +[2024-12-06 22:05:37,319][00194] Adding new argument 'video_name'=None that is not in the saved config file! +[2024-12-06 22:05:37,322][00194] Adding new argument 'max_num_frames'=1000000000.0 that is not in the saved config file! +[2024-12-06 22:05:37,323][00194] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! +[2024-12-06 22:05:37,325][00194] Adding new argument 'push_to_hub'=False that is not in the saved config file! +[2024-12-06 22:05:37,326][00194] Adding new argument 'hf_repository'=None that is not in the saved config file! +[2024-12-06 22:05:37,327][00194] Adding new argument 'policy_index'=0 that is not in the saved config file! +[2024-12-06 22:05:37,328][00194] Adding new argument 'eval_deterministic'=False that is not in the saved config file! +[2024-12-06 22:05:37,329][00194] Adding new argument 'train_script'=None that is not in the saved config file! +[2024-12-06 22:05:37,330][00194] Adding new argument 'enjoy_script'=None that is not in the saved config file! +[2024-12-06 22:05:37,331][00194] Using frameskip 1 and render_action_repeat=4 for evaluation +[2024-12-06 22:05:37,363][00194] Doom resolution: 160x120, resize resolution: (128, 72) +[2024-12-06 22:05:37,366][00194] RunningMeanStd input shape: (3, 72, 128) +[2024-12-06 22:05:37,368][00194] RunningMeanStd input shape: (1,) +[2024-12-06 22:05:37,384][00194] ConvEncoder: input_channels=3 +[2024-12-06 22:05:37,499][00194] Conv encoder output size: 512 +[2024-12-06 22:05:37,501][00194] Policy head output size: 512 +[2024-12-06 22:05:37,765][00194] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... +[2024-12-06 22:05:38,559][00194] Num frames 100... +[2024-12-06 22:05:38,684][00194] Num frames 200... +[2024-12-06 22:05:38,811][00194] Num frames 300... +[2024-12-06 22:05:38,935][00194] Num frames 400... +[2024-12-06 22:05:39,059][00194] Num frames 500... +[2024-12-06 22:05:39,186][00194] Num frames 600... +[2024-12-06 22:05:39,311][00194] Num frames 700... +[2024-12-06 22:05:39,435][00194] Num frames 800... +[2024-12-06 22:05:39,568][00194] Num frames 900... +[2024-12-06 22:05:39,658][00194] Avg episode rewards: #0: 20.280, true rewards: #0: 9.280 +[2024-12-06 22:05:39,660][00194] Avg episode reward: 20.280, avg true_objective: 9.280 +[2024-12-06 22:05:39,748][00194] Num frames 1000... +[2024-12-06 22:05:39,883][00194] Num frames 1100... +[2024-12-06 22:05:40,004][00194] Num frames 1200... +[2024-12-06 22:05:40,127][00194] Num frames 1300... +[2024-12-06 22:05:40,250][00194] Num frames 1400... +[2024-12-06 22:05:40,374][00194] Num frames 1500... +[2024-12-06 22:05:40,500][00194] Num frames 1600... +[2024-12-06 22:05:40,629][00194] Num frames 1700... +[2024-12-06 22:05:40,797][00194] Avg episode rewards: #0: 18.950, true rewards: #0: 8.950 +[2024-12-06 22:05:40,799][00194] Avg episode reward: 18.950, avg true_objective: 8.950 +[2024-12-06 22:05:40,817][00194] Num frames 1800... +[2024-12-06 22:05:40,945][00194] Num frames 1900... +[2024-12-06 22:05:41,070][00194] Num frames 2000... +[2024-12-06 22:05:41,194][00194] Num frames 2100... +[2024-12-06 22:05:41,316][00194] Num frames 2200... +[2024-12-06 22:05:41,441][00194] Num frames 2300... +[2024-12-06 22:05:41,571][00194] Num frames 2400... +[2024-12-06 22:05:41,697][00194] Num frames 2500... +[2024-12-06 22:05:41,825][00194] Num frames 2600... +[2024-12-06 22:05:41,907][00194] Avg episode rewards: #0: 19.073, true rewards: #0: 8.740 +[2024-12-06 22:05:41,908][00194] Avg episode reward: 19.073, avg true_objective: 8.740 +[2024-12-06 22:05:42,034][00194] Num frames 2700... +[2024-12-06 22:05:42,210][00194] Num frames 2800... +[2024-12-06 22:05:42,377][00194] Num frames 2900... +[2024-12-06 22:05:42,538][00194] Num frames 3000... +[2024-12-06 22:05:42,703][00194] Num frames 3100... +[2024-12-06 22:05:42,875][00194] Num frames 3200... +[2024-12-06 22:05:43,054][00194] Num frames 3300... +[2024-12-06 22:05:43,232][00194] Num frames 3400... +[2024-12-06 22:05:43,396][00194] Num frames 3500... +[2024-12-06 22:05:43,564][00194] Num frames 3600... +[2024-12-06 22:05:43,741][00194] Num frames 3700... +[2024-12-06 22:05:43,926][00194] Num frames 3800... +[2024-12-06 22:05:44,098][00194] Num frames 3900... +[2024-12-06 22:05:44,161][00194] Avg episode rewards: #0: 23.255, true rewards: #0: 9.755 +[2024-12-06 22:05:44,163][00194] Avg episode reward: 23.255, avg true_objective: 9.755 +[2024-12-06 22:05:44,333][00194] Num frames 4000... +[2024-12-06 22:05:44,497][00194] Num frames 4100... +[2024-12-06 22:05:44,626][00194] Num frames 4200... +[2024-12-06 22:05:44,758][00194] Num frames 4300... +[2024-12-06 22:05:44,887][00194] Num frames 4400... +[2024-12-06 22:05:45,017][00194] Num frames 4500... +[2024-12-06 22:05:45,151][00194] Num frames 4600... +[2024-12-06 22:05:45,273][00194] Num frames 4700... +[2024-12-06 22:05:45,395][00194] Num frames 4800... +[2024-12-06 22:05:45,565][00194] Avg episode rewards: #0: 23.588, true rewards: #0: 9.788 +[2024-12-06 22:05:45,567][00194] Avg episode reward: 23.588, avg true_objective: 9.788 +[2024-12-06 22:05:45,577][00194] Num frames 4900... +[2024-12-06 22:05:45,701][00194] Num frames 5000... +[2024-12-06 22:05:45,842][00194] Num frames 5100... +[2024-12-06 22:05:45,967][00194] Num frames 5200... +[2024-12-06 22:05:46,090][00194] Num frames 5300... +[2024-12-06 22:05:46,212][00194] Num frames 5400... +[2024-12-06 22:05:46,336][00194] Num frames 5500... +[2024-12-06 22:05:46,461][00194] Num frames 5600... +[2024-12-06 22:05:46,583][00194] Num frames 5700... +[2024-12-06 22:05:46,712][00194] Num frames 5800... +[2024-12-06 22:05:46,854][00194] Num frames 5900... +[2024-12-06 22:05:46,976][00194] Num frames 6000... +[2024-12-06 22:05:47,101][00194] Num frames 6100... +[2024-12-06 22:05:47,209][00194] Avg episode rewards: #0: 24.403, true rewards: #0: 10.237 +[2024-12-06 22:05:47,211][00194] Avg episode reward: 24.403, avg true_objective: 10.237 +[2024-12-06 22:05:47,288][00194] Num frames 6200... +[2024-12-06 22:05:47,416][00194] Num frames 6300... +[2024-12-06 22:05:47,557][00194] Num frames 6400... +[2024-12-06 22:05:47,685][00194] Num frames 6500... +[2024-12-06 22:05:47,820][00194] Num frames 6600... +[2024-12-06 22:05:47,943][00194] Num frames 6700... +[2024-12-06 22:05:48,068][00194] Num frames 6800... +[2024-12-06 22:05:48,187][00194] Num frames 6900... +[2024-12-06 22:05:48,310][00194] Num frames 7000... +[2024-12-06 22:05:48,438][00194] Num frames 7100... +[2024-12-06 22:05:48,576][00194] Avg episode rewards: #0: 24.523, true rewards: #0: 10.237 +[2024-12-06 22:05:48,578][00194] Avg episode reward: 24.523, avg true_objective: 10.237 +[2024-12-06 22:05:48,620][00194] Num frames 7200... +[2024-12-06 22:05:48,742][00194] Num frames 7300... +[2024-12-06 22:05:48,881][00194] Num frames 7400... +[2024-12-06 22:05:49,005][00194] Num frames 7500... +[2024-12-06 22:05:49,129][00194] Num frames 7600... +[2024-12-06 22:05:49,252][00194] Num frames 7700... +[2024-12-06 22:05:49,387][00194] Num frames 7800... +[2024-12-06 22:05:49,530][00194] Num frames 7900... +[2024-12-06 22:05:49,673][00194] Num frames 8000... +[2024-12-06 22:05:49,801][00194] Num frames 8100... +[2024-12-06 22:05:49,940][00194] Num frames 8200... +[2024-12-06 22:05:50,080][00194] Num frames 8300... +[2024-12-06 22:05:50,210][00194] Num frames 8400... +[2024-12-06 22:05:50,338][00194] Num frames 8500... +[2024-12-06 22:05:50,461][00194] Num frames 8600... +[2024-12-06 22:05:50,605][00194] Num frames 8700... +[2024-12-06 22:05:50,739][00194] Num frames 8800... +[2024-12-06 22:05:50,879][00194] Avg episode rewards: #0: 26.078, true rewards: #0: 11.077 +[2024-12-06 22:05:50,881][00194] Avg episode reward: 26.078, avg true_objective: 11.077 +[2024-12-06 22:05:50,928][00194] Num frames 8900... +[2024-12-06 22:05:51,052][00194] Num frames 9000... +[2024-12-06 22:05:51,173][00194] Num frames 9100... +[2024-12-06 22:05:51,296][00194] Num frames 9200... +[2024-12-06 22:05:51,468][00194] Avg episode rewards: #0: 24.109, true rewards: #0: 10.331 +[2024-12-06 22:05:51,470][00194] Avg episode reward: 24.109, avg true_objective: 10.331 +[2024-12-06 22:05:51,475][00194] Num frames 9300... +[2024-12-06 22:05:51,594][00194] Num frames 9400... +[2024-12-06 22:05:51,716][00194] Num frames 9500... +[2024-12-06 22:05:51,852][00194] Num frames 9600... +[2024-12-06 22:05:51,984][00194] Num frames 9700... +[2024-12-06 22:05:52,105][00194] Num frames 9800... +[2024-12-06 22:05:52,229][00194] Num frames 9900... +[2024-12-06 22:05:52,353][00194] Num frames 10000... +[2024-12-06 22:05:52,451][00194] Avg episode rewards: #0: 23.034, true rewards: #0: 10.034 +[2024-12-06 22:05:52,453][00194] Avg episode reward: 23.034, avg true_objective: 10.034 +[2024-12-06 22:06:49,040][00194] Replay video saved to /content/train_dir/default_experiment/replay.mp4! +[2024-12-06 22:15:06,050][00194] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json +[2024-12-06 22:15:06,052][00194] Overriding arg 'num_workers' with value 1 passed from command line +[2024-12-06 22:15:06,054][00194] Adding new argument 'no_render'=True that is not in the saved config file! +[2024-12-06 22:15:06,056][00194] Adding new argument 'save_video'=True that is not in the saved config file! +[2024-12-06 22:15:06,058][00194] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! +[2024-12-06 22:15:06,060][00194] Adding new argument 'video_name'=None that is not in the saved config file! +[2024-12-06 22:15:06,062][00194] Adding new argument 'max_num_frames'=100000 that is not in the saved config file! +[2024-12-06 22:15:06,063][00194] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! +[2024-12-06 22:15:06,064][00194] Adding new argument 'push_to_hub'=True that is not in the saved config file! +[2024-12-06 22:15:06,065][00194] Adding new argument 'hf_repository'='ThomasSimonini/rl_course_vizdoom_health_gathering_supreme' that is not in the saved config file! +[2024-12-06 22:15:06,066][00194] Adding new argument 'policy_index'=0 that is not in the saved config file! +[2024-12-06 22:15:06,067][00194] Adding new argument 'eval_deterministic'=False that is not in the saved config file! +[2024-12-06 22:15:06,068][00194] Adding new argument 'train_script'=None that is not in the saved config file! +[2024-12-06 22:15:06,069][00194] Adding new argument 'enjoy_script'=None that is not in the saved config file! +[2024-12-06 22:15:06,070][00194] Using frameskip 1 and render_action_repeat=4 for evaluation +[2024-12-06 22:15:06,098][00194] RunningMeanStd input shape: (3, 72, 128) +[2024-12-06 22:15:06,099][00194] RunningMeanStd input shape: (1,) +[2024-12-06 22:15:06,112][00194] ConvEncoder: input_channels=3 +[2024-12-06 22:15:06,150][00194] Conv encoder output size: 512 +[2024-12-06 22:15:06,151][00194] Policy head output size: 512 +[2024-12-06 22:15:06,170][00194] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... +[2024-12-06 22:15:06,581][00194] Num frames 100... +[2024-12-06 22:15:06,709][00194] Num frames 200... +[2024-12-06 22:15:06,841][00194] Num frames 300... +[2024-12-06 22:15:06,971][00194] Num frames 400... +[2024-12-06 22:15:07,094][00194] Num frames 500... +[2024-12-06 22:15:07,218][00194] Num frames 600... +[2024-12-06 22:15:07,342][00194] Num frames 700... +[2024-12-06 22:15:07,477][00194] Num frames 800... +[2024-12-06 22:15:07,529][00194] Avg episode rewards: #0: 16.000, true rewards: #0: 8.000 +[2024-12-06 22:15:07,530][00194] Avg episode reward: 16.000, avg true_objective: 8.000 +[2024-12-06 22:15:07,654][00194] Num frames 900... +[2024-12-06 22:15:07,774][00194] Num frames 1000... +[2024-12-06 22:15:07,901][00194] Num frames 1100... +[2024-12-06 22:15:08,025][00194] Num frames 1200... +[2024-12-06 22:15:08,151][00194] Num frames 1300... +[2024-12-06 22:15:08,272][00194] Num frames 1400... +[2024-12-06 22:15:08,393][00194] Num frames 1500... +[2024-12-06 22:15:08,524][00194] Num frames 1600... +[2024-12-06 22:15:08,646][00194] Num frames 1700... +[2024-12-06 22:15:08,770][00194] Num frames 1800... +[2024-12-06 22:15:08,898][00194] Num frames 1900... +[2024-12-06 22:15:09,020][00194] Num frames 2000... +[2024-12-06 22:15:09,096][00194] Avg episode rewards: #0: 20.580, true rewards: #0: 10.080 +[2024-12-06 22:15:09,098][00194] Avg episode reward: 20.580, avg true_objective: 10.080 +[2024-12-06 22:15:09,202][00194] Num frames 2100... +[2024-12-06 22:15:09,323][00194] Num frames 2200... +[2024-12-06 22:15:09,446][00194] Num frames 2300... +[2024-12-06 22:15:09,576][00194] Num frames 2400... +[2024-12-06 22:15:09,699][00194] Num frames 2500... +[2024-12-06 22:15:09,827][00194] Num frames 2600... +[2024-12-06 22:15:09,949][00194] Num frames 2700... +[2024-12-06 22:15:10,071][00194] Num frames 2800... +[2024-12-06 22:15:10,189][00194] Num frames 2900... +[2024-12-06 22:15:10,311][00194] Num frames 3000... +[2024-12-06 22:15:10,432][00194] Num frames 3100... +[2024-12-06 22:15:10,557][00194] Avg episode rewards: #0: 22.504, true rewards: #0: 10.503 +[2024-12-06 22:15:10,559][00194] Avg episode reward: 22.504, avg true_objective: 10.503 +[2024-12-06 22:15:10,621][00194] Num frames 3200... +[2024-12-06 22:15:10,743][00194] Num frames 3300... +[2024-12-06 22:15:10,879][00194] Num frames 3400... +[2024-12-06 22:15:11,055][00194] Num frames 3500... +[2024-12-06 22:15:11,222][00194] Num frames 3600... +[2024-12-06 22:15:11,387][00194] Num frames 3700... +[2024-12-06 22:15:11,552][00194] Num frames 3800... +[2024-12-06 22:15:11,715][00194] Num frames 3900... +[2024-12-06 22:15:11,891][00194] Num frames 4000... +[2024-12-06 22:15:12,051][00194] Num frames 4100... +[2024-12-06 22:15:12,219][00194] Num frames 4200... +[2024-12-06 22:15:12,393][00194] Num frames 4300... +[2024-12-06 22:15:12,572][00194] Num frames 4400... +[2024-12-06 22:15:12,747][00194] Num frames 4500... +[2024-12-06 22:15:12,924][00194] Num frames 4600... +[2024-12-06 22:15:13,100][00194] Num frames 4700... +[2024-12-06 22:15:13,270][00194] Num frames 4800... +[2024-12-06 22:15:13,445][00194] Num frames 4900... +[2024-12-06 22:15:13,578][00194] Num frames 5000... +[2024-12-06 22:15:13,707][00194] Num frames 5100... +[2024-12-06 22:15:13,832][00194] Num frames 5200... +[2024-12-06 22:15:13,946][00194] Avg episode rewards: #0: 34.377, true rewards: #0: 13.127 +[2024-12-06 22:15:13,948][00194] Avg episode reward: 34.377, avg true_objective: 13.127 +[2024-12-06 22:15:14,007][00194] Num frames 5300... +[2024-12-06 22:15:14,125][00194] Num frames 5400... +[2024-12-06 22:15:14,258][00194] Num frames 5500... +[2024-12-06 22:15:14,382][00194] Num frames 5600... +[2024-12-06 22:15:14,508][00194] Num frames 5700... +[2024-12-06 22:15:14,635][00194] Num frames 5800... +[2024-12-06 22:15:14,765][00194] Num frames 5900... +[2024-12-06 22:15:14,895][00194] Num frames 6000... +[2024-12-06 22:15:15,018][00194] Num frames 6100... +[2024-12-06 22:15:15,143][00194] Num frames 6200... +[2024-12-06 22:15:15,264][00194] Num frames 6300... +[2024-12-06 22:15:15,391][00194] Num frames 6400... +[2024-12-06 22:15:15,516][00194] Num frames 6500... +[2024-12-06 22:15:15,645][00194] Num frames 6600... +[2024-12-06 22:15:15,778][00194] Num frames 6700... +[2024-12-06 22:15:15,908][00194] Num frames 6800... +[2024-12-06 22:15:16,033][00194] Num frames 6900... +[2024-12-06 22:15:16,107][00194] Avg episode rewards: #0: 36.430, true rewards: #0: 13.830 +[2024-12-06 22:15:16,110][00194] Avg episode reward: 36.430, avg true_objective: 13.830 +[2024-12-06 22:15:16,212][00194] Num frames 7000... +[2024-12-06 22:15:16,330][00194] Num frames 7100... +[2024-12-06 22:15:16,453][00194] Num frames 7200... +[2024-12-06 22:15:16,573][00194] Num frames 7300... +[2024-12-06 22:15:16,703][00194] Num frames 7400... +[2024-12-06 22:15:16,831][00194] Num frames 7500... +[2024-12-06 22:15:16,955][00194] Num frames 7600... +[2024-12-06 22:15:17,079][00194] Num frames 7700... +[2024-12-06 22:15:17,200][00194] Num frames 7800... +[2024-12-06 22:15:17,326][00194] Num frames 7900... +[2024-12-06 22:15:17,447][00194] Num frames 8000... +[2024-12-06 22:15:17,572][00194] Num frames 8100... +[2024-12-06 22:15:17,697][00194] Num frames 8200... +[2024-12-06 22:15:17,830][00194] Num frames 8300... +[2024-12-06 22:15:17,952][00194] Num frames 8400... +[2024-12-06 22:15:18,071][00194] Num frames 8500... +[2024-12-06 22:15:18,189][00194] Num frames 8600... +[2024-12-06 22:15:18,308][00194] Num frames 8700... +[2024-12-06 22:15:18,437][00194] Num frames 8800... +[2024-12-06 22:15:18,561][00194] Num frames 8900... +[2024-12-06 22:15:18,690][00194] Num frames 9000... +[2024-12-06 22:15:18,765][00194] Avg episode rewards: #0: 40.191, true rewards: #0: 15.025 +[2024-12-06 22:15:18,767][00194] Avg episode reward: 40.191, avg true_objective: 15.025 +[2024-12-06 22:15:18,875][00194] Num frames 9100... +[2024-12-06 22:15:19,001][00194] Num frames 9200... +[2024-12-06 22:15:19,125][00194] Num frames 9300... +[2024-12-06 22:15:19,251][00194] Num frames 9400... +[2024-12-06 22:15:19,371][00194] Num frames 9500... +[2024-12-06 22:15:19,494][00194] Num frames 9600... +[2024-12-06 22:15:19,622][00194] Num frames 9700... +[2024-12-06 22:15:19,746][00194] Num frames 9800... +[2024-12-06 22:15:19,884][00194] Num frames 9900... +[2024-12-06 22:15:20,004][00194] Num frames 10000... +[2024-12-06 22:15:20,126][00194] Num frames 10100... +[2024-12-06 22:15:20,226][00194] Avg episode rewards: #0: 38.050, true rewards: #0: 14.479 +[2024-12-06 22:15:20,227][00194] Avg episode reward: 38.050, avg true_objective: 14.479 +[2024-12-06 22:15:20,305][00194] Num frames 10200... +[2024-12-06 22:15:20,427][00194] Num frames 10300... +[2024-12-06 22:15:20,550][00194] Num frames 10400... +[2024-12-06 22:15:20,673][00194] Num frames 10500... +[2024-12-06 22:15:20,806][00194] Num frames 10600... +[2024-12-06 22:15:20,921][00194] Avg episode rewards: #0: 34.433, true rewards: #0: 13.309 +[2024-12-06 22:15:20,922][00194] Avg episode reward: 34.433, avg true_objective: 13.309 +[2024-12-06 22:15:20,989][00194] Num frames 10700... +[2024-12-06 22:15:21,116][00194] Num frames 10800... +[2024-12-06 22:15:21,240][00194] Num frames 10900... +[2024-12-06 22:15:21,362][00194] Num frames 11000... +[2024-12-06 22:15:21,485][00194] Num frames 11100... +[2024-12-06 22:15:21,608][00194] Num frames 11200... +[2024-12-06 22:15:21,738][00194] Num frames 11300... +[2024-12-06 22:15:21,873][00194] Num frames 11400... +[2024-12-06 22:15:21,998][00194] Num frames 11500... +[2024-12-06 22:15:22,146][00194] Avg episode rewards: #0: 32.861, true rewards: #0: 12.861 +[2024-12-06 22:15:22,147][00194] Avg episode reward: 32.861, avg true_objective: 12.861 +[2024-12-06 22:15:22,179][00194] Num frames 11600... +[2024-12-06 22:15:22,302][00194] Num frames 11700... +[2024-12-06 22:15:22,421][00194] Num frames 11800... +[2024-12-06 22:15:22,545][00194] Num frames 11900... +[2024-12-06 22:15:22,619][00194] Avg episode rewards: #0: 30.113, true rewards: #0: 11.913 +[2024-12-06 22:15:22,621][00194] Avg episode reward: 30.113, avg true_objective: 11.913 +[2024-12-06 22:16:30,708][00194] Replay video saved to /content/train_dir/default_experiment/replay.mp4! +[2024-12-06 22:19:42,464][00194] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json +[2024-12-06 22:19:42,466][00194] Overriding arg 'num_workers' with value 1 passed from command line +[2024-12-06 22:19:42,468][00194] Adding new argument 'no_render'=True that is not in the saved config file! +[2024-12-06 22:19:42,469][00194] Adding new argument 'save_video'=True that is not in the saved config file! +[2024-12-06 22:19:42,471][00194] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! +[2024-12-06 22:19:42,472][00194] Adding new argument 'video_name'=None that is not in the saved config file! +[2024-12-06 22:19:42,474][00194] Adding new argument 'max_num_frames'=100000 that is not in the saved config file! +[2024-12-06 22:19:42,475][00194] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! +[2024-12-06 22:19:42,476][00194] Adding new argument 'push_to_hub'=True that is not in the saved config file! +[2024-12-06 22:19:42,478][00194] Adding new argument 'hf_repository'='ZachXie/rl_course_vizdoom_health_gathering_supreme' that is not in the saved config file! +[2024-12-06 22:19:42,479][00194] Adding new argument 'policy_index'=0 that is not in the saved config file! +[2024-12-06 22:19:42,480][00194] Adding new argument 'eval_deterministic'=False that is not in the saved config file! +[2024-12-06 22:19:42,482][00194] Adding new argument 'train_script'=None that is not in the saved config file! +[2024-12-06 22:19:42,483][00194] Adding new argument 'enjoy_script'=None that is not in the saved config file! +[2024-12-06 22:19:42,486][00194] Using frameskip 1 and render_action_repeat=4 for evaluation +[2024-12-06 22:19:42,520][00194] RunningMeanStd input shape: (3, 72, 128) +[2024-12-06 22:19:42,522][00194] RunningMeanStd input shape: (1,) +[2024-12-06 22:19:42,536][00194] ConvEncoder: input_channels=3 +[2024-12-06 22:19:42,577][00194] Conv encoder output size: 512 +[2024-12-06 22:19:42,579][00194] Policy head output size: 512 +[2024-12-06 22:19:42,599][00194] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... +[2024-12-06 22:19:43,029][00194] Num frames 100... +[2024-12-06 22:19:43,149][00194] Num frames 200... +[2024-12-06 22:19:43,271][00194] Num frames 300... +[2024-12-06 22:19:43,393][00194] Num frames 400... +[2024-12-06 22:19:43,524][00194] Num frames 500... +[2024-12-06 22:19:43,646][00194] Num frames 600... +[2024-12-06 22:19:43,769][00194] Num frames 700... +[2024-12-06 22:19:43,932][00194] Num frames 800... +[2024-12-06 22:19:44,099][00194] Num frames 900... +[2024-12-06 22:19:44,268][00194] Num frames 1000... +[2024-12-06 22:19:44,437][00194] Num frames 1100... +[2024-12-06 22:19:44,624][00194] Num frames 1200... +[2024-12-06 22:19:44,803][00194] Num frames 1300... +[2024-12-06 22:19:44,975][00194] Num frames 1400... +[2024-12-06 22:19:45,145][00194] Num frames 1500... +[2024-12-06 22:19:45,313][00194] Num frames 1600... +[2024-12-06 22:19:45,488][00194] Num frames 1700... +[2024-12-06 22:19:45,643][00194] Avg episode rewards: #0: 48.569, true rewards: #0: 17.570 +[2024-12-06 22:19:45,644][00194] Avg episode reward: 48.569, avg true_objective: 17.570 +[2024-12-06 22:19:45,724][00194] Num frames 1800... +[2024-12-06 22:19:45,908][00194] Num frames 1900... +[2024-12-06 22:19:46,085][00194] Num frames 2000... +[2024-12-06 22:19:46,259][00194] Num frames 2100... +[2024-12-06 22:19:46,414][00194] Num frames 2200... +[2024-12-06 22:19:46,539][00194] Num frames 2300... +[2024-12-06 22:19:46,671][00194] Num frames 2400... +[2024-12-06 22:19:46,793][00194] Num frames 2500... +[2024-12-06 22:19:46,920][00194] Num frames 2600... +[2024-12-06 22:19:47,043][00194] Num frames 2700... +[2024-12-06 22:19:47,167][00194] Num frames 2800... +[2024-12-06 22:19:47,290][00194] Num frames 2900... +[2024-12-06 22:19:47,413][00194] Num frames 3000... +[2024-12-06 22:19:47,536][00194] Num frames 3100... +[2024-12-06 22:19:47,680][00194] Num frames 3200... +[2024-12-06 22:19:47,811][00194] Num frames 3300... +[2024-12-06 22:19:47,943][00194] Num frames 3400... +[2024-12-06 22:19:48,069][00194] Num frames 3500... +[2024-12-06 22:19:48,196][00194] Num frames 3600... +[2024-12-06 22:19:48,318][00194] Num frames 3700... +[2024-12-06 22:19:48,437][00194] Num frames 3800... +[2024-12-06 22:19:48,561][00194] Avg episode rewards: #0: 53.784, true rewards: #0: 19.285 +[2024-12-06 22:19:48,563][00194] Avg episode reward: 53.784, avg true_objective: 19.285 +[2024-12-06 22:19:48,615][00194] Num frames 3900... +[2024-12-06 22:19:48,748][00194] Num frames 4000... +[2024-12-06 22:19:48,874][00194] Num frames 4100... +[2024-12-06 22:19:48,993][00194] Num frames 4200... +[2024-12-06 22:19:49,111][00194] Num frames 4300... +[2024-12-06 22:19:49,228][00194] Num frames 4400... +[2024-12-06 22:19:49,348][00194] Num frames 4500... +[2024-12-06 22:19:49,464][00194] Num frames 4600... +[2024-12-06 22:19:49,583][00194] Num frames 4700... +[2024-12-06 22:19:49,710][00194] Num frames 4800... +[2024-12-06 22:19:49,838][00194] Num frames 4900... +[2024-12-06 22:19:49,959][00194] Num frames 5000... +[2024-12-06 22:19:50,078][00194] Avg episode rewards: #0: 44.176, true rewards: #0: 16.843 +[2024-12-06 22:19:50,080][00194] Avg episode reward: 44.176, avg true_objective: 16.843 +[2024-12-06 22:19:50,137][00194] Num frames 5100... +[2024-12-06 22:19:50,255][00194] Num frames 5200... +[2024-12-06 22:19:50,373][00194] Num frames 5300... +[2024-12-06 22:19:50,491][00194] Num frames 5400... +[2024-12-06 22:19:50,614][00194] Num frames 5500... +[2024-12-06 22:19:50,742][00194] Num frames 5600... +[2024-12-06 22:19:50,869][00194] Num frames 5700... +[2024-12-06 22:19:50,987][00194] Num frames 5800... +[2024-12-06 22:19:51,110][00194] Num frames 5900... +[2024-12-06 22:19:51,230][00194] Num frames 6000... +[2024-12-06 22:19:51,352][00194] Num frames 6100... +[2024-12-06 22:19:51,479][00194] Num frames 6200... +[2024-12-06 22:19:51,602][00194] Num frames 6300... +[2024-12-06 22:19:51,716][00194] Avg episode rewards: #0: 40.870, true rewards: #0: 15.870 +[2024-12-06 22:19:51,718][00194] Avg episode reward: 40.870, avg true_objective: 15.870 +[2024-12-06 22:19:51,791][00194] Num frames 6400... +[2024-12-06 22:19:51,917][00194] Num frames 6500... +[2024-12-06 22:19:52,045][00194] Num frames 6600... +[2024-12-06 22:19:52,166][00194] Num frames 6700... +[2024-12-06 22:19:52,285][00194] Num frames 6800... +[2024-12-06 22:19:52,406][00194] Num frames 6900... +[2024-12-06 22:19:52,527][00194] Num frames 7000... +[2024-12-06 22:19:52,648][00194] Num frames 7100... +[2024-12-06 22:19:52,778][00194] Num frames 7200... +[2024-12-06 22:19:52,894][00194] Avg episode rewards: #0: 35.688, true rewards: #0: 14.488 +[2024-12-06 22:19:52,895][00194] Avg episode reward: 35.688, avg true_objective: 14.488 +[2024-12-06 22:19:52,967][00194] Num frames 7300... +[2024-12-06 22:19:53,091][00194] Num frames 7400... +[2024-12-06 22:19:53,211][00194] Num frames 7500... +[2024-12-06 22:19:53,337][00194] Num frames 7600... +[2024-12-06 22:19:53,457][00194] Num frames 7700... +[2024-12-06 22:19:53,578][00194] Num frames 7800... +[2024-12-06 22:19:53,702][00194] Num frames 7900... +[2024-12-06 22:19:53,837][00194] Num frames 8000... +[2024-12-06 22:19:53,960][00194] Num frames 8100... +[2024-12-06 22:19:54,080][00194] Num frames 8200... +[2024-12-06 22:19:54,219][00194] Avg episode rewards: #0: 33.611, true rewards: #0: 13.778 +[2024-12-06 22:19:54,220][00194] Avg episode reward: 33.611, avg true_objective: 13.778 +[2024-12-06 22:19:54,262][00194] Num frames 8300... +[2024-12-06 22:19:54,381][00194] Num frames 8400... +[2024-12-06 22:19:54,503][00194] Num frames 8500... +[2024-12-06 22:19:54,624][00194] Num frames 8600... +[2024-12-06 22:19:54,750][00194] Num frames 8700... +[2024-12-06 22:19:54,884][00194] Num frames 8800... +[2024-12-06 22:19:55,010][00194] Num frames 8900... +[2024-12-06 22:19:55,135][00194] Num frames 9000... +[2024-12-06 22:19:55,272][00194] Avg episode rewards: #0: 31.238, true rewards: #0: 12.953 +[2024-12-06 22:19:55,274][00194] Avg episode reward: 31.238, avg true_objective: 12.953 +[2024-12-06 22:19:55,316][00194] Num frames 9100... +[2024-12-06 22:19:55,434][00194] Num frames 9200... +[2024-12-06 22:19:55,556][00194] Num frames 9300... +[2024-12-06 22:19:55,680][00194] Num frames 9400... +[2024-12-06 22:19:55,809][00194] Num frames 9500... +[2024-12-06 22:19:55,939][00194] Num frames 9600... +[2024-12-06 22:19:56,063][00194] Num frames 9700... +[2024-12-06 22:19:56,282][00194] Num frames 9800... +[2024-12-06 22:19:56,445][00194] Num frames 9900... +[2024-12-06 22:19:56,617][00194] Num frames 10000... +[2024-12-06 22:19:56,780][00194] Num frames 10100... +[2024-12-06 22:19:57,153][00194] Num frames 10200... +[2024-12-06 22:19:57,315][00194] Num frames 10300... +[2024-12-06 22:19:57,490][00194] Num frames 10400... +[2024-12-06 22:19:57,665][00194] Num frames 10500... +[2024-12-06 22:19:58,090][00194] Num frames 10600... +[2024-12-06 22:19:58,266][00194] Num frames 10700... +[2024-12-06 22:19:58,469][00194] Num frames 10800... +[2024-12-06 22:19:58,641][00194] Num frames 10900... +[2024-12-06 22:19:58,829][00194] Num frames 11000... +[2024-12-06 22:19:59,020][00194] Num frames 11100... +[2024-12-06 22:19:59,186][00194] Avg episode rewards: #0: 34.208, true rewards: #0: 13.959 +[2024-12-06 22:19:59,188][00194] Avg episode reward: 34.208, avg true_objective: 13.959 +[2024-12-06 22:19:59,232][00194] Num frames 11200... +[2024-12-06 22:19:59,359][00194] Num frames 11300... +[2024-12-06 22:19:59,482][00194] Num frames 11400... +[2024-12-06 22:19:59,609][00194] Num frames 11500... +[2024-12-06 22:19:59,730][00194] Num frames 11600... +[2024-12-06 22:19:59,863][00194] Num frames 11700... +[2024-12-06 22:20:00,037][00194] Num frames 11800... +[2024-12-06 22:20:00,200][00194] Num frames 11900... +[2024-12-06 22:20:00,384][00194] Num frames 12000... +[2024-12-06 22:20:00,514][00194] Num frames 12100... +[2024-12-06 22:20:00,635][00194] Num frames 12200... +[2024-12-06 22:20:00,722][00194] Avg episode rewards: #0: 32.918, true rewards: #0: 13.584 +[2024-12-06 22:20:00,724][00194] Avg episode reward: 32.918, avg true_objective: 13.584 +[2024-12-06 22:20:00,825][00194] Num frames 12300... +[2024-12-06 22:20:00,946][00194] Num frames 12400... +[2024-12-06 22:20:01,080][00194] Num frames 12500... +[2024-12-06 22:20:01,203][00194] Num frames 12600... +[2024-12-06 22:20:01,324][00194] Num frames 12700... +[2024-12-06 22:20:01,447][00194] Num frames 12800... +[2024-12-06 22:20:01,570][00194] Num frames 12900... +[2024-12-06 22:20:01,711][00194] Num frames 13000... +[2024-12-06 22:20:01,849][00194] Num frames 13100... +[2024-12-06 22:20:01,977][00194] Num frames 13200... +[2024-12-06 22:20:02,099][00194] Avg episode rewards: #0: 31.750, true rewards: #0: 13.250 +[2024-12-06 22:20:02,101][00194] Avg episode reward: 31.750, avg true_objective: 13.250 +[2024-12-06 22:21:15,889][00194] Replay video saved to /content/train_dir/default_experiment/replay.mp4!