Files
TIR_PROJ/training/runs/ppo_fix_check.log
T
Johnny Fernandes 19bfac9bd9 Test25_1245
2026-04-25 11:47:37 +00:00

3389 lines
139 KiB
Plaintext

Using cpu device
Logging to runs/ppo_fix_check/ppo_1
------------------------------
| time/ | |
| fps | 5021 |
| iterations | 1 |
| time_elapsed | 3 |
| total_timesteps | 16384 |
------------------------------
------------------------------------------
| time/ | |
| fps | 4241 |
| iterations | 2 |
| time_elapsed | 7 |
| total_timesteps | 32768 |
| train/ | |
| approx_kl | 0.0047510993 |
| clip_fraction | 0.0344 |
| clip_range | 0.2 |
| entropy_loss | -2.85 |
| explained_variance | 0.786 |
| learning_rate | 0.0003 |
| loss | -0.00995 |
| n_updates | 10 |
| policy_gradient_loss | -0.00156 |
| std | 1.01 |
| value_loss | 0.0657 |
------------------------------------------
------------------------------------------
| time/ | |
| fps | 4026 |
| iterations | 3 |
| time_elapsed | 12 |
| total_timesteps | 49152 |
| train/ | |
| approx_kl | 0.0032065492 |
| clip_fraction | 0.0328 |
| clip_range | 0.2 |
| entropy_loss | -2.88 |
| explained_variance | 0.868 |
| learning_rate | 0.0003 |
| loss | -0.0327 |
| n_updates | 20 |
| policy_gradient_loss | -0.00152 |
| std | 1.02 |
| value_loss | 0.0172 |
------------------------------------------
/home/jalf/miniconda3/envs/tir/lib/python3.12/site-packages/stable_baselines3/common/evaluation.py:71: UserWarning: Evaluation environment is not wrapped with a ``Monitor`` wrapper. This may result in reporting modified episode lengths and rewards, if other wrappers happen to modify these. Consider wrapping environment first with ``Monitor`` wrapper.
warnings.warn(
Eval num_timesteps=50000, episode_reward=-25.33 +/- 56.30
Episode length: 1859.00 +/- 393.69
------------------------------------------
| eval/ | |
| mean_ep_length | 1.86e+03 |
| mean_reward | -25.3 |
| time/ | |
| total_timesteps | 50000 |
| train/ | |
| approx_kl | 0.0038272792 |
| clip_fraction | 0.0312 |
| clip_range | 0.2 |
| entropy_loss | -2.89 |
| explained_variance | 0.891 |
| learning_rate | 0.0003 |
| loss | -0.0224 |
| n_updates | 30 |
| policy_gradient_loss | -0.0019 |
| std | 1.02 |
| value_loss | 0.0227 |
------------------------------------------
New best mean reward!
------------------------------
| time/ | |
| fps | 2387 |
| iterations | 4 |
| time_elapsed | 27 |
| total_timesteps | 65536 |
------------------------------
------------------------------------------
| time/ | |
| fps | 2563 |
| iterations | 5 |
| time_elapsed | 31 |
| total_timesteps | 81920 |
| train/ | |
| approx_kl | 0.0040233894 |
| clip_fraction | 0.0323 |
| clip_range | 0.2 |
| entropy_loss | -2.87 |
| explained_variance | 0.878 |
| learning_rate | 0.0003 |
| loss | -0.0251 |
| n_updates | 40 |
| policy_gradient_loss | -0.00247 |
| std | 1.01 |
| value_loss | 0.0169 |
------------------------------------------
-----------------------------------------
| time/ | |
| fps | 2719 |
| iterations | 6 |
| time_elapsed | 36 |
| total_timesteps | 98304 |
| train/ | |
| approx_kl | 0.003573698 |
| clip_fraction | 0.0316 |
| clip_range | 0.2 |
| entropy_loss | -2.86 |
| explained_variance | 0.865 |
| learning_rate | 0.0003 |
| loss | -0.0219 |
| n_updates | 50 |
| policy_gradient_loss | -0.0019 |
| std | 1.01 |
| value_loss | 0.022 |
-----------------------------------------
/home/jalf/miniconda3/envs/tir/lib/python3.12/site-packages/stable_baselines3/common/evaluation.py:71: UserWarning: Evaluation environment is not wrapped with a ``Monitor`` wrapper. This may result in reporting modified episode lengths and rewards, if other wrappers happen to modify these. Consider wrapping environment first with ``Monitor`` wrapper.
warnings.warn(
Eval num_timesteps=100000, episode_reward=-29.60 +/- 36.59
Episode length: 1939.35 +/- 264.37
------------------------------------------
| eval/ | |
| mean_ep_length | 1.94e+03 |
| mean_reward | -29.6 |
| time/ | |
| total_timesteps | 100000 |
| train/ | |
| approx_kl | 0.0046861977 |
| clip_fraction | 0.039 |
| clip_range | 0.2 |
| entropy_loss | -2.86 |
| explained_variance | 0.815 |
| learning_rate | 0.0003 |
| loss | -0.0257 |
| n_updates | 60 |
| policy_gradient_loss | -0.00203 |
| std | 1.01 |
| value_loss | 0.0201 |
------------------------------------------
-------------------------------
| time/ | |
| fps | 2191 |
| iterations | 7 |
| time_elapsed | 52 |
| total_timesteps | 114688 |
-------------------------------
-----------------------------------------
| time/ | |
| fps | 2314 |
| iterations | 8 |
| time_elapsed | 56 |
| total_timesteps | 131072 |
| train/ | |
| approx_kl | 0.005258695 |
| clip_fraction | 0.0503 |
| clip_range | 0.2 |
| entropy_loss | -2.86 |
| explained_variance | 0.807 |
| learning_rate | 0.0003 |
| loss | -0.0211 |
| n_updates | 70 |
| policy_gradient_loss | -0.00398 |
| std | 1.01 |
| value_loss | 0.0164 |
-----------------------------------------
------------------------------------------
| time/ | |
| fps | 2359 |
| iterations | 9 |
| time_elapsed | 62 |
| total_timesteps | 147456 |
| train/ | |
| approx_kl | 0.0043328116 |
| clip_fraction | 0.0332 |
| clip_range | 0.2 |
| entropy_loss | -2.86 |
| explained_variance | 0.811 |
| learning_rate | 0.0003 |
| loss | -0.0259 |
| n_updates | 80 |
| policy_gradient_loss | -0.00173 |
| std | 1.01 |
| value_loss | 0.0121 |
------------------------------------------
Eval num_timesteps=150000, episode_reward=-33.97 +/- 37.15
Episode length: 1954.85 +/- 196.80
-----------------------------------------
| eval/ | |
| mean_ep_length | 1.95e+03 |
| mean_reward | -34 |
| time/ | |
| total_timesteps | 150000 |
| train/ | |
| approx_kl | 0.005169191 |
| clip_fraction | 0.0506 |
| clip_range | 0.2 |
| entropy_loss | -2.85 |
| explained_variance | 0.649 |
| learning_rate | 0.0003 |
| loss | -0.0287 |
| n_updates | 90 |
| policy_gradient_loss | -0.00384 |
| std | 1 |
| value_loss | 0.0162 |
-----------------------------------------
[Diag @ 150,000 | n_sheep=1 | success=15%]
COMPACT_CANT_DRIVE 16/20
SUCCESS 3/20
DROVE_NO_SHEEP 1/20
action_mag mean=0.239 p10=0.071 p90=0.433 (0=stopped, 1=full speed)
min_flock_radius mean=0.00m best=0.00m (target <5m to compact)
min_dog_to_com mean=4.80m best=1.70m (FLEE_DIST=7m)
min_com_to_pen mean=10.22m best=1.50m
reward/step (mean): progress=+0.0013 alignment=+0.0000 pen_bonus=+0.0008 step_cost=-0.0200 complete=+0.0078
-------------------------------
| time/ | |
| fps | 1935 |
| iterations | 10 |
| time_elapsed | 84 |
| total_timesteps | 163840 |
-------------------------------
------------------------------------------
| time/ | |
| fps | 2014 |
| iterations | 11 |
| time_elapsed | 89 |
| total_timesteps | 180224 |
| train/ | |
| approx_kl | 0.0039950563 |
| clip_fraction | 0.0276 |
| clip_range | 0.2 |
| entropy_loss | -2.83 |
| explained_variance | 0.623 |
| learning_rate | 0.0003 |
| loss | -0.0128 |
| n_updates | 100 |
| policy_gradient_loss | -0.00208 |
| std | 0.995 |
| value_loss | 0.0959 |
------------------------------------------
------------------------------------------
| time/ | |
| fps | 2093 |
| iterations | 12 |
| time_elapsed | 93 |
| total_timesteps | 196608 |
| train/ | |
| approx_kl | 0.0036244316 |
| clip_fraction | 0.0299 |
| clip_range | 0.2 |
| entropy_loss | -2.83 |
| explained_variance | 0.916 |
| learning_rate | 0.0003 |
| loss | -0.0251 |
| n_updates | 110 |
| policy_gradient_loss | -0.00229 |
| std | 0.991 |
| value_loss | 0.0118 |
------------------------------------------
Eval num_timesteps=200000, episode_reward=-36.37 +/- 39.41
Episode length: 1950.95 +/- 213.80
-----------------------------------------
| eval/ | |
| mean_ep_length | 1.95e+03 |
| mean_reward | -36.4 |
| time/ | |
| total_timesteps | 200000 |
| train/ | |
| approx_kl | 0.003325508 |
| clip_fraction | 0.0223 |
| clip_range | 0.2 |
| entropy_loss | -2.83 |
| explained_variance | 0.858 |
| learning_rate | 0.0003 |
| loss | -0.0279 |
| n_updates | 120 |
| policy_gradient_loss | -0.0007 |
| std | 0.999 |
| value_loss | 0.0493 |
-----------------------------------------
-------------------------------
| time/ | |
| fps | 1964 |
| iterations | 13 |
| time_elapsed | 108 |
| total_timesteps | 212992 |
-------------------------------
-----------------------------------------
| time/ | |
| fps | 2034 |
| iterations | 14 |
| time_elapsed | 112 |
| total_timesteps | 229376 |
| train/ | |
| approx_kl | 0.004660043 |
| clip_fraction | 0.0403 |
| clip_range | 0.2 |
| entropy_loss | -2.85 |
| explained_variance | 0.719 |
| learning_rate | 0.0003 |
| loss | 0.128 |
| n_updates | 130 |
| policy_gradient_loss | -0.00265 |
| std | 1.01 |
| value_loss | 0.073 |
-----------------------------------------
----------------------------------------
| time/ | |
| fps | 2103 |
| iterations | 15 |
| time_elapsed | 116 |
| total_timesteps | 245760 |
| train/ | |
| approx_kl | 0.00501227 |
| clip_fraction | 0.0499 |
| clip_range | 0.2 |
| entropy_loss | -2.88 |
| explained_variance | 0.847 |
| learning_rate | 0.0003 |
| loss | -0.0237 |
| n_updates | 140 |
| policy_gradient_loss | -0.00264 |
| std | 1.02 |
| value_loss | 0.0415 |
----------------------------------------
Eval num_timesteps=250000, episode_reward=-44.92 +/- 15.63
Episode length: 2000.00 +/- 0.00
------------------------------------------
| eval/ | |
| mean_ep_length | 2e+03 |
| mean_reward | -44.9 |
| time/ | |
| total_timesteps | 250000 |
| train/ | |
| approx_kl | 0.0055294414 |
| clip_fraction | 0.06 |
| clip_range | 0.2 |
| entropy_loss | -2.89 |
| explained_variance | 0.951 |
| learning_rate | 0.0003 |
| loss | -0.0274 |
| n_updates | 150 |
| policy_gradient_loss | -0.00491 |
| std | 1.03 |
| value_loss | 0.014 |
------------------------------------------
-------------------------------
| time/ | |
| fps | 1999 |
| iterations | 16 |
| time_elapsed | 131 |
| total_timesteps | 262144 |
-------------------------------
------------------------------------------
| time/ | |
| fps | 2051 |
| iterations | 17 |
| time_elapsed | 135 |
| total_timesteps | 278528 |
| train/ | |
| approx_kl | 0.0051201656 |
| clip_fraction | 0.0301 |
| clip_range | 0.2 |
| entropy_loss | -2.88 |
| explained_variance | 0.941 |
| learning_rate | 0.0003 |
| loss | 0.148 |
| n_updates | 160 |
| policy_gradient_loss | -0.00199 |
| std | 1.02 |
| value_loss | 0.099 |
------------------------------------------
-----------------------------------------
| time/ | |
| fps | 2096 |
| iterations | 18 |
| time_elapsed | 140 |
| total_timesteps | 294912 |
| train/ | |
| approx_kl | 0.004261789 |
| clip_fraction | 0.0328 |
| clip_range | 0.2 |
| entropy_loss | -2.88 |
| explained_variance | 0.942 |
| learning_rate | 0.0003 |
| loss | -0.0314 |
| n_updates | 170 |
| policy_gradient_loss | -0.00243 |
| std | 1.02 |
| value_loss | 0.0117 |
-----------------------------------------
Eval num_timesteps=300000, episode_reward=-44.79 +/- 17.68
Episode length: 2000.00 +/- 0.00
-----------------------------------------
| eval/ | |
| mean_ep_length | 2e+03 |
| mean_reward | -44.8 |
| time/ | |
| total_timesteps | 300000 |
| train/ | |
| approx_kl | 0.004783842 |
| clip_fraction | 0.0296 |
| clip_range | 0.2 |
| entropy_loss | -2.87 |
| explained_variance | 0.892 |
| learning_rate | 0.0003 |
| loss | -0.0219 |
| n_updates | 180 |
| policy_gradient_loss | -0.00159 |
| std | 1.01 |
| value_loss | 0.0497 |
-----------------------------------------
[Diag @ 300,000 | n_sheep=1 | success=0%]
COMPACT_CANT_DRIVE 17/20
DROVE_NO_SHEEP 3/20
action_mag mean=0.241 p10=0.109 p90=0.389 (0=stopped, 1=full speed)
min_flock_radius mean=0.00m best=0.00m (target <5m to compact)
min_dog_to_com mean=4.77m best=2.12m (FLEE_DIST=7m)
min_com_to_pen mean=9.31m best=1.50m
reward/step (mean): progress=+0.0016 alignment=+0.0000 pen_bonus=+0.0000 step_cost=-0.0200 complete=+0.0000
-------------------------------
| time/ | |
| fps | 1905 |
| iterations | 19 |
| time_elapsed | 163 |
| total_timesteps | 311296 |
-------------------------------
------------------------------------------
| time/ | |
| fps | 1949 |
| iterations | 20 |
| time_elapsed | 168 |
| total_timesteps | 327680 |
| train/ | |
| approx_kl | 0.0033368056 |
| clip_fraction | 0.0258 |
| clip_range | 0.2 |
| entropy_loss | -2.87 |
| explained_variance | 0.794 |
| learning_rate | 0.0003 |
| loss | -0.0211 |
| n_updates | 190 |
| policy_gradient_loss | -0.00105 |
| std | 1.02 |
| value_loss | 0.0769 |
------------------------------------------
------------------------------------------
| time/ | |
| fps | 1992 |
| iterations | 21 |
| time_elapsed | 172 |
| total_timesteps | 344064 |
| train/ | |
| approx_kl | 0.0046488494 |
| clip_fraction | 0.0352 |
| clip_range | 0.2 |
| entropy_loss | -2.87 |
| explained_variance | 0.927 |
| learning_rate | 0.0003 |
| loss | -0.0274 |
| n_updates | 200 |
| policy_gradient_loss | -0.00331 |
| std | 1.02 |
| value_loss | 0.0165 |
------------------------------------------
Eval num_timesteps=350000, episode_reward=-24.90 +/- 50.25
Episode length: 1976.75 +/- 82.03
------------------------------------------
| eval/ | |
| mean_ep_length | 1.98e+03 |
| mean_reward | -24.9 |
| time/ | |
| total_timesteps | 350000 |
| train/ | |
| approx_kl | 0.0041725934 |
| clip_fraction | 0.0299 |
| clip_range | 0.2 |
| entropy_loss | -2.88 |
| explained_variance | 0.944 |
| learning_rate | 0.0003 |
| loss | -0.026 |
| n_updates | 210 |
| policy_gradient_loss | -0.0026 |
| std | 1.02 |
| value_loss | 0.00665 |
------------------------------------------
New best mean reward!
-------------------------------
| time/ | |
| fps | 1921 |
| iterations | 22 |
| time_elapsed | 187 |
| total_timesteps | 360448 |
-------------------------------
-----------------------------------------
| time/ | |
| fps | 1963 |
| iterations | 23 |
| time_elapsed | 191 |
| total_timesteps | 376832 |
| train/ | |
| approx_kl | 0.005180447 |
| clip_fraction | 0.0532 |
| clip_range | 0.2 |
| entropy_loss | -2.87 |
| explained_variance | 0.956 |
| learning_rate | 0.0003 |
| loss | -0.0255 |
| n_updates | 220 |
| policy_gradient_loss | -0.00352 |
| std | 1.02 |
| value_loss | 0.0142 |
-----------------------------------------
-----------------------------------------
| time/ | |
| fps | 1990 |
| iterations | 24 |
| time_elapsed | 197 |
| total_timesteps | 393216 |
| train/ | |
| approx_kl | 0.004661506 |
| clip_fraction | 0.0443 |
| clip_range | 0.2 |
| entropy_loss | -2.87 |
| explained_variance | 0.967 |
| learning_rate | 0.0003 |
| loss | -0.0331 |
| n_updates | 230 |
| policy_gradient_loss | -0.00441 |
| std | 1.02 |
| value_loss | 0.0112 |
-----------------------------------------
Eval num_timesteps=400000, episode_reward=-26.04 +/- 47.69
Episode length: 1890.85 +/- 367.20
-----------------------------------------
| eval/ | |
| mean_ep_length | 1.89e+03 |
| mean_reward | -26 |
| time/ | |
| total_timesteps | 400000 |
| train/ | |
| approx_kl | 0.005491742 |
| clip_fraction | 0.0538 |
| clip_range | 0.2 |
| entropy_loss | -2.89 |
| explained_variance | 0.941 |
| learning_rate | 0.0003 |
| loss | -0.042 |
| n_updates | 240 |
| policy_gradient_loss | -0.00297 |
| std | 1.03 |
| value_loss | 0.00877 |
-----------------------------------------
-------------------------------
| time/ | |
| fps | 1927 |
| iterations | 25 |
| time_elapsed | 212 |
| total_timesteps | 409600 |
-------------------------------
------------------------------------------
| time/ | |
| fps | 1966 |
| iterations | 26 |
| time_elapsed | 216 |
| total_timesteps | 425984 |
| train/ | |
| approx_kl | 0.0045445506 |
| clip_fraction | 0.0385 |
| clip_range | 0.2 |
| entropy_loss | -2.91 |
| explained_variance | 0.941 |
| learning_rate | 0.0003 |
| loss | -0.0343 |
| n_updates | 250 |
| policy_gradient_loss | -0.00307 |
| std | 1.04 |
| value_loss | 0.00818 |
------------------------------------------
------------------------------------------
| time/ | |
| fps | 2004 |
| iterations | 27 |
| time_elapsed | 220 |
| total_timesteps | 442368 |
| train/ | |
| approx_kl | 0.0045271795 |
| clip_fraction | 0.0373 |
| clip_range | 0.2 |
| entropy_loss | -2.94 |
| explained_variance | 0.97 |
| learning_rate | 0.0003 |
| loss | -0.0361 |
| n_updates | 260 |
| policy_gradient_loss | -0.00236 |
| std | 1.05 |
| value_loss | 0.0091 |
------------------------------------------
Eval num_timesteps=450000, episode_reward=-24.58 +/- 48.73
Episode length: 1907.85 +/- 276.46
------------------------------------------
| eval/ | |
| mean_ep_length | 1.91e+03 |
| mean_reward | -24.6 |
| time/ | |
| total_timesteps | 450000 |
| train/ | |
| approx_kl | 0.0052676853 |
| clip_fraction | 0.0498 |
| clip_range | 0.2 |
| entropy_loss | -2.96 |
| explained_variance | 0.948 |
| learning_rate | 0.0003 |
| loss | -0.0261 |
| n_updates | 270 |
| policy_gradient_loss | -0.00236 |
| std | 1.07 |
| value_loss | 0.0286 |
------------------------------------------
New best mean reward!
[Diag @ 450,000 | n_sheep=1 | success=5%]
COMPACT_CANT_DRIVE 18/20
DROVE_NO_SHEEP 1/20
SUCCESS 1/20
action_mag mean=0.272 p10=0.139 p90=0.407 (0=stopped, 1=full speed)
min_flock_radius mean=0.00m best=0.00m (target <5m to compact)
min_dog_to_com mean=4.81m best=1.54m (FLEE_DIST=7m)
min_com_to_pen mean=12.36m best=1.96m
reward/step (mean): progress=+0.0012 alignment=+0.0000 pen_bonus=+0.0003 step_cost=-0.0200 complete=+0.0025
-------------------------------
| time/ | |
| fps | 1893 |
| iterations | 28 |
| time_elapsed | 242 |
| total_timesteps | 458752 |
-------------------------------
-----------------------------------------
| time/ | |
| fps | 1928 |
| iterations | 29 |
| time_elapsed | 246 |
| total_timesteps | 475136 |
| train/ | |
| approx_kl | 0.004465497 |
| clip_fraction | 0.0376 |
| clip_range | 0.2 |
| entropy_loss | -2.97 |
| explained_variance | 0.948 |
| learning_rate | 0.0003 |
| loss | -0.0307 |
| n_updates | 280 |
| policy_gradient_loss | -0.00259 |
| std | 1.07 |
| value_loss | 0.0213 |
-----------------------------------------
------------------------------------------
| time/ | |
| fps | 1961 |
| iterations | 30 |
| time_elapsed | 250 |
| total_timesteps | 491520 |
| train/ | |
| approx_kl | 0.0054338034 |
| clip_fraction | 0.0512 |
| clip_range | 0.2 |
| entropy_loss | -2.97 |
| explained_variance | 0.967 |
| learning_rate | 0.0003 |
| loss | -0.021 |
| n_updates | 290 |
| policy_gradient_loss | -0.00296 |
| std | 1.07 |
| value_loss | 0.0138 |
------------------------------------------
Eval num_timesteps=500000, episode_reward=-44.13 +/- 20.75
Episode length: 2000.00 +/- 0.00
-----------------------------------------
| eval/ | |
| mean_ep_length | 2e+03 |
| mean_reward | -44.1 |
| time/ | |
| total_timesteps | 500000 |
| train/ | |
| approx_kl | 0.006292434 |
| clip_fraction | 0.0572 |
| clip_range | 0.2 |
| entropy_loss | -2.97 |
| explained_variance | 0.937 |
| learning_rate | 0.0003 |
| loss | -0.0398 |
| n_updates | 300 |
| policy_gradient_loss | -0.00516 |
| std | 1.07 |
| value_loss | 0.00832 |
-----------------------------------------
-------------------------------
| time/ | |
| fps | 1913 |
| iterations | 31 |
| time_elapsed | 265 |
| total_timesteps | 507904 |
-------------------------------
------------------------------------------
| time/ | |
| fps | 1940 |
| iterations | 32 |
| time_elapsed | 270 |
| total_timesteps | 524288 |
| train/ | |
| approx_kl | 0.0063960385 |
| clip_fraction | 0.0702 |
| clip_range | 0.2 |
| entropy_loss | -2.96 |
| explained_variance | 0.942 |
| learning_rate | 0.0003 |
| loss | -0.0341 |
| n_updates | 310 |
| policy_gradient_loss | -0.00436 |
| std | 1.06 |
| value_loss | 0.0189 |
------------------------------------------
------------------------------------------
| time/ | |
| fps | 1968 |
| iterations | 33 |
| time_elapsed | 274 |
| total_timesteps | 540672 |
| train/ | |
| approx_kl | 0.0070166546 |
| clip_fraction | 0.0888 |
| clip_range | 0.2 |
| entropy_loss | -2.96 |
| explained_variance | 0.955 |
| learning_rate | 0.0003 |
| loss | -0.0376 |
| n_updates | 320 |
| policy_gradient_loss | -0.00631 |
| std | 1.06 |
| value_loss | 0.00861 |
------------------------------------------
Eval num_timesteps=550000, episode_reward=-38.60 +/- 14.53
Episode length: 2000.00 +/- 0.00
------------------------------------------
| eval/ | |
| mean_ep_length | 2e+03 |
| mean_reward | -38.6 |
| time/ | |
| total_timesteps | 550000 |
| train/ | |
| approx_kl | 0.0068266992 |
| clip_fraction | 0.075 |
| clip_range | 0.2 |
| entropy_loss | -2.96 |
| explained_variance | 0.959 |
| learning_rate | 0.0003 |
| loss | -0.0252 |
| n_updates | 330 |
| policy_gradient_loss | -0.00593 |
| std | 1.07 |
| value_loss | 0.0131 |
------------------------------------------
-------------------------------
| time/ | |
| fps | 1922 |
| iterations | 34 |
| time_elapsed | 289 |
| total_timesteps | 557056 |
-------------------------------
-----------------------------------------
| time/ | |
| fps | 1950 |
| iterations | 35 |
| time_elapsed | 294 |
| total_timesteps | 573440 |
| train/ | |
| approx_kl | 0.006152669 |
| clip_fraction | 0.0626 |
| clip_range | 0.2 |
| entropy_loss | -2.97 |
| explained_variance | 0.954 |
| learning_rate | 0.0003 |
| loss | -0.0376 |
| n_updates | 340 |
| policy_gradient_loss | -0.00514 |
| std | 1.07 |
| value_loss | 0.0187 |
-----------------------------------------
-----------------------------------------
| time/ | |
| fps | 1977 |
| iterations | 36 |
| time_elapsed | 298 |
| total_timesteps | 589824 |
| train/ | |
| approx_kl | 0.006685758 |
| clip_fraction | 0.0729 |
| clip_range | 0.2 |
| entropy_loss | -2.97 |
| explained_variance | 0.958 |
| learning_rate | 0.0003 |
| loss | -0.0387 |
| n_updates | 350 |
| policy_gradient_loss | -0.00632 |
| std | 1.07 |
| value_loss | 0.0118 |
-----------------------------------------
Eval num_timesteps=600000, episode_reward=-31.39 +/- 8.94
Episode length: 2000.00 +/- 0.00
-----------------------------------------
| eval/ | |
| mean_ep_length | 2e+03 |
| mean_reward | -31.4 |
| time/ | |
| total_timesteps | 600000 |
| train/ | |
| approx_kl | 0.008094068 |
| clip_fraction | 0.0985 |
| clip_range | 0.2 |
| entropy_loss | -2.97 |
| explained_variance | 0.937 |
| learning_rate | 0.0003 |
| loss | -0.0439 |
| n_updates | 360 |
| policy_gradient_loss | -0.00782 |
| std | 1.07 |
| value_loss | 0.0116 |
-----------------------------------------
[Diag @ 600,000 | n_sheep=1 | success=5%]
COMPACT_CANT_DRIVE 16/20
DROVE_NO_SHEEP 3/20
SUCCESS 1/20
action_mag mean=0.150 p10=0.000 p90=0.392 (0=stopped, 1=full speed)
min_flock_radius mean=0.00m best=0.00m (target <5m to compact)
min_dog_to_com mean=3.64m best=0.68m (FLEE_DIST=7m)
min_com_to_pen mean=10.60m best=1.50m
reward/step (mean): progress=+0.0025 alignment=+0.0000 pen_bonus=+0.0003 step_cost=-0.0200 complete=+0.0026
[Curriculum] leaving stage n_sheep=1 after 600,000 steps | training success rate (last 100 eps) = 9%
[Curriculum] → 2 sheep at step 600,000
-------------------------------
| time/ | |
| fps | 1894 |
| iterations | 37 |
| time_elapsed | 319 |
| total_timesteps | 606208 |
-------------------------------
------------------------------------------
| time/ | |
| fps | 1917 |
| iterations | 38 |
| time_elapsed | 324 |
| total_timesteps | 622592 |
| train/ | |
| approx_kl | 0.0067913756 |
| clip_fraction | 0.0689 |
| clip_range | 0.2 |
| entropy_loss | -2.97 |
| explained_variance | 0.861 |
| learning_rate | 0.0003 |
| loss | 0.0772 |
| n_updates | 370 |
| policy_gradient_loss | -0.00184 |
| std | 1.07 |
| value_loss | 0.101 |
------------------------------------------
------------------------------------------
| time/ | |
| fps | 1938 |
| iterations | 39 |
| time_elapsed | 329 |
| total_timesteps | 638976 |
| train/ | |
| approx_kl | 0.0061344057 |
| clip_fraction | 0.0666 |
| clip_range | 0.2 |
| entropy_loss | -2.98 |
| explained_variance | 0.928 |
| learning_rate | 0.0003 |
| loss | -0.0147 |
| n_updates | 380 |
| policy_gradient_loss | -0.00148 |
| std | 1.08 |
| value_loss | 0.0386 |
------------------------------------------
Eval num_timesteps=650000, episode_reward=-42.39 +/- 31.99
Episode length: 2000.00 +/- 0.00
------------------------------------------
| eval/ | |
| mean_ep_length | 2e+03 |
| mean_reward | -42.4 |
| time/ | |
| total_timesteps | 650000 |
| train/ | |
| approx_kl | 0.0061708866 |
| clip_fraction | 0.06 |
| clip_range | 0.2 |
| entropy_loss | -2.98 |
| explained_variance | 0.918 |
| learning_rate | 0.0003 |
| loss | -0.0203 |
| n_updates | 390 |
| policy_gradient_loss | -0.00313 |
| std | 1.07 |
| value_loss | 0.0242 |
------------------------------------------
-------------------------------
| time/ | |
| fps | 1896 |
| iterations | 40 |
| time_elapsed | 345 |
| total_timesteps | 655360 |
-------------------------------
-----------------------------------------
| time/ | |
| fps | 1918 |
| iterations | 41 |
| time_elapsed | 350 |
| total_timesteps | 671744 |
| train/ | |
| approx_kl | 0.007122565 |
| clip_fraction | 0.0765 |
| clip_range | 0.2 |
| entropy_loss | -2.98 |
| explained_variance | 0.855 |
| learning_rate | 0.0003 |
| loss | -0.00749 |
| n_updates | 400 |
| policy_gradient_loss | -0.00529 |
| std | 1.07 |
| value_loss | 0.0596 |
-----------------------------------------
------------------------------------------
| time/ | |
| fps | 1941 |
| iterations | 42 |
| time_elapsed | 354 |
| total_timesteps | 688128 |
| train/ | |
| approx_kl | 0.0078532845 |
| clip_fraction | 0.0975 |
| clip_range | 0.2 |
| entropy_loss | -2.98 |
| explained_variance | 0.89 |
| learning_rate | 0.0003 |
| loss | -0.0188 |
| n_updates | 410 |
| policy_gradient_loss | -0.00699 |
| std | 1.07 |
| value_loss | 0.0207 |
------------------------------------------
Eval num_timesteps=700000, episode_reward=-39.79 +/- 29.60
Episode length: 2000.00 +/- 0.00
------------------------------------------
| eval/ | |
| mean_ep_length | 2e+03 |
| mean_reward | -39.8 |
| time/ | |
| total_timesteps | 700000 |
| train/ | |
| approx_kl | 0.0073551387 |
| clip_fraction | 0.084 |
| clip_range | 0.2 |
| entropy_loss | -2.97 |
| explained_variance | 0.824 |
| learning_rate | 0.0003 |
| loss | 0.0126 |
| n_updates | 420 |
| policy_gradient_loss | -0.0064 |
| std | 1.06 |
| value_loss | 0.0438 |
------------------------------------------
-------------------------------
| time/ | |
| fps | 1904 |
| iterations | 43 |
| time_elapsed | 370 |
| total_timesteps | 704512 |
-------------------------------
-----------------------------------------
| time/ | |
| fps | 1922 |
| iterations | 44 |
| time_elapsed | 375 |
| total_timesteps | 720896 |
| train/ | |
| approx_kl | 0.006614036 |
| clip_fraction | 0.0611 |
| clip_range | 0.2 |
| entropy_loss | -2.95 |
| explained_variance | 0.881 |
| learning_rate | 0.0003 |
| loss | -0.0207 |
| n_updates | 430 |
| policy_gradient_loss | -0.00371 |
| std | 1.06 |
| value_loss | 0.0244 |
-----------------------------------------
------------------------------------------
| time/ | |
| fps | 1940 |
| iterations | 45 |
| time_elapsed | 380 |
| total_timesteps | 737280 |
| train/ | |
| approx_kl | 0.0060790265 |
| clip_fraction | 0.0591 |
| clip_range | 0.2 |
| entropy_loss | -2.95 |
| explained_variance | 0.885 |
| learning_rate | 0.0003 |
| loss | -0.0284 |
| n_updates | 440 |
| policy_gradient_loss | -0.00447 |
| std | 1.06 |
| value_loss | 0.0206 |
------------------------------------------
Eval num_timesteps=750000, episode_reward=-40.21 +/- 27.55
Episode length: 2000.00 +/- 0.00
------------------------------------------
| eval/ | |
| mean_ep_length | 2e+03 |
| mean_reward | -40.2 |
| time/ | |
| total_timesteps | 750000 |
| train/ | |
| approx_kl | 0.0066163363 |
| clip_fraction | 0.0691 |
| clip_range | 0.2 |
| entropy_loss | -2.96 |
| explained_variance | 0.924 |
| learning_rate | 0.0003 |
| loss | -0.032 |
| n_updates | 450 |
| policy_gradient_loss | -0.0043 |
| std | 1.06 |
| value_loss | 0.0127 |
------------------------------------------
[Diag @ 750,000 | n_sheep=2 | success=0%]
COMPACT_CANT_DRIVE 14/20
NEVER_COMPACT 5/20
DROVE_NO_SHEEP 1/20
action_mag mean=0.313 p10=0.081 p90=0.638 (0=stopped, 1=full speed)
min_flock_radius mean=2.72m best=0.00m (target <5m to compact)
min_dog_to_com mean=3.96m best=0.02m (FLEE_DIST=7m)
min_com_to_pen mean=12.68m best=2.17m
reward/step (mean): progress=-0.0005 alignment=+0.0000 pen_bonus=+0.0008 step_cost=-0.0200 complete=+0.0000
-------------------------------
| time/ | |
| fps | 1866 |
| iterations | 46 |
| time_elapsed | 403 |
| total_timesteps | 753664 |
-------------------------------
-----------------------------------------
| time/ | |
| fps | 1887 |
| iterations | 47 |
| time_elapsed | 407 |
| total_timesteps | 770048 |
| train/ | |
| approx_kl | 0.005094421 |
| clip_fraction | 0.0496 |
| clip_range | 0.2 |
| entropy_loss | -2.96 |
| explained_variance | 0.917 |
| learning_rate | 0.0003 |
| loss | -0.0237 |
| n_updates | 460 |
| policy_gradient_loss | -0.00332 |
| std | 1.06 |
| value_loss | 0.0275 |
-----------------------------------------
-----------------------------------------
| time/ | |
| fps | 1906 |
| iterations | 48 |
| time_elapsed | 412 |
| total_timesteps | 786432 |
| train/ | |
| approx_kl | 0.006302662 |
| clip_fraction | 0.0571 |
| clip_range | 0.2 |
| entropy_loss | -2.94 |
| explained_variance | 0.944 |
| learning_rate | 0.0003 |
| loss | -0.0353 |
| n_updates | 470 |
| policy_gradient_loss | -0.00424 |
| std | 1.05 |
| value_loss | 0.0201 |
-----------------------------------------
Eval num_timesteps=800000, episode_reward=-31.43 +/- 45.97
Episode length: 1953.35 +/- 203.34
------------------------------------------
| eval/ | |
| mean_ep_length | 1.95e+03 |
| mean_reward | -31.4 |
| time/ | |
| total_timesteps | 800000 |
| train/ | |
| approx_kl | 0.0055750986 |
| clip_fraction | 0.0494 |
| clip_range | 0.2 |
| entropy_loss | -2.95 |
| explained_variance | 0.959 |
| learning_rate | 0.0003 |
| loss | -0.0262 |
| n_updates | 480 |
| policy_gradient_loss | -0.00386 |
| std | 1.06 |
| value_loss | 0.0218 |
------------------------------------------
-------------------------------
| time/ | |
| fps | 1878 |
| iterations | 49 |
| time_elapsed | 427 |
| total_timesteps | 802816 |
-------------------------------
------------------------------------------
| time/ | |
| fps | 1897 |
| iterations | 50 |
| time_elapsed | 431 |
| total_timesteps | 819200 |
| train/ | |
| approx_kl | 0.0057711033 |
| clip_fraction | 0.0568 |
| clip_range | 0.2 |
| entropy_loss | -2.95 |
| explained_variance | 0.838 |
| learning_rate | 0.0003 |
| loss | -0.0362 |
| n_updates | 490 |
| policy_gradient_loss | -0.00438 |
| std | 1.06 |
| value_loss | 0.00952 |
------------------------------------------
------------------------------------------
| time/ | |
| fps | 1914 |
| iterations | 51 |
| time_elapsed | 436 |
| total_timesteps | 835584 |
| train/ | |
| approx_kl | 0.0073408587 |
| clip_fraction | 0.077 |
| clip_range | 0.2 |
| entropy_loss | -2.96 |
| explained_variance | 0.931 |
| learning_rate | 0.0003 |
| loss | -0.0283 |
| n_updates | 500 |
| policy_gradient_loss | -0.00553 |
| std | 1.07 |
| value_loss | 0.0142 |
------------------------------------------
Eval num_timesteps=850000, episode_reward=-37.98 +/- 27.04
Episode length: 2000.00 +/- 0.00
------------------------------------------
| eval/ | |
| mean_ep_length | 2e+03 |
| mean_reward | -38 |
| time/ | |
| total_timesteps | 850000 |
| train/ | |
| approx_kl | 0.0055803536 |
| clip_fraction | 0.0536 |
| clip_range | 0.2 |
| entropy_loss | -2.96 |
| explained_variance | 0.931 |
| learning_rate | 0.0003 |
| loss | -0.0338 |
| n_updates | 510 |
| policy_gradient_loss | -0.00469 |
| std | 1.06 |
| value_loss | 0.0156 |
------------------------------------------
-------------------------------
| time/ | |
| fps | 1884 |
| iterations | 52 |
| time_elapsed | 452 |
| total_timesteps | 851968 |
-------------------------------
----------------------------------------
| time/ | |
| fps | 1899 |
| iterations | 53 |
| time_elapsed | 457 |
| total_timesteps | 868352 |
| train/ | |
| approx_kl | 0.00585186 |
| clip_fraction | 0.0638 |
| clip_range | 0.2 |
| entropy_loss | -2.97 |
| explained_variance | 0.83 |
| learning_rate | 0.0003 |
| loss | -0.0333 |
| n_updates | 520 |
| policy_gradient_loss | -0.00395 |
| std | 1.07 |
| value_loss | 0.0322 |
----------------------------------------
------------------------------------------
| time/ | |
| fps | 1915 |
| iterations | 54 |
| time_elapsed | 461 |
| total_timesteps | 884736 |
| train/ | |
| approx_kl | 0.0055105407 |
| clip_fraction | 0.045 |
| clip_range | 0.2 |
| entropy_loss | -2.96 |
| explained_variance | 0.845 |
| learning_rate | 0.0003 |
| loss | -0.0283 |
| n_updates | 530 |
| policy_gradient_loss | -0.00367 |
| std | 1.06 |
| value_loss | 0.0109 |
------------------------------------------
Eval num_timesteps=900000, episode_reward=-41.53 +/- 35.40
Episode length: 2000.00 +/- 0.00
------------------------------------------
| eval/ | |
| mean_ep_length | 2e+03 |
| mean_reward | -41.5 |
| time/ | |
| total_timesteps | 900000 |
| train/ | |
| approx_kl | 0.0064837057 |
| clip_fraction | 0.0625 |
| clip_range | 0.2 |
| entropy_loss | -2.96 |
| explained_variance | 0.909 |
| learning_rate | 0.0003 |
| loss | -0.0394 |
| n_updates | 540 |
| policy_gradient_loss | -0.00409 |
| std | 1.06 |
| value_loss | 0.0147 |
------------------------------------------
[Diag @ 900,000 | n_sheep=2 | success=0%]
COMPACT_CANT_DRIVE 12/20
NEVER_COMPACT 8/20
action_mag mean=0.276 p10=0.038 p90=0.580 (0=stopped, 1=full speed)
min_flock_radius mean=4.30m best=0.98m (target <5m to compact)
min_dog_to_com mean=3.24m best=0.24m (FLEE_DIST=7m)
min_com_to_pen mean=12.15m best=5.60m
reward/step (mean): progress=-0.0048 alignment=+0.0000 pen_bonus=+0.0000 step_cost=-0.0200 complete=+0.0000
-------------------------------
| time/ | |
| fps | 1857 |
| iterations | 55 |
| time_elapsed | 485 |
| total_timesteps | 901120 |
-------------------------------
-----------------------------------------
| time/ | |
| fps | 1874 |
| iterations | 56 |
| time_elapsed | 489 |
| total_timesteps | 917504 |
| train/ | |
| approx_kl | 0.006582682 |
| clip_fraction | 0.0662 |
| clip_range | 0.2 |
| entropy_loss | -2.97 |
| explained_variance | 0.961 |
| learning_rate | 0.0003 |
| loss | -0.039 |
| n_updates | 550 |
| policy_gradient_loss | -0.00462 |
| std | 1.07 |
| value_loss | 0.0103 |
-----------------------------------------
------------------------------------------
| time/ | |
| fps | 1888 |
| iterations | 57 |
| time_elapsed | 494 |
| total_timesteps | 933888 |
| train/ | |
| approx_kl | 0.0059698187 |
| clip_fraction | 0.0573 |
| clip_range | 0.2 |
| entropy_loss | -2.97 |
| explained_variance | 0.907 |
| learning_rate | 0.0003 |
| loss | -0.0291 |
| n_updates | 560 |
| policy_gradient_loss | -0.00446 |
| std | 1.07 |
| value_loss | 0.0113 |
------------------------------------------
Eval num_timesteps=950000, episode_reward=-26.73 +/- 22.82
Episode length: 2000.00 +/- 0.00
-----------------------------------------
| eval/ | |
| mean_ep_length | 2e+03 |
| mean_reward | -26.7 |
| time/ | |
| total_timesteps | 950000 |
| train/ | |
| approx_kl | 0.006601461 |
| clip_fraction | 0.0594 |
| clip_range | 0.2 |
| entropy_loss | -2.96 |
| explained_variance | 0.872 |
| learning_rate | 0.0003 |
| loss | -0.034 |
| n_updates | 570 |
| policy_gradient_loss | -0.00455 |
| std | 1.06 |
| value_loss | 0.00901 |
-----------------------------------------
-------------------------------
| time/ | |
| fps | 1856 |
| iterations | 58 |
| time_elapsed | 511 |
| total_timesteps | 950272 |
-------------------------------
-----------------------------------------
| time/ | |
| fps | 1869 |
| iterations | 59 |
| time_elapsed | 517 |
| total_timesteps | 966656 |
| train/ | |
| approx_kl | 0.005824944 |
| clip_fraction | 0.0624 |
| clip_range | 0.2 |
| entropy_loss | -2.96 |
| explained_variance | 0.789 |
| learning_rate | 0.0003 |
| loss | -0.0214 |
| n_updates | 580 |
| policy_gradient_loss | -0.00363 |
| std | 1.07 |
| value_loss | 0.0359 |
-----------------------------------------
-----------------------------------------
| time/ | |
| fps | 1882 |
| iterations | 60 |
| time_elapsed | 522 |
| total_timesteps | 983040 |
| train/ | |
| approx_kl | 0.005888001 |
| clip_fraction | 0.0573 |
| clip_range | 0.2 |
| entropy_loss | -2.98 |
| explained_variance | 0.887 |
| learning_rate | 0.0003 |
| loss | -0.0391 |
| n_updates | 590 |
| policy_gradient_loss | -0.00371 |
| std | 1.07 |
| value_loss | 0.00935 |
-----------------------------------------
-----------------------------------------
| time/ | |
| fps | 1895 |
| iterations | 61 |
| time_elapsed | 527 |
| total_timesteps | 999424 |
| train/ | |
| approx_kl | 0.005874036 |
| clip_fraction | 0.0611 |
| clip_range | 0.2 |
| entropy_loss | -2.98 |
| explained_variance | 0.871 |
| learning_rate | 0.0003 |
| loss | -0.0246 |
| n_updates | 600 |
| policy_gradient_loss | -0.00492 |
| std | 1.07 |
| value_loss | 0.00877 |
-----------------------------------------
Eval num_timesteps=1000000, episode_reward=-22.72 +/- 33.15
Episode length: 2000.00 +/- 0.00
------------------------------------------
| eval/ | |
| mean_ep_length | 2e+03 |
| mean_reward | -22.7 |
| time/ | |
| total_timesteps | 1000000 |
| train/ | |
| approx_kl | 0.0060388125 |
| clip_fraction | 0.0637 |
| clip_range | 0.2 |
| entropy_loss | -2.97 |
| explained_variance | 0.737 |
| learning_rate | 0.0003 |
| loss | -0.0511 |
| n_updates | 610 |
| policy_gradient_loss | -0.00387 |
| std | 1.07 |
| value_loss | 0.0538 |
------------------------------------------
New best mean reward!
--------------------------------
| time/ | |
| fps | 1869 |
| iterations | 62 |
| time_elapsed | 543 |
| total_timesteps | 1015808 |
--------------------------------
-----------------------------------------
| time/ | |
| fps | 1882 |
| iterations | 63 |
| time_elapsed | 548 |
| total_timesteps | 1032192 |
| train/ | |
| approx_kl | 0.007320485 |
| clip_fraction | 0.0723 |
| clip_range | 0.2 |
| entropy_loss | -2.99 |
| explained_variance | 0.946 |
| learning_rate | 0.0003 |
| loss | -0.0342 |
| n_updates | 620 |
| policy_gradient_loss | -0.0052 |
| std | 1.08 |
| value_loss | 0.0174 |
-----------------------------------------
------------------------------------------
| time/ | |
| fps | 1894 |
| iterations | 64 |
| time_elapsed | 553 |
| total_timesteps | 1048576 |
| train/ | |
| approx_kl | 0.0066477214 |
| clip_fraction | 0.0621 |
| clip_range | 0.2 |
| entropy_loss | -3 |
| explained_variance | 0.919 |
| learning_rate | 0.0003 |
| loss | -0.0301 |
| n_updates | 630 |
| policy_gradient_loss | -0.00449 |
| std | 1.08 |
| value_loss | 0.0109 |
------------------------------------------
Eval num_timesteps=1050000, episode_reward=-39.86 +/- 28.77
Episode length: 2000.00 +/- 0.00
------------------------------------------
| eval/ | |
| mean_ep_length | 2e+03 |
| mean_reward | -39.9 |
| time/ | |
| total_timesteps | 1050000 |
| train/ | |
| approx_kl | 0.0066243596 |
| clip_fraction | 0.0772 |
| clip_range | 0.2 |
| entropy_loss | -2.99 |
| explained_variance | 0.861 |
| learning_rate | 0.0003 |
| loss | -0.0313 |
| n_updates | 640 |
| policy_gradient_loss | -0.00462 |
| std | 1.07 |
| value_loss | 0.0324 |
------------------------------------------
[Diag @ 1,050,000 | n_sheep=2 | success=0%]
COMPACT_CANT_DRIVE 18/20
NEVER_COMPACT 2/20
action_mag mean=0.200 p10=0.022 p90=0.478 (0=stopped, 1=full speed)
min_flock_radius mean=2.29m best=0.00m (target <5m to compact)
min_dog_to_com mean=3.23m best=0.05m (FLEE_DIST=7m)
min_com_to_pen mean=12.84m best=3.77m
reward/step (mean): progress=+0.0016 alignment=+0.0000 pen_bonus=+0.0003 step_cost=-0.0200 complete=+0.0000
--------------------------------
| time/ | |
| fps | 1843 |
| iterations | 65 |
| time_elapsed | 577 |
| total_timesteps | 1064960 |
--------------------------------
------------------------------------------
| time/ | |
| fps | 1855 |
| iterations | 66 |
| time_elapsed | 582 |
| total_timesteps | 1081344 |
| train/ | |
| approx_kl | 0.0066154073 |
| clip_fraction | 0.0657 |
| clip_range | 0.2 |
| entropy_loss | -2.99 |
| explained_variance | 0.836 |
| learning_rate | 0.0003 |
| loss | -0.029 |
| n_updates | 650 |
| policy_gradient_loss | -0.0049 |
| std | 1.08 |
| value_loss | 0.0135 |
------------------------------------------
------------------------------------------
| time/ | |
| fps | 1865 |
| iterations | 67 |
| time_elapsed | 588 |
| total_timesteps | 1097728 |
| train/ | |
| approx_kl | 0.0059733046 |
| clip_fraction | 0.0634 |
| clip_range | 0.2 |
| entropy_loss | -3.01 |
| explained_variance | 0.852 |
| learning_rate | 0.0003 |
| loss | -0.0254 |
| n_updates | 660 |
| policy_gradient_loss | -0.00452 |
| std | 1.09 |
| value_loss | 0.0395 |
------------------------------------------
Eval num_timesteps=1100000, episode_reward=-33.30 +/- 26.65
Episode length: 2000.00 +/- 0.00
------------------------------------------
| eval/ | |
| mean_ep_length | 2e+03 |
| mean_reward | -33.3 |
| time/ | |
| total_timesteps | 1100000 |
| train/ | |
| approx_kl | 0.0054050894 |
| clip_fraction | 0.048 |
| clip_range | 0.2 |
| entropy_loss | -3.02 |
| explained_variance | 0.851 |
| learning_rate | 0.0003 |
| loss | -0.0348 |
| n_updates | 670 |
| policy_gradient_loss | -0.00385 |
| std | 1.1 |
| value_loss | 0.0247 |
------------------------------------------
--------------------------------
| time/ | |
| fps | 1843 |
| iterations | 68 |
| time_elapsed | 604 |
| total_timesteps | 1114112 |
--------------------------------
------------------------------------------
| time/ | |
| fps | 1856 |
| iterations | 69 |
| time_elapsed | 608 |
| total_timesteps | 1130496 |
| train/ | |
| approx_kl | 0.0073612374 |
| clip_fraction | 0.076 |
| clip_range | 0.2 |
| entropy_loss | -3.01 |
| explained_variance | 0.885 |
| learning_rate | 0.0003 |
| loss | -0.0424 |
| n_updates | 680 |
| policy_gradient_loss | -0.00512 |
| std | 1.09 |
| value_loss | 0.0278 |
------------------------------------------
------------------------------------------
| time/ | |
| fps | 1869 |
| iterations | 70 |
| time_elapsed | 613 |
| total_timesteps | 1146880 |
| train/ | |
| approx_kl | 0.0063554104 |
| clip_fraction | 0.067 |
| clip_range | 0.2 |
| entropy_loss | -3.01 |
| explained_variance | 0.915 |
| learning_rate | 0.0003 |
| loss | -0.0302 |
| n_updates | 690 |
| policy_gradient_loss | -0.00577 |
| std | 1.09 |
| value_loss | 0.0116 |
------------------------------------------
Eval num_timesteps=1150000, episode_reward=-26.91 +/- 26.08
Episode length: 2000.00 +/- 0.00
-----------------------------------------
| eval/ | |
| mean_ep_length | 2e+03 |
| mean_reward | -26.9 |
| time/ | |
| total_timesteps | 1150000 |
| train/ | |
| approx_kl | 0.006060633 |
| clip_fraction | 0.0603 |
| clip_range | 0.2 |
| entropy_loss | -3.02 |
| explained_variance | 0.905 |
| learning_rate | 0.0003 |
| loss | -0.0374 |
| n_updates | 700 |
| policy_gradient_loss | -0.00442 |
| std | 1.1 |
| value_loss | 0.0101 |
-----------------------------------------
--------------------------------
| time/ | |
| fps | 1847 |
| iterations | 71 |
| time_elapsed | 629 |
| total_timesteps | 1163264 |
--------------------------------
------------------------------------------
| time/ | |
| fps | 1859 |
| iterations | 72 |
| time_elapsed | 634 |
| total_timesteps | 1179648 |
| train/ | |
| approx_kl | 0.0070389216 |
| clip_fraction | 0.0728 |
| clip_range | 0.2 |
| entropy_loss | -3.03 |
| explained_variance | 0.854 |
| learning_rate | 0.0003 |
| loss | -0.0409 |
| n_updates | 710 |
| policy_gradient_loss | -0.00505 |
| std | 1.1 |
| value_loss | 0.0196 |
------------------------------------------
------------------------------------------
| time/ | |
| fps | 1871 |
| iterations | 73 |
| time_elapsed | 638 |
| total_timesteps | 1196032 |
| train/ | |
| approx_kl | 0.0055403598 |
| clip_fraction | 0.0567 |
| clip_range | 0.2 |
| entropy_loss | -3.03 |
| explained_variance | 0.906 |
| learning_rate | 0.0003 |
| loss | -0.0324 |
| n_updates | 720 |
| policy_gradient_loss | -0.00494 |
| std | 1.1 |
| value_loss | 0.0109 |
------------------------------------------
Eval num_timesteps=1200000, episode_reward=-23.57 +/- 26.30
Episode length: 2000.00 +/- 0.00
------------------------------------------
| eval/ | |
| mean_ep_length | 2e+03 |
| mean_reward | -23.6 |
| time/ | |
| total_timesteps | 1200000 |
| train/ | |
| approx_kl | 0.0055604624 |
| clip_fraction | 0.0522 |
| clip_range | 0.2 |
| entropy_loss | -3.02 |
| explained_variance | 0.819 |
| learning_rate | 0.0003 |
| loss | -0.00379 |
| n_updates | 730 |
| policy_gradient_loss | -0.00374 |
| std | 1.1 |
| value_loss | 0.0453 |
------------------------------------------
[Diag @ 1,200,000 | n_sheep=2 | success=0%]
COMPACT_CANT_DRIVE 15/20
NEVER_COMPACT 4/20
DROVE_NO_SHEEP 1/20
action_mag mean=0.399 p10=0.067 p90=0.794 (0=stopped, 1=full speed)
min_flock_radius mean=2.96m best=0.00m (target <5m to compact)
min_dog_to_com mean=2.17m best=0.14m (FLEE_DIST=7m)
min_com_to_pen mean=11.07m best=2.66m
reward/step (mean): progress=+0.0064 alignment=+0.0000 pen_bonus=+0.0003 step_cost=-0.0200 complete=+0.0000
[Curriculum] leaving stage n_sheep=2 after 600,000 steps | training success rate (last 100 eps) = 0%
[Curriculum] → 3 sheep at step 1,200,000
--------------------------------
| time/ | |
| fps | 1828 |
| iterations | 74 |
| time_elapsed | 663 |
| total_timesteps | 1212416 |
--------------------------------
-----------------------------------------
| time/ | |
| fps | 1839 |
| iterations | 75 |
| time_elapsed | 668 |
| total_timesteps | 1228800 |
| train/ | |
| approx_kl | 0.007044647 |
| clip_fraction | 0.0819 |
| clip_range | 0.2 |
| entropy_loss | -3.02 |
| explained_variance | 0.902 |
| learning_rate | 0.0003 |
| loss | -0.00823 |
| n_updates | 740 |
| policy_gradient_loss | -0.00327 |
| std | 1.1 |
| value_loss | 0.042 |
-----------------------------------------
------------------------------------------
| time/ | |
| fps | 1849 |
| iterations | 76 |
| time_elapsed | 673 |
| total_timesteps | 1245184 |
| train/ | |
| approx_kl | 0.0064169513 |
| clip_fraction | 0.0699 |
| clip_range | 0.2 |
| entropy_loss | -3.03 |
| explained_variance | 0.928 |
| learning_rate | 0.0003 |
| loss | -0.0323 |
| n_updates | 750 |
| policy_gradient_loss | -0.00459 |
| std | 1.1 |
| value_loss | 0.0102 |
------------------------------------------
Eval num_timesteps=1250000, episode_reward=-27.97 +/- 37.55
Episode length: 2000.00 +/- 0.00
-----------------------------------------
| eval/ | |
| mean_ep_length | 2e+03 |
| mean_reward | -28 |
| time/ | |
| total_timesteps | 1250000 |
| train/ | |
| approx_kl | 0.006859841 |
| clip_fraction | 0.0783 |
| clip_range | 0.2 |
| entropy_loss | -3.04 |
| explained_variance | 0.94 |
| learning_rate | 0.0003 |
| loss | -0.0368 |
| n_updates | 760 |
| policy_gradient_loss | -0.00472 |
| std | 1.11 |
| value_loss | 0.00931 |
-----------------------------------------
--------------------------------
| time/ | |
| fps | 1825 |
| iterations | 77 |
| time_elapsed | 691 |
| total_timesteps | 1261568 |
--------------------------------
------------------------------------------
| time/ | |
| fps | 1836 |
| iterations | 78 |
| time_elapsed | 696 |
| total_timesteps | 1277952 |
| train/ | |
| approx_kl | 0.0066901552 |
| clip_fraction | 0.0704 |
| clip_range | 0.2 |
| entropy_loss | -3.04 |
| explained_variance | 0.942 |
| learning_rate | 0.0003 |
| loss | -0.0329 |
| n_updates | 770 |
| policy_gradient_loss | -0.00458 |
| std | 1.11 |
| value_loss | 0.00938 |
------------------------------------------
-----------------------------------------
| time/ | |
| fps | 1845 |
| iterations | 79 |
| time_elapsed | 701 |
| total_timesteps | 1294336 |
| train/ | |
| approx_kl | 0.007008245 |
| clip_fraction | 0.082 |
| clip_range | 0.2 |
| entropy_loss | -3.03 |
| explained_variance | 0.899 |
| learning_rate | 0.0003 |
| loss | -0.0194 |
| n_updates | 780 |
| policy_gradient_loss | -0.00426 |
| std | 1.1 |
| value_loss | 0.052 |
-----------------------------------------
Eval num_timesteps=1300000, episode_reward=-41.12 +/- 37.68
Episode length: 2000.00 +/- 0.00
------------------------------------------
| eval/ | |
| mean_ep_length | 2e+03 |
| mean_reward | -41.1 |
| time/ | |
| total_timesteps | 1300000 |
| train/ | |
| approx_kl | 0.0070775724 |
| clip_fraction | 0.0742 |
| clip_range | 0.2 |
| entropy_loss | -3.03 |
| explained_variance | 0.942 |
| learning_rate | 0.0003 |
| loss | -0.0238 |
| n_updates | 790 |
| policy_gradient_loss | -0.0052 |
| std | 1.11 |
| value_loss | 0.00657 |
------------------------------------------
--------------------------------
| time/ | |
| fps | 1823 |
| iterations | 80 |
| time_elapsed | 718 |
| total_timesteps | 1310720 |
--------------------------------
-----------------------------------------
| time/ | |
| fps | 1832 |
| iterations | 81 |
| time_elapsed | 724 |
| total_timesteps | 1327104 |
| train/ | |
| approx_kl | 0.008046751 |
| clip_fraction | 0.0851 |
| clip_range | 0.2 |
| entropy_loss | -3.04 |
| explained_variance | 0.897 |
| learning_rate | 0.0003 |
| loss | -0.0384 |
| n_updates | 800 |
| policy_gradient_loss | -0.0057 |
| std | 1.11 |
| value_loss | 0.009 |
-----------------------------------------
-----------------------------------------
| time/ | |
| fps | 1840 |
| iterations | 82 |
| time_elapsed | 730 |
| total_timesteps | 1343488 |
| train/ | |
| approx_kl | 0.006007643 |
| clip_fraction | 0.0548 |
| clip_range | 0.2 |
| entropy_loss | -3.06 |
| explained_variance | 0.871 |
| learning_rate | 0.0003 |
| loss | -0.0251 |
| n_updates | 810 |
| policy_gradient_loss | -0.00416 |
| std | 1.12 |
| value_loss | 0.0179 |
-----------------------------------------
Eval num_timesteps=1350000, episode_reward=-24.46 +/- 41.24
Episode length: 2000.00 +/- 0.00
------------------------------------------
| eval/ | |
| mean_ep_length | 2e+03 |
| mean_reward | -24.5 |
| time/ | |
| total_timesteps | 1350000 |
| train/ | |
| approx_kl | 0.0065572546 |
| clip_fraction | 0.0698 |
| clip_range | 0.2 |
| entropy_loss | -3.08 |
| explained_variance | 0.877 |
| learning_rate | 0.0003 |
| loss | -0.0219 |
| n_updates | 820 |
| policy_gradient_loss | -0.00456 |
| std | 1.13 |
| value_loss | 0.0242 |
------------------------------------------
[Diag @ 1,350,000 | n_sheep=3 | success=0%]
NEVER_COMPACT 14/20
COMPACT_CANT_DRIVE 6/20
action_mag mean=0.195 p10=0.018 p90=0.576 (0=stopped, 1=full speed)
min_flock_radius mean=6.32m best=1.36m (target <5m to compact)
min_dog_to_com mean=4.15m best=0.61m (FLEE_DIST=7m)
min_com_to_pen mean=11.37m best=4.88m
reward/step (mean): progress=+0.0029 alignment=+0.0000 pen_bonus=+0.0000 step_cost=-0.0200 complete=+0.0000
--------------------------------
| time/ | |
| fps | 1798 |
| iterations | 83 |
| time_elapsed | 756 |
| total_timesteps | 1359872 |
--------------------------------
------------------------------------------
| time/ | |
| fps | 1809 |
| iterations | 84 |
| time_elapsed | 760 |
| total_timesteps | 1376256 |
| train/ | |
| approx_kl | 0.0072198315 |
| clip_fraction | 0.0764 |
| clip_range | 0.2 |
| entropy_loss | -3.08 |
| explained_variance | 0.909 |
| learning_rate | 0.0003 |
| loss | -0.0208 |
| n_updates | 830 |
| policy_gradient_loss | -0.00626 |
| std | 1.13 |
| value_loss | 0.0106 |
------------------------------------------
------------------------------------------
| time/ | |
| fps | 1817 |
| iterations | 85 |
| time_elapsed | 766 |
| total_timesteps | 1392640 |
| train/ | |
| approx_kl | 0.0070813587 |
| clip_fraction | 0.0733 |
| clip_range | 0.2 |
| entropy_loss | -3.08 |
| explained_variance | 0.907 |
| learning_rate | 0.0003 |
| loss | -0.0324 |
| n_updates | 840 |
| policy_gradient_loss | -0.00505 |
| std | 1.13 |
| value_loss | 0.0166 |
------------------------------------------
Eval num_timesteps=1400000, episode_reward=-36.32 +/- 33.15
Episode length: 2000.00 +/- 0.00
------------------------------------------
| eval/ | |
| mean_ep_length | 2e+03 |
| mean_reward | -36.3 |
| time/ | |
| total_timesteps | 1400000 |
| train/ | |
| approx_kl | 0.0067584305 |
| clip_fraction | 0.08 |
| clip_range | 0.2 |
| entropy_loss | -3.08 |
| explained_variance | 0.906 |
| learning_rate | 0.0003 |
| loss | -0.0308 |
| n_updates | 850 |
| policy_gradient_loss | -0.0054 |
| std | 1.13 |
| value_loss | 0.0112 |
------------------------------------------
--------------------------------
| time/ | |
| fps | 1798 |
| iterations | 86 |
| time_elapsed | 783 |
| total_timesteps | 1409024 |
--------------------------------
-----------------------------------------
| time/ | |
| fps | 1807 |
| iterations | 87 |
| time_elapsed | 788 |
| total_timesteps | 1425408 |
| train/ | |
| approx_kl | 0.007411341 |
| clip_fraction | 0.0716 |
| clip_range | 0.2 |
| entropy_loss | -3.09 |
| explained_variance | 0.904 |
| learning_rate | 0.0003 |
| loss | -0.0322 |
| n_updates | 860 |
| policy_gradient_loss | -0.00641 |
| std | 1.14 |
| value_loss | 0.0191 |
-----------------------------------------
------------------------------------------
| time/ | |
| fps | 1815 |
| iterations | 88 |
| time_elapsed | 794 |
| total_timesteps | 1441792 |
| train/ | |
| approx_kl | 0.0077011855 |
| clip_fraction | 0.0774 |
| clip_range | 0.2 |
| entropy_loss | -3.09 |
| explained_variance | 0.914 |
| learning_rate | 0.0003 |
| loss | -0.0316 |
| n_updates | 870 |
| policy_gradient_loss | -0.00545 |
| std | 1.13 |
| value_loss | 0.0148 |
------------------------------------------
Eval num_timesteps=1450000, episode_reward=-40.58 +/- 38.17
Episode length: 2000.00 +/- 0.00
-----------------------------------------
| eval/ | |
| mean_ep_length | 2e+03 |
| mean_reward | -40.6 |
| time/ | |
| total_timesteps | 1450000 |
| train/ | |
| approx_kl | 0.007694071 |
| clip_fraction | 0.0816 |
| clip_range | 0.2 |
| entropy_loss | -3.07 |
| explained_variance | 0.937 |
| learning_rate | 0.0003 |
| loss | -0.036 |
| n_updates | 880 |
| policy_gradient_loss | -0.0054 |
| std | 1.12 |
| value_loss | 0.0111 |
-----------------------------------------
--------------------------------
| time/ | |
| fps | 1796 |
| iterations | 89 |
| time_elapsed | 811 |
| total_timesteps | 1458176 |
--------------------------------
-----------------------------------------
| time/ | |
| fps | 1805 |
| iterations | 90 |
| time_elapsed | 816 |
| total_timesteps | 1474560 |
| train/ | |
| approx_kl | 0.007034345 |
| clip_fraction | 0.0693 |
| clip_range | 0.2 |
| entropy_loss | -3.07 |
| explained_variance | 0.924 |
| learning_rate | 0.0003 |
| loss | 0.0472 |
| n_updates | 890 |
| policy_gradient_loss | -0.00472 |
| std | 1.13 |
| value_loss | 0.0352 |
-----------------------------------------
------------------------------------------
| time/ | |
| fps | 1815 |
| iterations | 91 |
| time_elapsed | 821 |
| total_timesteps | 1490944 |
| train/ | |
| approx_kl | 0.0078114523 |
| clip_fraction | 0.0917 |
| clip_range | 0.2 |
| entropy_loss | -3.08 |
| explained_variance | 0.942 |
| learning_rate | 0.0003 |
| loss | -0.0461 |
| n_updates | 900 |
| policy_gradient_loss | -0.00668 |
| std | 1.13 |
| value_loss | 0.00844 |
------------------------------------------
Eval num_timesteps=1500000, episode_reward=-19.66 +/- 25.98
Episode length: 2000.00 +/- 0.00
------------------------------------------
| eval/ | |
| mean_ep_length | 2e+03 |
| mean_reward | -19.7 |
| time/ | |
| total_timesteps | 1500000 |
| train/ | |
| approx_kl | 0.0067999987 |
| clip_fraction | 0.0606 |
| clip_range | 0.2 |
| entropy_loss | -3.08 |
| explained_variance | 0.893 |
| learning_rate | 0.0003 |
| loss | -0.0283 |
| n_updates | 910 |
| policy_gradient_loss | -0.00385 |
| std | 1.12 |
| value_loss | 0.0409 |
------------------------------------------
New best mean reward!
[Diag @ 1,500,000 | n_sheep=3 | success=0%]
COMPACT_CANT_DRIVE 11/20
NEVER_COMPACT 7/20
DROVE_NO_SHEEP 2/20
action_mag mean=0.185 p10=0.015 p90=0.426 (0=stopped, 1=full speed)
min_flock_radius mean=4.43m best=1.38m (target <5m to compact)
min_dog_to_com mean=2.89m best=0.07m (FLEE_DIST=7m)
min_com_to_pen mean=11.88m best=2.23m
reward/step (mean): progress=+0.0008 alignment=+0.0000 pen_bonus=+0.0003 step_cost=-0.0200 complete=+0.0000
--------------------------------
| time/ | |
| fps | 1781 |
| iterations | 92 |
| time_elapsed | 846 |
| total_timesteps | 1507328 |
--------------------------------
------------------------------------------
| time/ | |
| fps | 1789 |
| iterations | 93 |
| time_elapsed | 851 |
| total_timesteps | 1523712 |
| train/ | |
| approx_kl | 0.0069550863 |
| clip_fraction | 0.0787 |
| clip_range | 0.2 |
| entropy_loss | -3.08 |
| explained_variance | 0.897 |
| learning_rate | 0.0003 |
| loss | -0.0204 |
| n_updates | 920 |
| policy_gradient_loss | -0.00394 |
| std | 1.13 |
| value_loss | 0.0324 |
------------------------------------------
-----------------------------------------
| time/ | |
| fps | 1798 |
| iterations | 94 |
| time_elapsed | 856 |
| total_timesteps | 1540096 |
| train/ | |
| approx_kl | 0.006749108 |
| clip_fraction | 0.0787 |
| clip_range | 0.2 |
| entropy_loss | -3.08 |
| explained_variance | 0.929 |
| learning_rate | 0.0003 |
| loss | -0.0338 |
| n_updates | 930 |
| policy_gradient_loss | -0.00534 |
| std | 1.13 |
| value_loss | 0.00967 |
-----------------------------------------
Eval num_timesteps=1550000, episode_reward=-26.47 +/- 25.94
Episode length: 2000.00 +/- 0.00
------------------------------------------
| eval/ | |
| mean_ep_length | 2e+03 |
| mean_reward | -26.5 |
| time/ | |
| total_timesteps | 1550000 |
| train/ | |
| approx_kl | 0.0073381998 |
| clip_fraction | 0.0679 |
| clip_range | 0.2 |
| entropy_loss | -3.08 |
| explained_variance | 0.919 |
| learning_rate | 0.0003 |
| loss | -0.0259 |
| n_updates | 940 |
| policy_gradient_loss | -0.00554 |
| std | 1.13 |
| value_loss | 0.00999 |
------------------------------------------
--------------------------------
| time/ | |
| fps | 1782 |
| iterations | 95 |
| time_elapsed | 873 |
| total_timesteps | 1556480 |
--------------------------------
------------------------------------------
| time/ | |
| fps | 1790 |
| iterations | 96 |
| time_elapsed | 878 |
| total_timesteps | 1572864 |
| train/ | |
| approx_kl | 0.0071112993 |
| clip_fraction | 0.0781 |
| clip_range | 0.2 |
| entropy_loss | -3.08 |
| explained_variance | 0.929 |
| learning_rate | 0.0003 |
| loss | -0.0324 |
| n_updates | 950 |
| policy_gradient_loss | -0.00428 |
| std | 1.13 |
| value_loss | 0.0246 |
------------------------------------------
------------------------------------------
| time/ | |
| fps | 1798 |
| iterations | 97 |
| time_elapsed | 883 |
| total_timesteps | 1589248 |
| train/ | |
| approx_kl | 0.0077134473 |
| clip_fraction | 0.0784 |
| clip_range | 0.2 |
| entropy_loss | -3.08 |
| explained_variance | 0.917 |
| learning_rate | 0.0003 |
| loss | -0.0365 |
| n_updates | 960 |
| policy_gradient_loss | -0.00445 |
| std | 1.13 |
| value_loss | 0.0122 |
------------------------------------------
Eval num_timesteps=1600000, episode_reward=-35.13 +/- 31.01
Episode length: 2000.00 +/- 0.00
------------------------------------------
| eval/ | |
| mean_ep_length | 2e+03 |
| mean_reward | -35.1 |
| time/ | |
| total_timesteps | 1600000 |
| train/ | |
| approx_kl | 0.0070123896 |
| clip_fraction | 0.0712 |
| clip_range | 0.2 |
| entropy_loss | -3.07 |
| explained_variance | 0.919 |
| learning_rate | 0.0003 |
| loss | -0.026 |
| n_updates | 970 |
| policy_gradient_loss | -0.00519 |
| std | 1.13 |
| value_loss | 0.0171 |
------------------------------------------
--------------------------------
| time/ | |
| fps | 1781 |
| iterations | 98 |
| time_elapsed | 901 |
| total_timesteps | 1605632 |
--------------------------------
-----------------------------------------
| time/ | |
| fps | 1789 |
| iterations | 99 |
| time_elapsed | 906 |
| total_timesteps | 1622016 |
| train/ | |
| approx_kl | 0.007990176 |
| clip_fraction | 0.0845 |
| clip_range | 0.2 |
| entropy_loss | -3.07 |
| explained_variance | 0.873 |
| learning_rate | 0.0003 |
| loss | -0.04 |
| n_updates | 980 |
| policy_gradient_loss | -0.0045 |
| std | 1.13 |
| value_loss | 0.0153 |
-----------------------------------------
-----------------------------------------
| time/ | |
| fps | 1798 |
| iterations | 100 |
| time_elapsed | 911 |
| total_timesteps | 1638400 |
| train/ | |
| approx_kl | 0.006477687 |
| clip_fraction | 0.0593 |
| clip_range | 0.2 |
| entropy_loss | -3.07 |
| explained_variance | 0.946 |
| learning_rate | 0.0003 |
| loss | -0.0396 |
| n_updates | 990 |
| policy_gradient_loss | -0.00442 |
| std | 1.13 |
| value_loss | 0.0107 |
-----------------------------------------
Eval num_timesteps=1650000, episode_reward=-31.86 +/- 47.05
Episode length: 2000.00 +/- 0.00
-----------------------------------------
| eval/ | |
| mean_ep_length | 2e+03 |
| mean_reward | -31.9 |
| time/ | |
| total_timesteps | 1650000 |
| train/ | |
| approx_kl | 0.006796476 |
| clip_fraction | 0.0672 |
| clip_range | 0.2 |
| entropy_loss | -3.08 |
| explained_variance | 0.929 |
| learning_rate | 0.0003 |
| loss | -0.0264 |
| n_updates | 1000 |
| policy_gradient_loss | -0.00375 |
| std | 1.13 |
| value_loss | 0.0385 |
-----------------------------------------
[Diag @ 1,650,000 | n_sheep=3 | success=0%]
NEVER_COMPACT 11/20
COMPACT_CANT_DRIVE 9/20
action_mag mean=0.154 p10=0.005 p90=0.398 (0=stopped, 1=full speed)
min_flock_radius mean=5.81m best=0.00m (target <5m to compact)
min_dog_to_com mean=3.22m best=0.52m (FLEE_DIST=7m)
min_com_to_pen mean=13.42m best=7.08m
reward/step (mean): progress=+0.0061 alignment=+0.0000 pen_bonus=+0.0010 step_cost=-0.0200 complete=+0.0000
--------------------------------
| time/ | |
| fps | 1768 |
| iterations | 101 |
| time_elapsed | 935 |
| total_timesteps | 1654784 |
--------------------------------
----------------------------------------
| time/ | |
| fps | 1774 |
| iterations | 102 |
| time_elapsed | 941 |
| total_timesteps | 1671168 |
| train/ | |
| approx_kl | 0.00682881 |
| clip_fraction | 0.0694 |
| clip_range | 0.2 |
| entropy_loss | -3.08 |
| explained_variance | 0.939 |
| learning_rate | 0.0003 |
| loss | -0.0233 |
| n_updates | 1010 |
| policy_gradient_loss | -0.00461 |
| std | 1.13 |
| value_loss | 0.0183 |
----------------------------------------
------------------------------------------
| time/ | |
| fps | 1779 |
| iterations | 103 |
| time_elapsed | 948 |
| total_timesteps | 1687552 |
| train/ | |
| approx_kl | 0.0071003223 |
| clip_fraction | 0.0782 |
| clip_range | 0.2 |
| entropy_loss | -3.1 |
| explained_variance | 0.923 |
| learning_rate | 0.0003 |
| loss | -0.0398 |
| n_updates | 1020 |
| policy_gradient_loss | -0.00491 |
| std | 1.15 |
| value_loss | 0.0101 |
------------------------------------------
Eval num_timesteps=1700000, episode_reward=-32.11 +/- 36.59
Episode length: 2000.00 +/- 0.00
------------------------------------------
| eval/ | |
| mean_ep_length | 2e+03 |
| mean_reward | -32.1 |
| time/ | |
| total_timesteps | 1700000 |
| train/ | |
| approx_kl | 0.0064870613 |
| clip_fraction | 0.0624 |
| clip_range | 0.2 |
| entropy_loss | -3.13 |
| explained_variance | 0.909 |
| learning_rate | 0.0003 |
| loss | -0.0365 |
| n_updates | 1030 |
| policy_gradient_loss | -0.00404 |
| std | 1.17 |
| value_loss | 0.00855 |
------------------------------------------
--------------------------------
| time/ | |
| fps | 1762 |
| iterations | 104 |
| time_elapsed | 966 |
| total_timesteps | 1703936 |
--------------------------------
-----------------------------------------
| time/ | |
| fps | 1769 |
| iterations | 105 |
| time_elapsed | 972 |
| total_timesteps | 1720320 |
| train/ | |
| approx_kl | 0.007349294 |
| clip_fraction | 0.0833 |
| clip_range | 0.2 |
| entropy_loss | -3.15 |
| explained_variance | 0.926 |
| learning_rate | 0.0003 |
| loss | -0.0358 |
| n_updates | 1040 |
| policy_gradient_loss | -0.00514 |
| std | 1.17 |
| value_loss | 0.00848 |
-----------------------------------------
------------------------------------------
| time/ | |
| fps | 1777 |
| iterations | 106 |
| time_elapsed | 976 |
| total_timesteps | 1736704 |
| train/ | |
| approx_kl | 0.0070306472 |
| clip_fraction | 0.0814 |
| clip_range | 0.2 |
| entropy_loss | -3.15 |
| explained_variance | 0.887 |
| learning_rate | 0.0003 |
| loss | -0.0359 |
| n_updates | 1050 |
| policy_gradient_loss | -0.00489 |
| std | 1.17 |
| value_loss | 0.0134 |
------------------------------------------
Eval num_timesteps=1750000, episode_reward=-34.24 +/- 43.23
Episode length: 2000.00 +/- 0.00
-----------------------------------------
| eval/ | |
| mean_ep_length | 2e+03 |
| mean_reward | -34.2 |
| time/ | |
| total_timesteps | 1750000 |
| train/ | |
| approx_kl | 0.008487761 |
| clip_fraction | 0.102 |
| clip_range | 0.2 |
| entropy_loss | -3.15 |
| explained_variance | 0.962 |
| learning_rate | 0.0003 |
| loss | -0.0369 |
| n_updates | 1060 |
| policy_gradient_loss | -0.0077 |
| std | 1.17 |
| value_loss | 0.00786 |
-----------------------------------------
--------------------------------
| time/ | |
| fps | 1762 |
| iterations | 107 |
| time_elapsed | 994 |
| total_timesteps | 1753088 |
--------------------------------
------------------------------------------
| time/ | |
| fps | 1766 |
| iterations | 108 |
| time_elapsed | 1001 |
| total_timesteps | 1769472 |
| train/ | |
| approx_kl | 0.0074267983 |
| clip_fraction | 0.0742 |
| clip_range | 0.2 |
| entropy_loss | -3.15 |
| explained_variance | 0.939 |
| learning_rate | 0.0003 |
| loss | -0.0404 |
| n_updates | 1070 |
| policy_gradient_loss | -0.00575 |
| std | 1.18 |
| value_loss | 0.0158 |
------------------------------------------
------------------------------------------
| time/ | |
| fps | 1772 |
| iterations | 109 |
| time_elapsed | 1007 |
| total_timesteps | 1785856 |
| train/ | |
| approx_kl | 0.0075380025 |
| clip_fraction | 0.074 |
| clip_range | 0.2 |
| entropy_loss | -3.15 |
| explained_variance | 0.961 |
| learning_rate | 0.0003 |
| loss | -0.034 |
| n_updates | 1080 |
| policy_gradient_loss | -0.00553 |
| std | 1.17 |
| value_loss | 0.00651 |
------------------------------------------
Eval num_timesteps=1800000, episode_reward=-31.16 +/- 37.32
Episode length: 2000.00 +/- 0.00
-----------------------------------------
| eval/ | |
| mean_ep_length | 2e+03 |
| mean_reward | -31.2 |
| time/ | |
| total_timesteps | 1800000 |
| train/ | |
| approx_kl | 0.007386248 |
| clip_fraction | 0.0843 |
| clip_range | 0.2 |
| entropy_loss | -3.15 |
| explained_variance | 0.922 |
| learning_rate | 0.0003 |
| loss | -0.0419 |
| n_updates | 1090 |
| policy_gradient_loss | -0.00596 |
| std | 1.17 |
| value_loss | 0.00858 |
-----------------------------------------
[Diag @ 1,800,000 | n_sheep=3 | success=0%]
NEVER_COMPACT 17/20
COMPACT_CANT_DRIVE 3/20
action_mag mean=0.164 p10=0.007 p90=0.418 (0=stopped, 1=full speed)
min_flock_radius mean=7.52m best=2.00m (target <5m to compact)
min_dog_to_com mean=2.24m best=0.21m (FLEE_DIST=7m)
min_com_to_pen mean=12.87m best=3.90m
reward/step (mean): progress=-0.0007 alignment=+0.0000 pen_bonus=+0.0005 step_cost=-0.0200 complete=+0.0000
[Curriculum] leaving stage n_sheep=3 after 600,000 steps | training success rate (last 100 eps) = 0%
[Curriculum] → 4 sheep at step 1,800,000
--------------------------------
| time/ | |
| fps | 1743 |
| iterations | 110 |
| time_elapsed | 1033 |
| total_timesteps | 1802240 |
--------------------------------
-----------------------------------------
| time/ | |
| fps | 1749 |
| iterations | 111 |
| time_elapsed | 1039 |
| total_timesteps | 1818624 |
| train/ | |
| approx_kl | 0.009158293 |
| clip_fraction | 0.0991 |
| clip_range | 0.2 |
| entropy_loss | -3.15 |
| explained_variance | 0.893 |
| learning_rate | 0.0003 |
| loss | -0.0414 |
| n_updates | 1100 |
| policy_gradient_loss | -0.00701 |
| std | 1.17 |
| value_loss | 0.0237 |
-----------------------------------------
-----------------------------------------
| time/ | |
| fps | 1755 |
| iterations | 112 |
| time_elapsed | 1045 |
| total_timesteps | 1835008 |
| train/ | |
| approx_kl | 0.007241189 |
| clip_fraction | 0.0831 |
| clip_range | 0.2 |
| entropy_loss | -3.15 |
| explained_variance | 0.874 |
| learning_rate | 0.0003 |
| loss | -0.0241 |
| n_updates | 1110 |
| policy_gradient_loss | -0.00634 |
| std | 1.17 |
| value_loss | 0.0226 |
-----------------------------------------
Eval num_timesteps=1850000, episode_reward=-29.45 +/- 31.10
Episode length: 2000.00 +/- 0.00
---------------------------------------
| eval/ | |
| mean_ep_length | 2e+03 |
| mean_reward | -29.5 |
| time/ | |
| total_timesteps | 1850000 |
| train/ | |
| approx_kl | 0.0078688 |
| clip_fraction | 0.0777 |
| clip_range | 0.2 |
| entropy_loss | -3.15 |
| explained_variance | 0.895 |
| learning_rate | 0.0003 |
| loss | -0.036 |
| n_updates | 1120 |
| policy_gradient_loss | -0.00602 |
| std | 1.17 |
| value_loss | 0.0128 |
---------------------------------------
--------------------------------
| time/ | |
| fps | 1742 |
| iterations | 113 |
| time_elapsed | 1062 |
| total_timesteps | 1851392 |
--------------------------------
-----------------------------------------
| time/ | |
| fps | 1749 |
| iterations | 114 |
| time_elapsed | 1067 |
| total_timesteps | 1867776 |
| train/ | |
| approx_kl | 0.008158936 |
| clip_fraction | 0.0963 |
| clip_range | 0.2 |
| entropy_loss | -3.14 |
| explained_variance | 0.897 |
| learning_rate | 0.0003 |
| loss | -0.0324 |
| n_updates | 1130 |
| policy_gradient_loss | -0.00854 |
| std | 1.17 |
| value_loss | 0.0144 |
-----------------------------------------
------------------------------------------
| time/ | |
| fps | 1754 |
| iterations | 115 |
| time_elapsed | 1073 |
| total_timesteps | 1884160 |
| train/ | |
| approx_kl | 0.0074978825 |
| clip_fraction | 0.0844 |
| clip_range | 0.2 |
| entropy_loss | -3.14 |
| explained_variance | 0.92 |
| learning_rate | 0.0003 |
| loss | -0.0246 |
| n_updates | 1140 |
| policy_gradient_loss | -0.00578 |
| std | 1.16 |
| value_loss | 0.0134 |
------------------------------------------
Eval num_timesteps=1900000, episode_reward=-38.21 +/- 31.08
Episode length: 2000.00 +/- 0.00
----------------------------------------
| eval/ | |
| mean_ep_length | 2e+03 |
| mean_reward | -38.2 |
| time/ | |
| total_timesteps | 1900000 |
| train/ | |
| approx_kl | 0.00678163 |
| clip_fraction | 0.0711 |
| clip_range | 0.2 |
| entropy_loss | -3.15 |
| explained_variance | 0.892 |
| learning_rate | 0.0003 |
| loss | -0.0345 |
| n_updates | 1150 |
| policy_gradient_loss | -0.00409 |
| std | 1.18 |
| value_loss | 0.0221 |
----------------------------------------
--------------------------------
| time/ | |
| fps | 1740 |
| iterations | 116 |
| time_elapsed | 1091 |
| total_timesteps | 1900544 |
--------------------------------
-----------------------------------------
| time/ | |
| fps | 1746 |
| iterations | 117 |
| time_elapsed | 1097 |
| total_timesteps | 1916928 |
| train/ | |
| approx_kl | 0.006992462 |
| clip_fraction | 0.0731 |
| clip_range | 0.2 |
| entropy_loss | -3.16 |
| explained_variance | 0.895 |
| learning_rate | 0.0003 |
| loss | -0.0243 |
| n_updates | 1160 |
| policy_gradient_loss | -0.00588 |
| std | 1.18 |
| value_loss | 0.0145 |
-----------------------------------------
------------------------------------------
| time/ | |
| fps | 1750 |
| iterations | 118 |
| time_elapsed | 1104 |
| total_timesteps | 1933312 |
| train/ | |
| approx_kl | 0.0069225584 |
| clip_fraction | 0.068 |
| clip_range | 0.2 |
| entropy_loss | -3.15 |
| explained_variance | 0.905 |
| learning_rate | 0.0003 |
| loss | -0.0297 |
| n_updates | 1170 |
| policy_gradient_loss | -0.00516 |
| std | 1.17 |
| value_loss | 0.0153 |
------------------------------------------
-----------------------------------------
| time/ | |
| fps | 1756 |
| iterations | 119 |
| time_elapsed | 1109 |
| total_timesteps | 1949696 |
| train/ | |
| approx_kl | 0.005966103 |
| clip_fraction | 0.059 |
| clip_range | 0.2 |
| entropy_loss | -3.15 |
| explained_variance | 0.896 |
| learning_rate | 0.0003 |
| loss | -0.0337 |
| n_updates | 1180 |
| policy_gradient_loss | -0.00413 |
| std | 1.17 |
| value_loss | 0.0091 |
-----------------------------------------
Eval num_timesteps=1950000, episode_reward=-59.72 +/- 38.15
Episode length: 2000.00 +/- 0.00
------------------------------------------
| eval/ | |
| mean_ep_length | 2e+03 |
| mean_reward | -59.7 |
| time/ | |
| total_timesteps | 1950000 |
| train/ | |
| approx_kl | 0.0067311125 |
| clip_fraction | 0.0733 |
| clip_range | 0.2 |
| entropy_loss | -3.16 |
| explained_variance | 0.861 |
| learning_rate | 0.0003 |
| loss | -0.0147 |
| n_updates | 1190 |
| policy_gradient_loss | -0.00459 |
| std | 1.18 |
| value_loss | 0.0083 |
------------------------------------------
[Diag @ 1,950,000 | n_sheep=4 | success=0%]
NEVER_COMPACT 14/20
COMPACT_CANT_DRIVE 6/20
action_mag mean=0.325 p10=0.025 p90=0.778 (0=stopped, 1=full speed)
min_flock_radius mean=7.27m best=2.17m (target <5m to compact)
min_dog_to_com mean=3.74m best=0.07m (FLEE_DIST=7m)
min_com_to_pen mean=13.01m best=6.24m
reward/step (mean): progress=+0.0026 alignment=+0.0000 pen_bonus=+0.0005 step_cost=-0.0200 complete=+0.0000
--------------------------------
| time/ | |
| fps | 1728 |
| iterations | 120 |
| time_elapsed | 1137 |
| total_timesteps | 1966080 |
--------------------------------
------------------------------------------
| time/ | |
| fps | 1734 |
| iterations | 121 |
| time_elapsed | 1143 |
| total_timesteps | 1982464 |
| train/ | |
| approx_kl | 0.0061555626 |
| clip_fraction | 0.0631 |
| clip_range | 0.2 |
| entropy_loss | -3.17 |
| explained_variance | 0.932 |
| learning_rate | 0.0003 |
| loss | -0.0328 |
| n_updates | 1200 |
| policy_gradient_loss | -0.00446 |
| std | 1.19 |
| value_loss | 0.0133 |
------------------------------------------
------------------------------------------
| time/ | |
| fps | 1739 |
| iterations | 122 |
| time_elapsed | 1149 |
| total_timesteps | 1998848 |
| train/ | |
| approx_kl | 0.0060347347 |
| clip_fraction | 0.057 |
| clip_range | 0.2 |
| entropy_loss | -3.18 |
| explained_variance | 0.841 |
| learning_rate | 0.0003 |
| loss | -0.0352 |
| n_updates | 1210 |
| policy_gradient_loss | -0.00322 |
| std | 1.19 |
| value_loss | 0.0104 |
------------------------------------------
Eval num_timesteps=2000000, episode_reward=-37.97 +/- 46.26
Episode length: 2000.00 +/- 0.00
------------------------------------------
| eval/ | |
| mean_ep_length | 2e+03 |
| mean_reward | -38 |
| time/ | |
| total_timesteps | 2000000 |
| train/ | |
| approx_kl | 0.0063244104 |
| clip_fraction | 0.0675 |
| clip_range | 0.2 |
| entropy_loss | -3.18 |
| explained_variance | 0.865 |
| learning_rate | 0.0003 |
| loss | -0.0217 |
| n_updates | 1220 |
| policy_gradient_loss | -0.00489 |
| std | 1.2 |
| value_loss | 0.0219 |
------------------------------------------
--------------------------------
| time/ | |
| fps | 1725 |
| iterations | 123 |
| time_elapsed | 1167 |
| total_timesteps | 2015232 |
--------------------------------
-----------------------------------------
| time/ | |
| fps | 1730 |
| iterations | 124 |
| time_elapsed | 1173 |
| total_timesteps | 2031616 |
| train/ | |
| approx_kl | 0.007022621 |
| clip_fraction | 0.0816 |
| clip_range | 0.2 |
| entropy_loss | -3.19 |
| explained_variance | 0.949 |
| learning_rate | 0.0003 |
| loss | -0.0248 |
| n_updates | 1230 |
| policy_gradient_loss | -0.0053 |
| std | 1.19 |
| value_loss | 0.00677 |
-----------------------------------------
-----------------------------------------
| time/ | |
| fps | 1735 |
| iterations | 125 |
| time_elapsed | 1179 |
| total_timesteps | 2048000 |
| train/ | |
| approx_kl | 0.006686856 |
| clip_fraction | 0.0653 |
| clip_range | 0.2 |
| entropy_loss | -3.18 |
| explained_variance | 0.928 |
| learning_rate | 0.0003 |
| loss | -0.0333 |
| n_updates | 1240 |
| policy_gradient_loss | -0.00445 |
| std | 1.19 |
| value_loss | 0.00651 |
-----------------------------------------
Eval num_timesteps=2050000, episode_reward=-27.67 +/- 36.42
Episode length: 2000.00 +/- 0.00
-----------------------------------------
| eval/ | |
| mean_ep_length | 2e+03 |
| mean_reward | -27.7 |
| time/ | |
| total_timesteps | 2050000 |
| train/ | |
| approx_kl | 0.006721792 |
| clip_fraction | 0.0675 |
| clip_range | 0.2 |
| entropy_loss | -3.2 |
| explained_variance | 0.921 |
| learning_rate | 0.0003 |
| loss | -0.0278 |
| n_updates | 1250 |
| policy_gradient_loss | -0.00408 |
| std | 1.21 |
| value_loss | 0.00793 |
-----------------------------------------
--------------------------------
| time/ | |
| fps | 1721 |
| iterations | 126 |
| time_elapsed | 1198 |
| total_timesteps | 2064384 |
--------------------------------
-----------------------------------------
| time/ | |
| fps | 1726 |
| iterations | 127 |
| time_elapsed | 1205 |
| total_timesteps | 2080768 |
| train/ | |
| approx_kl | 0.006730888 |
| clip_fraction | 0.0617 |
| clip_range | 0.2 |
| entropy_loss | -3.23 |
| explained_variance | 0.911 |
| learning_rate | 0.0003 |
| loss | -0.0276 |
| n_updates | 1260 |
| policy_gradient_loss | -0.00378 |
| std | 1.22 |
| value_loss | 0.00964 |
-----------------------------------------
-----------------------------------------
| time/ | |
| fps | 1732 |
| iterations | 128 |
| time_elapsed | 1210 |
| total_timesteps | 2097152 |
| train/ | |
| approx_kl | 0.007725292 |
| clip_fraction | 0.0775 |
| clip_range | 0.2 |
| entropy_loss | -3.23 |
| explained_variance | 0.913 |
| learning_rate | 0.0003 |
| loss | -0.0371 |
| n_updates | 1270 |
| policy_gradient_loss | -0.006 |
| std | 1.22 |
| value_loss | 0.0109 |
-----------------------------------------
Eval num_timesteps=2100000, episode_reward=-40.56 +/- 44.37
Episode length: 2000.00 +/- 0.00
------------------------------------------
| eval/ | |
| mean_ep_length | 2e+03 |
| mean_reward | -40.6 |
| time/ | |
| total_timesteps | 2100000 |
| train/ | |
| approx_kl | 0.0067186276 |
| clip_fraction | 0.0644 |
| clip_range | 0.2 |
| entropy_loss | -3.24 |
| explained_variance | 0.845 |
| learning_rate | 0.0003 |
| loss | -0.0357 |
| n_updates | 1280 |
| policy_gradient_loss | -0.00433 |
| std | 1.23 |
| value_loss | 0.0263 |
------------------------------------------
[Diag @ 2,100,000 | n_sheep=4 | success=0%]
NEVER_COMPACT 12/20
COMPACT_CANT_DRIVE 8/20
action_mag mean=0.384 p10=0.018 p90=0.884 (0=stopped, 1=full speed)
min_flock_radius mean=6.36m best=2.11m (target <5m to compact)
min_dog_to_com mean=2.94m best=0.40m (FLEE_DIST=7m)
min_com_to_pen mean=12.34m best=5.56m
reward/step (mean): progress=-0.0084 alignment=+0.0000 pen_bonus=+0.0003 step_cost=-0.0200 complete=+0.0000
--------------------------------
| time/ | |
| fps | 1706 |
| iterations | 129 |
| time_elapsed | 1238 |
| total_timesteps | 2113536 |
--------------------------------
-----------------------------------------
| time/ | |
| fps | 1712 |
| iterations | 130 |
| time_elapsed | 1243 |
| total_timesteps | 2129920 |
| train/ | |
| approx_kl | 0.006317258 |
| clip_fraction | 0.0623 |
| clip_range | 0.2 |
| entropy_loss | -3.26 |
| explained_variance | 0.912 |
| learning_rate | 0.0003 |
| loss | -0.0419 |
| n_updates | 1290 |
| policy_gradient_loss | -0.00427 |
| std | 1.24 |
| value_loss | 0.00859 |
-----------------------------------------
----------------------------------------
| time/ | |
| fps | 1716 |
| iterations | 131 |
| time_elapsed | 1250 |
| total_timesteps | 2146304 |
| train/ | |
| approx_kl | 0.00636432 |
| clip_fraction | 0.0698 |
| clip_range | 0.2 |
| entropy_loss | -3.28 |
| explained_variance | 0.851 |
| learning_rate | 0.0003 |
| loss | -0.0266 |
| n_updates | 1300 |
| policy_gradient_loss | -0.00374 |
| std | 1.25 |
| value_loss | 0.0299 |
----------------------------------------
Eval num_timesteps=2150000, episode_reward=-63.32 +/- 33.74
Episode length: 2000.00 +/- 0.00
------------------------------------------
| eval/ | |
| mean_ep_length | 2e+03 |
| mean_reward | -63.3 |
| time/ | |
| total_timesteps | 2150000 |
| train/ | |
| approx_kl | 0.0060345423 |
| clip_fraction | 0.0563 |
| clip_range | 0.2 |
| entropy_loss | -3.27 |
| explained_variance | 0.898 |
| learning_rate | 0.0003 |
| loss | -0.0404 |
| n_updates | 1310 |
| policy_gradient_loss | -0.00356 |
| std | 1.24 |
| value_loss | 0.0205 |
------------------------------------------
--------------------------------
| time/ | |
| fps | 1704 |
| iterations | 132 |
| time_elapsed | 1268 |
| total_timesteps | 2162688 |
--------------------------------
-----------------------------------------
| time/ | |
| fps | 1709 |
| iterations | 133 |
| time_elapsed | 1274 |
| total_timesteps | 2179072 |
| train/ | |
| approx_kl | 0.007027424 |
| clip_fraction | 0.0693 |
| clip_range | 0.2 |
| entropy_loss | -3.25 |
| explained_variance | 0.9 |
| learning_rate | 0.0003 |
| loss | -0.0315 |
| n_updates | 1320 |
| policy_gradient_loss | -0.00521 |
| std | 1.23 |
| value_loss | 0.0194 |
-----------------------------------------
-----------------------------------------
| time/ | |
| fps | 1715 |
| iterations | 134 |
| time_elapsed | 1279 |
| total_timesteps | 2195456 |
| train/ | |
| approx_kl | 0.006112649 |
| clip_fraction | 0.0635 |
| clip_range | 0.2 |
| entropy_loss | -3.24 |
| explained_variance | 0.957 |
| learning_rate | 0.0003 |
| loss | -0.0339 |
| n_updates | 1330 |
| policy_gradient_loss | -0.00383 |
| std | 1.23 |
| value_loss | 0.00861 |
-----------------------------------------
Eval num_timesteps=2200000, episode_reward=-31.28 +/- 44.80
Episode length: 2000.00 +/- 0.00
------------------------------------------
| eval/ | |
| mean_ep_length | 2e+03 |
| mean_reward | -31.3 |
| time/ | |
| total_timesteps | 2200000 |
| train/ | |
| approx_kl | 0.0070182728 |
| clip_fraction | 0.076 |
| clip_range | 0.2 |
| entropy_loss | -3.26 |
| explained_variance | 0.883 |
| learning_rate | 0.0003 |
| loss | -0.0412 |
| n_updates | 1340 |
| policy_gradient_loss | -0.00534 |
| std | 1.25 |
| value_loss | 0.013 |
------------------------------------------
--------------------------------
| time/ | |
| fps | 1704 |
| iterations | 135 |
| time_elapsed | 1297 |
| total_timesteps | 2211840 |
--------------------------------
------------------------------------------
| time/ | |
| fps | 1708 |
| iterations | 136 |
| time_elapsed | 1304 |
| total_timesteps | 2228224 |
| train/ | |
| approx_kl | 0.0062820893 |
| clip_fraction | 0.062 |
| clip_range | 0.2 |
| entropy_loss | -3.26 |
| explained_variance | 0.924 |
| learning_rate | 0.0003 |
| loss | -0.0377 |
| n_updates | 1350 |
| policy_gradient_loss | -0.00497 |
| std | 1.24 |
| value_loss | 0.00797 |
------------------------------------------
------------------------------------------
| time/ | |
| fps | 1713 |
| iterations | 137 |
| time_elapsed | 1310 |
| total_timesteps | 2244608 |
| train/ | |
| approx_kl | 0.0072454046 |
| clip_fraction | 0.0747 |
| clip_range | 0.2 |
| entropy_loss | -3.25 |
| explained_variance | 0.94 |
| learning_rate | 0.0003 |
| loss | -0.0366 |
| n_updates | 1360 |
| policy_gradient_loss | -0.00572 |
| std | 1.23 |
| value_loss | 0.00852 |
------------------------------------------
Eval num_timesteps=2250000, episode_reward=-36.00 +/- 38.67
Episode length: 2000.00 +/- 0.00
-----------------------------------------
| eval/ | |
| mean_ep_length | 2e+03 |
| mean_reward | -36 |
| time/ | |
| total_timesteps | 2250000 |
| train/ | |
| approx_kl | 0.005690419 |
| clip_fraction | 0.0546 |
| clip_range | 0.2 |
| entropy_loss | -3.25 |
| explained_variance | 0.957 |
| learning_rate | 0.0003 |
| loss | -0.0376 |
| n_updates | 1370 |
| policy_gradient_loss | -0.00425 |
| std | 1.23 |
| value_loss | 0.00524 |
-----------------------------------------
[Diag @ 2,250,000 | n_sheep=4 | success=0%]
NEVER_COMPACT 13/20
COMPACT_CANT_DRIVE 7/20
action_mag mean=0.416 p10=0.038 p90=0.887 (0=stopped, 1=full speed)
min_flock_radius mean=6.62m best=2.03m (target <5m to compact)
min_dog_to_com mean=3.54m best=0.40m (FLEE_DIST=7m)
min_com_to_pen mean=14.24m best=9.65m
reward/step (mean): progress=-0.0070 alignment=+0.0000 pen_bonus=+0.0005 step_cost=-0.0200 complete=+0.0000
--------------------------------
| time/ | |
| fps | 1690 |
| iterations | 138 |
| time_elapsed | 1337 |
| total_timesteps | 2260992 |
--------------------------------
------------------------------------------
| time/ | |
| fps | 1696 |
| iterations | 139 |
| time_elapsed | 1342 |
| total_timesteps | 2277376 |
| train/ | |
| approx_kl | 0.0072061084 |
| clip_fraction | 0.0728 |
| clip_range | 0.2 |
| entropy_loss | -3.25 |
| explained_variance | 0.954 |
| learning_rate | 0.0003 |
| loss | -0.0312 |
| n_updates | 1380 |
| policy_gradient_loss | -0.00512 |
| std | 1.23 |
| value_loss | 0.006 |
------------------------------------------
------------------------------------------
| time/ | |
| fps | 1702 |
| iterations | 140 |
| time_elapsed | 1347 |
| total_timesteps | 2293760 |
| train/ | |
| approx_kl | 0.0066916933 |
| clip_fraction | 0.0626 |
| clip_range | 0.2 |
| entropy_loss | -3.24 |
| explained_variance | 0.939 |
| learning_rate | 0.0003 |
| loss | -0.0408 |
| n_updates | 1390 |
| policy_gradient_loss | -0.00463 |
| std | 1.23 |
| value_loss | 0.00827 |
------------------------------------------
Eval num_timesteps=2300000, episode_reward=-43.65 +/- 42.86
Episode length: 2000.00 +/- 0.00
------------------------------------------
| eval/ | |
| mean_ep_length | 2e+03 |
| mean_reward | -43.7 |
| time/ | |
| total_timesteps | 2300000 |
| train/ | |
| approx_kl | 0.0062987795 |
| clip_fraction | 0.0609 |
| clip_range | 0.2 |
| entropy_loss | -3.26 |
| explained_variance | 0.898 |
| learning_rate | 0.0003 |
| loss | -0.0316 |
| n_updates | 1400 |
| policy_gradient_loss | -0.00442 |
| std | 1.25 |
| value_loss | 0.00955 |
------------------------------------------
--------------------------------
| time/ | |
| fps | 1691 |
| iterations | 141 |
| time_elapsed | 1365 |
| total_timesteps | 2310144 |
--------------------------------
-----------------------------------------
| time/ | |
| fps | 1696 |
| iterations | 142 |
| time_elapsed | 1371 |
| total_timesteps | 2326528 |
| train/ | |
| approx_kl | 0.005443076 |
| clip_fraction | 0.054 |
| clip_range | 0.2 |
| entropy_loss | -3.27 |
| explained_variance | 0.877 |
| learning_rate | 0.0003 |
| loss | -0.0296 |
| n_updates | 1410 |
| policy_gradient_loss | -0.00375 |
| std | 1.24 |
| value_loss | 0.00928 |
-----------------------------------------
-----------------------------------------
| time/ | |
| fps | 1701 |
| iterations | 143 |
| time_elapsed | 1376 |
| total_timesteps | 2342912 |
| train/ | |
| approx_kl | 0.004740049 |
| clip_fraction | 0.0456 |
| clip_range | 0.2 |
| entropy_loss | -3.26 |
| explained_variance | 0.922 |
| learning_rate | 0.0003 |
| loss | -0.0318 |
| n_updates | 1420 |
| policy_gradient_loss | -0.00351 |
| std | 1.24 |
| value_loss | 0.0156 |
-----------------------------------------
Eval num_timesteps=2350000, episode_reward=-37.57 +/- 37.78
Episode length: 2000.00 +/- 0.00
------------------------------------------
| eval/ | |
| mean_ep_length | 2e+03 |
| mean_reward | -37.6 |
| time/ | |
| total_timesteps | 2350000 |
| train/ | |
| approx_kl | 0.0056120222 |
| clip_fraction | 0.0542 |
| clip_range | 0.2 |
| entropy_loss | -3.27 |
| explained_variance | 0.911 |
| learning_rate | 0.0003 |
| loss | -0.0272 |
| n_updates | 1430 |
| policy_gradient_loss | -0.0035 |
| std | 1.25 |
| value_loss | 0.00811 |
------------------------------------------
--------------------------------
| time/ | |
| fps | 1690 |
| iterations | 144 |
| time_elapsed | 1395 |
| total_timesteps | 2359296 |
--------------------------------
------------------------------------------
| time/ | |
| fps | 1695 |
| iterations | 145 |
| time_elapsed | 1401 |
| total_timesteps | 2375680 |
| train/ | |
| approx_kl | 0.0064737825 |
| clip_fraction | 0.0697 |
| clip_range | 0.2 |
| entropy_loss | -3.28 |
| explained_variance | 0.93 |
| learning_rate | 0.0003 |
| loss | -0.036 |
| n_updates | 1440 |
| policy_gradient_loss | -0.00403 |
| std | 1.25 |
| value_loss | 0.00488 |
------------------------------------------
------------------------------------------
| time/ | |
| fps | 1699 |
| iterations | 146 |
| time_elapsed | 1407 |
| total_timesteps | 2392064 |
| train/ | |
| approx_kl | 0.0050720195 |
| clip_fraction | 0.0466 |
| clip_range | 0.2 |
| entropy_loss | -3.29 |
| explained_variance | 0.902 |
| learning_rate | 0.0003 |
| loss | -0.0374 |
| n_updates | 1450 |
| policy_gradient_loss | -0.00283 |
| std | 1.26 |
| value_loss | 0.00958 |
------------------------------------------
Eval num_timesteps=2400000, episode_reward=-42.55 +/- 37.89
Episode length: 2000.00 +/- 0.00
-----------------------------------------
| eval/ | |
| mean_ep_length | 2e+03 |
| mean_reward | -42.6 |
| time/ | |
| total_timesteps | 2400000 |
| train/ | |
| approx_kl | 0.005990128 |
| clip_fraction | 0.0565 |
| clip_range | 0.2 |
| entropy_loss | -3.31 |
| explained_variance | 0.869 |
| learning_rate | 0.0003 |
| loss | -0.0448 |
| n_updates | 1460 |
| policy_gradient_loss | -0.0051 |
| std | 1.27 |
| value_loss | 0.00854 |
-----------------------------------------
[Diag @ 2,400,000 | n_sheep=4 | success=0%]
NEVER_COMPACT 15/20
COMPACT_CANT_DRIVE 5/20
action_mag mean=0.424 p10=0.025 p90=0.948 (0=stopped, 1=full speed)
min_flock_radius mean=7.66m best=1.63m (target <5m to compact)
min_dog_to_com mean=4.77m best=0.32m (FLEE_DIST=7m)
min_com_to_pen mean=14.47m best=8.96m
reward/step (mean): progress=-0.0008 alignment=+0.0000 pen_bonus=+0.0003 step_cost=-0.0200 complete=+0.0000
--------------------------------
| time/ | |
| fps | 1677 |
| iterations | 147 |
| time_elapsed | 1435 |
| total_timesteps | 2408448 |
--------------------------------
Training complete. Artefacts saved to runs/ppo_fix_check/