Using cpu device
Logging to runs/ppo_fix_check2/ppo_1
------------------------------
| time/              |       |
|    fps             | 4605  |
|    iterations      | 1     |
|    time_elapsed    | 3     |
|    total_timesteps | 16384 |
------------------------------
------------------------------------------
| time/                   |              |
|    fps                  | 4011         |
|    iterations           | 2            |
|    time_elapsed         | 8            |
|    total_timesteps      | 32768        |
| train/                  |              |
|    approx_kl            | 0.0033352287 |
|    clip_fraction        | 0.0253       |
|    clip_range           | 0.2          |
|    entropy_loss         | -2.83        |
|    explained_variance   | 0.271        |
|    learning_rate        | 0.0003       |
|    loss                 | -0.00687     |
|    n_updates            | 10           |
|    policy_gradient_loss | -0.00103     |
|    std                  | 0.996        |
|    value_loss           | 0.0684       |
------------------------------------------
-----------------------------------------
| time/                   |             |
|    fps                  | 3789        |
|    iterations           | 3           |
|    time_elapsed         | 12          |
|    total_timesteps      | 49152       |
| train/                  |             |
|    approx_kl            | 0.005950423 |
|    clip_fraction        | 0.0552      |
|    clip_range           | 0.2         |
|    entropy_loss         | -2.83       |
|    explained_variance   | 0.527       |
|    learning_rate        | 0.0003      |
|    loss                 | -0.0153     |
|    n_updates            | 20          |
|    policy_gradient_loss | -0.0029     |
|    std                  | 0.997       |
|    value_loss           | 0.0663      |
-----------------------------------------
/home/jalf/miniconda3/envs/tir/lib/python3.12/site-packages/stable_baselines3/common/evaluation.py:71: UserWarning: Evaluation environment is not wrapped with a ``Monitor`` wrapper. This may result in reporting modified episode lengths and rewards, if other wrappers happen to modify these. Consider wrapping environment first with ``Monitor`` wrapper.
  warnings.warn(
Eval num_timesteps=50000, episode_reward=-25.68 +/- 59.67
Episode length: 1815.95 +/- 456.88
------------------------------------------
| eval/                   |              |
|    mean_ep_length       | 1.82e+03     |
|    mean_reward          | -25.7        |
| time/                   |              |
|    total_timesteps      | 50000        |
| train/                  |              |
|    approx_kl            | 0.0040030424 |
|    clip_fraction        | 0.0356       |
|    clip_range           | 0.2          |
|    entropy_loss         | -2.85        |
|    explained_variance   | 0.421        |
|    learning_rate        | 0.0003       |
|    loss                 | 0.149        |
|    n_updates            | 30           |
|    policy_gradient_loss | -0.00198     |
|    std                  | 1.01         |
|    value_loss           | 0.114        |
------------------------------------------
New best mean reward!
------------------------------
| time/              |       |
|    fps             | 2351  |
|    iterations      | 4     |
|    time_elapsed    | 27    |
|    total_timesteps | 65536 |
------------------------------
-----------------------------------------
| time/                   |             |
|    fps                  | 2446        |
|    iterations           | 5           |
|    time_elapsed         | 33          |
|    total_timesteps      | 81920       |
| train/                  |             |
|    approx_kl            | 0.005522004 |
|    clip_fraction        | 0.0604      |
|    clip_range           | 0.2         |
|    entropy_loss         | -2.86       |
|    explained_variance   | 0.737       |
|    learning_rate        | 0.0003      |
|    loss                 | -0.0301     |
|    n_updates            | 40          |
|    policy_gradient_loss | -0.00434    |
|    std                  | 1.01        |
|    value_loss           | 0.0164      |
-----------------------------------------
------------------------------------------
| time/                   |              |
|    fps                  | 2617         |
|    iterations           | 6            |
|    time_elapsed         | 37           |
|    total_timesteps      | 98304        |
| train/                  |              |
|    approx_kl            | 0.0052388343 |
|    clip_fraction        | 0.0463       |
|    clip_range           | 0.2          |
|    entropy_loss         | -2.86        |
|    explained_variance   | 0.626        |
|    learning_rate        | 0.0003       |
|    loss                 | -0.0294      |
|    n_updates            | 50           |
|    policy_gradient_loss | -0.00297     |
|    std                  | 1.01         |
|    value_loss           | 0.0597       |
------------------------------------------
/home/jalf/miniconda3/envs/tir/lib/python3.12/site-packages/stable_baselines3/common/evaluation.py:71: UserWarning: Evaluation environment is not wrapped with a ``Monitor`` wrapper. This may result in reporting modified episode lengths and rewards, if other wrappers happen to modify these. Consider wrapping environment first with ``Monitor`` wrapper.
  warnings.warn(
Eval num_timesteps=100000, episode_reward=-22.76 +/- 46.60
Episode length: 1900.95 +/- 430.60
-----------------------------------------
| eval/                   |             |
|    mean_ep_length       | 1.9e+03     |
|    mean_reward          | -22.8       |
| time/                   |             |
|    total_timesteps      | 100000      |
| train/                  |             |
|    approx_kl            | 0.005612197 |
|    clip_fraction        | 0.0475      |
|    clip_range           | 0.2         |
|    entropy_loss         | -2.86       |
|    explained_variance   | 0.747       |
|    learning_rate        | 0.0003      |
|    loss                 | -0.0261     |
|    n_updates            | 60          |
|    policy_gradient_loss | -0.00393    |
|    std                  | 1.01        |
|    value_loss           | 0.0517      |
-----------------------------------------
New best mean reward!
-------------------------------
| time/              |        |
|    fps             | 2178   |
|    iterations      | 7      |
|    time_elapsed    | 52     |
|    total_timesteps | 114688 |
-------------------------------
------------------------------------------
| time/                   |              |
|    fps                  | 2294         |
|    iterations           | 8            |
|    time_elapsed         | 57           |
|    total_timesteps      | 131072       |
| train/                  |              |
|    approx_kl            | 0.0057119504 |
|    clip_fraction        | 0.0541       |
|    clip_range           | 0.2          |
|    entropy_loss         | -2.85        |
|    explained_variance   | 0.896        |
|    learning_rate        | 0.0003       |
|    loss                 | -0.0144      |
|    n_updates            | 70           |
|    policy_gradient_loss | -0.00364     |
|    std                  | 1            |
|    value_loss           | 0.0738       |
------------------------------------------
-----------------------------------------
| time/                   |             |
|    fps                  | 2393        |
|    iterations           | 9           |
|    time_elapsed         | 61          |
|    total_timesteps      | 147456      |
| train/                  |             |
|    approx_kl            | 0.005940904 |
|    clip_fraction        | 0.0565      |
|    clip_range           | 0.2         |
|    entropy_loss         | -2.85       |
|    explained_variance   | 0.89        |
|    learning_rate        | 0.0003      |
|    loss                 | -0.0283     |
|    n_updates            | 80          |
|    policy_gradient_loss | -0.00245    |
|    std                  | 1.01        |
|    value_loss           | 0.0761      |
-----------------------------------------
Eval num_timesteps=150000, episode_reward=-29.37 +/- 28.32
Episode length: 1997.50 +/- 10.90
-----------------------------------------
| eval/                   |             |
|    mean_ep_length       | 2e+03       |
|    mean_reward          | -29.4       |
| time/                   |             |
|    total_timesteps      | 150000      |
| train/                  |             |
|    approx_kl            | 0.004531667 |
|    clip_fraction        | 0.0392      |
|    clip_range           | 0.2         |
|    entropy_loss         | -2.85       |
|    explained_variance   | 0.958       |
|    learning_rate        | 0.0003      |
|    loss                 | -0.0343     |
|    n_updates            | 90          |
|    policy_gradient_loss | -0.00379    |
|    std                  | 1.01        |
|    value_loss           | 0.00995     |
-----------------------------------------

[Diag @ 150,000 | n_sheep=1 | success=0%]
  COMPACT_CANT_DRIVE         17/20
  DROVE_NO_SHEEP             3/20
  action_mag mean=0.089 p10=0.003 p90=0.274 (0=stopped, 1=full speed)
  min_flock_radius mean=0.00m best=0.00m  (target <5m to compact)
  min_dog_to_com   mean=4.40m best=2.07m  (FLEE_DIST=7m)
  min_com_to_pen   mean=11.66m best=1.50m
  reward/step (mean): progress=+0.0004  alignment=+0.0000  pen_bonus=+0.0000  step_cost=-0.0200  complete=+0.0000
-------------------------------
| time/              |        |
|    fps             | 1950   |
|    iterations      | 10     |
|    time_elapsed    | 84     |
|    total_timesteps | 163840 |
-------------------------------
------------------------------------------
| time/                   |              |
|    fps                  | 2020         |
|    iterations           | 11           |
|    time_elapsed         | 89           |
|    total_timesteps      | 180224       |
| train/                  |              |
|    approx_kl            | 0.0061831754 |
|    clip_fraction        | 0.068        |
|    clip_range           | 0.2          |
|    entropy_loss         | -2.86        |
|    explained_variance   | 0.975        |
|    learning_rate        | 0.0003       |
|    loss                 | -0.0349      |
|    n_updates            | 100          |
|    policy_gradient_loss | -0.00607     |
|    std                  | 1.02         |
|    value_loss           | 0.0156       |
------------------------------------------
-----------------------------------------
| time/                   |             |
|    fps                  | 2084        |
|    iterations           | 12          |
|    time_elapsed         | 94          |
|    total_timesteps      | 196608      |
| train/                  |             |
|    approx_kl            | 0.009407628 |
|    clip_fraction        | 0.123       |
|    clip_range           | 0.2         |
|    entropy_loss         | -2.87       |
|    explained_variance   | 0.899       |
|    learning_rate        | 0.0003      |
|    loss                 | -0.0305     |
|    n_updates            | 110         |
|    policy_gradient_loss | -0.00932    |
|    std                  | 1.02        |
|    value_loss           | 0.0223      |
-----------------------------------------
Eval num_timesteps=200000, episode_reward=-12.36 +/- 51.37
Episode length: 1880.20 +/- 355.04
-----------------------------------------
| eval/                   |             |
|    mean_ep_length       | 1.88e+03    |
|    mean_reward          | -12.4       |
| time/                   |             |
|    total_timesteps      | 200000      |
| train/                  |             |
|    approx_kl            | 0.008270489 |
|    clip_fraction        | 0.0945      |
|    clip_range           | 0.2         |
|    entropy_loss         | -2.85       |
|    explained_variance   | 0.945       |
|    learning_rate        | 0.0003      |
|    loss                 | -0.0339     |
|    n_updates            | 120         |
|    policy_gradient_loss | -0.00809    |
|    std                  | 1           |
|    value_loss           | 0.0162      |
-----------------------------------------
New best mean reward!
-------------------------------
| time/              |        |
|    fps             | 1936   |
|    iterations      | 13     |
|    time_elapsed    | 109    |
|    total_timesteps | 212992 |
-------------------------------
-----------------------------------------
| time/                   |             |
|    fps                  | 1989        |
|    iterations           | 14          |
|    time_elapsed         | 115         |
|    total_timesteps      | 229376      |
| train/                  |             |
|    approx_kl            | 0.008541125 |
|    clip_fraction        | 0.112       |
|    clip_range           | 0.2         |
|    entropy_loss         | -2.83       |
|    explained_variance   | 0.944       |
|    learning_rate        | 0.0003      |
|    loss                 | -0.0184     |
|    n_updates            | 130         |
|    policy_gradient_loss | -0.00846    |
|    std                  | 0.994       |
|    value_loss           | 0.0284      |
-----------------------------------------
----------------------------------------
| time/                   |            |
|    fps                  | 2037       |
|    iterations           | 15         |
|    time_elapsed         | 120        |
|    total_timesteps      | 245760     |
| train/                  |            |
|    approx_kl            | 0.00763176 |
|    clip_fraction        | 0.0894     |
|    clip_range           | 0.2        |
|    entropy_loss         | -2.81      |
|    explained_variance   | 0.9        |
|    learning_rate        | 0.0003     |
|    loss                 | -0.0128    |
|    n_updates            | 140        |
|    policy_gradient_loss | -0.00655   |
|    std                  | 0.987      |
|    value_loss           | 0.071      |
----------------------------------------
Eval num_timesteps=250000, episode_reward=45.82 +/- 68.33
Episode length: 1391.70 +/- 757.58
-----------------------------------------
| eval/                   |             |
|    mean_ep_length       | 1.39e+03    |
|    mean_reward          | 45.8        |
| time/                   |             |
|    total_timesteps      | 250000      |
| train/                  |             |
|    approx_kl            | 0.009210973 |
|    clip_fraction        | 0.11        |
|    clip_range           | 0.2         |
|    entropy_loss         | -2.81       |
|    explained_variance   | 0.95        |
|    learning_rate        | 0.0003      |
|    loss                 | -0.0401     |
|    n_updates            | 150         |
|    policy_gradient_loss | -0.0082     |
|    std                  | 0.986       |
|    value_loss           | 0.0202      |
-----------------------------------------
New best mean reward!
-------------------------------
| time/              |        |
|    fps             | 1958   |
|    iterations      | 16     |
|    time_elapsed    | 133    |
|    total_timesteps | 262144 |
-------------------------------
-----------------------------------------
| time/                   |             |
|    fps                  | 2005        |
|    iterations           | 17          |
|    time_elapsed         | 138         |
|    total_timesteps      | 278528      |
| train/                  |             |
|    approx_kl            | 0.008197077 |
|    clip_fraction        | 0.096       |
|    clip_range           | 0.2         |
|    entropy_loss         | -2.79       |
|    explained_variance   | 0.949       |
|    learning_rate        | 0.0003      |
|    loss                 | -0.0375     |
|    n_updates            | 160         |
|    policy_gradient_loss | -0.00834    |
|    std                  | 0.976       |
|    value_loss           | 0.0207      |
-----------------------------------------
-----------------------------------------
| time/                   |             |
|    fps                  | 2061        |
|    iterations           | 18          |
|    time_elapsed         | 143         |
|    total_timesteps      | 294912      |
| train/                  |             |
|    approx_kl            | 0.006078005 |
|    clip_fraction        | 0.0598      |
|    clip_range           | 0.2         |
|    entropy_loss         | -2.78       |
|    explained_variance   | 0.965       |
|    learning_rate        | 0.0003      |
|    loss                 | -0.0188     |
|    n_updates            | 170         |
|    policy_gradient_loss | -0.00464    |
|    std                  | 0.969       |
|    value_loss           | 0.0178      |
-----------------------------------------
Eval num_timesteps=300000, episode_reward=56.19 +/- 63.26
Episode length: 1246.75 +/- 843.82
------------------------------------------
| eval/                   |              |
|    mean_ep_length       | 1.25e+03     |
|    mean_reward          | 56.2         |
| time/                   |              |
|    total_timesteps      | 300000       |
| train/                  |              |
|    approx_kl            | 0.0056289425 |
|    clip_fraction        | 0.0523       |
|    clip_range           | 0.2          |
|    entropy_loss         | -2.76        |
|    explained_variance   | 0.969        |
|    learning_rate        | 0.0003       |
|    loss                 | -0.0246      |
|    n_updates            | 180          |
|    policy_gradient_loss | -0.00378     |
|    std                  | 0.961        |
|    value_loss           | 0.0174       |
------------------------------------------
New best mean reward!

[Diag @ 300,000 | n_sheep=1 | success=40%]
  DROVE_NO_SHEEP             11/20
  SUCCESS                    8/20
  COMPACT_CANT_DRIVE         1/20
  action_mag mean=0.076 p10=0.000 p90=0.193 (0=stopped, 1=full speed)
  min_flock_radius mean=0.00m best=0.00m  (target <5m to compact)
  min_dog_to_com   mean=2.83m best=0.24m  (FLEE_DIST=7m)
  min_com_to_pen   mean=2.99m best=1.50m
  reward/step (mean): progress=+0.0236  alignment=+0.0012  pen_bonus=+0.0029  step_cost=-0.0200  complete=+0.0291
-------------------------------
| time/              |        |
|    fps             | 1939   |
|    iterations      | 19     |
|    time_elapsed    | 160    |
|    total_timesteps | 311296 |
-------------------------------
-----------------------------------------
| time/                   |             |
|    fps                  | 1983        |
|    iterations           | 20          |
|    time_elapsed         | 165         |
|    total_timesteps      | 327680      |
| train/                  |             |
|    approx_kl            | 0.005042998 |
|    clip_fraction        | 0.05        |
|    clip_range           | 0.2         |
|    entropy_loss         | -2.73       |
|    explained_variance   | 0.941       |
|    learning_rate        | 0.0003      |
|    loss                 | -0.0242     |
|    n_updates            | 190         |
|    policy_gradient_loss | -0.00399    |
|    std                  | 0.947       |
|    value_loss           | 0.00505     |
-----------------------------------------
------------------------------------------
| time/                   |              |
|    fps                  | 2018         |
|    iterations           | 21           |
|    time_elapsed         | 170          |
|    total_timesteps      | 344064       |
| train/                  |              |
|    approx_kl            | 0.0054986854 |
|    clip_fraction        | 0.0569       |
|    clip_range           | 0.2          |
|    entropy_loss         | -2.72        |
|    explained_variance   | 0.942        |
|    learning_rate        | 0.0003       |
|    loss                 | -0.0248      |
|    n_updates            | 200          |
|    policy_gradient_loss | -0.00415     |
|    std                  | 0.941        |
|    value_loss           | 0.00784      |
------------------------------------------
Eval num_timesteps=350000, episode_reward=25.08 +/- 61.55
Episode length: 1562.00 +/- 761.23
------------------------------------------
| eval/                   |              |
|    mean_ep_length       | 1.56e+03     |
|    mean_reward          | 25.1         |
| time/                   |              |
|    total_timesteps      | 350000       |
| train/                  |              |
|    approx_kl            | 0.0046333643 |
|    clip_fraction        | 0.0476       |
|    clip_range           | 0.2          |
|    entropy_loss         | -2.71        |
|    explained_variance   | 0.934        |
|    learning_rate        | 0.0003       |
|    loss                 | -0.0244      |
|    n_updates            | 210          |
|    policy_gradient_loss | -0.00237     |
|    std                  | 0.934        |
|    value_loss           | 0.00827      |
------------------------------------------
-------------------------------
| time/              |        |
|    fps             | 1950   |
|    iterations      | 22     |
|    time_elapsed    | 184    |
|    total_timesteps | 360448 |
-------------------------------
-----------------------------------------
| time/                   |             |
|    fps                  | 1990        |
|    iterations           | 23          |
|    time_elapsed         | 189         |
|    total_timesteps      | 376832      |
| train/                  |             |
|    approx_kl            | 0.006686668 |
|    clip_fraction        | 0.0757      |
|    clip_range           | 0.2         |
|    entropy_loss         | -2.7        |
|    explained_variance   | 0.963       |
|    learning_rate        | 0.0003      |
|    loss                 | -0.0423     |
|    n_updates            | 220         |
|    policy_gradient_loss | -0.00244    |
|    std                  | 0.936       |
|    value_loss           | 0.00575     |
-----------------------------------------
-----------------------------------------
| time/                   |             |
|    fps                  | 2027        |
|    iterations           | 24          |
|    time_elapsed         | 193         |
|    total_timesteps      | 393216      |
| train/                  |             |
|    approx_kl            | 0.009116547 |
|    clip_fraction        | 0.103       |
|    clip_range           | 0.2         |
|    entropy_loss         | -2.71       |
|    explained_variance   | 0.97        |
|    learning_rate        | 0.0003      |
|    loss                 | -0.0353     |
|    n_updates            | 230         |
|    policy_gradient_loss | -0.0042     |
|    std                  | 0.941       |
|    value_loss           | 0.006       |
-----------------------------------------
Eval num_timesteps=400000, episode_reward=56.91 +/- 71.91
Episode length: 1225.25 +/- 861.21
------------------------------------------
| eval/                   |              |
|    mean_ep_length       | 1.23e+03     |
|    mean_reward          | 56.9         |
| time/                   |              |
|    total_timesteps      | 400000       |
| train/                  |              |
|    approx_kl            | 0.0061917743 |
|    clip_fraction        | 0.0658       |
|    clip_range           | 0.2          |
|    entropy_loss         | -2.72        |
|    explained_variance   | 0.975        |
|    learning_rate        | 0.0003       |
|    loss                 | -0.0378      |
|    n_updates            | 240          |
|    policy_gradient_loss | -0.00282     |
|    std                  | 0.943        |
|    value_loss           | 0.00633      |
------------------------------------------
New best mean reward!
-------------------------------
| time/              |        |
|    fps             | 1981   |
|    iterations      | 25     |
|    time_elapsed    | 206    |
|    total_timesteps | 409600 |
-------------------------------
-----------------------------------------
| time/                   |             |
|    fps                  | 2011        |
|    iterations           | 26          |
|    time_elapsed         | 211         |
|    total_timesteps      | 425984      |
| train/                  |             |
|    approx_kl            | 0.007945089 |
|    clip_fraction        | 0.1         |
|    clip_range           | 0.2         |
|    entropy_loss         | -2.73       |
|    explained_variance   | 0.978       |
|    learning_rate        | 0.0003      |
|    loss                 | -0.0343     |
|    n_updates            | 250         |
|    policy_gradient_loss | -0.00475    |
|    std                  | 0.95        |
|    value_loss           | 0.00708     |
-----------------------------------------
-----------------------------------------
| time/                   |             |
|    fps                  | 2044        |
|    iterations           | 27          |
|    time_elapsed         | 216         |
|    total_timesteps      | 442368      |
| train/                  |             |
|    approx_kl            | 0.013059773 |
|    clip_fraction        | 0.152       |
|    clip_range           | 0.2         |
|    entropy_loss         | -2.76       |
|    explained_variance   | 0.984       |
|    learning_rate        | 0.0003      |
|    loss                 | -0.0421     |
|    n_updates            | 260         |
|    policy_gradient_loss | -0.00542    |
|    std                  | 0.967       |
|    value_loss           | 0.00331     |
-----------------------------------------
Eval num_timesteps=450000, episode_reward=58.80 +/- 74.46
Episode length: 1123.15 +/- 881.85
------------------------------------------
| eval/                   |              |
|    mean_ep_length       | 1.12e+03     |
|    mean_reward          | 58.8         |
| time/                   |              |
|    total_timesteps      | 450000       |
| train/                  |              |
|    approx_kl            | 0.0085322345 |
|    clip_fraction        | 0.0967       |
|    clip_range           | 0.2          |
|    entropy_loss         | -2.77        |
|    explained_variance   | 0.98         |
|    learning_rate        | 0.0003       |
|    loss                 | -0.0264      |
|    n_updates            | 270          |
|    policy_gradient_loss | -0.00612     |
|    std                  | 0.963        |
|    value_loss           | 0.00919      |
------------------------------------------
New best mean reward!

[Diag @ 450,000 | n_sheep=1 | success=65%]
  SUCCESS                    13/20
  DROVE_NO_SHEEP             4/20
  COMPACT_CANT_DRIVE         3/20
  action_mag mean=0.105 p10=0.000 p90=0.272 (0=stopped, 1=full speed)
  min_flock_radius mean=0.00m best=0.00m  (target <5m to compact)
  min_dog_to_com   mean=1.67m best=0.43m  (FLEE_DIST=7m)
  min_com_to_pen   mean=3.26m best=2.29m
  reward/step (mean): progress=+0.0326  alignment=+0.0024  pen_bonus=+0.0076  step_cost=-0.0200  complete=+0.0762
-------------------------------
| time/              |        |
|    fps             | 1974   |
|    iterations      | 28     |
|    time_elapsed    | 232    |
|    total_timesteps | 458752 |
-------------------------------
----------------------------------------
| time/                   |            |
|    fps                  | 2005       |
|    iterations           | 29         |
|    time_elapsed         | 236        |
|    total_timesteps      | 475136     |
| train/                  |            |
|    approx_kl            | 0.01203198 |
|    clip_fraction        | 0.146      |
|    clip_range           | 0.2        |
|    entropy_loss         | -2.79      |
|    explained_variance   | 0.963      |
|    learning_rate        | 0.0003     |
|    loss                 | 0.00738    |
|    n_updates            | 280        |
|    policy_gradient_loss | -0.0128    |
|    std                  | 0.982      |
|    value_loss           | 0.0749     |
----------------------------------------
------------------------------------------
| time/                   |              |
|    fps                  | 2037         |
|    iterations           | 30           |
|    time_elapsed         | 241          |
|    total_timesteps      | 491520       |
| train/                  |              |
|    approx_kl            | 0.0078244675 |
|    clip_fraction        | 0.0856       |
|    clip_range           | 0.2          |
|    entropy_loss         | -2.8         |
|    explained_variance   | 0.937        |
|    learning_rate        | 0.0003       |
|    loss                 | 0.0631       |
|    n_updates            | 290          |
|    policy_gradient_loss | -0.00651     |
|    std                  | 0.977        |
|    value_loss           | 0.131        |
------------------------------------------
Eval num_timesteps=500000, episode_reward=135.29 +/- 9.81
Episode length: 287.30 +/- 88.71
----------------------------------------
| eval/                   |            |
|    mean_ep_length       | 287        |
|    mean_reward          | 135        |
| time/                   |            |
|    total_timesteps      | 500000     |
| train/                  |            |
|    approx_kl            | 0.00837522 |
|    clip_fraction        | 0.0866     |
|    clip_range           | 0.2        |
|    entropy_loss         | -2.77      |
|    explained_variance   | 0.948      |
|    learning_rate        | 0.0003     |
|    loss                 | 0.041      |
|    n_updates            | 300        |
|    policy_gradient_loss | -0.00532   |
|    std                  | 0.962      |
|    value_loss           | 0.0898     |
----------------------------------------
New best mean reward!
-------------------------------
| time/              |        |
|    fps             | 2048   |
|    iterations      | 31     |
|    time_elapsed    | 247    |
|    total_timesteps | 507904 |
-------------------------------
------------------------------------------
| time/                   |              |
|    fps                  | 2070         |
|    iterations           | 32           |
|    time_elapsed         | 253          |
|    total_timesteps      | 524288       |
| train/                  |              |
|    approx_kl            | 0.0067581255 |
|    clip_fraction        | 0.0543       |
|    clip_range           | 0.2          |
|    entropy_loss         | -2.75        |
|    explained_variance   | 0.932        |
|    learning_rate        | 0.0003       |
|    loss                 | 0.0518       |
|    n_updates            | 310          |
|    policy_gradient_loss | -0.00297     |
|    std                  | 0.954        |
|    value_loss           | 0.111        |
------------------------------------------
------------------------------------------
| time/                   |              |
|    fps                  | 2090         |
|    iterations           | 33           |
|    time_elapsed         | 258          |
|    total_timesteps      | 540672       |
| train/                  |              |
|    approx_kl            | 0.0066835573 |
|    clip_fraction        | 0.0597       |
|    clip_range           | 0.2          |
|    entropy_loss         | -2.74        |
|    explained_variance   | 0.934        |
|    learning_rate        | 0.0003       |
|    loss                 | 0.00545      |
|    n_updates            | 320          |
|    policy_gradient_loss | -0.00508     |
|    std                  | 0.949        |
|    value_loss           | 0.101        |
------------------------------------------
Eval num_timesteps=550000, episode_reward=136.08 +/- 11.93
Episode length: 285.80 +/- 123.59
------------------------------------------
| eval/                   |              |
|    mean_ep_length       | 286          |
|    mean_reward          | 136          |
| time/                   |              |
|    total_timesteps      | 550000       |
| train/                  |              |
|    approx_kl            | 0.0062076193 |
|    clip_fraction        | 0.0672       |
|    clip_range           | 0.2          |
|    entropy_loss         | -2.71        |
|    explained_variance   | 0.942        |
|    learning_rate        | 0.0003       |
|    loss                 | 0.0229       |
|    n_updates            | 330          |
|    policy_gradient_loss | -0.00616     |
|    std                  | 0.933        |
|    value_loss           | 0.0813       |
------------------------------------------
New best mean reward!
-------------------------------
| time/              |        |
|    fps             | 2104   |
|    iterations      | 34     |
|    time_elapsed    | 264    |
|    total_timesteps | 557056 |
-------------------------------
------------------------------------------
| time/                   |              |
|    fps                  | 2130         |
|    iterations           | 35           |
|    time_elapsed         | 269          |
|    total_timesteps      | 573440       |
| train/                  |              |
|    approx_kl            | 0.0064913128 |
|    clip_fraction        | 0.0631       |
|    clip_range           | 0.2          |
|    entropy_loss         | -2.67        |
|    explained_variance   | 0.971        |
|    learning_rate        | 0.0003       |
|    loss                 | -0.0199      |
|    n_updates            | 340          |
|    policy_gradient_loss | -0.00631     |
|    std                  | 0.917        |
|    value_loss           | 0.0185       |
------------------------------------------
------------------------------------------
| time/                   |              |
|    fps                  | 2155         |
|    iterations           | 36           |
|    time_elapsed         | 273          |
|    total_timesteps      | 589824       |
| train/                  |              |
|    approx_kl            | 0.0067110434 |
|    clip_fraction        | 0.0719       |
|    clip_range           | 0.2          |
|    entropy_loss         | -2.63        |
|    explained_variance   | 0.98         |
|    learning_rate        | 0.0003       |
|    loss                 | -0.0343      |
|    n_updates            | 350          |
|    policy_gradient_loss | -0.0069      |
|    std                  | 0.897        |
|    value_loss           | 0.0113       |
------------------------------------------
Eval num_timesteps=600000, episode_reward=135.45 +/- 12.96
Episode length: 273.05 +/- 118.26
------------------------------------------
| eval/                   |              |
|    mean_ep_length       | 273          |
|    mean_reward          | 135          |
| time/                   |              |
|    total_timesteps      | 600000       |
| train/                  |              |
|    approx_kl            | 0.0054842415 |
|    clip_fraction        | 0.0564       |
|    clip_range           | 0.2          |
|    entropy_loss         | -2.59        |
|    explained_variance   | 0.983        |
|    learning_rate        | 0.0003       |
|    loss                 | -0.033       |
|    n_updates            | 360          |
|    policy_gradient_loss | -0.0042      |
|    std                  | 0.883        |
|    value_loss           | 0.00479      |
------------------------------------------

[Diag @ 600,000 | n_sheep=1 | success=100%]
  SUCCESS                    20/20
  action_mag mean=0.343 p10=0.232 p90=0.548 (0=stopped, 1=full speed)
  min_flock_radius mean=0.00m best=0.00m  (target <5m to compact)
  min_dog_to_com   mean=1.53m best=0.76m  (FLEE_DIST=7m)
  min_com_to_pen   mean=3.49m best=2.84m
  reward/step (mean): progress=+0.1066  alignment=+0.0088  pen_bonus=+0.0357  step_cost=-0.0200  complete=+0.3567

[Curriculum] leaving stage n_sheep=1 after 600,000 steps | training success rate (last 100 eps) = 100%
[Curriculum] → 2 sheep at step 600,000

-------------------------------
| time/              |        |
|    fps             | 2156   |
|    iterations      | 37     |
|    time_elapsed    | 281    |
|    total_timesteps | 606208 |
-------------------------------
-----------------------------------------
| time/                   |             |
|    fps                  | 2173        |
|    iterations           | 38          |
|    time_elapsed         | 286         |
|    total_timesteps      | 622592      |
| train/                  |             |
|    approx_kl            | 0.011170821 |
|    clip_fraction        | 0.117       |
|    clip_range           | 0.2         |
|    entropy_loss         | -2.59       |
|    explained_variance   | 0.924       |
|    learning_rate        | 0.0003      |
|    loss                 | -0.0137     |
|    n_updates            | 370         |
|    policy_gradient_loss | 0.00714     |
|    std                  | 0.886       |
|    value_loss           | 0.0417      |
-----------------------------------------
-----------------------------------------
| time/                   |             |
|    fps                  | 2192        |
|    iterations           | 39          |
|    time_elapsed         | 291         |
|    total_timesteps      | 638976      |
| train/                  |             |
|    approx_kl            | 0.012632904 |
|    clip_fraction        | 0.156       |
|    clip_range           | 0.2         |
|    entropy_loss         | -2.6        |
|    explained_variance   | 0.858       |
|    learning_rate        | 0.0003      |
|    loss                 | -0.00445    |
|    n_updates            | 380         |
|    policy_gradient_loss | 0.00112     |
|    std                  | 0.892       |
|    value_loss           | 0.0156      |
-----------------------------------------
Eval num_timesteps=650000, episode_reward=-38.36 +/- 29.94
Episode length: 2000.00 +/- 0.00
-----------------------------------------
| eval/                   |             |
|    mean_ep_length       | 2e+03       |
|    mean_reward          | -38.4       |
| time/                   |             |
|    total_timesteps      | 650000      |
| train/                  |             |
|    approx_kl            | 0.012015635 |
|    clip_fraction        | 0.133       |
|    clip_range           | 0.2         |
|    entropy_loss         | -2.62       |
|    explained_variance   | 0.946       |
|    learning_rate        | 0.0003      |
|    loss                 | -0.0168     |
|    n_updates            | 390         |
|    policy_gradient_loss | -0.000726   |
|    std                  | 0.904       |
|    value_loss           | 0.0126      |
-----------------------------------------
-------------------------------
| time/              |        |
|    fps             | 2131   |
|    iterations      | 40     |
|    time_elapsed    | 307    |
|    total_timesteps | 655360 |
-------------------------------
-----------------------------------------
| time/                   |             |
|    fps                  | 2145        |
|    iterations           | 41          |
|    time_elapsed         | 313         |
|    total_timesteps      | 671744      |
| train/                  |             |
|    approx_kl            | 0.009391339 |
|    clip_fraction        | 0.121       |
|    clip_range           | 0.2         |
|    entropy_loss         | -2.63       |
|    explained_variance   | 0.955       |
|    learning_rate        | 0.0003      |
|    loss                 | -0.0164     |
|    n_updates            | 400         |
|    policy_gradient_loss | -0.00177    |
|    std                  | 0.905       |
|    value_loss           | 0.00536     |
-----------------------------------------
------------------------------------------
| time/                   |              |
|    fps                  | 2156         |
|    iterations           | 42           |
|    time_elapsed         | 319          |
|    total_timesteps      | 688128       |
| train/                  |              |
|    approx_kl            | 0.0077482145 |
|    clip_fraction        | 0.0977       |
|    clip_range           | 0.2          |
|    entropy_loss         | -2.64        |
|    explained_variance   | 0.895        |
|    learning_rate        | 0.0003       |
|    loss                 | -0.023       |
|    n_updates            | 410          |
|    policy_gradient_loss | -0.00158     |
|    std                  | 0.908        |
|    value_loss           | 0.0068       |
------------------------------------------
Eval num_timesteps=700000, episode_reward=-16.26 +/- 48.54
Episode length: 1934.20 +/- 286.82
-----------------------------------------
| eval/                   |             |
|    mean_ep_length       | 1.93e+03    |
|    mean_reward          | -16.3       |
| time/                   |             |
|    total_timesteps      | 700000      |
| train/                  |             |
|    approx_kl            | 0.007948186 |
|    clip_fraction        | 0.0933      |
|    clip_range           | 0.2         |
|    entropy_loss         | -2.64       |
|    explained_variance   | 0.934       |
|    learning_rate        | 0.0003      |
|    loss                 | -0.0205     |
|    n_updates            | 420         |
|    policy_gradient_loss | -0.00233    |
|    std                  | 0.904       |
|    value_loss           | 0.00556     |
-----------------------------------------
-------------------------------
| time/              |        |
|    fps             | 2093   |
|    iterations      | 43     |
|    time_elapsed    | 336    |
|    total_timesteps | 704512 |
-------------------------------
------------------------------------------
| time/                   |              |
|    fps                  | 2109         |
|    iterations           | 44           |
|    time_elapsed         | 341          |
|    total_timesteps      | 720896       |
| train/                  |              |
|    approx_kl            | 0.0077707805 |
|    clip_fraction        | 0.101        |
|    clip_range           | 0.2          |
|    entropy_loss         | -2.64        |
|    explained_variance   | 0.929        |
|    learning_rate        | 0.0003       |
|    loss                 | -0.00469     |
|    n_updates            | 430          |
|    policy_gradient_loss | -0.00226     |
|    std                  | 0.909        |
|    value_loss           | 0.0031       |
------------------------------------------
------------------------------------------
| time/                   |              |
|    fps                  | 2129         |
|    iterations           | 45           |
|    time_elapsed         | 346          |
|    total_timesteps      | 737280       |
| train/                  |              |
|    approx_kl            | 0.0063995067 |
|    clip_fraction        | 0.0823       |
|    clip_range           | 0.2          |
|    entropy_loss         | -2.66        |
|    explained_variance   | 0.951        |
|    learning_rate        | 0.0003       |
|    loss                 | -0.0249      |
|    n_updates            | 440          |
|    policy_gradient_loss | -0.00261     |
|    std                  | 0.922        |
|    value_loss           | 0.00343      |
------------------------------------------
Eval num_timesteps=750000, episode_reward=-12.10 +/- 56.78
Episode length: 1850.50 +/- 449.09
------------------------------------------
| eval/                   |              |
|    mean_ep_length       | 1.85e+03     |
|    mean_reward          | -12.1        |
| time/                   |              |
|    total_timesteps      | 750000       |
| train/                  |              |
|    approx_kl            | 0.0069549307 |
|    clip_fraction        | 0.0847       |
|    clip_range           | 0.2          |
|    entropy_loss         | -2.68        |
|    explained_variance   | 0.862        |
|    learning_rate        | 0.0003       |
|    loss                 | -0.0192      |
|    n_updates            | 450          |
|    policy_gradient_loss | -0.00165     |
|    std                  | 0.929        |
|    value_loss           | 0.0032       |
------------------------------------------

[Diag @ 750,000 | n_sheep=2 | success=5%]
  COMPACT_CANT_DRIVE         9/20
  NEVER_COMPACT              9/20
  PARTIAL_1of2               1/20
  SUCCESS                    1/20
  action_mag mean=0.261 p10=0.002 p90=0.983 (0=stopped, 1=full speed)
  min_flock_radius mean=3.93m best=0.00m  (target <5m to compact)
  min_dog_to_com   mean=0.79m best=0.07m  (FLEE_DIST=7m)
  min_com_to_pen   mean=13.43m best=1.62m
  reward/step (mean): progress=-0.0058  alignment=+0.0087  pen_bonus=+0.0008  step_cost=-0.0200  complete=+0.0025
-------------------------------
| time/              |        |
|    fps             | 2043   |
|    iterations      | 46     |
|    time_elapsed    | 368    |
|    total_timesteps | 753664 |
-------------------------------
-----------------------------------------
| time/                   |             |
|    fps                  | 2062        |
|    iterations           | 47          |
|    time_elapsed         | 373         |
|    total_timesteps      | 770048      |
| train/                  |             |
|    approx_kl            | 0.008165602 |
|    clip_fraction        | 0.0997      |
|    clip_range           | 0.2         |
|    entropy_loss         | -2.69       |
|    explained_variance   | 0.931       |
|    learning_rate        | 0.0003      |
|    loss                 | -0.0461     |
|    n_updates            | 460         |
|    policy_gradient_loss | -0.00412    |
|    std                  | 0.932       |
|    value_loss           | 0.00308     |
-----------------------------------------
-----------------------------------------
| time/                   |             |
|    fps                  | 2074        |
|    iterations           | 48          |
|    time_elapsed         | 379         |
|    total_timesteps      | 786432      |
| train/                  |             |
|    approx_kl            | 0.006088208 |
|    clip_fraction        | 0.0805      |
|    clip_range           | 0.2         |
|    entropy_loss         | -2.71       |
|    explained_variance   | 0.917       |
|    learning_rate        | 0.0003      |
|    loss                 | -0.034      |
|    n_updates            | 470         |
|    policy_gradient_loss | -0.000257   |
|    std                  | 0.943       |
|    value_loss           | 0.00533     |
-----------------------------------------
Eval num_timesteps=800000, episode_reward=-32.78 +/- 23.33
Episode length: 2000.00 +/- 0.00
------------------------------------------
| eval/                   |              |
|    mean_ep_length       | 2e+03        |
|    mean_reward          | -32.8        |
| time/                   |              |
|    total_timesteps      | 800000       |
| train/                  |              |
|    approx_kl            | 0.0069386996 |
|    clip_fraction        | 0.0883       |
|    clip_range           | 0.2          |
|    entropy_loss         | -2.73        |
|    explained_variance   | 0.954        |
|    learning_rate        | 0.0003       |
|    loss                 | -0.0361      |
|    n_updates            | 480          |
|    policy_gradient_loss | -0.00228     |
|    std                  | 0.948        |
|    value_loss           | 0.00495      |
------------------------------------------
-------------------------------
| time/              |        |
|    fps             | 2028   |
|    iterations      | 49     |
|    time_elapsed    | 395    |
|    total_timesteps | 802816 |
-------------------------------
------------------------------------------
| time/                   |              |
|    fps                  | 2045         |
|    iterations           | 50           |
|    time_elapsed         | 400          |
|    total_timesteps      | 819200       |
| train/                  |              |
|    approx_kl            | 0.0070893797 |
|    clip_fraction        | 0.0687       |
|    clip_range           | 0.2          |
|    entropy_loss         | -2.74        |
|    explained_variance   | 0.955        |
|    learning_rate        | 0.0003       |
|    loss                 | -0.035       |
|    n_updates            | 490          |
|    policy_gradient_loss | -0.00221     |
|    std                  | 0.954        |
|    value_loss           | 0.00229      |
------------------------------------------
------------------------------------------
| time/                   |              |
|    fps                  | 2060         |
|    iterations           | 51           |
|    time_elapsed         | 405          |
|    total_timesteps      | 835584       |
| train/                  |              |
|    approx_kl            | 0.0068652867 |
|    clip_fraction        | 0.0787       |
|    clip_range           | 0.2          |
|    entropy_loss         | -2.75        |
|    explained_variance   | 0.863        |
|    learning_rate        | 0.0003       |
|    loss                 | -0.0337      |
|    n_updates            | 500          |
|    policy_gradient_loss | -0.00277     |
|    std                  | 0.959        |
|    value_loss           | 0.00229      |
------------------------------------------
Eval num_timesteps=850000, episode_reward=-14.34 +/- 48.77
Episode length: 1998.40 +/- 6.97
-----------------------------------------
| eval/                   |             |
|    mean_ep_length       | 2e+03       |
|    mean_reward          | -14.3       |
| time/                   |             |
|    total_timesteps      | 850000      |
| train/                  |             |
|    approx_kl            | 0.007872021 |
|    clip_fraction        | 0.0815      |
|    clip_range           | 0.2         |
|    entropy_loss         | -2.76       |
|    explained_variance   | 0.852       |
|    learning_rate        | 0.0003      |
|    loss                 | -0.0358     |
|    n_updates            | 510         |
|    policy_gradient_loss | -0.00365    |
|    std                  | 0.966       |
|    value_loss           | 0.00272     |
-----------------------------------------
-------------------------------
| time/              |        |
|    fps             | 2018   |
|    iterations      | 52     |
|    time_elapsed    | 422    |
|    total_timesteps | 851968 |
-------------------------------
-----------------------------------------
| time/                   |             |
|    fps                  | 2032        |
|    iterations           | 53          |
|    time_elapsed         | 427         |
|    total_timesteps      | 868352      |
| train/                  |             |
|    approx_kl            | 0.007002457 |
|    clip_fraction        | 0.0752      |
|    clip_range           | 0.2         |
|    entropy_loss         | -2.78       |
|    explained_variance   | 0.879       |
|    learning_rate        | 0.0003      |
|    loss                 | -0.0414     |
|    n_updates            | 520         |
|    policy_gradient_loss | -0.00242    |
|    std                  | 0.977       |
|    value_loss           | 0.00166     |
-----------------------------------------
-----------------------------------------
| time/                   |             |
|    fps                  | 2047        |
|    iterations           | 54          |
|    time_elapsed         | 432         |
|    total_timesteps      | 884736      |
| train/                  |             |
|    approx_kl            | 0.007822147 |
|    clip_fraction        | 0.0813      |
|    clip_range           | 0.2         |
|    entropy_loss         | -2.8        |
|    explained_variance   | 0.871       |
|    learning_rate        | 0.0003      |
|    loss                 | -0.0376     |
|    n_updates            | 530         |
|    policy_gradient_loss | -0.00362    |
|    std                  | 0.984       |
|    value_loss           | 0.00212     |
-----------------------------------------
Eval num_timesteps=900000, episode_reward=-20.41 +/- 60.01
Episode length: 1929.40 +/- 284.99
----------------------------------------
| eval/                   |            |
|    mean_ep_length       | 1.93e+03   |
|    mean_reward          | -20.4      |
| time/                   |            |
|    total_timesteps      | 900000     |
| train/                  |            |
|    approx_kl            | 0.00738756 |
|    clip_fraction        | 0.0793     |
|    clip_range           | 0.2        |
|    entropy_loss         | -2.81      |
|    explained_variance   | 0.808      |
|    learning_rate        | 0.0003     |
|    loss                 | -0.0355    |
|    n_updates            | 540        |
|    policy_gradient_loss | -0.00195   |
|    std                  | 0.988      |
|    value_loss           | 0.00721    |
----------------------------------------

[Diag @ 900,000 | n_sheep=2 | success=5%]
  COMPACT_CANT_DRIVE         11/20
  NEVER_COMPACT              8/20
  SUCCESS                    1/20
  action_mag mean=0.203 p10=0.007 p90=0.704 (0=stopped, 1=full speed)
  min_flock_radius mean=3.40m best=0.00m  (target <5m to compact)
  min_dog_to_com   mean=0.60m best=0.11m  (FLEE_DIST=7m)
  min_com_to_pen   mean=14.01m best=3.61m
  reward/step (mean): progress=-0.0040  alignment=+0.0071  pen_bonus=+0.0008  step_cost=-0.0200  complete=+0.0026
-------------------------------
| time/              |        |
|    fps             | 1977   |
|    iterations      | 55     |
|    time_elapsed    | 455    |
|    total_timesteps | 901120 |
-------------------------------
-----------------------------------------
| time/                   |             |
|    fps                  | 1990        |
|    iterations           | 56          |
|    time_elapsed         | 460         |
|    total_timesteps      | 917504      |
| train/                  |             |
|    approx_kl            | 0.007000256 |
|    clip_fraction        | 0.0831      |
|    clip_range           | 0.2         |
|    entropy_loss         | -2.8        |
|    explained_variance   | 0.889       |
|    learning_rate        | 0.0003      |
|    loss                 | -0.0285     |
|    n_updates            | 550         |
|    policy_gradient_loss | -0.00402    |
|    std                  | 0.984       |
|    value_loss           | 0.00171     |
-----------------------------------------
-----------------------------------------
| time/                   |             |
|    fps                  | 2005        |
|    iterations           | 57          |
|    time_elapsed         | 465         |
|    total_timesteps      | 933888      |
| train/                  |             |
|    approx_kl            | 0.007749311 |
|    clip_fraction        | 0.0755      |
|    clip_range           | 0.2         |
|    entropy_loss         | -2.83       |
|    explained_variance   | 0.599       |
|    learning_rate        | 0.0003      |
|    loss                 | -0.032      |
|    n_updates            | 560         |
|    policy_gradient_loss | -0.00239    |
|    std                  | 1.01        |
|    value_loss           | 0.00351     |
-----------------------------------------
Eval num_timesteps=950000, episode_reward=-13.16 +/- 44.70
Episode length: 1949.30 +/- 221.00
------------------------------------------
| eval/                   |              |
|    mean_ep_length       | 1.95e+03     |
|    mean_reward          | -13.2        |
| time/                   |              |
|    total_timesteps      | 950000       |
| train/                  |              |
|    approx_kl            | 0.0075328955 |
|    clip_fraction        | 0.0829       |
|    clip_range           | 0.2          |
|    entropy_loss         | -2.85        |
|    explained_variance   | 0.783        |
|    learning_rate        | 0.0003       |
|    loss                 | -0.0306      |
|    n_updates            | 570          |
|    policy_gradient_loss | -0.00352     |
|    std                  | 1.01         |
|    value_loss           | 0.00319      |
------------------------------------------
-------------------------------
| time/              |        |
|    fps             | 1971   |
|    iterations      | 58     |
|    time_elapsed    | 482    |
|    total_timesteps | 950272 |
-------------------------------
------------------------------------------
| time/                   |              |
|    fps                  | 1981         |
|    iterations           | 59           |
|    time_elapsed         | 487          |
|    total_timesteps      | 966656       |
| train/                  |              |
|    approx_kl            | 0.0072506005 |
|    clip_fraction        | 0.0835       |
|    clip_range           | 0.2          |
|    entropy_loss         | -2.86        |
|    explained_variance   | 0.929        |
|    learning_rate        | 0.0003       |
|    loss                 | -0.0291      |
|    n_updates            | 580          |
|    policy_gradient_loss | -0.00173     |
|    std                  | 1.01         |
|    value_loss           | 0.00491      |
------------------------------------------
------------------------------------------
| time/                   |              |
|    fps                  | 1991         |
|    iterations           | 60           |
|    time_elapsed         | 493          |
|    total_timesteps      | 983040       |
| train/                  |              |
|    approx_kl            | 0.0068104668 |
|    clip_fraction        | 0.0799       |
|    clip_range           | 0.2          |
|    entropy_loss         | -2.87        |
|    explained_variance   | 0.813        |
|    learning_rate        | 0.0003       |
|    loss                 | -0.0282      |
|    n_updates            | 590          |
|    policy_gradient_loss | -0.00162     |
|    std                  | 1.02         |
|    value_loss           | 0.00477      |
------------------------------------------
-----------------------------------------
| time/                   |             |
|    fps                  | 2005        |
|    iterations           | 61          |
|    time_elapsed         | 498         |
|    total_timesteps      | 999424      |
| train/                  |             |
|    approx_kl            | 0.007103944 |
|    clip_fraction        | 0.0774      |
|    clip_range           | 0.2         |
|    entropy_loss         | -2.88       |
|    explained_variance   | 0.942       |
|    learning_rate        | 0.0003      |
|    loss                 | -0.0322     |
|    n_updates            | 600         |
|    policy_gradient_loss | -0.00143    |
|    std                  | 1.03        |
|    value_loss           | 0.0033      |
-----------------------------------------
Eval num_timesteps=1000000, episode_reward=-25.58 +/- 49.00
Episode length: 1999.50 +/- 2.18
------------------------------------------
| eval/                   |              |
|    mean_ep_length       | 2e+03        |
|    mean_reward          | -25.6        |
| time/                   |              |
|    total_timesteps      | 1000000      |
| train/                  |              |
|    approx_kl            | 0.0075788023 |
|    clip_fraction        | 0.088        |
|    clip_range           | 0.2          |
|    entropy_loss         | -2.9         |
|    explained_variance   | 0.864        |
|    learning_rate        | 0.0003       |
|    loss                 | -0.0352      |
|    n_updates            | 610          |
|    policy_gradient_loss | -0.003       |
|    std                  | 1.04         |
|    value_loss           | 0.00192      |
------------------------------------------
--------------------------------
| time/              |         |
|    fps             | 1971    |
|    iterations      | 62      |
|    time_elapsed    | 515     |
|    total_timesteps | 1015808 |
--------------------------------
-----------------------------------------
| time/                   |             |
|    fps                  | 1983        |
|    iterations           | 63          |
|    time_elapsed         | 520         |
|    total_timesteps      | 1032192     |
| train/                  |             |
|    approx_kl            | 0.009131588 |
|    clip_fraction        | 0.0902      |
|    clip_range           | 0.2         |
|    entropy_loss         | -2.89       |
|    explained_variance   | 0.941       |
|    learning_rate        | 0.0003      |
|    loss                 | -0.0476     |
|    n_updates            | 620         |
|    policy_gradient_loss | -0.00341    |
|    std                  | 1.03        |
|    value_loss           | 0.00705     |
-----------------------------------------
----------------------------------------
| time/                   |            |
|    fps                  | 1995       |
|    iterations           | 64         |
|    time_elapsed         | 525        |
|    total_timesteps      | 1048576    |
| train/                  |            |
|    approx_kl            | 0.00746674 |
|    clip_fraction        | 0.0838     |
|    clip_range           | 0.2        |
|    entropy_loss         | -2.89      |
|    explained_variance   | 0.958      |
|    learning_rate        | 0.0003     |
|    loss                 | -0.022     |
|    n_updates            | 630        |
|    policy_gradient_loss | -0.00392   |
|    std                  | 1.03       |
|    value_loss           | 0.00592    |
----------------------------------------
Eval num_timesteps=1050000, episode_reward=-12.04 +/- 64.56
Episode length: 1889.90 +/- 333.38
------------------------------------------
| eval/                   |              |
|    mean_ep_length       | 1.89e+03     |
|    mean_reward          | -12          |
| time/                   |              |
|    total_timesteps      | 1050000      |
| train/                  |              |
|    approx_kl            | 0.0058071706 |
|    clip_fraction        | 0.0721       |
|    clip_range           | 0.2          |
|    entropy_loss         | -2.9         |
|    explained_variance   | 0.932        |
|    learning_rate        | 0.0003       |
|    loss                 | -0.0188      |
|    n_updates            | 640          |
|    policy_gradient_loss | -0.00235     |
|    std                  | 1.03         |
|    value_loss           | 0.00513      |
------------------------------------------

[Diag @ 1,050,000 | n_sheep=2 | success=5%]
  COMPACT_CANT_DRIVE         10/20
  NEVER_COMPACT              9/20
  SUCCESS                    1/20
  action_mag mean=0.190 p10=0.001 p90=0.686 (0=stopped, 1=full speed)
  min_flock_radius mean=4.60m best=0.00m  (target <5m to compact)
  min_dog_to_com   mean=0.54m best=0.21m  (FLEE_DIST=7m)
  min_com_to_pen   mean=13.05m best=3.62m
  reward/step (mean): progress=-0.0023  alignment=+0.0072  pen_bonus=+0.0005  step_cost=-0.0200  complete=+0.0025
--------------------------------
| time/              |         |
|    fps             | 1931    |
|    iterations      | 65      |
|    time_elapsed    | 551     |
|    total_timesteps | 1064960 |
--------------------------------
-----------------------------------------
| time/                   |             |
|    fps                  | 1944        |
|    iterations           | 66          |
|    time_elapsed         | 556         |
|    total_timesteps      | 1081344     |
| train/                  |             |
|    approx_kl            | 0.006802067 |
|    clip_fraction        | 0.0701      |
|    clip_range           | 0.2         |
|    entropy_loss         | -2.92       |
|    explained_variance   | 0.937       |
|    learning_rate        | 0.0003      |
|    loss                 | -0.0304     |
|    n_updates            | 650         |
|    policy_gradient_loss | -0.0019     |
|    std                  | 1.04        |
|    value_loss           | 0.00206     |
-----------------------------------------
-----------------------------------------
| time/                   |             |
|    fps                  | 1956        |
|    iterations           | 67          |
|    time_elapsed         | 561         |
|    total_timesteps      | 1097728     |
| train/                  |             |
|    approx_kl            | 0.007102525 |
|    clip_fraction        | 0.074       |
|    clip_range           | 0.2         |
|    entropy_loss         | -2.92       |
|    explained_variance   | 0.953       |
|    learning_rate        | 0.0003      |
|    loss                 | -0.00869    |
|    n_updates            | 660         |
|    policy_gradient_loss | -0.00208    |
|    std                  | 1.04        |
|    value_loss           | 0.00579     |
-----------------------------------------
Eval num_timesteps=1100000, episode_reward=-29.51 +/- 23.80
Episode length: 2000.00 +/- 0.00
-----------------------------------------
| eval/                   |             |
|    mean_ep_length       | 2e+03       |
|    mean_reward          | -29.5       |
| time/                   |             |
|    total_timesteps      | 1100000     |
| train/                  |             |
|    approx_kl            | 0.006372301 |
|    clip_fraction        | 0.0669      |
|    clip_range           | 0.2         |
|    entropy_loss         | -2.94       |
|    explained_variance   | 0.829       |
|    learning_rate        | 0.0003      |
|    loss                 | -0.0349     |
|    n_updates            | 670         |
|    policy_gradient_loss | -0.00135    |
|    std                  | 1.06        |
|    value_loss           | 0.00208     |
-----------------------------------------
--------------------------------
| time/              |         |
|    fps             | 1932    |
|    iterations      | 68      |
|    time_elapsed    | 576     |
|    total_timesteps | 1114112 |
--------------------------------
-----------------------------------------
| time/                   |             |
|    fps                  | 1942        |
|    iterations           | 69          |
|    time_elapsed         | 581         |
|    total_timesteps      | 1130496     |
| train/                  |             |
|    approx_kl            | 0.007083354 |
|    clip_fraction        | 0.0839      |
|    clip_range           | 0.2         |
|    entropy_loss         | -2.95       |
|    explained_variance   | 0.845       |
|    learning_rate        | 0.0003      |
|    loss                 | -0.0464     |
|    n_updates            | 680         |
|    policy_gradient_loss | -0.00298    |
|    std                  | 1.06        |
|    value_loss           | 0.00747     |
-----------------------------------------
-----------------------------------------
| time/                   |             |
|    fps                  | 1954        |
|    iterations           | 70          |
|    time_elapsed         | 586         |
|    total_timesteps      | 1146880     |
| train/                  |             |
|    approx_kl            | 0.007034454 |
|    clip_fraction        | 0.0875      |
|    clip_range           | 0.2         |
|    entropy_loss         | -2.96       |
|    explained_variance   | 0.892       |
|    learning_rate        | 0.0003      |
|    loss                 | -0.0382     |
|    n_updates            | 690         |
|    policy_gradient_loss | -0.00359    |
|    std                  | 1.06        |
|    value_loss           | 0.00208     |
-----------------------------------------
Eval num_timesteps=1150000, episode_reward=-20.98 +/- 49.18
Episode length: 1959.70 +/- 175.66
-----------------------------------------
| eval/                   |             |
|    mean_ep_length       | 1.96e+03    |
|    mean_reward          | -21         |
| time/                   |             |
|    total_timesteps      | 1150000     |
| train/                  |             |
|    approx_kl            | 0.006192833 |
|    clip_fraction        | 0.0626      |
|    clip_range           | 0.2         |
|    entropy_loss         | -2.94       |
|    explained_variance   | 0.951       |
|    learning_rate        | 0.0003      |
|    loss                 | -0.0224     |
|    n_updates            | 700         |
|    policy_gradient_loss | -0.00299    |
|    std                  | 1.05        |
|    value_loss           | 0.00883     |
-----------------------------------------
--------------------------------
| time/              |         |
|    fps             | 1926    |
|    iterations      | 71      |
|    time_elapsed    | 603     |
|    total_timesteps | 1163264 |
--------------------------------
-----------------------------------------
| time/                   |             |
|    fps                  | 1937        |
|    iterations           | 72          |
|    time_elapsed         | 608         |
|    total_timesteps      | 1179648     |
| train/                  |             |
|    approx_kl            | 0.008185772 |
|    clip_fraction        | 0.0969      |
|    clip_range           | 0.2         |
|    entropy_loss         | -2.96       |
|    explained_variance   | 0.944       |
|    learning_rate        | 0.0003      |
|    loss                 | -0.0278     |
|    n_updates            | 710         |
|    policy_gradient_loss | -0.00316    |
|    std                  | 1.07        |
|    value_loss           | 0.00421     |
-----------------------------------------
------------------------------------------
| time/                   |              |
|    fps                  | 1947         |
|    iterations           | 73           |
|    time_elapsed         | 614          |
|    total_timesteps      | 1196032      |
| train/                  |              |
|    approx_kl            | 0.0063469247 |
|    clip_fraction        | 0.065        |
|    clip_range           | 0.2          |
|    entropy_loss         | -2.96        |
|    explained_variance   | 0.912        |
|    learning_rate        | 0.0003       |
|    loss                 | -0.0239      |
|    n_updates            | 720          |
|    policy_gradient_loss | -0.00224     |
|    std                  | 1.06         |
|    value_loss           | 0.0054       |
------------------------------------------
Eval num_timesteps=1200000, episode_reward=-29.34 +/- 18.71
Episode length: 2000.00 +/- 0.00
----------------------------------------
| eval/                   |            |
|    mean_ep_length       | 2e+03      |
|    mean_reward          | -29.3      |
| time/                   |            |
|    total_timesteps      | 1200000    |
| train/                  |            |
|    approx_kl            | 0.00778389 |
|    clip_fraction        | 0.0734     |
|    clip_range           | 0.2        |
|    entropy_loss         | -2.95      |
|    explained_variance   | 0.961      |
|    learning_rate        | 0.0003     |
|    loss                 | -0.0435    |
|    n_updates            | 730        |
|    policy_gradient_loss | -0.00184   |
|    std                  | 1.06       |
|    value_loss           | 0.0048     |
----------------------------------------

[Diag @ 1,200,000 | n_sheep=2 | success=10%]
  NEVER_COMPACT              9/20
  COMPACT_CANT_DRIVE         9/20
  SUCCESS                    2/20
  action_mag mean=0.198 p10=0.002 p90=0.744 (0=stopped, 1=full speed)
  min_flock_radius mean=3.94m best=0.00m  (target <5m to compact)
  min_dog_to_com   mean=0.50m best=0.14m  (FLEE_DIST=7m)
  min_com_to_pen   mean=11.36m best=3.58m
  reward/step (mean): progress=-0.0002  alignment=+0.0073  pen_bonus=+0.0013  step_cost=-0.0200  complete=+0.0053

[Curriculum] leaving stage n_sheep=2 after 600,000 steps | training success rate (last 100 eps) = 5%
[Curriculum] → 3 sheep at step 1,200,000

--------------------------------
| time/              |         |
|    fps             | 1898    |
|    iterations      | 74      |
|    time_elapsed    | 638     |
|    total_timesteps | 1212416 |
--------------------------------
----------------------------------------
| time/                   |            |
|    fps                  | 1909       |
|    iterations           | 75         |
|    time_elapsed         | 643        |
|    total_timesteps      | 1228800    |
| train/                  |            |
|    approx_kl            | 0.00918101 |
|    clip_fraction        | 0.106      |
|    clip_range           | 0.2        |
|    entropy_loss         | -2.95      |
|    explained_variance   | 0.919      |
|    learning_rate        | 0.0003     |
|    loss                 | -0.0112    |
|    n_updates            | 740        |
|    policy_gradient_loss | -0.00123   |
|    std                  | 1.06       |
|    value_loss           | 0.0427     |
----------------------------------------
-----------------------------------------
| time/                   |             |
|    fps                  | 1917        |
|    iterations           | 76          |
|    time_elapsed         | 649         |
|    total_timesteps      | 1245184     |
| train/                  |             |
|    approx_kl            | 0.010076641 |
|    clip_fraction        | 0.137       |
|    clip_range           | 0.2         |
|    entropy_loss         | -2.94       |
|    explained_variance   | 0.919       |
|    learning_rate        | 0.0003      |
|    loss                 | -0.0229     |
|    n_updates            | 750         |
|    policy_gradient_loss | -0.000617   |
|    std                  | 1.05        |
|    value_loss           | 0.0222      |
-----------------------------------------
Eval num_timesteps=1250000, episode_reward=-38.73 +/- 33.85
Episode length: 2000.00 +/- 0.00
---------------------------------------
| eval/                   |           |
|    mean_ep_length       | 2e+03     |
|    mean_reward          | -38.7     |
| time/                   |           |
|    total_timesteps      | 1250000   |
| train/                  |           |
|    approx_kl            | 0.0084493 |
|    clip_fraction        | 0.109     |
|    clip_range           | 0.2       |
|    entropy_loss         | -2.96     |
|    explained_variance   | 0.96      |
|    learning_rate        | 0.0003    |
|    loss                 | -0.0259   |
|    n_updates            | 760       |
|    policy_gradient_loss | -0.00168  |
|    std                  | 1.06      |
|    value_loss           | 0.0024    |
---------------------------------------
--------------------------------
| time/              |         |
|    fps             | 1890    |
|    iterations      | 77      |
|    time_elapsed    | 667     |
|    total_timesteps | 1261568 |
--------------------------------
-----------------------------------------
| time/                   |             |
|    fps                  | 1899        |
|    iterations           | 78          |
|    time_elapsed         | 672         |
|    total_timesteps      | 1277952     |
| train/                  |             |
|    approx_kl            | 0.008724872 |
|    clip_fraction        | 0.109       |
|    clip_range           | 0.2         |
|    entropy_loss         | -2.98       |
|    explained_variance   | 0.931       |
|    learning_rate        | 0.0003      |
|    loss                 | -0.0293     |
|    n_updates            | 770         |
|    policy_gradient_loss | -0.00204    |
|    std                  | 1.08        |
|    value_loss           | 0.0067      |
-----------------------------------------
-----------------------------------------
| time/                   |             |
|    fps                  | 1906        |
|    iterations           | 79          |
|    time_elapsed         | 678         |
|    total_timesteps      | 1294336     |
| train/                  |             |
|    approx_kl            | 0.008191848 |
|    clip_fraction        | 0.096       |
|    clip_range           | 0.2         |
|    entropy_loss         | -2.99       |
|    explained_variance   | 0.963       |
|    learning_rate        | 0.0003      |
|    loss                 | -0.0247     |
|    n_updates            | 780         |
|    policy_gradient_loss | -0.002      |
|    std                  | 1.08        |
|    value_loss           | 0.00632     |
-----------------------------------------
Eval num_timesteps=1300000, episode_reward=-26.68 +/- 27.12
Episode length: 2000.00 +/- 0.00
-----------------------------------------
| eval/                   |             |
|    mean_ep_length       | 2e+03       |
|    mean_reward          | -26.7       |
| time/                   |             |
|    total_timesteps      | 1300000     |
| train/                  |             |
|    approx_kl            | 0.006018152 |
|    clip_fraction        | 0.0869      |
|    clip_range           | 0.2         |
|    entropy_loss         | -3          |
|    explained_variance   | 0.96        |
|    learning_rate        | 0.0003      |
|    loss                 | -0.0311     |
|    n_updates            | 790         |
|    policy_gradient_loss | -0.00129    |
|    std                  | 1.09        |
|    value_loss           | 0.00189     |
-----------------------------------------
--------------------------------
| time/              |         |
|    fps             | 1881    |
|    iterations      | 80      |
|    time_elapsed    | 696     |
|    total_timesteps | 1310720 |
--------------------------------
------------------------------------------
| time/                   |              |
|    fps                  | 1892         |
|    iterations           | 81           |
|    time_elapsed         | 701          |
|    total_timesteps      | 1327104      |
| train/                  |              |
|    approx_kl            | 0.0077671953 |
|    clip_fraction        | 0.082        |
|    clip_range           | 0.2          |
|    entropy_loss         | -3.01        |
|    explained_variance   | 0.972        |
|    learning_rate        | 0.0003       |
|    loss                 | -0.0308      |
|    n_updates            | 800          |
|    policy_gradient_loss | -0.00219     |
|    std                  | 1.09         |
|    value_loss           | 0.00177      |
------------------------------------------
-----------------------------------------
| time/                   |             |
|    fps                  | 1902        |
|    iterations           | 82          |
|    time_elapsed         | 706         |
|    total_timesteps      | 1343488     |
| train/                  |             |
|    approx_kl            | 0.008806022 |
|    clip_fraction        | 0.0947      |
|    clip_range           | 0.2         |
|    entropy_loss         | -3.02       |
|    explained_variance   | 0.962       |
|    learning_rate        | 0.0003      |
|    loss                 | -0.0426     |
|    n_updates            | 810         |
|    policy_gradient_loss | -0.00231    |
|    std                  | 1.1         |
|    value_loss           | 0.00235     |
-----------------------------------------
Eval num_timesteps=1350000, episode_reward=-24.30 +/- 32.03
Episode length: 2000.00 +/- 0.00
-----------------------------------------
| eval/                   |             |
|    mean_ep_length       | 2e+03       |
|    mean_reward          | -24.3       |
| time/                   |             |
|    total_timesteps      | 1350000     |
| train/                  |             |
|    approx_kl            | 0.007263833 |
|    clip_fraction        | 0.0797      |
|    clip_range           | 0.2         |
|    entropy_loss         | -3.03       |
|    explained_variance   | 0.957       |
|    learning_rate        | 0.0003      |
|    loss                 | -0.0338     |
|    n_updates            | 820         |
|    policy_gradient_loss | -0.00251    |
|    std                  | 1.11        |
|    value_loss           | 0.00397     |
-----------------------------------------

[Diag @ 1,350,000 | n_sheep=3 | success=0%]
  NEVER_COMPACT              16/20
  COMPACT_CANT_DRIVE         4/20
  action_mag mean=0.058 p10=0.004 p90=0.054 (0=stopped, 1=full speed)
  min_flock_radius mean=6.77m best=1.04m  (target <5m to compact)
  min_dog_to_com   mean=0.58m best=0.28m  (FLEE_DIST=7m)
  min_com_to_pen   mean=12.71m best=4.27m
  reward/step (mean): progress=-0.0038  alignment=+0.0015  pen_bonus=+0.0005  step_cost=-0.0200  complete=+0.0000
--------------------------------
| time/              |         |
|    fps             | 1859    |
|    iterations      | 83      |
|    time_elapsed    | 731     |
|    total_timesteps | 1359872 |
--------------------------------
-----------------------------------------
| time/                   |             |
|    fps                  | 1870        |
|    iterations           | 84          |
|    time_elapsed         | 735         |
|    total_timesteps      | 1376256     |
| train/                  |             |
|    approx_kl            | 0.007816839 |
|    clip_fraction        | 0.0812      |
|    clip_range           | 0.2         |
|    entropy_loss         | -3.05       |
|    explained_variance   | 0.946       |
|    learning_rate        | 0.0003      |
|    loss                 | -0.0285     |
|    n_updates            | 830         |
|    policy_gradient_loss | -0.00277    |
|    std                  | 1.11        |
|    value_loss           | 0.0018      |
-----------------------------------------
------------------------------------------
| time/                   |              |
|    fps                  | 1880         |
|    iterations           | 85           |
|    time_elapsed         | 740          |
|    total_timesteps      | 1392640      |
| train/                  |              |
|    approx_kl            | 0.0064534983 |
|    clip_fraction        | 0.0774       |
|    clip_range           | 0.2          |
|    entropy_loss         | -3.06        |
|    explained_variance   | 0.958        |
|    learning_rate        | 0.0003       |
|    loss                 | -0.0305      |
|    n_updates            | 840          |
|    policy_gradient_loss | -0.00158     |
|    std                  | 1.12         |
|    value_loss           | 0.00988      |
------------------------------------------
Eval num_timesteps=1400000, episode_reward=-39.10 +/- 41.08
Episode length: 2000.00 +/- 0.00
------------------------------------------
| eval/                   |              |
|    mean_ep_length       | 2e+03        |
|    mean_reward          | -39.1        |
| time/                   |              |
|    total_timesteps      | 1400000      |
| train/                  |              |
|    approx_kl            | 0.0069560152 |
|    clip_fraction        | 0.0835       |
|    clip_range           | 0.2          |
|    entropy_loss         | -3.07        |
|    explained_variance   | 0.96         |
|    learning_rate        | 0.0003       |
|    loss                 | -0.0302      |
|    n_updates            | 850          |
|    policy_gradient_loss | -0.00283     |
|    std                  | 1.12         |
|    value_loss           | 0.00307      |
------------------------------------------
--------------------------------
| time/              |         |
|    fps             | 1857    |
|    iterations      | 86      |
|    time_elapsed    | 758     |
|    total_timesteps | 1409024 |
--------------------------------
-----------------------------------------
| time/                   |             |
|    fps                  | 1864        |
|    iterations           | 87          |
|    time_elapsed         | 764         |
|    total_timesteps      | 1425408     |
| train/                  |             |
|    approx_kl            | 0.007682803 |
|    clip_fraction        | 0.0931      |
|    clip_range           | 0.2         |
|    entropy_loss         | -3.09       |
|    explained_variance   | 0.902       |
|    learning_rate        | 0.0003      |
|    loss                 | -0.0322     |
|    n_updates            | 860         |
|    policy_gradient_loss | -0.00224    |
|    std                  | 1.14        |
|    value_loss           | 0.013       |
-----------------------------------------
------------------------------------------
| time/                   |              |
|    fps                  | 1869         |
|    iterations           | 88           |
|    time_elapsed         | 771          |
|    total_timesteps      | 1441792      |
| train/                  |              |
|    approx_kl            | 0.0063949013 |
|    clip_fraction        | 0.0786       |
|    clip_range           | 0.2          |
|    entropy_loss         | -3.1         |
|    explained_variance   | 0.953        |
|    learning_rate        | 0.0003       |
|    loss                 | -0.0401      |
|    n_updates            | 870          |
|    policy_gradient_loss | -0.00134     |
|    std                  | 1.14         |
|    value_loss           | 0.00193      |
------------------------------------------
Eval num_timesteps=1450000, episode_reward=-28.59 +/- 25.61
Episode length: 2000.00 +/- 0.00
-----------------------------------------
| eval/                   |             |
|    mean_ep_length       | 2e+03       |
|    mean_reward          | -28.6       |
| time/                   |             |
|    total_timesteps      | 1450000     |
| train/                  |             |
|    approx_kl            | 0.007503539 |
|    clip_fraction        | 0.0774      |
|    clip_range           | 0.2         |
|    entropy_loss         | -3.13       |
|    explained_variance   | 0.951       |
|    learning_rate        | 0.0003      |
|    loss                 | -0.0378     |
|    n_updates            | 880         |
|    policy_gradient_loss | -0.00309    |
|    std                  | 1.16        |
|    value_loss           | 0.00551     |
-----------------------------------------
--------------------------------
| time/              |         |
|    fps             | 1845    |
|    iterations      | 89      |
|    time_elapsed    | 789     |
|    total_timesteps | 1458176 |
--------------------------------
------------------------------------------
| time/                   |              |
|    fps                  | 1852         |
|    iterations           | 90           |
|    time_elapsed         | 796          |
|    total_timesteps      | 1474560      |
| train/                  |              |
|    approx_kl            | 0.0075057503 |
|    clip_fraction        | 0.0793       |
|    clip_range           | 0.2          |
|    entropy_loss         | -3.15        |
|    explained_variance   | 0.955        |
|    learning_rate        | 0.0003       |
|    loss                 | -0.0439      |
|    n_updates            | 890          |
|    policy_gradient_loss | -0.00264     |
|    std                  | 1.17         |
|    value_loss           | 0.00265      |
------------------------------------------
------------------------------------------
| time/                   |              |
|    fps                  | 1857         |
|    iterations           | 91           |
|    time_elapsed         | 802          |
|    total_timesteps      | 1490944      |
| train/                  |              |
|    approx_kl            | 0.0068523246 |
|    clip_fraction        | 0.0755       |
|    clip_range           | 0.2          |
|    entropy_loss         | -3.15        |
|    explained_variance   | 0.935        |
|    learning_rate        | 0.0003       |
|    loss                 | -0.0282      |
|    n_updates            | 900          |
|    policy_gradient_loss | -0.00292     |
|    std                  | 1.17         |
|    value_loss           | 0.00268      |
------------------------------------------
Eval num_timesteps=1500000, episode_reward=-40.66 +/- 25.29
Episode length: 2000.00 +/- 0.00
-----------------------------------------
| eval/                   |             |
|    mean_ep_length       | 2e+03       |
|    mean_reward          | -40.7       |
| time/                   |             |
|    total_timesteps      | 1500000     |
| train/                  |             |
|    approx_kl            | 0.007249858 |
|    clip_fraction        | 0.0857      |
|    clip_range           | 0.2         |
|    entropy_loss         | -3.15       |
|    explained_variance   | 0.952       |
|    learning_rate        | 0.0003      |
|    loss                 | -0.0366     |
|    n_updates            | 910         |
|    policy_gradient_loss | -0.00319    |
|    std                  | 1.17        |
|    value_loss           | 0.00564     |
-----------------------------------------

[Diag @ 1,500,000 | n_sheep=3 | success=0%]
  NEVER_COMPACT              14/20
  COMPACT_CANT_DRIVE         6/20
  action_mag mean=0.050 p10=0.005 p90=0.049 (0=stopped, 1=full speed)
  min_flock_radius mean=6.53m best=0.98m  (target <5m to compact)
  min_dog_to_com   mean=0.46m best=0.06m  (FLEE_DIST=7m)
  min_com_to_pen   mean=12.38m best=5.44m
  reward/step (mean): progress=+0.0039  alignment=+0.0011  pen_bonus=+0.0005  step_cost=-0.0200  complete=+0.0000
--------------------------------
| time/              |         |
|    fps             | 1819    |
|    iterations      | 92      |
|    time_elapsed    | 828     |
|    total_timesteps | 1507328 |
--------------------------------
-----------------------------------------
| time/                   |             |
|    fps                  | 1828        |
|    iterations           | 93          |
|    time_elapsed         | 833         |
|    total_timesteps      | 1523712     |
| train/                  |             |
|    approx_kl            | 0.007471386 |
|    clip_fraction        | 0.0834      |
|    clip_range           | 0.2         |
|    entropy_loss         | -3.16       |
|    explained_variance   | 0.929       |
|    learning_rate        | 0.0003      |
|    loss                 | -0.0275     |
|    n_updates            | 920         |
|    policy_gradient_loss | -0.00192    |
|    std                  | 1.17        |
|    value_loss           | 0.00791     |
-----------------------------------------
-----------------------------------------
| time/                   |             |
|    fps                  | 1835        |
|    iterations           | 94          |
|    time_elapsed         | 838         |
|    total_timesteps      | 1540096     |
| train/                  |             |
|    approx_kl            | 0.007296456 |
|    clip_fraction        | 0.0765      |
|    clip_range           | 0.2         |
|    entropy_loss         | -3.17       |
|    explained_variance   | 0.95        |
|    learning_rate        | 0.0003      |
|    loss                 | -0.0484     |
|    n_updates            | 930         |
|    policy_gradient_loss | -0.00366    |
|    std                  | 1.18        |
|    value_loss           | 0.00788     |
-----------------------------------------
Eval num_timesteps=1550000, episode_reward=-34.66 +/- 25.47
Episode length: 2000.00 +/- 0.00
-----------------------------------------
| eval/                   |             |
|    mean_ep_length       | 2e+03       |
|    mean_reward          | -34.7       |
| time/                   |             |
|    total_timesteps      | 1550000     |
| train/                  |             |
|    approx_kl            | 0.007654687 |
|    clip_fraction        | 0.095       |
|    clip_range           | 0.2         |
|    entropy_loss         | -3.18       |
|    explained_variance   | 0.92        |
|    learning_rate        | 0.0003      |
|    loss                 | -0.0386     |
|    n_updates            | 940         |
|    policy_gradient_loss | -0.00316    |
|    std                  | 1.19        |
|    value_loss           | 0.00363     |
-----------------------------------------
--------------------------------
| time/              |         |
|    fps             | 1817    |
|    iterations      | 95      |
|    time_elapsed    | 856     |
|    total_timesteps | 1556480 |
--------------------------------
-----------------------------------------
| time/                   |             |
|    fps                  | 1823        |
|    iterations           | 96          |
|    time_elapsed         | 862         |
|    total_timesteps      | 1572864     |
| train/                  |             |
|    approx_kl            | 0.007030643 |
|    clip_fraction        | 0.0881      |
|    clip_range           | 0.2         |
|    entropy_loss         | -3.18       |
|    explained_variance   | 0.944       |
|    learning_rate        | 0.0003      |
|    loss                 | -0.0346     |
|    n_updates            | 950         |
|    policy_gradient_loss | -0.00321    |
|    std                  | 1.19        |
|    value_loss           | 0.00208     |
-----------------------------------------
------------------------------------------
| time/                   |              |
|    fps                  | 1828         |
|    iterations           | 97           |
|    time_elapsed         | 869          |
|    total_timesteps      | 1589248      |
| train/                  |              |
|    approx_kl            | 0.0071562277 |
|    clip_fraction        | 0.0834       |
|    clip_range           | 0.2          |
|    entropy_loss         | -3.19        |
|    explained_variance   | 0.955        |
|    learning_rate        | 0.0003       |
|    loss                 | -0.0196      |
|    n_updates            | 960          |
|    policy_gradient_loss | -0.00259     |
|    std                  | 1.2          |
|    value_loss           | 0.00773      |
------------------------------------------
Eval num_timesteps=1600000, episode_reward=-33.49 +/- 36.88
Episode length: 2000.00 +/- 0.00
------------------------------------------
| eval/                   |              |
|    mean_ep_length       | 2e+03        |
|    mean_reward          | -33.5        |
| time/                   |              |
|    total_timesteps      | 1600000      |
| train/                  |              |
|    approx_kl            | 0.0069667175 |
|    clip_fraction        | 0.0741       |
|    clip_range           | 0.2          |
|    entropy_loss         | -3.2         |
|    explained_variance   | 0.94         |
|    learning_rate        | 0.0003       |
|    loss                 | -0.0313      |
|    n_updates            | 970          |
|    policy_gradient_loss | -0.00399     |
|    std                  | 1.2          |
|    value_loss           | 0.00419      |
------------------------------------------
--------------------------------
| time/              |         |
|    fps             | 1810    |
|    iterations      | 98      |
|    time_elapsed    | 886     |
|    total_timesteps | 1605632 |
--------------------------------
------------------------------------------
| time/                   |              |
|    fps                  | 1819         |
|    iterations           | 99           |
|    time_elapsed         | 891          |
|    total_timesteps      | 1622016      |
| train/                  |              |
|    approx_kl            | 0.0061995042 |
|    clip_fraction        | 0.0767       |
|    clip_range           | 0.2          |
|    entropy_loss         | -3.21        |
|    explained_variance   | 0.968        |
|    learning_rate        | 0.0003       |
|    loss                 | -0.036       |
|    n_updates            | 980          |
|    policy_gradient_loss | -0.00289     |
|    std                  | 1.2          |
|    value_loss           | 0.00241      |
------------------------------------------
-----------------------------------------
| time/                   |             |
|    fps                  | 1826        |
|    iterations           | 100         |
|    time_elapsed         | 896         |
|    total_timesteps      | 1638400     |
| train/                  |             |
|    approx_kl            | 0.006502889 |
|    clip_fraction        | 0.0714      |
|    clip_range           | 0.2         |
|    entropy_loss         | -3.22       |
|    explained_variance   | 0.976       |
|    learning_rate        | 0.0003      |
|    loss                 | -0.0445     |
|    n_updates            | 990         |
|    policy_gradient_loss | -0.00314    |
|    std                  | 1.21        |
|    value_loss           | 0.00218     |
-----------------------------------------
Eval num_timesteps=1650000, episode_reward=-38.00 +/- 30.02
Episode length: 2000.00 +/- 0.00
-----------------------------------------
| eval/                   |             |
|    mean_ep_length       | 2e+03       |
|    mean_reward          | -38         |
| time/                   |             |
|    total_timesteps      | 1650000     |
| train/                  |             |
|    approx_kl            | 0.006163503 |
|    clip_fraction        | 0.0739      |
|    clip_range           | 0.2         |
|    entropy_loss         | -3.22       |
|    explained_variance   | 0.955       |
|    learning_rate        | 0.0003      |
|    loss                 | -0.0391     |
|    n_updates            | 1000        |
|    policy_gradient_loss | -0.00257    |
|    std                  | 1.22        |
|    value_loss           | 0.0027      |
-----------------------------------------

[Diag @ 1,650,000 | n_sheep=3 | success=0%]
  NEVER_COMPACT              16/20
  COMPACT_CANT_DRIVE         4/20
  action_mag mean=0.054 p10=0.002 p90=0.051 (0=stopped, 1=full speed)
  min_flock_radius mean=6.63m best=3.72m  (target <5m to compact)
  min_dog_to_com   mean=0.60m best=0.09m  (FLEE_DIST=7m)
  min_com_to_pen   mean=13.17m best=5.44m
  reward/step (mean): progress=+0.0032  alignment=+0.0015  pen_bonus=+0.0005  step_cost=-0.0200  complete=+0.0000
--------------------------------
| time/              |         |
|    fps             | 1793    |
|    iterations      | 101     |
|    time_elapsed    | 922     |
|    total_timesteps | 1654784 |
--------------------------------
----------------------------------------
| time/                   |            |
|    fps                  | 1800       |
|    iterations           | 102        |
|    time_elapsed         | 927        |
|    total_timesteps      | 1671168    |
| train/                  |            |
|    approx_kl            | 0.00634938 |
|    clip_fraction        | 0.073      |
|    clip_range           | 0.2        |
|    entropy_loss         | -3.23      |
|    explained_variance   | 0.97       |
|    learning_rate        | 0.0003     |
|    loss                 | -0.0462    |
|    n_updates            | 1010       |
|    policy_gradient_loss | -0.00394   |
|    std                  | 1.22       |
|    value_loss           | 0.00334    |
----------------------------------------
------------------------------------------
| time/                   |              |
|    fps                  | 1807         |
|    iterations           | 103          |
|    time_elapsed         | 933          |
|    total_timesteps      | 1687552      |
| train/                  |              |
|    approx_kl            | 0.0072235917 |
|    clip_fraction        | 0.0774       |
|    clip_range           | 0.2          |
|    entropy_loss         | -3.23        |
|    explained_variance   | 0.957        |
|    learning_rate        | 0.0003       |
|    loss                 | -0.0284      |
|    n_updates            | 1020         |
|    policy_gradient_loss | -0.00292     |
|    std                  | 1.22         |
|    value_loss           | 0.00807      |
------------------------------------------
Eval num_timesteps=1700000, episode_reward=-32.26 +/- 31.96
Episode length: 2000.00 +/- 0.00
------------------------------------------
| eval/                   |              |
|    mean_ep_length       | 2e+03        |
|    mean_reward          | -32.3        |
| time/                   |              |
|    total_timesteps      | 1700000      |
| train/                  |              |
|    approx_kl            | 0.0060304543 |
|    clip_fraction        | 0.0721       |
|    clip_range           | 0.2          |
|    entropy_loss         | -3.23        |
|    explained_variance   | 0.929        |
|    learning_rate        | 0.0003       |
|    loss                 | -0.0427      |
|    n_updates            | 1030         |
|    policy_gradient_loss | -0.00306     |
|    std                  | 1.21         |
|    value_loss           | 0.00208      |
------------------------------------------
--------------------------------
| time/              |         |
|    fps             | 1789    |
|    iterations      | 104     |
|    time_elapsed    | 952     |
|    total_timesteps | 1703936 |
--------------------------------
-----------------------------------------
| time/                   |             |
|    fps                  | 1795        |
|    iterations           | 105         |
|    time_elapsed         | 958         |
|    total_timesteps      | 1720320     |
| train/                  |             |
|    approx_kl            | 0.006440907 |
|    clip_fraction        | 0.0642      |
|    clip_range           | 0.2         |
|    entropy_loss         | -3.22       |
|    explained_variance   | 0.947       |
|    learning_rate        | 0.0003      |
|    loss                 | -0.0317     |
|    n_updates            | 1040        |
|    policy_gradient_loss | -0.00158    |
|    std                  | 1.21        |
|    value_loss           | 0.00165     |
-----------------------------------------
-----------------------------------------
| time/                   |             |
|    fps                  | 1801        |
|    iterations           | 106         |
|    time_elapsed         | 963         |
|    total_timesteps      | 1736704     |
| train/                  |             |
|    approx_kl            | 0.006897255 |
|    clip_fraction        | 0.0738      |
|    clip_range           | 0.2         |
|    entropy_loss         | -3.2        |
|    explained_variance   | 0.939       |
|    learning_rate        | 0.0003      |
|    loss                 | -0.0408     |
|    n_updates            | 1050        |
|    policy_gradient_loss | -0.00349    |
|    std                  | 1.19        |
|    value_loss           | 0.00814     |
-----------------------------------------
Eval num_timesteps=1750000, episode_reward=-40.58 +/- 28.91
Episode length: 2000.00 +/- 0.00
------------------------------------------
| eval/                   |              |
|    mean_ep_length       | 2e+03        |
|    mean_reward          | -40.6        |
| time/                   |              |
|    total_timesteps      | 1750000      |
| train/                  |              |
|    approx_kl            | 0.0070952754 |
|    clip_fraction        | 0.0742       |
|    clip_range           | 0.2          |
|    entropy_loss         | -3.19        |
|    explained_variance   | 0.957        |
|    learning_rate        | 0.0003       |
|    loss                 | -0.0308      |
|    n_updates            | 1060         |
|    policy_gradient_loss | -0.0037      |
|    std                  | 1.19         |
|    value_loss           | 0.0191       |
------------------------------------------
--------------------------------
| time/              |         |
|    fps             | 1784    |
|    iterations      | 107     |
|    time_elapsed    | 982     |
|    total_timesteps | 1753088 |
--------------------------------
-----------------------------------------
| time/                   |             |
|    fps                  | 1791        |
|    iterations           | 108         |
|    time_elapsed         | 987         |
|    total_timesteps      | 1769472     |
| train/                  |             |
|    approx_kl            | 0.006444447 |
|    clip_fraction        | 0.0736      |
|    clip_range           | 0.2         |
|    entropy_loss         | -3.2        |
|    explained_variance   | 0.968       |
|    learning_rate        | 0.0003      |
|    loss                 | -0.0362     |
|    n_updates            | 1070        |
|    policy_gradient_loss | -0.00409    |
|    std                  | 1.2         |
|    value_loss           | 0.00395     |
-----------------------------------------
-----------------------------------------
| time/                   |             |
|    fps                  | 1797        |
|    iterations           | 109         |
|    time_elapsed         | 993         |
|    total_timesteps      | 1785856     |
| train/                  |             |
|    approx_kl            | 0.007391736 |
|    clip_fraction        | 0.0758      |
|    clip_range           | 0.2         |
|    entropy_loss         | -3.22       |
|    explained_variance   | 0.96        |
|    learning_rate        | 0.0003      |
|    loss                 | -0.0341     |
|    n_updates            | 1080        |
|    policy_gradient_loss | -0.00272    |
|    std                  | 1.21        |
|    value_loss           | 0.00221     |
-----------------------------------------
Eval num_timesteps=1800000, episode_reward=-29.06 +/- 30.98
Episode length: 2000.00 +/- 0.00
-----------------------------------------
| eval/                   |             |
|    mean_ep_length       | 2e+03       |
|    mean_reward          | -29.1       |
| time/                   |             |
|    total_timesteps      | 1800000     |
| train/                  |             |
|    approx_kl            | 0.006899439 |
|    clip_fraction        | 0.0695      |
|    clip_range           | 0.2         |
|    entropy_loss         | -3.25       |
|    explained_variance   | 0.965       |
|    learning_rate        | 0.0003      |
|    loss                 | -0.0317     |
|    n_updates            | 1090        |
|    policy_gradient_loss | -0.00226    |
|    std                  | 1.23        |
|    value_loss           | 0.00615     |
-----------------------------------------

[Diag @ 1,800,000 | n_sheep=3 | success=0%]
  NEVER_COMPACT              11/20
  COMPACT_CANT_DRIVE         9/20
  action_mag mean=0.054 p10=0.003 p90=0.057 (0=stopped, 1=full speed)
  min_flock_radius mean=6.01m best=1.13m  (target <5m to compact)
  min_dog_to_com   mean=0.51m best=0.11m  (FLEE_DIST=7m)
  min_com_to_pen   mean=12.52m best=3.21m
  reward/step (mean): progress=+0.0050  alignment=+0.0017  pen_bonus=+0.0008  step_cost=-0.0200  complete=+0.0000

[Curriculum] leaving stage n_sheep=3 after 600,000 steps | training success rate (last 100 eps) = 0%
[Curriculum] → 4 sheep at step 1,800,000

--------------------------------
| time/              |         |
|    fps             | 1769    |
|    iterations      | 110     |
|    time_elapsed    | 1018    |
|    total_timesteps | 1802240 |
--------------------------------
-----------------------------------------
| time/                   |             |
|    fps                  | 1776        |
|    iterations           | 111         |
|    time_elapsed         | 1023        |
|    total_timesteps      | 1818624     |
| train/                  |             |
|    approx_kl            | 0.006710761 |
|    clip_fraction        | 0.0761      |
|    clip_range           | 0.2         |
|    entropy_loss         | -3.25       |
|    explained_variance   | 0.867       |
|    learning_rate        | 0.0003      |
|    loss                 | -0.031      |
|    n_updates            | 1100        |
|    policy_gradient_loss | -0.00311    |
|    std                  | 1.23        |
|    value_loss           | 0.0186      |
-----------------------------------------
-----------------------------------------
| time/                   |             |
|    fps                  | 1783        |
|    iterations           | 112         |
|    time_elapsed         | 1028        |
|    total_timesteps      | 1835008     |
| train/                  |             |
|    approx_kl            | 0.006202608 |
|    clip_fraction        | 0.0682      |
|    clip_range           | 0.2         |
|    entropy_loss         | -3.25       |
|    explained_variance   | 0.954       |
|    learning_rate        | 0.0003      |
|    loss                 | -0.0245     |
|    n_updates            | 1110        |
|    policy_gradient_loss | -0.00429    |
|    std                  | 1.23        |
|    value_loss           | 0.00641     |
-----------------------------------------
Eval num_timesteps=1850000, episode_reward=-35.87 +/- 42.36
Episode length: 2000.00 +/- 0.00
-----------------------------------------
| eval/                   |             |
|    mean_ep_length       | 2e+03       |
|    mean_reward          | -35.9       |
| time/                   |             |
|    total_timesteps      | 1850000     |
| train/                  |             |
|    approx_kl            | 0.008398036 |
|    clip_fraction        | 0.086       |
|    clip_range           | 0.2         |
|    entropy_loss         | -3.28       |
|    explained_variance   | 0.938       |
|    learning_rate        | 0.0003      |
|    loss                 | -0.0514     |
|    n_updates            | 1120        |
|    policy_gradient_loss | -0.00497    |
|    std                  | 1.25        |
|    value_loss           | 0.00614     |
-----------------------------------------
--------------------------------
| time/              |         |
|    fps             | 1768    |
|    iterations      | 113     |
|    time_elapsed    | 1046    |
|    total_timesteps | 1851392 |
--------------------------------
-----------------------------------------
| time/                   |             |
|    fps                  | 1775        |
|    iterations           | 114         |
|    time_elapsed         | 1052        |
|    total_timesteps      | 1867776     |
| train/                  |             |
|    approx_kl            | 0.007641702 |
|    clip_fraction        | 0.0742      |
|    clip_range           | 0.2         |
|    entropy_loss         | -3.31       |
|    explained_variance   | 0.935       |
|    learning_rate        | 0.0003      |
|    loss                 | -0.046      |
|    n_updates            | 1130        |
|    policy_gradient_loss | -0.00349    |
|    std                  | 1.28        |
|    value_loss           | 0.0228      |
-----------------------------------------
------------------------------------------
| time/                   |              |
|    fps                  | 1781         |
|    iterations           | 115          |
|    time_elapsed         | 1057         |
|    total_timesteps      | 1884160      |
| train/                  |              |
|    approx_kl            | 0.0073437546 |
|    clip_fraction        | 0.0747       |
|    clip_range           | 0.2          |
|    entropy_loss         | -3.34        |
|    explained_variance   | 0.928        |
|    learning_rate        | 0.0003       |
|    loss                 | -0.0498      |
|    n_updates            | 1140         |
|    policy_gradient_loss | -0.00496     |
|    std                  | 1.29         |
|    value_loss           | 0.00764      |
------------------------------------------
Eval num_timesteps=1900000, episode_reward=-41.88 +/- 27.01
Episode length: 2000.00 +/- 0.00
-----------------------------------------
| eval/                   |             |
|    mean_ep_length       | 2e+03       |
|    mean_reward          | -41.9       |
| time/                   |             |
|    total_timesteps      | 1900000     |
| train/                  |             |
|    approx_kl            | 0.006885264 |
|    clip_fraction        | 0.0728      |
|    clip_range           | 0.2         |
|    entropy_loss         | -3.36       |
|    explained_variance   | 0.934       |
|    learning_rate        | 0.0003      |
|    loss                 | -0.0503     |
|    n_updates            | 1150        |
|    policy_gradient_loss | -0.00384    |
|    std                  | 1.3         |
|    value_loss           | 0.00423     |
-----------------------------------------
--------------------------------
| time/              |         |
|    fps             | 1767    |
|    iterations      | 116     |
|    time_elapsed    | 1075    |
|    total_timesteps | 1900544 |
--------------------------------
------------------------------------------
| time/                   |              |
|    fps                  | 1773         |
|    iterations           | 117          |
|    time_elapsed         | 1080         |
|    total_timesteps      | 1916928      |
| train/                  |              |
|    approx_kl            | 0.0077611385 |
|    clip_fraction        | 0.0792       |
|    clip_range           | 0.2          |
|    entropy_loss         | -3.38        |
|    explained_variance   | 0.931        |
|    learning_rate        | 0.0003       |
|    loss                 | -0.0374      |
|    n_updates            | 1160         |
|    policy_gradient_loss | -0.00399     |
|    std                  | 1.31         |
|    value_loss           | 0.00292      |
------------------------------------------
-----------------------------------------
| time/                   |             |
|    fps                  | 1780        |
|    iterations           | 118         |
|    time_elapsed         | 1085        |
|    total_timesteps      | 1933312     |
| train/                  |             |
|    approx_kl            | 0.006831214 |
|    clip_fraction        | 0.0758      |
|    clip_range           | 0.2         |
|    entropy_loss         | -3.4        |
|    explained_variance   | 0.963       |
|    learning_rate        | 0.0003      |
|    loss                 | -0.0175     |
|    n_updates            | 1170        |
|    policy_gradient_loss | -0.00471    |
|    std                  | 1.33        |
|    value_loss           | 0.00235     |
-----------------------------------------
-----------------------------------------
| time/                   |             |
|    fps                  | 1786        |
|    iterations           | 119         |
|    time_elapsed         | 1091        |
|    total_timesteps      | 1949696     |
| train/                  |             |
|    approx_kl            | 0.006474304 |
|    clip_fraction        | 0.0666      |
|    clip_range           | 0.2         |
|    entropy_loss         | -3.43       |
|    explained_variance   | 0.931       |
|    learning_rate        | 0.0003      |
|    loss                 | -0.0318     |
|    n_updates            | 1180        |
|    policy_gradient_loss | -0.00285    |
|    std                  | 1.35        |
|    value_loss           | 0.00699     |
-----------------------------------------
Eval num_timesteps=1950000, episode_reward=-35.80 +/- 28.95
Episode length: 2000.00 +/- 0.00
-----------------------------------------
| eval/                   |             |
|    mean_ep_length       | 2e+03       |
|    mean_reward          | -35.8       |
| time/                   |             |
|    total_timesteps      | 1950000     |
| train/                  |             |
|    approx_kl            | 0.008532442 |
|    clip_fraction        | 0.0746      |
|    clip_range           | 0.2         |
|    entropy_loss         | -3.43       |
|    explained_variance   | 0.958       |
|    learning_rate        | 0.0003      |
|    loss                 | -0.00337    |
|    n_updates            | 1190        |
|    policy_gradient_loss | -0.00376    |
|    std                  | 1.34        |
|    value_loss           | 0.0156      |
-----------------------------------------

[Diag @ 1,950,000 | n_sheep=4 | success=0%]
  NEVER_COMPACT              19/20
  COMPACT_CANT_DRIVE         1/20
  action_mag mean=0.049 p10=0.007 p90=0.044 (0=stopped, 1=full speed)
  min_flock_radius mean=8.95m best=4.96m  (target <5m to compact)
  min_dog_to_com   mean=0.39m best=0.07m  (FLEE_DIST=7m)
  min_com_to_pen   mean=14.18m best=9.30m
  reward/step (mean): progress=-0.0121  alignment=+0.0010  pen_bonus=+0.0005  step_cost=-0.0200  complete=+0.0000
--------------------------------
| time/              |         |
|    fps             | 1759    |
|    iterations      | 120     |
|    time_elapsed    | 1117    |
|    total_timesteps | 1966080 |
--------------------------------
-----------------------------------------
| time/                   |             |
|    fps                  | 1766        |
|    iterations           | 121         |
|    time_elapsed         | 1122        |
|    total_timesteps      | 1982464     |
| train/                  |             |
|    approx_kl            | 0.006549825 |
|    clip_fraction        | 0.0665      |
|    clip_range           | 0.2         |
|    entropy_loss         | -3.43       |
|    explained_variance   | 0.966       |
|    learning_rate        | 0.0003      |
|    loss                 | -0.0345     |
|    n_updates            | 1200        |
|    policy_gradient_loss | -0.00349    |
|    std                  | 1.34        |
|    value_loss           | 0.00315     |
-----------------------------------------
------------------------------------------
| time/                   |              |
|    fps                  | 1773         |
|    iterations           | 122          |
|    time_elapsed         | 1127         |
|    total_timesteps      | 1998848      |
| train/                  |              |
|    approx_kl            | 0.0062008686 |
|    clip_fraction        | 0.0699       |
|    clip_range           | 0.2          |
|    entropy_loss         | -3.44        |
|    explained_variance   | 0.959        |
|    learning_rate        | 0.0003       |
|    loss                 | -0.0512      |
|    n_updates            | 1210         |
|    policy_gradient_loss | -0.00291     |
|    std                  | 1.35         |
|    value_loss           | 0.00544      |
------------------------------------------
Eval num_timesteps=2000000, episode_reward=-45.28 +/- 26.78
Episode length: 2000.00 +/- 0.00
-----------------------------------------
| eval/                   |             |
|    mean_ep_length       | 2e+03       |
|    mean_reward          | -45.3       |
| time/                   |             |
|    total_timesteps      | 2000000     |
| train/                  |             |
|    approx_kl            | 0.006553275 |
|    clip_fraction        | 0.0739      |
|    clip_range           | 0.2         |
|    entropy_loss         | -3.45       |
|    explained_variance   | 0.924       |
|    learning_rate        | 0.0003      |
|    loss                 | -0.0416     |
|    n_updates            | 1220        |
|    policy_gradient_loss | -0.00427    |
|    std                  | 1.36        |
|    value_loss           | 0.0127      |
-----------------------------------------
--------------------------------
| time/              |         |
|    fps             | 1761    |
|    iterations      | 123     |
|    time_elapsed    | 1144    |
|    total_timesteps | 2015232 |
--------------------------------
------------------------------------------
| time/                   |              |
|    fps                  | 1767         |
|    iterations           | 124          |
|    time_elapsed         | 1149         |
|    total_timesteps      | 2031616      |
| train/                  |              |
|    approx_kl            | 0.0059226304 |
|    clip_fraction        | 0.0653       |
|    clip_range           | 0.2          |
|    entropy_loss         | -3.46        |
|    explained_variance   | 0.947        |
|    learning_rate        | 0.0003       |
|    loss                 | -0.025       |
|    n_updates            | 1230         |
|    policy_gradient_loss | -0.00273     |
|    std                  | 1.36         |
|    value_loss           | 0.00879      |
------------------------------------------
------------------------------------------
| time/                   |              |
|    fps                  | 1775         |
|    iterations           | 125          |
|    time_elapsed         | 1153         |
|    total_timesteps      | 2048000      |
| train/                  |              |
|    approx_kl            | 0.0076779695 |
|    clip_fraction        | 0.0729       |
|    clip_range           | 0.2          |
|    entropy_loss         | -3.47        |
|    explained_variance   | 0.931        |
|    learning_rate        | 0.0003       |
|    loss                 | -0.0382      |
|    n_updates            | 1240         |
|    policy_gradient_loss | -0.00385     |
|    std                  | 1.37         |
|    value_loss           | 0.00692      |
------------------------------------------
Eval num_timesteps=2050000, episode_reward=-44.22 +/- 28.52
Episode length: 2000.00 +/- 0.00
------------------------------------------
| eval/                   |              |
|    mean_ep_length       | 2e+03        |
|    mean_reward          | -44.2        |
| time/                   |              |
|    total_timesteps      | 2050000      |
| train/                  |              |
|    approx_kl            | 0.0073502595 |
|    clip_fraction        | 0.0822       |
|    clip_range           | 0.2          |
|    entropy_loss         | -3.49        |
|    explained_variance   | 0.946        |
|    learning_rate        | 0.0003       |
|    loss                 | -0.0342      |
|    n_updates            | 1250         |
|    policy_gradient_loss | -0.00592     |
|    std                  | 1.39         |
|    value_loss           | 0.00555      |
------------------------------------------
--------------------------------
| time/              |         |
|    fps             | 1764    |
|    iterations      | 126     |
|    time_elapsed    | 1170    |
|    total_timesteps | 2064384 |
--------------------------------
-----------------------------------------
| time/                   |             |
|    fps                  | 1770        |
|    iterations           | 127         |
|    time_elapsed         | 1175        |
|    total_timesteps      | 2080768     |
| train/                  |             |
|    approx_kl            | 0.006628736 |
|    clip_fraction        | 0.0767      |
|    clip_range           | 0.2         |
|    entropy_loss         | -3.51       |
|    explained_variance   | 0.95        |
|    learning_rate        | 0.0003      |
|    loss                 | -0.035      |
|    n_updates            | 1260        |
|    policy_gradient_loss | -0.00457    |
|    std                  | 1.4         |
|    value_loss           | 0.00416     |
-----------------------------------------
------------------------------------------
| time/                   |              |
|    fps                  | 1776         |
|    iterations           | 128          |
|    time_elapsed         | 1180         |
|    total_timesteps      | 2097152      |
| train/                  |              |
|    approx_kl            | 0.0068027405 |
|    clip_fraction        | 0.0719       |
|    clip_range           | 0.2          |
|    entropy_loss         | -3.53        |
|    explained_variance   | 0.891        |
|    learning_rate        | 0.0003       |
|    loss                 | -0.0391      |
|    n_updates            | 1270         |
|    policy_gradient_loss | -0.00312     |
|    std                  | 1.42         |
|    value_loss           | 0.00492      |
------------------------------------------
Eval num_timesteps=2100000, episode_reward=-39.37 +/- 34.76
Episode length: 2000.00 +/- 0.00
-----------------------------------------
| eval/                   |             |
|    mean_ep_length       | 2e+03       |
|    mean_reward          | -39.4       |
| time/                   |             |
|    total_timesteps      | 2100000     |
| train/                  |             |
|    approx_kl            | 0.005523986 |
|    clip_fraction        | 0.0604      |
|    clip_range           | 0.2         |
|    entropy_loss         | -3.54       |
|    explained_variance   | 0.938       |
|    learning_rate        | 0.0003      |
|    loss                 | -0.0364     |
|    n_updates            | 1280        |
|    policy_gradient_loss | -0.00281    |
|    std                  | 1.42        |
|    value_loss           | 0.015       |
-----------------------------------------

[Diag @ 2,100,000 | n_sheep=4 | success=0%]
  NEVER_COMPACT              20/20
  action_mag mean=0.047 p10=0.002 p90=0.041 (0=stopped, 1=full speed)
  min_flock_radius mean=8.62m best=5.89m  (target <5m to compact)
  min_dog_to_com   mean=0.46m best=0.04m  (FLEE_DIST=7m)
  min_com_to_pen   mean=14.19m best=7.53m
  reward/step (mean): progress=-0.0012  alignment=+0.0012  pen_bonus=+0.0010  step_cost=-0.0200  complete=+0.0000
--------------------------------
| time/              |         |
|    fps             | 1751    |
|    iterations      | 129     |
|    time_elapsed    | 1206    |
|    total_timesteps | 2113536 |
--------------------------------
-----------------------------------------
| time/                   |             |
|    fps                  | 1756        |
|    iterations           | 130         |
|    time_elapsed         | 1212        |
|    total_timesteps      | 2129920     |
| train/                  |             |
|    approx_kl            | 0.007766474 |
|    clip_fraction        | 0.0823      |
|    clip_range           | 0.2         |
|    entropy_loss         | -3.53       |
|    explained_variance   | 0.96        |
|    learning_rate        | 0.0003      |
|    loss                 | -0.0396     |
|    n_updates            | 1290        |
|    policy_gradient_loss | -0.00492    |
|    std                  | 1.41        |
|    value_loss           | 0.00554     |
-----------------------------------------
-----------------------------------------
| time/                   |             |
|    fps                  | 1762        |
|    iterations           | 131         |
|    time_elapsed         | 1217        |
|    total_timesteps      | 2146304     |
| train/                  |             |
|    approx_kl            | 0.006704482 |
|    clip_fraction        | 0.0748      |
|    clip_range           | 0.2         |
|    entropy_loss         | -3.53       |
|    explained_variance   | 0.97        |
|    learning_rate        | 0.0003      |
|    loss                 | -0.0466     |
|    n_updates            | 1300        |
|    policy_gradient_loss | -0.00339    |
|    std                  | 1.42        |
|    value_loss           | 0.00432     |
-----------------------------------------
Eval num_timesteps=2150000, episode_reward=-43.17 +/- 26.95
Episode length: 2000.00 +/- 0.00
------------------------------------------
| eval/                   |              |
|    mean_ep_length       | 2e+03        |
|    mean_reward          | -43.2        |
| time/                   |              |
|    total_timesteps      | 2150000      |
| train/                  |              |
|    approx_kl            | 0.0065447316 |
|    clip_fraction        | 0.0751       |
|    clip_range           | 0.2          |
|    entropy_loss         | -3.53        |
|    explained_variance   | 0.888        |
|    learning_rate        | 0.0003       |
|    loss                 | -0.0369      |
|    n_updates            | 1310         |
|    policy_gradient_loss | -0.00369     |
|    std                  | 1.41         |
|    value_loss           | 0.0165       |
------------------------------------------
--------------------------------
| time/              |         |
|    fps             | 1750    |
|    iterations      | 132     |
|    time_elapsed    | 1235    |
|    total_timesteps | 2162688 |
--------------------------------
------------------------------------------
| time/                   |              |
|    fps                  | 1755         |
|    iterations           | 133          |
|    time_elapsed         | 1241         |
|    total_timesteps      | 2179072      |
| train/                  |              |
|    approx_kl            | 0.0070872563 |
|    clip_fraction        | 0.075        |
|    clip_range           | 0.2          |
|    entropy_loss         | -3.54        |
|    explained_variance   | 0.954        |
|    learning_rate        | 0.0003       |
|    loss                 | -0.0427      |
|    n_updates            | 1320         |
|    policy_gradient_loss | -0.00406     |
|    std                  | 1.42         |
|    value_loss           | 0.00977      |
------------------------------------------
------------------------------------------
| time/                   |              |
|    fps                  | 1762         |
|    iterations           | 134          |
|    time_elapsed         | 1245         |
|    total_timesteps      | 2195456      |
| train/                  |              |
|    approx_kl            | 0.0073371828 |
|    clip_fraction        | 0.077        |
|    clip_range           | 0.2          |
|    entropy_loss         | -3.55        |
|    explained_variance   | 0.939        |
|    learning_rate        | 0.0003       |
|    loss                 | -0.0303      |
|    n_updates            | 1330         |
|    policy_gradient_loss | -0.00371     |
|    std                  | 1.43         |
|    value_loss           | 0.00862      |
------------------------------------------
Eval num_timesteps=2200000, episode_reward=-40.81 +/- 44.39
Episode length: 2000.00 +/- 0.00
------------------------------------------
| eval/                   |              |
|    mean_ep_length       | 2e+03        |
|    mean_reward          | -40.8        |
| time/                   |              |
|    total_timesteps      | 2200000      |
| train/                  |              |
|    approx_kl            | 0.0072064474 |
|    clip_fraction        | 0.0714       |
|    clip_range           | 0.2          |
|    entropy_loss         | -3.58        |
|    explained_variance   | 0.951        |
|    learning_rate        | 0.0003       |
|    loss                 | -0.0517      |
|    n_updates            | 1340         |
|    policy_gradient_loss | -0.00405     |
|    std                  | 1.45         |
|    value_loss           | 0.00351      |
------------------------------------------
--------------------------------
| time/              |         |
|    fps             | 1751    |
|    iterations      | 135     |
|    time_elapsed    | 1262    |
|    total_timesteps | 2211840 |
--------------------------------
-----------------------------------------
| time/                   |             |
|    fps                  | 1758        |
|    iterations           | 136         |
|    time_elapsed         | 1267        |
|    total_timesteps      | 2228224     |
| train/                  |             |
|    approx_kl            | 0.008551812 |
|    clip_fraction        | 0.0911      |
|    clip_range           | 0.2         |
|    entropy_loss         | -3.58       |
|    explained_variance   | 0.929       |
|    learning_rate        | 0.0003      |
|    loss                 | -0.0258     |
|    n_updates            | 1350        |
|    policy_gradient_loss | -0.00599    |
|    std                  | 1.45        |
|    value_loss           | 0.0034      |
-----------------------------------------
-----------------------------------------
| time/                   |             |
|    fps                  | 1764        |
|    iterations           | 137         |
|    time_elapsed         | 1271        |
|    total_timesteps      | 2244608     |
| train/                  |             |
|    approx_kl            | 0.006960677 |
|    clip_fraction        | 0.0702      |
|    clip_range           | 0.2         |
|    entropy_loss         | -3.59       |
|    explained_variance   | 0.9         |
|    learning_rate        | 0.0003      |
|    loss                 | -0.0396     |
|    n_updates            | 1360        |
|    policy_gradient_loss | -0.00412    |
|    std                  | 1.46        |
|    value_loss           | 0.00429     |
-----------------------------------------
Eval num_timesteps=2250000, episode_reward=-37.92 +/- 31.68
Episode length: 2000.00 +/- 0.00
-----------------------------------------
| eval/                   |             |
|    mean_ep_length       | 2e+03       |
|    mean_reward          | -37.9       |
| time/                   |             |
|    total_timesteps      | 2250000     |
| train/                  |             |
|    approx_kl            | 0.005949891 |
|    clip_fraction        | 0.0683      |
|    clip_range           | 0.2         |
|    entropy_loss         | -3.59       |
|    explained_variance   | 0.948       |
|    learning_rate        | 0.0003      |
|    loss                 | -0.0381     |
|    n_updates            | 1370        |
|    policy_gradient_loss | -0.00328    |
|    std                  | 1.46        |
|    value_loss           | 0.0113      |
-----------------------------------------

[Diag @ 2,250,000 | n_sheep=4 | success=0%]
  NEVER_COMPACT              19/20
  COMPACT_CANT_DRIVE         1/20
  action_mag mean=0.068 p10=0.004 p90=0.045 (0=stopped, 1=full speed)
  min_flock_radius mean=7.87m best=3.57m  (target <5m to compact)
  min_dog_to_com   mean=0.45m best=0.15m  (FLEE_DIST=7m)
  min_com_to_pen   mean=14.06m best=6.95m
  reward/step (mean): progress=-0.0035  alignment=+0.0020  pen_bonus=+0.0008  step_cost=-0.0200  complete=+0.0000
--------------------------------
| time/              |         |
|    fps             | 1743    |
|    iterations      | 138     |
|    time_elapsed    | 1297    |
|    total_timesteps | 2260992 |
--------------------------------
------------------------------------------
| time/                   |              |
|    fps                  | 1749         |
|    iterations           | 139          |
|    time_elapsed         | 1301         |
|    total_timesteps      | 2277376      |
| train/                  |              |
|    approx_kl            | 0.0071727796 |
|    clip_fraction        | 0.0784       |
|    clip_range           | 0.2          |
|    entropy_loss         | -3.6         |
|    explained_variance   | 0.943        |
|    learning_rate        | 0.0003       |
|    loss                 | -0.0387      |
|    n_updates            | 1380         |
|    policy_gradient_loss | -0.0042      |
|    std                  | 1.46         |
|    value_loss           | 0.0113       |
------------------------------------------
-----------------------------------------
| time/                   |             |
|    fps                  | 1755        |
|    iterations           | 140         |
|    time_elapsed         | 1306        |
|    total_timesteps      | 2293760     |
| train/                  |             |
|    approx_kl            | 0.006800391 |
|    clip_fraction        | 0.0662      |
|    clip_range           | 0.2         |
|    entropy_loss         | -3.59       |
|    explained_variance   | 0.931       |
|    learning_rate        | 0.0003      |
|    loss                 | -0.0283     |
|    n_updates            | 1390        |
|    policy_gradient_loss | -0.00421    |
|    std                  | 1.46        |
|    value_loss           | 0.00659     |
-----------------------------------------
Eval num_timesteps=2300000, episode_reward=-47.47 +/- 37.24
Episode length: 2000.00 +/- 0.00
-----------------------------------------
| eval/                   |             |
|    mean_ep_length       | 2e+03       |
|    mean_reward          | -47.5       |
| time/                   |             |
|    total_timesteps      | 2300000     |
| train/                  |             |
|    approx_kl            | 0.008103053 |
|    clip_fraction        | 0.081       |
|    clip_range           | 0.2         |
|    entropy_loss         | -3.59       |
|    explained_variance   | 0.945       |
|    learning_rate        | 0.0003      |
|    loss                 | -0.0433     |
|    n_updates            | 1400        |
|    policy_gradient_loss | -0.00404    |
|    std                  | 1.46        |
|    value_loss           | 0.00796     |
-----------------------------------------
--------------------------------
| time/              |         |
|    fps             | 1745    |
|    iterations      | 141     |
|    time_elapsed    | 1323    |
|    total_timesteps | 2310144 |
--------------------------------
------------------------------------------
| time/                   |              |
|    fps                  | 1751         |
|    iterations           | 142          |
|    time_elapsed         | 1328         |
|    total_timesteps      | 2326528      |
| train/                  |              |
|    approx_kl            | 0.0061590094 |
|    clip_fraction        | 0.066        |
|    clip_range           | 0.2          |
|    entropy_loss         | -3.61        |
|    explained_variance   | 0.957        |
|    learning_rate        | 0.0003       |
|    loss                 | -0.0436      |
|    n_updates            | 1410         |
|    policy_gradient_loss | -0.00287     |
|    std                  | 1.47         |
|    value_loss           | 0.0102       |
------------------------------------------
------------------------------------------
| time/                   |              |
|    fps                  | 1757         |
|    iterations           | 143          |
|    time_elapsed         | 1332         |
|    total_timesteps      | 2342912      |
| train/                  |              |
|    approx_kl            | 0.0070403973 |
|    clip_fraction        | 0.0733       |
|    clip_range           | 0.2          |
|    entropy_loss         | -3.62        |
|    explained_variance   | 0.863        |
|    learning_rate        | 0.0003       |
|    loss                 | -0.0356      |
|    n_updates            | 1420         |
|    policy_gradient_loss | -0.00525     |
|    std                  | 1.48         |
|    value_loss           | 0.0103       |
------------------------------------------
Eval num_timesteps=2350000, episode_reward=-47.95 +/- 27.60
Episode length: 2000.00 +/- 0.00
-----------------------------------------
| eval/                   |             |
|    mean_ep_length       | 2e+03       |
|    mean_reward          | -48         |
| time/                   |             |
|    total_timesteps      | 2350000     |
| train/                  |             |
|    approx_kl            | 0.007505033 |
|    clip_fraction        | 0.0729      |
|    clip_range           | 0.2         |
|    entropy_loss         | -3.64       |
|    explained_variance   | 0.94        |
|    learning_rate        | 0.0003      |
|    loss                 | -0.0473     |
|    n_updates            | 1430        |
|    policy_gradient_loss | -0.00385    |
|    std                  | 1.5         |
|    value_loss           | 0.00449     |
-----------------------------------------
--------------------------------
| time/              |         |
|    fps             | 1747    |
|    iterations      | 144     |
|    time_elapsed    | 1350    |
|    total_timesteps | 2359296 |
--------------------------------
----------------------------------------
| time/                   |            |
|    fps                  | 1752       |
|    iterations           | 145        |
|    time_elapsed         | 1355       |
|    total_timesteps      | 2375680    |
| train/                  |            |
|    approx_kl            | 0.00724002 |
|    clip_fraction        | 0.0739     |
|    clip_range           | 0.2        |
|    entropy_loss         | -3.65      |
|    explained_variance   | 0.948      |
|    learning_rate        | 0.0003     |
|    loss                 | -0.0419    |
|    n_updates            | 1440       |
|    policy_gradient_loss | -0.00426   |
|    std                  | 1.5        |
|    value_loss           | 0.00886    |
----------------------------------------
-----------------------------------------
| time/                   |             |
|    fps                  | 1758        |
|    iterations           | 146         |
|    time_elapsed         | 1360        |
|    total_timesteps      | 2392064     |
| train/                  |             |
|    approx_kl            | 0.007578165 |
|    clip_fraction        | 0.0713      |
|    clip_range           | 0.2         |
|    entropy_loss         | -3.64       |
|    explained_variance   | 0.859       |
|    learning_rate        | 0.0003      |
|    loss                 | -0.0427     |
|    n_updates            | 1450        |
|    policy_gradient_loss | -0.0049     |
|    std                  | 1.49        |
|    value_loss           | 0.00429     |
-----------------------------------------
Eval num_timesteps=2400000, episode_reward=-47.88 +/- 34.39
Episode length: 2000.00 +/- 0.00
-----------------------------------------
| eval/                   |             |
|    mean_ep_length       | 2e+03       |
|    mean_reward          | -47.9       |
| time/                   |             |
|    total_timesteps      | 2400000     |
| train/                  |             |
|    approx_kl            | 0.006707498 |
|    clip_fraction        | 0.0692      |
|    clip_range           | 0.2         |
|    entropy_loss         | -3.65       |
|    explained_variance   | 0.861       |
|    learning_rate        | 0.0003      |
|    loss                 | -0.0426     |
|    n_updates            | 1460        |
|    policy_gradient_loss | -0.00411    |
|    std                  | 1.5         |
|    value_loss           | 0.00639     |
-----------------------------------------

[Diag @ 2,400,000 | n_sheep=4 | success=0%]
  NEVER_COMPACT              19/20
  COMPACT_CANT_DRIVE         1/20
  action_mag mean=0.052 p10=0.005 p90=0.045 (0=stopped, 1=full speed)
  min_flock_radius mean=8.79m best=3.32m  (target <5m to compact)
  min_dog_to_com   mean=0.45m best=0.20m  (FLEE_DIST=7m)
  min_com_to_pen   mean=13.96m best=9.02m
  reward/step (mean): progress=-0.0047  alignment=+0.0013  pen_bonus=+0.0005  step_cost=-0.0200  complete=+0.0000
--------------------------------
| time/              |         |
|    fps             | 1737    |
|    iterations      | 147     |
|    time_elapsed    | 1386    |
|    total_timesteps | 2408448 |
--------------------------------

Training complete. Artefacts saved to runs/ppo_fix_check2/