TIR_PROJ/training/runs/sweep_smoke.log

Sweep dir: runs/sweep_20260425_124021
Search space: ['W_PER_SHEEP', 'W_ALIGN', 'W_PEN_BONUS', 'W_STEP_COST', 'W_COMPLETE', 'W_COMPACT', 'ALIGN_SHAPE', 'ALIGN_GATED', 'ent_coef']
Per-trial: 1,000,000 steps train + 30 eval eps
Time budget: 0.5h

[Trial   1] {'W_PER_SHEEP': 1.0, 'W_ALIGN': 0.1, 'W_PEN_BONUS': 10.0, 'W_STEP_COST': 0.02, 'W_COMPLETE': 100.0, 'W_COMPACT': 3.0, 'ALIGN_SHAPE': 'standoff', 'ALIGN_GATED': False, 'ent_coef': 0.005}
           ... [trial 1 | 1 sheep |  50,000 steps | ret(last 32)=-8.33  sr=6%]
           ... [trial 1 | 1 sheep | 100,000 steps | ret(last 50)=-2.95  sr=6%]
           ... [trial 1 | 1 sheep | 150,000 steps | ret(last 50)=+12.68  sr=10%]
           ... [trial 1 | 1 sheep | 200,000 steps | ret(last 50)=+22.15  sr=22%]
           ... [trial 1 | 1 sheep | 250,000 steps | ret(last 50)=+22.47  sr=18%]
           ... [trial 1 | 1 sheep | 300,000 steps | ret(last 50)=+23.58  sr=24%]
           ... [trial 1 | 1 sheep | 350,000 steps | ret(last 50)=+23.42  sr=18%]
           ... [trial 1 | 1 sheep | 400,000 steps | ret(last 50)=+24.39  sr=32%]
           ... [trial 1 | 2 sheep | 409,608 steps | ret(last 0)=+nan  sr=nan%]
           ... [trial 1 | 2 sheep | 459,608 steps | ret(last 35)=+15.39  sr=3%]
           ... [trial 1 | 2 sheep | 509,608 steps | ret(last 50)=+20.25  sr=0%]
           ... [trial 1 | 2 sheep | 559,608 steps | ret(last 50)=+23.24  sr=4%]
           ... [trial 1 | 2 sheep | 609,608 steps | ret(last 50)=+23.36  sr=4%]
           ... [trial 1 | 2 sheep | 659,608 steps | ret(last 50)=+25.32  sr=2%]
           ... [trial 1 | 2 sheep | 709,608 steps | ret(last 50)=+24.02  sr=4%]
           ... [trial 1 | 2 sheep | 759,608 steps | ret(last 50)=+24.66  sr=4%]
           ... [trial 1 | 2 sheep | 809,608 steps | ret(last 50)=+25.41  sr=4%]
           ... [trial 1 | 2 sheep | 859,608 steps | ret(last 50)=+24.27  sr=4%]
           ... [trial 1 | 2 sheep | 909,608 steps | ret(last 50)=+25.13  sr=8%]
           ... [trial 1 | 2 sheep | 959,608 steps | ret(last 50)=+25.10  sr=2%]
           ... [trial 1 | 2 sheep | 1,009,608 steps | ret(last 50)=+26.02  sr=2%]
           ... [trial 1 | eval n=1]
           ... [trial 1 | eval n=2]
           ... [trial 1 | eval n=3]
           → score=0.060  sr1=0.30 sr2=0.00 sr3=0.00  [308s]

============================================================================================
  LEADERBOARD
============================================================================================
  rank  score   sr1   sr2   sr3  config
  ----------------------------------------------------------------------------------------
     1  0.060  0.30  0.00  0.00  W_PER_SHEEP=1.0 W_ALIGN=0.1 W_PEN_BONUS=10.0 W_STEP_COST=0.02 W_COMPLETE=100.0 W_COMPACT=3.0 ALIGN_SHAPE=standoff ALIGN_GATED=False ent_coef=0.005

  Best config saved to runs/sweep_20260425_124021/best.json
  Total trials: 1 (1 successful, 0 failed)
  Total time:   0.09h