Checkpoint 6

2026-05-11 10:35:48 +01:00
parent b457155538
commit fce0e0c786
27 changed files with 194 additions and 704 deletions
@@ -12,7 +12,7 @@ gate into an external pen. The dog has three deployable modes:
 | `rl` | KL-regularised PPO fine-tune of `bc` | Reward-driven refinement |

 `sequential` (single-target pin-and-push) is kept as an alternative
-analytic baseline. `dagger` is a data-collection mode, not deployment.
+analytic baseline.

 ## Perception

@@ -28,13 +28,13 @@ control step:
   (`herding/sheep_tracker.py`).

 **LiDAR validation** (intermediate-goal item v from `docs/project.md`):
-run the dog controller in `HERDING_MODE=diag` mode to capture 80
-real Webots scans plus the ground-truth sheep positions in
-`training/dagger/diag_<ts>.npz`. Comparing detections against GT in
-that file showed clustered centroids match GT positions within 0.15 m
-after the +SHEEP_RADIUS surface-to-centre correction — i.e. the
-LiDAR pipeline produces correct sheep-position estimates from the
-real Webots scan, validating the sensor for the herding task.
+during development a diagnostic-dump controller captured 80 real
+Webots scans plus the ground-truth sheep positions. Comparing
+detections against GT showed clustered centroids match GT positions
+within 0.15 m after the +SHEEP_RADIUS surface-to-centre correction —
+i.e. the LiDAR pipeline produces correct sheep-position estimates
+from the real Webots scan, validating the sensor for the herding
+task.

 The tracker outputs a `{name: (x, y)}` dict shaped exactly like the
 prior receiver-based one, so Strömbom, Sequential, and the BC obs
@@ -53,7 +53,7 @@ Privileged ground-truth perception is available for ablation —
 pip install -r training/requirements.txt

 # 2. Smoke test
-python -m training.parity_test
+python -m tests.parity_test

 # 3. Reproduce the BC policy (~10 min on CPU: ~5 min demos + ~3 min BC)
 python -m tools.collect_demos --teacher strombom \
@@ -61,21 +61,17 @@ python -m tools.collect_demos --teacher strombom \
 python -m training.bc_pretrain --demos training/demos.npz \
    --out training/runs/bc --epochs 60 --net-arch 512,512

-# 4. Optional: DAgger from inside Webots if sim-trained doesn't transfer
-tools/auto_dagger.sh 3 60
-python -m tools.dagger_merge_train --out training/runs/bc_dagger
-
-# 5. Evaluate (env)
-python -m training.eval --policy training/runs/bc \
-    --max-flock 10 --max-steps 8000 --n-seeds 5
-
-# 6. Optional RL fine-tune of the BC policy (~40 min on CPU, 1 M steps)
+# 4. KL-PPO fine-tune of the BC policy (~30 min on CPU, 1 M steps)
 python -m training.train_ppo \
    --bc training/runs/bc \
    --out training/runs/rl \
    --total-timesteps 1000000

-# 7. Run in Webots
+# 5. Evaluate (env)
+python -m training.eval --policy training/runs/rl \
+    --max-flock 10 --max-steps 15000 --n-seeds 10
+
+# 6. Run in Webots
 tools/run_webots.sh 10 bc          # behaviour-cloned MLP
 tools/run_webots.sh 10 rl          # KL-PPO fine-tune
 tools/run_webots.sh 10 strombom    # analytic baseline
@@ -84,22 +80,25 @@ tools/run_webots.sh 10 strombom    # analytic baseline
 ## Layout

 ```
-herding/                  — single source of truth (env + Webots both import)
-  geometry.py             — field/pen constants, robot specs
-  flocking_sim.py         — Reynolds-style sheep dynamics
-  diffdrive.py            — differential-drive kinematics
-  control.py              — shared near-sheep speed-modulation helper
+herding/                  — perception / control / world primitives
  obs.py                  — 32-D order-invariant observation builder
-  strombom.py             — canonical CoM-drive teacher
-  sequential.py           — single-target "pin-and-push" teacher
-  active_scan.py          — wraps a base teacher with opening rotation +
-                            walk-to-centre + speed modulation
-  lidar_sim.py            — fast 2D raycast for the env (sheep + walls + posts)
-  lidar_perception.py     — scan → world-frame cluster centroids + filters
-  sheep_tracker.py        — multi-target NN tracker with FOV memory
+  world/                  — environment-side physics & geometry
+    geometry.py             field/pen constants, robot specs
+    diffdrive.py            differential-drive kinematics
+    flocking_sim.py         Reynolds + Strömbom 2014 sheep dynamics
+  perception/             — LiDAR → tracked-sheep pipeline
+    lidar_sim.py            fast 2D raycast for the env
+    lidar_perception.py     scan → world-frame cluster centroids + filters
+    sheep_tracker.py        multi-target NN tracker with FOV memory
+  control/                — every dog mode's action source
+    strombom.py             canonical CoM collect/drive heuristic
+    sequential.py           single-target "pin-and-push" alternative
+    active_scan.py          wraps a base teacher with opening rotation +
+                            walk-to-centre fallback
+    modulation.py           shared near-sheep speed-modulation helper

 controllers/
-  sheep/sheep.py          — Webots sheep controller (uses herding.flocking_sim)
+  sheep/sheep.py          — Webots sheep controller (uses herding.world.flocking_sim)
  shepherd_dog/
    shepherd_dog.py       — Webots dog controller, mode-switched
    policy_loader.py      — lazy SB3 policy loader (auto-detects frame stack)
@@ -107,16 +106,17 @@ controllers/
 training/
  herding_env.py          — Gymnasium env (LiDAR + tracker by default)
  bc_pretrain.py          — supervised BC of (obs, action) demos into MLP
-  eval.py                 — analytic + BC policy comparison harness
-  parity_test.py          — shape / determinism smoke test
+  train_ppo.py            — KL-regularised PPO fine-tune of BC
+  eval.py                 — analytic + learned policy comparison harness
  runs/                   — checkpoints (whitelisted in .gitignore)
  requirements.txt

+tests/
+  parity_test.py          — shape / determinism / baseline smoke test
+
 tools/
  collect_demos.py        — sim demos via the active-scan teacher
-  dagger_merge_train.py   — merge Webots-collected DAgger demos and retrain
  run_webots.sh           — launch Webots with N sheep + chosen mode
-  auto_dagger.sh          — headless DAgger collection across many runs

 worlds/
  field.wbt               — main world (3 m gate, external pen)
@@ -127,8 +127,8 @@ docs/project.md           — original project goals

 ## Shared low-level control

-Every dog mode (RL, Strömbom, Sequential, the DAgger teacher) routes
-its action through `herding/control.py:modulate_speed_near_sheep`,
+Every dog mode (Strömbom, Sequential, BC, RL) routes its action
+through `herding/control/modulation.py:modulate_speed_near_sheep`,
 which scales action magnitude down when within ~2.5 m of the nearest
 tracked sheep. This stops the dog from charging in at full speed and
 scattering the flock. Direction (intent) is preserved.