Checkpoint 7

2026-05-11 12:21:51 +01:00
parent fce0e0c786
commit a01a5c9cef
34 changed files with 1266 additions and 1038 deletions
@@ -22,10 +22,10 @@ control step:

 1. Read `lidar.getRangeImage()`,
 2. Cluster returns into world-frame `(x, y)` estimates
-   (`herding/lidar_perception.py`),
+   (`herding/perception/lidar_perception.py`),
 3. Fold them into a multi-target tracker that maintains last-seen
   positions for sheep currently outside the FOV
-   (`herding/sheep_tracker.py`).
+   (`herding/perception/sheep_tracker.py`).

 **LiDAR validation** (intermediate-goal item v from `docs/project.md`):
 during development a diagnostic-dump controller captured 80 real
@@ -39,7 +39,7 @@ task.
 The tracker outputs a `{name: (x, y)}` dict shaped exactly like the
 prior receiver-based one, so Strömbom, Sequential, and the BC obs
 builder all run unchanged on top of it. The 2D Gymnasium env
-(`herding/lidar_sim.py`) raycasts sheep discs at training time, so
+(`herding/perception/lidar_sim.py`) raycasts sheep discs at training time, so
 demos collected in the env match the perception the deployed
 controller sees in Webots.

@@ -52,36 +52,32 @@ Privileged ground-truth perception is available for ablation —
 # 1. Set up the Python env (any venv with PyTorch + SB3)
 pip install -r training/requirements.txt

-# 2. Smoke test
-python -m tests.parity_test
+# 2. Smoke test (70 pytest cases, < 1 s)
+make test

-# 3. Reproduce the BC policy (~10 min on CPU: ~5 min demos + ~3 min BC)
-python -m tools.collect_demos --teacher strombom \
-    --out training/demos.npz --seeds-per-n 15 --subsample 3 --frame-stack 4
-python -m training.bc_pretrain --demos training/demos.npz \
-    --out training/runs/bc --epochs 60 --net-arch 512,512
+# 3. Reproduce the full pipeline (~30–60 min CPU)
+make            # demos -> bc -> rl -> eval

-# 4. KL-PPO fine-tune of the BC policy (~30 min on CPU, 1 M steps)
-python -m training.train_ppo \
-    --bc training/runs/bc \
-    --out training/runs/rl \
-    --total-timesteps 1000000
+# Individual stages (each rebuilds upstream artefacts if missing):
+make bc_demos   # sim demos
+make bc         # behaviour clone
+make rl         # KL-PPO fine-tune
+make eval       # 10-seed env eval of rl

-# 5. Evaluate (env)
-python -m training.eval --policy training/runs/rl \
-    --max-flock 10 --max-steps 15000 --n-seeds 10
-
-# 6. Run in Webots
-tools/run_webots.sh 10 bc          # behaviour-cloned MLP
-tools/run_webots.sh 10 rl          # KL-PPO fine-tune
-tools/run_webots.sh 10 strombom    # analytic baseline
+# 4. Run in Webots
+make webots N=10 MODE=bc          # behaviour-cloned MLP
+make webots N=10 MODE=rl          # KL-PPO fine-tune
+make webots N=10 MODE=strombom    # analytic baseline
+# (or invoke directly: tools/run_webots.sh 10 rl)
 ```

+`make help` lists every target and the overridable hyperparameters
+(e.g. `make rl PPO_STEPS=2000000 KL=0.02`).
+
 ## Layout

 ```
 herding/                  — perception / control / world primitives
-  obs.py                  — 32-D order-invariant observation builder
  world/                  — environment-side physics & geometry
    geometry.py             field/pen constants, robot specs
    diffdrive.py            differential-drive kinematics
@@ -90,6 +86,7 @@ herding/                  — perception / control / world primitives
    lidar_sim.py            fast 2D raycast for the env
    lidar_perception.py     scan → world-frame cluster centroids + filters
    sheep_tracker.py        multi-target NN tracker with FOV memory
+    obs.py                  32-D order-invariant observation builder
  control/                — every dog mode's action source
    strombom.py             canonical CoM collect/drive heuristic
    sequential.py           single-target "pin-and-push" alternative
@@ -105,19 +102,28 @@ controllers/

 training/
  herding_env.py          — Gymnasium env (LiDAR + tracker by default)
-  bc_pretrain.py          — supervised BC of (obs, action) demos into MLP
-  train_ppo.py            — KL-regularised PPO fine-tune of BC
+  bc/collect.py           — sim demos via the active-scan teacher
+  bc/pretrain.py          — supervised BC of (obs, action) demos into MLP
+  rl/train.py             — KL-regularised PPO fine-tune of BC
  eval.py                 — analytic + learned policy comparison harness
+  bc/demos.npz            — collected demonstrations (gitignored)
  runs/                   — checkpoints (whitelisted in .gitignore)
  requirements.txt

 tests/
-  parity_test.py          — shape / determinism / baseline smoke test
+  conftest.py             — pytest setup (adds project root to sys.path)
+  test_geometry.py        — geometric predicates + constants
+  test_diffdrive.py       — kinematics and (vx, vy) → wheel-speed map
+  test_obs.py             — observation builder (shape, normalisation, order)
+  test_control.py         — speed modulation + analytic teachers + active scan
+  test_perception.py      — LiDAR sim + clustering + tracker
+  test_env.py             — Gymnasium contract + determinism + reward

 tools/
-  collect_demos.py        — sim demos via the active-scan teacher
  run_webots.sh           — launch Webots with N sheep + chosen mode

+Makefile                  — pipeline orchestrator (make / make rl / make test / …)
+
 worlds/
  field.wbt               — main world (3 m gate, external pen)