Drop versioning vocabulary, polish docstrings, fix world-aware policy resolution

User-facing pass after the project was decided to be a single submission with no inner iterations. * Remove every "v1"/"v2"/"versioning" reference from the docs: - README mecanum section trims the "v1 predates the rewrite" prose in favour of a self-contained retrain recipe. - The 3.2 GB `training/runs/v1_clean/` backup directory is deleted. * Refresh control-layer docstrings: - `sheep_tracker.py` header now describes the three actual pipeline stages (consensus, prediction, pen latching) instead of layering the consensus stage on top of a stale "predictive mode" preamble. - `controllers/shepherd_dog/shepherd_dog.py` mode list is up-to-date — adds `universal`, removes outdated single-policy default paths, mentions `HERDING_USE_GT=1` as the perception ablation. * Refresh training command examples: - `training/bc/collect.py` and `training/bc/pretrain.py` usage snippets show the world-suffixed paths the Makefile actually uses; the `--out` arg is now required so old "demos.npz" invocations error loudly instead of silently overwriting. - `training/README.md` rewritten — drops the legacy `runs/bc` diagram, documents the per-(drive, world) pipeline, and adds the mecanum retraining caveat. * Fix policy-directory resolution end-to-end: - `tools/run_webots.sh` now tries `training/runs/{bc,rl}_<drive>_<world>` first, then the drive- only path, then the bare-mode legacy path — matching the actual on-disk layout. Previously it looked for `bc_<drive>` (no world) and silently fell back to `bc`, masking the world selection. - `controllers/shepherd_dog/shepherd_dog.py:_resolve_policy_dir` has the same fix plus a latent NameError unmasked: it referenced `DRIVE_MODE` before that variable was set at module load. The block is restructured so MODE/DRIVE_MODE/WORLD are resolved first, then the function uses them as explicit arguments. 126 pytest cases still pass. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-17 01:50:54 +00:00
parent a584a034e9
commit 10c01a938e
7 changed files with 208 additions and 163 deletions
@@ -78,13 +78,11 @@ HERDING_USE_GT=1 tools/run_webots.sh 5 strombom differential field
 `make help` lists every target and the overridable hyperparameters.

 **Mecanum note**: the `ShepherdDogMecanum.proto` uses physical roller
-hinges in Webots (committed 2026-05-16). The Webots calibration shows
-a ~60% strafe efficiency and ~28% backward bleed compared to textbook
-mecanum; the gym kinematics in `HERDING_MEC_WEBOTS` are tuned to
-match. **Mecanum BC/RL policies need to be retrained against this
-preset** — see `mecanum_proto_gap.md` in `memory/` for the 3-command
-flow. The v1 policies in `training/runs/{bc,rl}_mecanum_*` predate the
-proto rewrite and will not herd reliably in Webots until retrained.
+hinges in Webots. The Webots calibration shows ~60% strafe efficiency
+and ~28% backward bleed compared to textbook mecanum; the gym
+kinematics in `HERDING_MEC_WEBOTS` are tuned to match. **Mecanum BC/RL
+policies need to be retrained against this preset** — see the retrain
+flow in the Mecanum results section below.

 ## Documentation map

@@ -215,16 +213,30 @@ information.

 ### Mecanum (differential is the headline)

-The `ShepherdDogMecanum.proto` was rewritten on 2026-05-16 with 32
-physical roller hinges, giving true omnidirectional motion in Webots
-(`tools/calibrate_mecanum.sh` confirms the X-pattern). The mecanum
-calibration shows ~60% strafe efficiency vs textbook (vs ~89% on
-forward), so v1 mecanum BC/RL policies trained on textbook gym
-mecanum no longer herd reliably. The fix is staged but not run:
-the gym now has `HERDING_MEC_WEBOTS` which matches Webots' physical
-mecanum, and `training/bc/collect.py` / `training/rl/train.py` auto-
-select this preset for mecanum runs. Retraining (≈ 2 h per combo,
-4 combos) is the documented future step.
+`ShepherdDogMecanum.proto` has 32 physical roller hinges giving true
+omnidirectional motion in Webots — `tools/calibrate_mecanum.sh`
+confirms the X-pattern. Calibration shows ~60% strafe efficiency vs
+textbook (versus ~89% on forward), so the gym needs to match the
+imperfect physical mecanum for the trained policy to compensate.
+`HERDING_MEC_WEBOTS` is the matched preset; `training/bc/collect.py`
+and `training/rl/train.py` auto-select it for mecanum runs. Mecanum
+policies were trained on the textbook gym, so they need to be
+retrained against `HERDING_MEC_WEBOTS` (≈ 2 h per combo, 4 combos):
+
+```bash
+python -m training.bc.collect \
+    --drive-mode mecanum --world field --use-webots-preset \
+    --out training/bc/demos_mecanum_field.npz
+python -m training.bc.pretrain \
+    --demos training/bc/demos_mecanum_field.npz \
+    --out training/runs/bc_mecanum_field
+python -m training.rl.train \
+    --bc training/runs/bc_mecanum_field \
+    --out training/runs/rl_mecanum_field \
+    --drive-mode mecanum --world field --use-webots-preset
+```
+
+Repeat for `field_round`.

 ## License