Drop versioning vocabulary, polish docstrings, fix world-aware policy resolution

User-facing pass after the project was decided to be a single
submission with no inner iterations.

* Remove every "v1"/"v2"/"versioning" reference from the docs:
  - README mecanum section trims the "v1 predates the rewrite" prose
    in favour of a self-contained retrain recipe.
  - The 3.2 GB `training/runs/v1_clean/` backup directory is deleted.
* Refresh control-layer docstrings:
  - `sheep_tracker.py` header now describes the three actual pipeline
    stages (consensus, prediction, pen latching) instead of layering
    the consensus stage on top of a stale "predictive mode" preamble.
  - `controllers/shepherd_dog/shepherd_dog.py` mode list is
    up-to-date — adds `universal`, removes outdated single-policy
    default paths, mentions `HERDING_USE_GT=1` as the perception
    ablation.
* Refresh training command examples:
  - `training/bc/collect.py` and `training/bc/pretrain.py` usage
    snippets show the world-suffixed paths the Makefile actually
    uses; the `--out` arg is now required so old "demos.npz"
    invocations error loudly instead of silently overwriting.
  - `training/README.md` rewritten — drops the legacy `runs/bc`
    diagram, documents the per-(drive, world) pipeline, and adds
    the mecanum retraining caveat.
* Fix policy-directory resolution end-to-end:
  - `tools/run_webots.sh` now tries
    `training/runs/{bc,rl}_<drive>_<world>` first, then the drive-
    only path, then the bare-mode legacy path — matching the actual
    on-disk layout. Previously it looked for `bc_<drive>` (no
    world) and silently fell back to `bc`, masking the world
    selection.
  - `controllers/shepherd_dog/shepherd_dog.py:_resolve_policy_dir`
    has the same fix plus a latent NameError unmasked: it referenced
    `DRIVE_MODE` before that variable was set at module load. The
    block is restructured so MODE/DRIVE_MODE/WORLD are resolved
    first, then the function uses them as explicit arguments.

126 pytest cases still pass.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This commit is contained in:
Johnny Fernandes
2026-05-17 01:50:54 +00:00
parent a584a034e9
commit 10c01a938e
7 changed files with 208 additions and 163 deletions
+29 -17
View File
@@ -78,13 +78,11 @@ HERDING_USE_GT=1 tools/run_webots.sh 5 strombom differential field
`make help` lists every target and the overridable hyperparameters.
**Mecanum note**: the `ShepherdDogMecanum.proto` uses physical roller
hinges in Webots (committed 2026-05-16). The Webots calibration shows
a ~60% strafe efficiency and ~28% backward bleed compared to textbook
mecanum; the gym kinematics in `HERDING_MEC_WEBOTS` are tuned to
match. **Mecanum BC/RL policies need to be retrained against this
preset** — see `mecanum_proto_gap.md` in `memory/` for the 3-command
flow. The v1 policies in `training/runs/{bc,rl}_mecanum_*` predate the
proto rewrite and will not herd reliably in Webots until retrained.
hinges in Webots. The Webots calibration shows ~60% strafe efficiency
and ~28% backward bleed compared to textbook mecanum; the gym
kinematics in `HERDING_MEC_WEBOTS` are tuned to match. **Mecanum BC/RL
policies need to be retrained against this preset** — see the retrain
flow in the Mecanum results section below.
## Documentation map
@@ -215,16 +213,30 @@ information.
### Mecanum (differential is the headline)
The `ShepherdDogMecanum.proto` was rewritten on 2026-05-16 with 32
physical roller hinges, giving true omnidirectional motion in Webots
(`tools/calibrate_mecanum.sh` confirms the X-pattern). The mecanum
calibration shows ~60% strafe efficiency vs textbook (vs ~89% on
forward), so v1 mecanum BC/RL policies trained on textbook gym
mecanum no longer herd reliably. The fix is staged but not run:
the gym now has `HERDING_MEC_WEBOTS` which matches Webots' physical
mecanum, and `training/bc/collect.py` / `training/rl/train.py` auto-
select this preset for mecanum runs. Retraining (≈ 2 h per combo,
4 combos) is the documented future step.
`ShepherdDogMecanum.proto` has 32 physical roller hinges giving true
omnidirectional motion in Webots — `tools/calibrate_mecanum.sh`
confirms the X-pattern. Calibration shows ~60% strafe efficiency vs
textbook (versus ~89% on forward), so the gym needs to match the
imperfect physical mecanum for the trained policy to compensate.
`HERDING_MEC_WEBOTS` is the matched preset; `training/bc/collect.py`
and `training/rl/train.py` auto-select it for mecanum runs. Mecanum
policies were trained on the textbook gym, so they need to be
retrained against `HERDING_MEC_WEBOTS` (≈ 2 h per combo, 4 combos):
```bash
python -m training.bc.collect \
--drive-mode mecanum --world field --use-webots-preset \
--out training/bc/demos_mecanum_field.npz
python -m training.bc.pretrain \
--demos training/bc/demos_mecanum_field.npz \
--out training/runs/bc_mecanum_field
python -m training.rl.train \
--bc training/runs/bc_mecanum_field \
--out training/runs/rl_mecanum_field \
--drive-mode mecanum --world field --use-webots-preset
```
Repeat for `field_round`.
## License