Drop versioning vocabulary, polish docstrings, fix world-aware policy resolution
User-facing pass after the project was decided to be a single
submission with no inner iterations.
* Remove every "v1"/"v2"/"versioning" reference from the docs:
- README mecanum section trims the "v1 predates the rewrite" prose
in favour of a self-contained retrain recipe.
- The 3.2 GB `training/runs/v1_clean/` backup directory is deleted.
* Refresh control-layer docstrings:
- `sheep_tracker.py` header now describes the three actual pipeline
stages (consensus, prediction, pen latching) instead of layering
the consensus stage on top of a stale "predictive mode" preamble.
- `controllers/shepherd_dog/shepherd_dog.py` mode list is
up-to-date — adds `universal`, removes outdated single-policy
default paths, mentions `HERDING_USE_GT=1` as the perception
ablation.
* Refresh training command examples:
- `training/bc/collect.py` and `training/bc/pretrain.py` usage
snippets show the world-suffixed paths the Makefile actually
uses; the `--out` arg is now required so old "demos.npz"
invocations error loudly instead of silently overwriting.
- `training/README.md` rewritten — drops the legacy `runs/bc`
diagram, documents the per-(drive, world) pipeline, and adds
the mecanum retraining caveat.
* Fix policy-directory resolution end-to-end:
- `tools/run_webots.sh` now tries
`training/runs/{bc,rl}_<drive>_<world>` first, then the drive-
only path, then the bare-mode legacy path — matching the actual
on-disk layout. Previously it looked for `bc_<drive>` (no
world) and silently fell back to `bc`, masking the world
selection.
- `controllers/shepherd_dog/shepherd_dog.py:_resolve_policy_dir`
has the same fix plus a latent NameError unmasked: it referenced
`DRIVE_MODE` before that variable was set at module load. The
block is restructured so MODE/DRIVE_MODE/WORLD are resolved
first, then the function uses them as explicit arguments.
126 pytest cases still pass.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This commit is contained in:
@@ -78,13 +78,11 @@ HERDING_USE_GT=1 tools/run_webots.sh 5 strombom differential field
|
||||
`make help` lists every target and the overridable hyperparameters.
|
||||
|
||||
**Mecanum note**: the `ShepherdDogMecanum.proto` uses physical roller
|
||||
hinges in Webots (committed 2026-05-16). The Webots calibration shows
|
||||
a ~60% strafe efficiency and ~28% backward bleed compared to textbook
|
||||
mecanum; the gym kinematics in `HERDING_MEC_WEBOTS` are tuned to
|
||||
match. **Mecanum BC/RL policies need to be retrained against this
|
||||
preset** — see `mecanum_proto_gap.md` in `memory/` for the 3-command
|
||||
flow. The v1 policies in `training/runs/{bc,rl}_mecanum_*` predate the
|
||||
proto rewrite and will not herd reliably in Webots until retrained.
|
||||
hinges in Webots. The Webots calibration shows ~60% strafe efficiency
|
||||
and ~28% backward bleed compared to textbook mecanum; the gym
|
||||
kinematics in `HERDING_MEC_WEBOTS` are tuned to match. **Mecanum BC/RL
|
||||
policies need to be retrained against this preset** — see the retrain
|
||||
flow in the Mecanum results section below.
|
||||
|
||||
## Documentation map
|
||||
|
||||
@@ -215,16 +213,30 @@ information.
|
||||
|
||||
### Mecanum (differential is the headline)
|
||||
|
||||
The `ShepherdDogMecanum.proto` was rewritten on 2026-05-16 with 32
|
||||
physical roller hinges, giving true omnidirectional motion in Webots
|
||||
(`tools/calibrate_mecanum.sh` confirms the X-pattern). The mecanum
|
||||
calibration shows ~60% strafe efficiency vs textbook (vs ~89% on
|
||||
forward), so v1 mecanum BC/RL policies trained on textbook gym
|
||||
mecanum no longer herd reliably. The fix is staged but not run:
|
||||
the gym now has `HERDING_MEC_WEBOTS` which matches Webots' physical
|
||||
mecanum, and `training/bc/collect.py` / `training/rl/train.py` auto-
|
||||
select this preset for mecanum runs. Retraining (≈ 2 h per combo,
|
||||
4 combos) is the documented future step.
|
||||
`ShepherdDogMecanum.proto` has 32 physical roller hinges giving true
|
||||
omnidirectional motion in Webots — `tools/calibrate_mecanum.sh`
|
||||
confirms the X-pattern. Calibration shows ~60% strafe efficiency vs
|
||||
textbook (versus ~89% on forward), so the gym needs to match the
|
||||
imperfect physical mecanum for the trained policy to compensate.
|
||||
`HERDING_MEC_WEBOTS` is the matched preset; `training/bc/collect.py`
|
||||
and `training/rl/train.py` auto-select it for mecanum runs. Mecanum
|
||||
policies were trained on the textbook gym, so they need to be
|
||||
retrained against `HERDING_MEC_WEBOTS` (≈ 2 h per combo, 4 combos):
|
||||
|
||||
```bash
|
||||
python -m training.bc.collect \
|
||||
--drive-mode mecanum --world field --use-webots-preset \
|
||||
--out training/bc/demos_mecanum_field.npz
|
||||
python -m training.bc.pretrain \
|
||||
--demos training/bc/demos_mecanum_field.npz \
|
||||
--out training/runs/bc_mecanum_field
|
||||
python -m training.rl.train \
|
||||
--bc training/runs/bc_mecanum_field \
|
||||
--out training/runs/rl_mecanum_field \
|
||||
--drive-mode mecanum --world field --use-webots-preset
|
||||
```
|
||||
|
||||
Repeat for `field_round`.
|
||||
|
||||
## License
|
||||
|
||||
|
||||
Reference in New Issue
Block a user