Drop versioning vocabulary, polish docstrings, fix world-aware policy resolution

User-facing pass after the project was decided to be a single
submission with no inner iterations.

* Remove every "v1"/"v2"/"versioning" reference from the docs:
  - README mecanum section trims the "v1 predates the rewrite" prose
    in favour of a self-contained retrain recipe.
  - The 3.2 GB `training/runs/v1_clean/` backup directory is deleted.
* Refresh control-layer docstrings:
  - `sheep_tracker.py` header now describes the three actual pipeline
    stages (consensus, prediction, pen latching) instead of layering
    the consensus stage on top of a stale "predictive mode" preamble.
  - `controllers/shepherd_dog/shepherd_dog.py` mode list is
    up-to-date — adds `universal`, removes outdated single-policy
    default paths, mentions `HERDING_USE_GT=1` as the perception
    ablation.
* Refresh training command examples:
  - `training/bc/collect.py` and `training/bc/pretrain.py` usage
    snippets show the world-suffixed paths the Makefile actually
    uses; the `--out` arg is now required so old "demos.npz"
    invocations error loudly instead of silently overwriting.
  - `training/README.md` rewritten — drops the legacy `runs/bc`
    diagram, documents the per-(drive, world) pipeline, and adds
    the mecanum retraining caveat.
* Fix policy-directory resolution end-to-end:
  - `tools/run_webots.sh` now tries
    `training/runs/{bc,rl}_<drive>_<world>` first, then the drive-
    only path, then the bare-mode legacy path — matching the actual
    on-disk layout. Previously it looked for `bc_<drive>` (no
    world) and silently fell back to `bc`, masking the world
    selection.
  - `controllers/shepherd_dog/shepherd_dog.py:_resolve_policy_dir`
    has the same fix plus a latent NameError unmasked: it referenced
    `DRIVE_MODE` before that variable was set at module load. The
    block is restructured so MODE/DRIVE_MODE/WORLD are resolved
    first, then the function uses them as explicit arguments.

126 pytest cases still pass.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This commit is contained in:
Johnny Fernandes
2026-05-17 01:50:54 +00:00
parent a584a034e9
commit 10c01a938e
7 changed files with 208 additions and 163 deletions
+66 -67
View File
@@ -1,42 +1,49 @@
"""Shepherd Dog controller (Webots).
Mode is selected by ``HERDING_MODE`` (env var, or via the
``herding_runtime.cfg`` file the launcher writes since Webots strips
env vars on some setups):
Mode is selected by ``HERDING_MODE`` — read from the env var or from
the ``herding_runtime.cfg`` file the launcher writes (Webots strips
env vars from controller subprocesses on some setups):
strombom → canonical Strömbom (2014) collect/drive heuristic
wrapped in ActiveScanTeacher (opening rotation +
walk-to-centre when the tracker briefly empties).
sequential → single-target "pin-and-push", same wrapper.
bc → behaviour-cloned MLP, trained on Strömbom demos.
Default policy: training/runs/bc/policy.zip.
rl → KL-regularised PPO fine-tune of bc. Same obs/action
space as bc; refines time-to-pen via reward while
staying anchored to bc.
Default policy: training/runs/rl/policy.zip.
walk-to-centre when the tracker briefly empties)
sequential → single-target "pin-and-push", same wrapper
universal → mecanum-aware teacher (Strömbom + omega + recovery)
bc → behaviour-cloned MLP, trained on universal demos
rl → KL-regularised PPO fine-tune of `bc`
Policy directories are resolved by `policy_loader` from
``training/runs/{bc,rl}_{drive}_{world}`` with a fallback to
``training/runs/{bc,rl}`` (legacy single-policy paths).
Sheep perception
----------------
The dog perceives sheep through its **front-mounted 140° LiDAR**
The dog perceives sheep through its front-mounted 140° LiDAR
(``protos/ShepherdDog.proto``: 180 rays, 12 m max range). Each step:
1. Reads ``lidar.getRangeImage()``.
2. Runs ``herding.perception.lidar_perception.detections_from_scan``
to cluster returns into world-frame ``(x, y)`` sheep estimates.
3. Folds those into a ``SheepTracker`` which maintains last-seen
positions for sheep currently out of FOV and latches "penned"
once a track crosses the gate plane south.
1. Read ``lidar.getRangeImage()``.
2. Cluster returns into world-frame ``(x, y)`` estimates
(``herding.perception.lidar_perception.detections_from_scan``).
3. Fold detections into a ``SheepTracker``, which maintains
last-seen positions for sheep currently out of FOV, requires
consensus across multiple frames before promoting a candidate
to a real track, and latches "penned" once a track crosses
the gate plane south.
Sheep ``emitter`` messages are read **for diagnostic logging only**
(GT_penned counter + auto-finish sentinel); they are never used to
drive the policy. Perception for control comes entirely from LiDAR.
Setting ``HERDING_USE_GT=1`` bypasses the tracker and feeds emitter
ground-truth positions to the policy — useful as a perception
ablation for the analytic baselines.
Sheep emitter messages are otherwise read for diagnostic logging
only (``GT_penned`` counter + auto-finish sentinel); the control
loop never depends on them.
Auto-finish
-----------
When the dog observes (via GT, read off the receiver) that all sheep
are penned, it writes ``training/.run_done`` and the launcher
(``tools/run_webots.sh``) detects it and closes Webots. This keeps
batch evaluation runs bounded.
When every emitter-reported sheep is penned, the controller writes
``training/.run_done``. The launcher (``tools/run_webots.sh``)
detects the sentinel and closes Webots so headless sweep runs are
bounded.
"""
import math
@@ -111,6 +118,24 @@ MODE = (os.environ.get("HERDING_MODE")
or _runtime_cfg.get("HERDING_MODE")
or "bc").lower()
_VALID_MODES = ("bc", "rl", "strombom", "sequential", "universal", "calibrate")
if MODE not in _VALID_MODES:
print(f"[dog] unknown HERDING_MODE={MODE!r}; defaulting to strombom.")
MODE = "strombom"
# Drive mode: "differential" (2-wheel) or "mecanum" (4-wheel omnidirectional).
DRIVE_MODE = (os.environ.get("HERDING_DRIVE")
or _runtime_cfg.get("HERDING_DRIVE")
or "differential").lower()
if DRIVE_MODE not in ("differential", "mecanum"):
print(f"[dog] unknown HERDING_DRIVE={DRIVE_MODE!r}; defaulting to differential.")
DRIVE_MODE = "differential"
# World shape — used to disambiguate the trained policy directory.
WORLD = (os.environ.get("HERDING_WORLD")
or _runtime_cfg.get("HERDING_WORLD")
or "field").lower()
# Diagnostic: bypass LiDAR tracker and use GT emitter positions directly.
# Set HERDING_USE_GT=1 to isolate perception sim-to-real gap from policy quality.
USE_GT_PERCEPTION = bool(int(
@@ -119,50 +144,34 @@ USE_GT_PERCEPTION = bool(int(
))
def _resolve_policy_dir(mode: str) -> str:
"""Where to look for the trained policy for the given mode.
def _resolve_policy_dir(mode: str, drive: str, world: str) -> str:
"""Where to look for the trained policy for the given mode/drive/world.
Priority:
1. HERDING_POLICY_DIR env var or runtime-cfg entry, if it points
to a real directory.
2. Drive-mode-specific default:
bc → training/runs/bc_differential (or bc_mecanum)
rl → training/runs/rl_differential (or rl_mecanum)
3. Legacy path (no drive suffix):
bc → training/runs/bc
rl → training/runs/rl
2. Canonical: training/runs/{bc,rl}_<drive>_<world>
3. Drive-only: training/runs/{bc,rl}_<drive>
4. Bare-mode: training/runs/{bc,rl}
The first existing directory wins; if none exist, the canonical
path is returned so the loader's error message is informative.
"""
env_dir = (os.environ.get("HERDING_POLICY_DIR")
or _runtime_cfg.get("HERDING_POLICY_DIR"))
if env_dir and os.path.isdir(env_dir):
return env_dir
drive = DRIVE_MODE
mode_default = {
"bc": os.path.join(_PROJECT_ROOT, "training", "runs",
f"bc_{drive}"),
"rl": os.path.join(_PROJECT_ROOT, "training", "runs",
f"rl_{drive}"),
}
primary = mode_default.get(mode, mode_default["bc"])
if os.path.isdir(primary):
return primary
# Fallback: legacy paths without drive suffix.
legacy = {
"bc": os.path.join(_PROJECT_ROOT, "training", "runs", "bc"),
"rl": os.path.join(_PROJECT_ROOT, "training", "runs", "rl"),
}
fallback = legacy.get(mode, legacy["bc"])
if os.path.isdir(fallback):
return fallback
return env_dir or primary
base = "rl" if mode == "rl" else "bc"
runs = os.path.join(_PROJECT_ROOT, "training", "runs")
for cand in (f"{base}_{drive}_{world}", f"{base}_{drive}", base):
path = os.path.join(runs, cand)
if os.path.isdir(path):
return path
return os.path.join(runs, f"{base}_{drive}_{world}")
_VALID_MODES = ("bc", "rl", "strombom", "sequential", "universal", "calibrate")
if MODE not in _VALID_MODES:
print(f"[dog] unknown HERDING_MODE={MODE!r}; defaulting to strombom.")
MODE = "strombom"
print(f"[dog] mode={MODE} drive={DRIVE_MODE} world={WORLD}")
POLICY_DIR = _resolve_policy_dir(MODE)
POLICY_DIR = _resolve_policy_dir(MODE, DRIVE_MODE, WORLD)
policy_handle = None
if MODE in ("bc", "rl"):
print(f"[dog] resolved POLICY_DIR={POLICY_DIR} exists={os.path.isdir(POLICY_DIR)}")
@@ -173,16 +182,6 @@ if MODE in ("bc", "rl"):
except Exception as exc:
print(f"[dog] policy load failed ({exc!r}); falling back to strombom.")
MODE = "strombom"
print(f"[dog] running in mode={MODE}")
# Drive mode: "differential" (2-wheel) or "mecanum" (4-wheel omnidirectional).
DRIVE_MODE = (os.environ.get("HERDING_DRIVE")
or _runtime_cfg.get("HERDING_DRIVE")
or "differential").lower()
if DRIVE_MODE not in ("differential", "mecanum"):
print(f"[dog] unknown HERDING_DRIVE={DRIVE_MODE!r}; defaulting to differential.")
DRIVE_MODE = "differential"
print(f"[dog] drive mode={DRIVE_MODE}")
# ---------------------------------------------------------------------------