Drop versioning vocabulary, polish docstrings, fix world-aware policy resolution

User-facing pass after the project was decided to be a single submission with no inner iterations. * Remove every "v1"/"v2"/"versioning" reference from the docs: - README mecanum section trims the "v1 predates the rewrite" prose in favour of a self-contained retrain recipe. - The 3.2 GB `training/runs/v1_clean/` backup directory is deleted. * Refresh control-layer docstrings: - `sheep_tracker.py` header now describes the three actual pipeline stages (consensus, prediction, pen latching) instead of layering the consensus stage on top of a stale "predictive mode" preamble. - `controllers/shepherd_dog/shepherd_dog.py` mode list is up-to-date — adds `universal`, removes outdated single-policy default paths, mentions `HERDING_USE_GT=1` as the perception ablation. * Refresh training command examples: - `training/bc/collect.py` and `training/bc/pretrain.py` usage snippets show the world-suffixed paths the Makefile actually uses; the `--out` arg is now required so old "demos.npz" invocations error loudly instead of silently overwriting. - `training/README.md` rewritten — drops the legacy `runs/bc` diagram, documents the per-(drive, world) pipeline, and adds the mecanum retraining caveat. * Fix policy-directory resolution end-to-end: - `tools/run_webots.sh` now tries `training/runs/{bc,rl}_<drive>_<world>` first, then the drive- only path, then the bare-mode legacy path — matching the actual on-disk layout. Previously it looked for `bc_<drive>` (no world) and silently fell back to `bc`, masking the world selection. - `controllers/shepherd_dog/shepherd_dog.py:_resolve_policy_dir` has the same fix plus a latent NameError unmasked: it referenced `DRIVE_MODE` before that variable was set at module load. The block is restructured so MODE/DRIVE_MODE/WORLD are resolved first, then the function uses them as explicit arguments. 126 pytest cases still pass. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-17 01:50:54 +00:00
parent a584a034e9
commit 10c01a938e
7 changed files with 208 additions and 163 deletions
@@ -1,42 +1,49 @@
 """Shepherd Dog controller (Webots).

-Mode is selected by ``HERDING_MODE`` (env var, or via the
-``herding_runtime.cfg`` file the launcher writes since Webots strips
-env vars on some setups):
+Mode is selected by ``HERDING_MODE`` — read from the env var or from
+the ``herding_runtime.cfg`` file the launcher writes (Webots strips
+env vars from controller subprocesses on some setups):

    strombom    → canonical Strömbom (2014) collect/drive heuristic
                  wrapped in ActiveScanTeacher (opening rotation +
-                  walk-to-centre when the tracker briefly empties).
-    sequential  → single-target "pin-and-push", same wrapper.
-    bc          → behaviour-cloned MLP, trained on Strömbom demos.
-                  Default policy: training/runs/bc/policy.zip.
-    rl          → KL-regularised PPO fine-tune of bc. Same obs/action
-                  space as bc; refines time-to-pen via reward while
-                  staying anchored to bc.
-                  Default policy: training/runs/rl/policy.zip.
+                  walk-to-centre when the tracker briefly empties)
+    sequential  → single-target "pin-and-push", same wrapper
+    universal   → mecanum-aware teacher (Strömbom + omega + recovery)
+    bc          → behaviour-cloned MLP, trained on universal demos
+    rl          → KL-regularised PPO fine-tune of `bc`
+
+Policy directories are resolved by `policy_loader` from
+``training/runs/{bc,rl}_{drive}_{world}`` with a fallback to
+``training/runs/{bc,rl}`` (legacy single-policy paths).

 Sheep perception
 ----------------
-The dog perceives sheep through its **front-mounted 140° LiDAR**
+The dog perceives sheep through its front-mounted 140° LiDAR
 (``protos/ShepherdDog.proto``: 180 rays, 12 m max range). Each step:

-    1. Reads ``lidar.getRangeImage()``.
-    2. Runs ``herding.perception.lidar_perception.detections_from_scan``
-       to cluster returns into world-frame ``(x, y)`` sheep estimates.
-    3. Folds those into a ``SheepTracker`` which maintains last-seen
-       positions for sheep currently out of FOV and latches "penned"
-       once a track crosses the gate plane south.
+    1. Read ``lidar.getRangeImage()``.
+    2. Cluster returns into world-frame ``(x, y)`` estimates
+       (``herding.perception.lidar_perception.detections_from_scan``).
+    3. Fold detections into a ``SheepTracker``, which maintains
+       last-seen positions for sheep currently out of FOV, requires
+       consensus across multiple frames before promoting a candidate
+       to a real track, and latches "penned" once a track crosses
+       the gate plane south.

-Sheep ``emitter`` messages are read **for diagnostic logging only**
-(GT_penned counter + auto-finish sentinel); they are never used to
-drive the policy. Perception for control comes entirely from LiDAR.
+Setting ``HERDING_USE_GT=1`` bypasses the tracker and feeds emitter
+ground-truth positions to the policy — useful as a perception
+ablation for the analytic baselines.
+
+Sheep emitter messages are otherwise read for diagnostic logging
+only (``GT_penned`` counter + auto-finish sentinel); the control
+loop never depends on them.

 Auto-finish
 -----------
-When the dog observes (via GT, read off the receiver) that all sheep
-are penned, it writes ``training/.run_done`` and the launcher
-(``tools/run_webots.sh``) detects it and closes Webots. This keeps
-batch evaluation runs bounded.
+When every emitter-reported sheep is penned, the controller writes
+``training/.run_done``. The launcher (``tools/run_webots.sh``)
+detects the sentinel and closes Webots so headless sweep runs are
+bounded.
 """

 import math
@@ -111,6 +118,24 @@ MODE = (os.environ.get("HERDING_MODE")
        or _runtime_cfg.get("HERDING_MODE")
        or "bc").lower()

+_VALID_MODES = ("bc", "rl", "strombom", "sequential", "universal", "calibrate")
+if MODE not in _VALID_MODES:
+    print(f"[dog] unknown HERDING_MODE={MODE!r}; defaulting to strombom.")
+    MODE = "strombom"
+
+# Drive mode: "differential" (2-wheel) or "mecanum" (4-wheel omnidirectional).
+DRIVE_MODE = (os.environ.get("HERDING_DRIVE")
+              or _runtime_cfg.get("HERDING_DRIVE")
+              or "differential").lower()
+if DRIVE_MODE not in ("differential", "mecanum"):
+    print(f"[dog] unknown HERDING_DRIVE={DRIVE_MODE!r}; defaulting to differential.")
+    DRIVE_MODE = "differential"
+
+# World shape — used to disambiguate the trained policy directory.
+WORLD = (os.environ.get("HERDING_WORLD")
+         or _runtime_cfg.get("HERDING_WORLD")
+         or "field").lower()
+
 # Diagnostic: bypass LiDAR tracker and use GT emitter positions directly.
 # Set HERDING_USE_GT=1 to isolate perception sim-to-real gap from policy quality.
 USE_GT_PERCEPTION = bool(int(
@@ -119,50 +144,34 @@ USE_GT_PERCEPTION = bool(int(
 ))


-def _resolve_policy_dir(mode: str) -> str:
-    """Where to look for the trained policy for the given mode.
+def _resolve_policy_dir(mode: str, drive: str, world: str) -> str:
+    """Where to look for the trained policy for the given mode/drive/world.

    Priority:
      1. HERDING_POLICY_DIR env var or runtime-cfg entry, if it points
         to a real directory.
-      2. Drive-mode-specific default:
-            bc → training/runs/bc_differential (or bc_mecanum)
-            rl → training/runs/rl_differential (or rl_mecanum)
-      3. Legacy path (no drive suffix):
-            bc → training/runs/bc
-            rl → training/runs/rl
+      2. Canonical:  training/runs/{bc,rl}_<drive>_<world>
+      3. Drive-only: training/runs/{bc,rl}_<drive>
+      4. Bare-mode:  training/runs/{bc,rl}
+    The first existing directory wins; if none exist, the canonical
+    path is returned so the loader's error message is informative.
    """
    env_dir = (os.environ.get("HERDING_POLICY_DIR")
               or _runtime_cfg.get("HERDING_POLICY_DIR"))
    if env_dir and os.path.isdir(env_dir):
        return env_dir
-    drive = DRIVE_MODE
-    mode_default = {
-        "bc": os.path.join(_PROJECT_ROOT, "training", "runs",
-                           f"bc_{drive}"),
-        "rl": os.path.join(_PROJECT_ROOT, "training", "runs",
-                           f"rl_{drive}"),
-    }
-    primary = mode_default.get(mode, mode_default["bc"])
-    if os.path.isdir(primary):
-        return primary
-    # Fallback: legacy paths without drive suffix.
-    legacy = {
-        "bc": os.path.join(_PROJECT_ROOT, "training", "runs", "bc"),
-        "rl": os.path.join(_PROJECT_ROOT, "training", "runs", "rl"),
-    }
-    fallback = legacy.get(mode, legacy["bc"])
-    if os.path.isdir(fallback):
-        return fallback
-    return env_dir or primary
+    base = "rl" if mode == "rl" else "bc"
+    runs = os.path.join(_PROJECT_ROOT, "training", "runs")
+    for cand in (f"{base}_{drive}_{world}", f"{base}_{drive}", base):
+        path = os.path.join(runs, cand)
+        if os.path.isdir(path):
+            return path
+    return os.path.join(runs, f"{base}_{drive}_{world}")


-_VALID_MODES = ("bc", "rl", "strombom", "sequential", "universal", "calibrate")
-if MODE not in _VALID_MODES:
-    print(f"[dog] unknown HERDING_MODE={MODE!r}; defaulting to strombom.")
-    MODE = "strombom"
+print(f"[dog] mode={MODE}  drive={DRIVE_MODE}  world={WORLD}")

-POLICY_DIR = _resolve_policy_dir(MODE)
+POLICY_DIR = _resolve_policy_dir(MODE, DRIVE_MODE, WORLD)
 policy_handle = None
 if MODE in ("bc", "rl"):
    print(f"[dog] resolved POLICY_DIR={POLICY_DIR}  exists={os.path.isdir(POLICY_DIR)}")
@@ -173,16 +182,6 @@ if MODE in ("bc", "rl"):
    except Exception as exc:
        print(f"[dog] policy load failed ({exc!r}); falling back to strombom.")
        MODE = "strombom"
-print(f"[dog] running in mode={MODE}")
-
-# Drive mode: "differential" (2-wheel) or "mecanum" (4-wheel omnidirectional).
-DRIVE_MODE = (os.environ.get("HERDING_DRIVE")
-              or _runtime_cfg.get("HERDING_DRIVE")
-              or "differential").lower()
-if DRIVE_MODE not in ("differential", "mecanum"):
-    print(f"[dog] unknown HERDING_DRIVE={DRIVE_MODE!r}; defaulting to differential.")
-    DRIVE_MODE = "differential"
-print(f"[dog] drive mode={DRIVE_MODE}")


 # ---------------------------------------------------------------------------