Drop versioning vocabulary, polish docstrings, fix world-aware policy resolution
User-facing pass after the project was decided to be a single
submission with no inner iterations.
* Remove every "v1"/"v2"/"versioning" reference from the docs:
- README mecanum section trims the "v1 predates the rewrite" prose
in favour of a self-contained retrain recipe.
- The 3.2 GB `training/runs/v1_clean/` backup directory is deleted.
* Refresh control-layer docstrings:
- `sheep_tracker.py` header now describes the three actual pipeline
stages (consensus, prediction, pen latching) instead of layering
the consensus stage on top of a stale "predictive mode" preamble.
- `controllers/shepherd_dog/shepherd_dog.py` mode list is
up-to-date — adds `universal`, removes outdated single-policy
default paths, mentions `HERDING_USE_GT=1` as the perception
ablation.
* Refresh training command examples:
- `training/bc/collect.py` and `training/bc/pretrain.py` usage
snippets show the world-suffixed paths the Makefile actually
uses; the `--out` arg is now required so old "demos.npz"
invocations error loudly instead of silently overwriting.
- `training/README.md` rewritten — drops the legacy `runs/bc`
diagram, documents the per-(drive, world) pipeline, and adds
the mecanum retraining caveat.
* Fix policy-directory resolution end-to-end:
- `tools/run_webots.sh` now tries
`training/runs/{bc,rl}_<drive>_<world>` first, then the drive-
only path, then the bare-mode legacy path — matching the actual
on-disk layout. Previously it looked for `bc_<drive>` (no
world) and silently fell back to `bc`, masking the world
selection.
- `controllers/shepherd_dog/shepherd_dog.py:_resolve_policy_dir`
has the same fix plus a latent NameError unmasked: it referenced
`DRIVE_MODE` before that variable was set at module load. The
block is restructured so MODE/DRIVE_MODE/WORLD are resolved
first, then the function uses them as explicit arguments.
126 pytest cases still pass.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This commit is contained in:
@@ -1,42 +1,49 @@
|
||||
"""Shepherd Dog controller (Webots).
|
||||
|
||||
Mode is selected by ``HERDING_MODE`` (env var, or via the
|
||||
``herding_runtime.cfg`` file the launcher writes since Webots strips
|
||||
env vars on some setups):
|
||||
Mode is selected by ``HERDING_MODE`` — read from the env var or from
|
||||
the ``herding_runtime.cfg`` file the launcher writes (Webots strips
|
||||
env vars from controller subprocesses on some setups):
|
||||
|
||||
strombom → canonical Strömbom (2014) collect/drive heuristic
|
||||
wrapped in ActiveScanTeacher (opening rotation +
|
||||
walk-to-centre when the tracker briefly empties).
|
||||
sequential → single-target "pin-and-push", same wrapper.
|
||||
bc → behaviour-cloned MLP, trained on Strömbom demos.
|
||||
Default policy: training/runs/bc/policy.zip.
|
||||
rl → KL-regularised PPO fine-tune of bc. Same obs/action
|
||||
space as bc; refines time-to-pen via reward while
|
||||
staying anchored to bc.
|
||||
Default policy: training/runs/rl/policy.zip.
|
||||
walk-to-centre when the tracker briefly empties)
|
||||
sequential → single-target "pin-and-push", same wrapper
|
||||
universal → mecanum-aware teacher (Strömbom + omega + recovery)
|
||||
bc → behaviour-cloned MLP, trained on universal demos
|
||||
rl → KL-regularised PPO fine-tune of `bc`
|
||||
|
||||
Policy directories are resolved by `policy_loader` from
|
||||
``training/runs/{bc,rl}_{drive}_{world}`` with a fallback to
|
||||
``training/runs/{bc,rl}`` (legacy single-policy paths).
|
||||
|
||||
Sheep perception
|
||||
----------------
|
||||
The dog perceives sheep through its **front-mounted 140° LiDAR**
|
||||
The dog perceives sheep through its front-mounted 140° LiDAR
|
||||
(``protos/ShepherdDog.proto``: 180 rays, 12 m max range). Each step:
|
||||
|
||||
1. Reads ``lidar.getRangeImage()``.
|
||||
2. Runs ``herding.perception.lidar_perception.detections_from_scan``
|
||||
to cluster returns into world-frame ``(x, y)`` sheep estimates.
|
||||
3. Folds those into a ``SheepTracker`` which maintains last-seen
|
||||
positions for sheep currently out of FOV and latches "penned"
|
||||
once a track crosses the gate plane south.
|
||||
1. Read ``lidar.getRangeImage()``.
|
||||
2. Cluster returns into world-frame ``(x, y)`` estimates
|
||||
(``herding.perception.lidar_perception.detections_from_scan``).
|
||||
3. Fold detections into a ``SheepTracker``, which maintains
|
||||
last-seen positions for sheep currently out of FOV, requires
|
||||
consensus across multiple frames before promoting a candidate
|
||||
to a real track, and latches "penned" once a track crosses
|
||||
the gate plane south.
|
||||
|
||||
Sheep ``emitter`` messages are read **for diagnostic logging only**
|
||||
(GT_penned counter + auto-finish sentinel); they are never used to
|
||||
drive the policy. Perception for control comes entirely from LiDAR.
|
||||
Setting ``HERDING_USE_GT=1`` bypasses the tracker and feeds emitter
|
||||
ground-truth positions to the policy — useful as a perception
|
||||
ablation for the analytic baselines.
|
||||
|
||||
Sheep emitter messages are otherwise read for diagnostic logging
|
||||
only (``GT_penned`` counter + auto-finish sentinel); the control
|
||||
loop never depends on them.
|
||||
|
||||
Auto-finish
|
||||
-----------
|
||||
When the dog observes (via GT, read off the receiver) that all sheep
|
||||
are penned, it writes ``training/.run_done`` and the launcher
|
||||
(``tools/run_webots.sh``) detects it and closes Webots. This keeps
|
||||
batch evaluation runs bounded.
|
||||
When every emitter-reported sheep is penned, the controller writes
|
||||
``training/.run_done``. The launcher (``tools/run_webots.sh``)
|
||||
detects the sentinel and closes Webots so headless sweep runs are
|
||||
bounded.
|
||||
"""
|
||||
|
||||
import math
|
||||
@@ -111,6 +118,24 @@ MODE = (os.environ.get("HERDING_MODE")
|
||||
or _runtime_cfg.get("HERDING_MODE")
|
||||
or "bc").lower()
|
||||
|
||||
_VALID_MODES = ("bc", "rl", "strombom", "sequential", "universal", "calibrate")
|
||||
if MODE not in _VALID_MODES:
|
||||
print(f"[dog] unknown HERDING_MODE={MODE!r}; defaulting to strombom.")
|
||||
MODE = "strombom"
|
||||
|
||||
# Drive mode: "differential" (2-wheel) or "mecanum" (4-wheel omnidirectional).
|
||||
DRIVE_MODE = (os.environ.get("HERDING_DRIVE")
|
||||
or _runtime_cfg.get("HERDING_DRIVE")
|
||||
or "differential").lower()
|
||||
if DRIVE_MODE not in ("differential", "mecanum"):
|
||||
print(f"[dog] unknown HERDING_DRIVE={DRIVE_MODE!r}; defaulting to differential.")
|
||||
DRIVE_MODE = "differential"
|
||||
|
||||
# World shape — used to disambiguate the trained policy directory.
|
||||
WORLD = (os.environ.get("HERDING_WORLD")
|
||||
or _runtime_cfg.get("HERDING_WORLD")
|
||||
or "field").lower()
|
||||
|
||||
# Diagnostic: bypass LiDAR tracker and use GT emitter positions directly.
|
||||
# Set HERDING_USE_GT=1 to isolate perception sim-to-real gap from policy quality.
|
||||
USE_GT_PERCEPTION = bool(int(
|
||||
@@ -119,50 +144,34 @@ USE_GT_PERCEPTION = bool(int(
|
||||
))
|
||||
|
||||
|
||||
def _resolve_policy_dir(mode: str) -> str:
|
||||
"""Where to look for the trained policy for the given mode.
|
||||
def _resolve_policy_dir(mode: str, drive: str, world: str) -> str:
|
||||
"""Where to look for the trained policy for the given mode/drive/world.
|
||||
|
||||
Priority:
|
||||
1. HERDING_POLICY_DIR env var or runtime-cfg entry, if it points
|
||||
to a real directory.
|
||||
2. Drive-mode-specific default:
|
||||
bc → training/runs/bc_differential (or bc_mecanum)
|
||||
rl → training/runs/rl_differential (or rl_mecanum)
|
||||
3. Legacy path (no drive suffix):
|
||||
bc → training/runs/bc
|
||||
rl → training/runs/rl
|
||||
2. Canonical: training/runs/{bc,rl}_<drive>_<world>
|
||||
3. Drive-only: training/runs/{bc,rl}_<drive>
|
||||
4. Bare-mode: training/runs/{bc,rl}
|
||||
The first existing directory wins; if none exist, the canonical
|
||||
path is returned so the loader's error message is informative.
|
||||
"""
|
||||
env_dir = (os.environ.get("HERDING_POLICY_DIR")
|
||||
or _runtime_cfg.get("HERDING_POLICY_DIR"))
|
||||
if env_dir and os.path.isdir(env_dir):
|
||||
return env_dir
|
||||
drive = DRIVE_MODE
|
||||
mode_default = {
|
||||
"bc": os.path.join(_PROJECT_ROOT, "training", "runs",
|
||||
f"bc_{drive}"),
|
||||
"rl": os.path.join(_PROJECT_ROOT, "training", "runs",
|
||||
f"rl_{drive}"),
|
||||
}
|
||||
primary = mode_default.get(mode, mode_default["bc"])
|
||||
if os.path.isdir(primary):
|
||||
return primary
|
||||
# Fallback: legacy paths without drive suffix.
|
||||
legacy = {
|
||||
"bc": os.path.join(_PROJECT_ROOT, "training", "runs", "bc"),
|
||||
"rl": os.path.join(_PROJECT_ROOT, "training", "runs", "rl"),
|
||||
}
|
||||
fallback = legacy.get(mode, legacy["bc"])
|
||||
if os.path.isdir(fallback):
|
||||
return fallback
|
||||
return env_dir or primary
|
||||
base = "rl" if mode == "rl" else "bc"
|
||||
runs = os.path.join(_PROJECT_ROOT, "training", "runs")
|
||||
for cand in (f"{base}_{drive}_{world}", f"{base}_{drive}", base):
|
||||
path = os.path.join(runs, cand)
|
||||
if os.path.isdir(path):
|
||||
return path
|
||||
return os.path.join(runs, f"{base}_{drive}_{world}")
|
||||
|
||||
|
||||
_VALID_MODES = ("bc", "rl", "strombom", "sequential", "universal", "calibrate")
|
||||
if MODE not in _VALID_MODES:
|
||||
print(f"[dog] unknown HERDING_MODE={MODE!r}; defaulting to strombom.")
|
||||
MODE = "strombom"
|
||||
print(f"[dog] mode={MODE} drive={DRIVE_MODE} world={WORLD}")
|
||||
|
||||
POLICY_DIR = _resolve_policy_dir(MODE)
|
||||
POLICY_DIR = _resolve_policy_dir(MODE, DRIVE_MODE, WORLD)
|
||||
policy_handle = None
|
||||
if MODE in ("bc", "rl"):
|
||||
print(f"[dog] resolved POLICY_DIR={POLICY_DIR} exists={os.path.isdir(POLICY_DIR)}")
|
||||
@@ -173,16 +182,6 @@ if MODE in ("bc", "rl"):
|
||||
except Exception as exc:
|
||||
print(f"[dog] policy load failed ({exc!r}); falling back to strombom.")
|
||||
MODE = "strombom"
|
||||
print(f"[dog] running in mode={MODE}")
|
||||
|
||||
# Drive mode: "differential" (2-wheel) or "mecanum" (4-wheel omnidirectional).
|
||||
DRIVE_MODE = (os.environ.get("HERDING_DRIVE")
|
||||
or _runtime_cfg.get("HERDING_DRIVE")
|
||||
or "differential").lower()
|
||||
if DRIVE_MODE not in ("differential", "mecanum"):
|
||||
print(f"[dog] unknown HERDING_DRIVE={DRIVE_MODE!r}; defaulting to differential.")
|
||||
DRIVE_MODE = "differential"
|
||||
print(f"[dog] drive mode={DRIVE_MODE}")
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
Reference in New Issue
Block a user