Compare commits

..

25 Commits

Author SHA1 Message Date
Johnny Fernandes 0a27ad9a26 Full retrain pipeline + hybrid policy set
Ran end-to-end clean retrain + gym eval + 24-cell Webots sweep
(tools/full_pipeline.sh). Results:

  Differential — all 16 cells pen N/N. Updated policies committed.
  Mecanum     — new training stochastically regressed (only 2/8 cells
                vs the v2 baseline's 4/8). v2 baseline mec policies
                are RESTORED in this commit (training/runs/{bc,rl}_
                mecanum_*) — they remain the deliverable.

The retrain pipeline itself is committed for reproducibility
(tools/full_pipeline.sh: clean → train_all → eval_all → 24-cell
Webots sweep). The v2 mec policies are also backed up locally to
_backup_pretrain/mec_v2_baseline/ (gitignored).

Verified after restore:
  bc mec field_round n=10 → 10/10 in 147 s sim
  rl diff field n=5      → 5/5  in 137 s sim
2026-05-20 08:07:39 +00:00
Johnny Fernandes 07d1ece3d4 Allow strafe_efficiency=1.0 in mec preset test; minor comment cleanup
After a deep investigation into the n=5 mecanum sim-to-real gap, all
attempted fixes (consensus tightening, wall_reject tightening, static-
phantom drop, deploy-time track merge, in-tracker track merge,
fp_rate-augmented retrain, max_range cap, 140° mecanum retrain) failed
to reliably pen n=5 in Webots without regressing n=10. The phantom
problem at 360° + small flock is genuinely hard and out of scope for
the deadline; documented in docs/status.md.

Result preserved from the previous mecanum work:
* 16/16 differential cells pen N/N.
* 4/8 mecanum cells (all n=10) pen 10/10 via Supervisor kinematic
  injection (commit 27c0f65).
* n=5 mecanum is the known gap.

Small changes that survived the iteration:
* tests/test_config.py — strafe_efficiency=1.0 is now valid (kinematic
  injection means the gym preset and Webots controller share the
  formula, so textbook values produce gym-identical body motion).
* tools/run_webots.sh — refreshed the LiDAR-variant comment.
* training/rl/train.py — comment polish.
2026-05-19 15:57:27 +00:00
Johnny Fernandes 62ea811655 Fix _h_ema NameError; add status + article-draft notes
- shepherd_dog: a leftover reference to the removed HERDING_HEADING_EMA
  helper raised NameError on every controller startup. Drop it.
- docs/status.md: expand the n=5 mecanum failure-mode discussion with
  the four phantom-suppression attempts that didn't pan out, and the
  honest workaround (Webots reports n=10 only, n=5 covered by gym
  results).
- docs/article_draft.md: project-report outline with section structure,
  results tables, and the mecanum sim-to-real narrative for the
  formal writeup.
2026-05-19 01:11:49 +00:00
Johnny Fernandes 27c0f65722 Mecanum Webots via Supervisor kinematic injection
Replace the failing ODE-rolled mecanum chassis dynamics with a
Supervisor.setVelocity call that uses the gym mecanum forward
kinematics formula directly. Wheel motors still spin (visual);
chassis motion comes from the gym model so training and deployment
match by construction.

Results (seed=42, n=10 sheep): BC + RL mecanum pen 10/10 in both
field and field_round. n=5 mecanum cells still 0/5 due to tracker
phantoms anchored to wall corners under the 360° LiDAR — documented
in docs/status.md as the remaining gap.

Cleanup: drop deploy-time hacks (HERDING_HEADING_*, HERDING_OMEGA_CLAMP,
HERDING_TRACKER_*) that were workarounds for the old ODE chaos;
revert the proto inertiaMatrix, roller dampingConstant, and reduced
motor torque since they no longer carry load; refresh comments
around the mecanum config presets.
2026-05-18 22:46:37 +00:00
Johnny Fernandes 1df84ae4b5 Drop webots_quick target; mecanum BC demos now auto-use HERDING_MEC_WEBOTS
* Remove `webots_quick` Makefile target — `make webots` is the only
  webots entry point now (it fires the interactive picker). The
  positional non-interactive path is still available as
  `bash tools/run_webots.sh N MODE DRIVE WORLD` for scripted use.
* Add `WEBOTS_PRESET_FLAG = --use-webots-preset` for mecanum drive
  and pass it to the `bc.collect` recipe so demos are collected
  under the gym kinematics that match the physical-roller Webots
  mecanum. Without this, mecanum BC demos would record textbook
  X-pattern teacher actions against textbook gym kinematics, and
  the resulting policy would fail at deployment exactly the same
  way the current v1 mecanum policies do.
* `rl/train.py` already auto-detects mecanum and applies
  HERDING_MEC_WEBOTS internally (commit 3b4c99a), so the rl recipe
  doesn't need the flag — a one-line comment in the Makefile makes
  that intent explicit.

Diff drive keeps the existing recipe: no --use-webots-preset, so
BC demos collected on HERDING_DEFAULT (360° gym, no FP). This is
the regime that produced the current diff/field and diff/round
policies that pen 5/5 in Webots LiDAR; retraining under the same
regime is the safest reproduction.

126 pytest cases still pass.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-17 10:44:15 +00:00
Johnny Fernandes e86fee5ae8 Per-sheep pen-time metrics, seed support, make webots → menu
* `controllers/shepherd_dog/shepherd_dog.py`
  - Tracks the first step at which each sheep crosses the gate; on
    auto-finish (all sheep penned) prints a `[results]` summary
    block: mode/drive/world/lidar/dogs/seed line, total simulated
    time, per-sheep penning order with absolute step + seconds since
    sim start, and the gate spread between the first and last
    penning.
  - Reads `HERDING_SEED` (env / runtime cfg) and seeds the
    controller's RNG when set. Empty = time-based default = old
    non-deterministic behaviour.
* `controllers/sheep/sheep.py` reads `HERDING_SEED` the same way
  (loading `herding_runtime.cfg` itself so it works even when
  Webots strips env vars) and seeds Python's RNG XOR'd with the
  sheep's name hash, so a fixed seed gives a reproducible flock
  trajectory without all sheep starting from identical wander state.
* `tools/run_webots.sh` writes `HERDING_SEED` into the runtime cfg
  (empty when unset so existing scripts keep their stochastic
  behaviour).
* `tools/webots_menu.sh` gains a Seed prompt (random / fixed
  integer); the launch summary box shows the choice next to the
  perception row.
* `Makefile`
  - `make webots`  now fires the interactive picker (replacing the
    old positional invocation).
  - `make webots_quick MODE=… DRIVE=… WORLD=… N=…` is the old
    positional path, kept for batch / scripted use.

Smoke-tested: menu renders Mode → Drive → World → LiDAR → Dogs
→ Sheep → Perception → Seed → Headless prompts and shows the
selected Seed value in the launch summary. 126 pytest cases still
pass.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-17 10:33:34 +00:00
Johnny Fernandes bdaff6a3e1 Interactive Webots launcher (tools/webots_menu.sh)
Single-command picker that prompts for every experimental knob the
project supports, then dispatches to `tools/run_webots.sh` with the
matching env vars. The banner reminds the user that the interpreter
path lives in `tools/setup_env.sh` (or `$HERDING_PYTHON`) so the
"this conda path won't exist on another machine" trap is hard to
fall into.

Prompts, in order:
  Mode          : bc | rl | strombom | sequential | universal
  Drive         : differential | mecanum
  World         : field | field_round
  LiDAR FOV     : 140° | 360°  (skipped when drive=mecanum)
  Dogs          : 1 | 2 (axis-split — only ask leak if 2)
  Sheep         : 1..10
  Perception    : LiDAR | GT bypass
  Headless      : no (windowed) | yes (xvfb-run + fast mode)

Each prompt has a default marked with `*`; pressing Enter through the
whole flow runs the canonical demo (BC / diff / field / 140° /
1 dog / 5 sheep / LiDAR / windowed). The configuration is summarised
in a boxed block before the final "Launch? [Y/n]" confirm.

README quick-start now lists `tools/webots_menu.sh` as the
recommended starting point and shows the env-var-prefixed launcher
invocations (HERDING_LIDAR=360, HERDING_NDOGS=2, HERDING_USE_GT=1)
for non-interactive use.

126 pytest cases still pass.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-17 02:49:06 +00:00
Johnny Fernandes eadeeafb32 Dual-shepherd soft axis-split (HERDING_AXIS_LEAK)
The strict 100/0 axis mask reaches drive standoff and deadlocks
because each dog has only one degree of freedom left to push the
flock. Soften the mask: each dog leads its assigned axis (full gain)
and contributes ``HERDING_AXIS_LEAK`` on the other axis. ``0.0`` is
the old strict behaviour; ``1.0`` is no mask (both dogs run full
policy, role-redundant). Default ``0.3`` breaks the deadlock while
preserving the "one dog per axis" coordination story.

Implementation:
* `controllers/shepherd_dog/shepherd_dog.py` reads
  `HERDING_AXIS_LEAK` from env / runtime cfg (clamped to [0, 1]),
  prints it next to the axis tag, and multiplies the off-axis
  velocity component by it instead of zeroing.
* `tools/run_webots.sh` writes `HERDING_AXIS_LEAK` into
  `herding_runtime.cfg` so Webots-stripped controller subprocesses
  still see it; defaults to 0.3 when unset.

Webots smoke test (HERDING_NDOGS=2, HERDING_AXIS_LEAK=0.3, strombom,
diff/field, 5 sheep, LiDAR perception, no GT): **5/5 penned at step
13204**, vs the strict 100/0 mask which timed out at 0/5. Penning
trail 1/5 → 2/5 → 4/5 → 5/5 between steps 6200 and 13400 — slower
than single-dog (Strömbom diff/field n=5: 7528) as expected since
the work is split, but the coordination demonstrably succeeds.

This gives the writeup a clean three-row ablation:
  α=0.0  (strict)  → deadlock, 0/5
  α=0.3  (default) → 5/5 @ 13204
  α=1.0  (no mask) → both dogs run full policy (single-dog
                     baseline applied twice; no axis story)

126 pytest cases still pass.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-17 02:43:40 +00:00
Johnny Fernandes cfbf4a0267 Dual-shepherd axis-split (HERDING_NDOGS=2)
The launcher can now spawn two `ShepherdDog` robots, each masked to a
single axis of motion, so the herding workload is split orthogonally.

Mechanic:
* `HERDING_NDOGS=2` (default 1) tells `tools/run_webots.sh` to replace
  the single-dog node in the generated test world with two copies:
  - `ShepherdDogX` at (-4, -10), `customData "axis=x"`
  - `ShepherdDogY` at (+4, -10), `customData "axis=y"`
  Each spawn position sits south of the field interior so the pair
  doesn't collide with starting sheep.
* `controllers/shepherd_dog/shepherd_dog.py` reads `getCustomData()`
  at startup; when `axis=x|y` it zeroes the off-axis component of every
  action *after* speed modulation and *before* EMA smoothing. With
  `customData` empty the controller behaves identically to single-dog
  mode, so all existing launches are unaffected.
* The dog's emitter line now carries the robot's name
  (`dog:ShepherdDogX:x:y`), and `controllers/sheep/sheep.py` keeps a
  `dogs` dict keyed by name, picking the closest one each step for
  its flee target. Single-dog runs still use the legacy two-field
  `dog:x:y` format thanks to a length check.
* `HERDING_NDOGS` is written into `herding_runtime.cfg` and exported
  to subprocesses so future tooling can read it.

Verified behaviour in Webots smoke tests (HERDING_NDOGS=2, strombom,
diff/field, 5 sheep): both dogs spawn with the expected names and
axis tags, the dual-dog status print appears, each dog acts only on
its assigned axis early in the trial, and the masking is internally
consistent. The pair stalls before penning under pure axis-split
because each dog reaches its drive standoff and then has only one
degree of freedom — useful research finding for the write-up;
coordination strategy (shared CoM, role-switching, etc.) is future
work.

126 pytest cases still pass.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-17 02:35:38 +00:00
Johnny Fernandes d00da52c3c Portable Python env + 360° LiDAR ablation flag
Two small features.

(1) Portable interpreter
* `tools/setup_env.sh` exports HERDING_PYTHON (default points to the
  project's conda env; override in your shell to retarget).
* Both `controllers/*/runtime.ini` files now use Webots' env-var
  expansion: `COMMAND = $(HERDING_PYTHON)` so the Webots-launched
  controllers pick up the same interpreter as the bash scripts.
* `tools/run_webots.sh`, `tools/webots_sweep{,_gt}.sh` and
  `tools/calibrate_mecanum.sh` all source `setup_env.sh` at the top
  instead of hard-coding `/home/jalf/miniconda3/envs/tir/bin`.
The hard-coded conda path is now exactly one line in `setup_env.sh`'s
fallback default — a single place to edit on a new machine, or
override-once via `export HERDING_PYTHON=...`.

(2) 360° LiDAR FOV ablation
* New `LIDAR_WEBOTS_360` preset matches the existing
  `protos/ShepherdDog360.proto` (360 rays / 2π FOV / 15 m range).
* `tools/run_webots.sh` reads `HERDING_LIDAR=140|360` and swaps the
  diff-drive proto accordingly (mecanum keeps 140° — the
  ShepherdDogMecanum proto has its own LiDAR section). The variant
  is written into `herding_runtime.cfg` so the controller can read
  it even when Webots strips env vars.
* `controllers/shepherd_dog/shepherd_dog.py` picks the matching
  `lidar_cfg` (`HERDING_WEBOTS.lidar` for 140°, `LIDAR_WEBOTS_360`
  otherwise) and feeds it to `detections_from_scan` so the
  perception pipeline interprets ray angles + max range correctly.

Smoke test: `HERDING_LIDAR=360 tools/run_webots.sh 5 strombom
differential field` launches with `ShepherdDog360.proto`, the
controller logs the new mode/drive/world line, and the dog is
penning sheep through 360° perception (4/5 at step 19200 before I
killed the test). No retraining required because the gym already
trains under `LIDAR_FULL` (360° preset).

126 pytest cases still pass.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-17 02:19:15 +00:00
Johnny Fernandes 7ab69ab0f3 Rename multi-segment functions to two-concept names; polish docstrings
Naming pass: rename functions whose third+ segment is redundant or
implementation-detail, sticking to the codebase's preferred
``noun_verb`` / ``verb_noun`` two-concept idiom. Renames are atomic
across definitions, callers, and tests.

  is_penned_position        →  is_penned
  modulate_speed_near_sheep →  modulate_speed
  mecanum_kinematics_step   →  mecanum_step
  policy_forward_mean       →  forward_mean

Two-concept patterns like ``velocity_to_wheels`` / ``detections_from_scan``
/ ``make_strombom_predictor`` are left alone — they're idiomatic
converters / factories that read as a single concept, and the longer
form aids grep-ability.

Docstring polish:
* ``herding/config.py`` header drops the "previously lived as a
  module-level literal" historical framing — we ship as a single
  thing, so the refactor anecdote no longer earns its keep. The
  usage examples now mention both ``HERDING_WEBOTS`` and
  ``HERDING_MEC_WEBOTS`` presets.

126 pytest cases still pass.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-17 01:58:15 +00:00
Johnny Fernandes 10c01a938e Drop versioning vocabulary, polish docstrings, fix world-aware policy resolution
User-facing pass after the project was decided to be a single
submission with no inner iterations.

* Remove every "v1"/"v2"/"versioning" reference from the docs:
  - README mecanum section trims the "v1 predates the rewrite" prose
    in favour of a self-contained retrain recipe.
  - The 3.2 GB `training/runs/v1_clean/` backup directory is deleted.
* Refresh control-layer docstrings:
  - `sheep_tracker.py` header now describes the three actual pipeline
    stages (consensus, prediction, pen latching) instead of layering
    the consensus stage on top of a stale "predictive mode" preamble.
  - `controllers/shepherd_dog/shepherd_dog.py` mode list is
    up-to-date — adds `universal`, removes outdated single-policy
    default paths, mentions `HERDING_USE_GT=1` as the perception
    ablation.
* Refresh training command examples:
  - `training/bc/collect.py` and `training/bc/pretrain.py` usage
    snippets show the world-suffixed paths the Makefile actually
    uses; the `--out` arg is now required so old "demos.npz"
    invocations error loudly instead of silently overwriting.
  - `training/README.md` rewritten — drops the legacy `runs/bc`
    diagram, documents the per-(drive, world) pipeline, and adds
    the mecanum retraining caveat.
* Fix policy-directory resolution end-to-end:
  - `tools/run_webots.sh` now tries
    `training/runs/{bc,rl}_<drive>_<world>` first, then the drive-
    only path, then the bare-mode legacy path — matching the actual
    on-disk layout. Previously it looked for `bc_<drive>` (no
    world) and silently fell back to `bc`, masking the world
    selection.
  - `controllers/shepherd_dog/shepherd_dog.py:_resolve_policy_dir`
    has the same fix plus a latent NameError unmasked: it referenced
    `DRIVE_MODE` before that variable was set at module load. The
    block is restructured so MODE/DRIVE_MODE/WORLD are resolved
    first, then the function uses them as explicit arguments.

126 pytest cases still pass.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-17 01:50:54 +00:00
Johnny Fernandes a584a034e9 Project-wide cleanup: gitignore, dead code, stale artifacts, README
Repo hygiene pass after a long working session.

Files removed:
* stage1_train.log — runtime training log (~125 KB), shouldn't have
  been tracked.
* training/bc/demos.npz — orphan default-name demos file from before
  the world+drive-suffixed naming convention took over; no script
  references it.
* training/runs/bc_dagger{1,2}_differential_field/policy.zip — failed
  DAgger experiment artifacts. Per `memory/dagger_results.md` the
  whole DAgger experiment hit 0/5 on Webots transfer; these checkpoints
  have no consumers.

Untracked-but-deleted (no git change) — also cleaned from disk:
* Root-level runtime logs (43 *.log files, all unused — gitignored now).
* training/bc/{combined,dagger}*.npz (5 huge demo blobs, 2.6 GB
  reclaimed; not committed).
* training/bc/v1/ (2.6 GB backup of pre-DAgger demos; reclaimed).
* training/runs/at_20260426_*/ (orphan timestamped runs; reclaimed).
* All __pycache__/.

Dead code removed:
* `herding/control/strombom.py::compute_action_debug` — no callers
  anywhere in the repo.
* `herding/control/sequential.py::compute_action_debug` — same.
* `herding/control/universal.py::compute_action_diff` — same.

.gitignore extended to cover:
* All *.log files (training/eval/webots logs are runtime artifacts).
* training/bc/*.npz (re-collectable on demand by `make bc_demos`).
* training/bc/v1/.
* .pytest_cache, *.pyc, .claude/.

README refreshed:
* Mecanum + round-world coverage in the headline.
* Quick-start updated for DRIVE/WORLD-suffixed Makefile targets,
  GT-bypass example, and the mecanum-retrain caveat.
* Layout reflects the actual current tree (config.py, both protos,
  both worlds, all tools).
* Results table replaced with the Webots end-to-end numbers from
  the 2026-05-16 sweep (8/8 diff combos + LiDAR/GT comparison).

Verification: 126 pytest cases still pass (was 126 going in — no
test-coverage regression from the dead-code removal).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-17 01:38:19 +00:00
Johnny Fernandes 3b4c99a6c4 Training pipelines auto-select mecanum-Webots preset
* training/bc/collect.py: --use-webots-preset now picks the
  drive-matched variant. Mecanum drives get HERDING_MEC_WEBOTS
  (with the Webots-calibrated strafe efficiency and bleed) so the
  collected demos reflect the imperfect physical mecanum the
  deployed policy will see. Differential drives still use
  HERDING_WEBOTS (no behaviour change there).
* training/rl/train.py: mecanum fine-tune now *unconditionally*
  applies the HERDING_MEC_WEBOTS robot config to the PPO env (the
  policy must update against the same imperfect kinematics it
  deploys on). Diff fine-tune unchanged.

To retrain a mecanum policy end-to-end against the new proto:

  python -m training.bc.collect --drive-mode mecanum --world field \
    --use-webots-preset \
    --out training/bc/demos_mecanum_field_v2.npz
  python -m training.bc.pretrain --demos training/bc/demos_mecanum_field_v2.npz \
    --out training/runs/bc_mecanum_field_v2 ...
  python -m training.rl.train --bc training/runs/bc_mecanum_field_v2 \
    --out training/runs/rl_mecanum_field_v2 \
    --drive-mode mecanum --world field --use-webots-preset

The same flow for field_round / mecanum/round.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-17 01:12:06 +00:00
Johnny Fernandes ee77c8606c Gym mecanum kinematics matching to Webots roller-hinge proto
Mecanum proto rewrite in b3cf990 made the wheels truly omnidirectional
in Webots, but with asymmetric slip: forward command produces ~89% of
textbook speed while strafe produces only ~38% plus a consistent
~28% backward bleed-through. v1 BC/RL trained on perfect mecanum
gym kinematics could not herd the new dynamics. To unblock that:

* `mecanum_kinematics_step` gains two parameters that scale the
  realised motion to match a deployed-platform calibration:
    - strafe_efficiency  ∈ (0, 1]  default 1.0
    - strafe_to_forward_bleed     default 0.0
  Forward motion is untouched (textbook X-pattern continues to apply
  to vx_body); only the lateral channel is scaled and bleed is added.
* `RobotConfig` exposes both as drive-config fields with the same
  pass-through defaults so existing diff-drive code and existing
  mecanum training pipelines see no behaviour change.
* `HERDING_MEC_WEBOTS` preset bakes in the values measured against the
  current Webots mecanum proto (strafe_efficiency=0.4,
  strafe_to_forward_bleed=-0.28). Training mecanum BC/RL with this
  preset produces policies that compensate for the imperfect
  physical mecanum at deploy.
* `HerdingEnv` plumbs `RobotConfig.strafe_*` through to
  `mecanum_kinematics_step` so the preset takes effect.
* tools/gen_mecanum_wheels.py is added so the proto's 32 roller
  hinges can be regenerated by editing a single set of constants
  rather than hand-editing 1500+ lines of VRML.

Tests:
* 4 new mecanum_kinematics_step tests (default pass-through, strafe
  scaling, backward bleed, forward unaffected by strafe params).
* 3 new RobotConfig tests (defaults, validation, preset shape).
* Sanity check: gym strafe with HERDING_MEC_WEBOTS over 100 steps
  reproduces the Webots calibration to 2 decimal places.

126 unit tests pass (was 120).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-17 01:09:47 +00:00
Johnny Fernandes b3cf9909a8 Mecanum proto: replace cylinder wheels with physical roller hinges
Each wheel is now a hub solid + 8 passive HingeJoint rollers (capsules
tilted 45° in body xy plane at the bottom contact point) instead of
a single plain Cylinder. The rollers free-spin around their tilt axes
so the wheel exhibits mecanum X-pattern behaviour: gym-frame strafe
commands now produce body strafe in Webots, where before they
produced wrong-direction motion (the plain cylinders behaved as 4-
wheel skid-steer).

Calibration on flat field, 200 steps each:
                       gym predict      webots out         err
  vx=0.5  vy=0          1.33 m/s +x     1.19 m/s +x       10.9% +x
                        0     m/s +y    -0.10 m/s +y      ~clean
  vx=0    vy=0.5        1.33 m/s +y     0.50 m/s +y       62.1% +y
                        0     m/s +x    -0.37 m/s +x      noticeable
                                                          mecanum
                                                          coupling

Strafe is imperfect (-x bleed-through, magnitude under-shoot) but
direction is correct and the platform is now omnidirectional. Forward
motion is high-fidelity. Tilt signs assigned so diagonal pairs FL+RR
and FR+RL share the same body-frame roller orientation (the standard
X pattern). Two contact-material names "MecanumWheelA/B" are kept for
diagnostic separation; both use the same isotropic Coulomb friction
of 2.0 with forceDependentSlip 0.005.

tools/run_webots.sh ships the matching contactProperties block on
every mecanum launch (re-emitted into the temporary world copy).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-16 21:54:35 +00:00
Johnny Fernandes 1c197e0ff7 Enable consensus tracker by default + round-world Strömbom fix
Two changes that together raise diff/round gym success ~52%→88% (BC)
and ~68%→88% (RL) without retraining; diff/field stays at 100%.

* TrackerConfig.consensus_k default 1 → 3 (radius 0.5 m, max_age 15
  frames). The same candidate-promotion mechanism that closed the
  Webots LiDAR gap also filters gym tracker phantoms — they show up
  on the round field where sheep run further between detection
  cycles than GATE_M, so each new position spawns a fresh track
  while the stale one persists in memory. SheepTracker() called with
  no tracker_cfg keeps the legacy pass-through behaviour for
  backwards compatibility.
* Strömbom + universal teachers now detect when the natural
  "behind the flock" drive target leaves the curved boundary and
  fall back to pushing the flock radially inward toward the centre.
  Breaks the wall-circling pattern that previously trapped both the
  analytical baselines and the trained policies.

A/B numbers (n_sheep ∈ {1,2,3,5,10}, 5 seeds each, max_steps=15000):

  diff/field  bc:  baseline 100%  consensus 100%
  diff/field  rl:  baseline 100%  consensus 100%
  diff/round  bc:  baseline  52%  consensus  88%
  diff/round  rl:  baseline  68%  consensus  88%

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-16 21:09:25 +00:00
Johnny Fernandes 03b2df5656 Fix run_webots.sh exit-1 when N=0 (calibration mode)
`active=$(grep -c '^Sheep' "$DST")` returns 0 with exit code 1 when
no sheep are left in the world, which fires set -e and kills the
script before it can launch Webots. Wrap with `|| true` so the
calibration mode (N=0) can actually run.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-16 20:40:28 +00:00
Johnny Fernandes 2d23289052 Consensus tracker + active scan close Webots 140° LiDAR gap
Two deploy-time fixes that take v1 360°-trained BC/RL from 0/n to n/n
penned on the canonical 140° LiDAR proto for diff/field:

* SheepTracker now supports a consensus stage: new detections start as
  candidate tracks invisible to get_positions(). A candidate must
  accumulate consensus_k matches within consensus_radius_m of itself
  inside a consensus_max_age window to be promoted; otherwise it
  expires. Real sheep self-confirm within 3 frames (≪0.05 m/step);
  wall-return cluster centroids jitter beyond 0.3 m as the dog moves
  and never promote. consensus_k=1 (default) is a no-op so unconfigured
  callers and HERDING_DEFAULT keep prior behaviour.
* HERDING_WEBOTS preset gets consensus_k=3, radius=0.3, max_age=20,
  plus longer forget_steps=300 and predict_steps=180 so confirmed
  sheep persist through long FOV-occlusion gaps a narrow 140° cone
  produces. max_new_tracks_per_step=1 still rate-caps spawn bursts.
* shepherd_dog.py BC/RL empty-obs fallback now rotates the desired
  heading with step_count so the cone actively sweeps the field
  instead of driving due north into the wall.

Verified in headless Webots (HERDING_USE_GT=0, LiDAR only):
  BC diff/field:        5/5 @ 11698, 10/10 @ 15079
  RL diff/field:        5/5 @ 10039, 9/10 @ 18200 (timeout)
  Strömbom diff/field:  5/5 @ 7528
All previously 0/n. 120 unit tests pass; 9 new consensus tests cover
the candidate stage, promotion radius, and one-shot phantom rejection.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-16 20:19:11 +00:00
Johnny Fernandes 876e14e74f LSTM (RecurrentPPO) experiment + recurrent policy support
Adds RecurrentPPO-based training as an alternative to MLP+frame-stack.
The LSTM gives the policy unbounded temporal memory, addressing the
partial-obs failure mode of the 140° Webots LiDAR (tracker briefly
empties when the dog turns; sporadic phantom tracks confuse decisions).

* training/rl/train_lstm.py: from-scratch RecurrentPPO trainer (no BC
  init, no KL term since there's no reference). Uses HERDING_WEBOTS
  preset so the obs distribution matches deployment.
* training/eval.py: auto-detects RecurrentPPO zips, maintains LSTM
  hidden state across steps, resets between episodes.
* controllers/shepherd_dog/policy_loader.py: PolicyHandle supports
  recurrent policies — state managed inside, reset_recurrent() exposed.

Result on diff/field after 3M steps:
- Gym (default 360°): 69% avg success across n=1..10
- Gym (HERDING_WEBOTS preset, training env): 2% — penning 3-4/5 but
  rarely all 5
- Webots LiDAR 140°: 0/5 (same wall as DAgger and v1 policies)

Conclusion: architectural changes (LSTM vs MLP) don't close the
perception sim-to-real gap. The gym LiDAR sim doesn't faithfully
reproduce Webots phantom-track distribution; any policy trained on the
gym proxy fails to handle real Webots phantoms regardless of
architecture. Closing this gap requires either modeling Webots phantom
patterns in the gym sim (multi-day work) or Webots-in-the-loop
training (very slow). See memory/lstm_results.md for details.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-16 19:22:32 +00:00
Johnny Fernandes dd5ac669e5 Webots sim-to-real fixes, DAgger pipeline, 360° proto variant
Today's session worked across the full Webots delivery stack — found and
fixed a cluster of bugs blocking the BC/RL transfer, then explored
training-side mitigations for the residual perception gap.

Bug fixes:
- Makefile FP_RATE default 2.0 → 0.0: BC demos used fp_rate=0 but RL
  fine-tune defaulted to fp_rate=2, poisoning the BC obs distribution
  and stalling PPO at 0% success across 1.46M+ steps.
- controllers/{shepherd_dog,sheep}/runtime.ini: Webots was launching
  controllers under system python3 (no numpy) and they were crashing
  silently. Pinned to the conda tir env.
- herding/config.py HERDING_WEBOTS preset: pen_latch_depth 0.5 → 2.0,
  max_new_tracks_per_step 3 → 1, static_reject 0.8 → 1.2. Stops phantom
  FPs near the gate from latching as permanently-penned tracks.
- herding/perception/sheep_tracker.py: penned tracks now decay at
  forget_steps × 8 instead of living forever. Adds get_positions
  min_freshness filter for deploy-time use.

Training/eval matches deployment:
- training/bc/collect.py: --dagger-policy flag for DAgger rollouts
  (policy drives, teacher labels) + --use-webots-preset for matched
  140° tracker + DR config.
- controllers/shepherd_dog/shepherd_dog.py: scan-fallback (0, 0.6) when
  BC/RL sees empty sheep_positions — recovers from FOV gaps.

Tooling:
- tools/dagger_round.sh: one-shot DAgger round (collect + concat + bc).
- tools/webots_sweep_gt.sh: full sweep with HERDING_USE_GT=1 for the
  perception-gap diagnosis matrix.
- protos/ShepherdDog360.proto: 360° FOV variant for the FOV-ablation
  comparison. Canonical proto stays at 140° per project spec.

Artifacts: v1 BC/RL policies for all 4 (drive × world) combos trained
in clean gym (success: diff/field 90-100%, diff/round 58%, mec/field
60-100%, mec/round 50-100%). DAgger r1/r2 BCs for diff/field show
12%→38% progression on gym HERDING_WEBOTS proxy but did not close
to actual Webots LiDAR (0/5 throughout). Next: LSTM policy or
learned tracker per the project-state memory.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-16 17:21:02 +00:00
Johnny Fernandes c61df91950 Checkpoint 10 2026-05-13 23:22:17 +01:00
Johnny Fernandes aa598fcb83 Checkpoint 10 2026-05-13 23:14:16 +01:00
Johnny Fernandes 0f807003a5 Results from last checkpoint 2026-05-13 20:26:18 +00:00
Johnny Fernandes 683de740af Checkpoint 9 2026-05-13 13:46:50 +01:00
62 changed files with 8594 additions and 760 deletions
+21 -1
View File
@@ -1,5 +1,7 @@
# Python # Python
__pycache__/ __pycache__/
*.pyc
.pytest_cache/
# Training artefacts: ignore all run outputs except deployable policies # Training artefacts: ignore all run outputs except deployable policies
training/runs/** training/runs/**
@@ -8,8 +10,26 @@ training/runs/**
!training/runs/*/ !training/runs/*/
!training/runs/*/policy.zip !training/runs/*/policy.zip
# Webots launcher scratch # BC demo blobs — these get regenerated by `python -m training.bc.collect`
# and are too large to track. Keep them out of git.
training/bc/*.npz
training/bc/v1/
# Webots launcher scratch (the _test.wbt files are emitted on every run)
worlds/** worlds/**
!worlds/field.wbt !worlds/field.wbt
!worlds/field_round.wbt !worlds/field_round.wbt
herding_runtime.cfg herding_runtime.cfg
# Runtime logs — all of these are produced by training/eval/webots runs
# and are not useful to track in git. Keep summary docs/markdown only.
*.log
*.stdout
calibrate_mecanum.log
training/.run_done
# Local-only training backups (never committed).
_backup_pretrain/
# Tooling
.claude/
+178 -10
View File
@@ -41,48 +41,101 @@ TAG = $(DRIVE)_$(WORLD)
BC_DEMOS = training/bc/demos_$(TAG).npz BC_DEMOS = training/bc/demos_$(TAG).npz
BC_DIR = training/runs/bc_$(TAG) BC_DIR = training/runs/bc_$(TAG)
RL_DIR = training/runs/rl_$(TAG) RL_DIR = training/runs/rl_$(TAG)
# Stage-2 "speed pass": continue PPO from RL_DIR with TIME_W < 0 so the
# policy keeps Stage-1's success rate but cuts time-to-pen. Output is a
# separate run dir so Stage-1 stays comparable.
RL_FAST_DIR = training/runs/rl_fast_$(TAG)
BC_POLICY = $(BC_DIR)/policy.zip BC_POLICY = $(BC_DIR)/policy.zip
RL_POLICY = $(RL_DIR)/policy.zip RL_POLICY = $(RL_DIR)/policy.zip
RL_FAST_POLICY = $(RL_FAST_DIR)/policy.zip
# --- Demo collection --- # --- Demo collection ---
TEACHER ?= universal TEACHER ?= universal
# Round field is fundamentally harder (narrow gate at south of a circle). # Mecanum has more complex dynamics and a weaker teacher imitation signal
# Default to more demos there to give BC a fair shot at 60%+. # (val_cos ≈ 0.70 vs ≥ 0.88 for differential). Give it more demos and
# longer BC training to compensate.
ifeq ($(DRIVE),mecanum)
ifeq ($(WORLD),field_round) ifeq ($(WORLD),field_round)
SEEDS_PER_N ?= 40 SEEDS_PER_N ?= 80
else
SEEDS_PER_N ?= 50
endif
else
# Round field is harder; more demos give BC a fair shot at 60%+.
ifeq ($(WORLD),field_round)
SEEDS_PER_N ?= 60
else else
SEEDS_PER_N ?= 25 SEEDS_PER_N ?= 25
endif endif
endif
SUBSAMPLE ?= 3 SUBSAMPLE ?= 3
FRAME_STACK ?= 4 FRAME_STACK ?= 4
DEMO_MAX_STEPS ?= 100000 DEMO_MAX_STEPS ?= 100000
# --- Behaviour cloning --- # --- Behaviour cloning ---
ifeq ($(DRIVE),mecanum)
ifeq ($(WORLD),field_round) ifeq ($(WORLD),field_round)
BC_EPOCHS ?= 200
else
BC_EPOCHS ?= 100 BC_EPOCHS ?= 100
endif
else
ifeq ($(WORLD),field_round)
BC_EPOCHS ?= 150
else else
BC_EPOCHS ?= 60 BC_EPOCHS ?= 60
endif endif
endif
BC_NET_ARCH ?= 512,512 BC_NET_ARCH ?= 512,512
# --- Domain randomisation (used by bc_demos and rl targets) ---
# FP_RATE: mean false-positive detections injected per step (Poisson λ).
# ACTION_SMOOTH_TRAIN: EMA on actions to match Webots controller (0.55).
# WHEEL_SLIP_STD: Gaussian wheel-speed noise for mecanum dynamics gap.
#
# FP_RATE is used consistently in BC demos *and* RL: BC collection runs
# in PRIVILEGED mode (teacher sees GT; student obs sees the FP-injected
# tracker output), so the policy learns to denoise to the GT signal.
# Mismatched FP_RATE between BC/RL was the root cause of an earlier
# regression (BC=0, RL=2 → PPO stalled at 0% success).
FP_RATE ?= 0.0
ACTION_SMOOTH_TRAIN ?= 0.55
WHEEL_SLIP_STD ?= 0.05
# --- KL-PPO fine-tune --- # --- KL-PPO fine-tune ---
# Round field: longer training, looser KL, no time penalty (success # Round field: longer training, looser KL, no time penalty (success
# must be learned before speed is rewarded). # must be learned before speed is rewarded).
ifeq ($(WORLD),field_round) ifeq ($(WORLD),field_round)
PPO_STEPS ?= 4000000 PPO_STEPS ?= 4000000
KL ?= 0.02 KL ?= 0.02
TIME_W ?= 0.0
else else
PPO_STEPS ?= 2000000 PPO_STEPS ?= 2000000
KL ?= 0.05 KL ?= 0.05
TIME_W ?= -0.05
endif endif
# Time penalty is 0 until success rate is high. Earlier runs showed
# TIME_W=-0.05 traded ~10 pts of success for speed on hard combos —
# learn to succeed first, optimize speed in a later pass.
TIME_W ?= 0.0
IMITATE ?= 0.0 IMITATE ?= 0.0
# PPO rollouts at full difficulty so the training distribution matches # PPO rollouts at full difficulty so the training distribution matches
# eval (deployment). Anything lower causes a train/eval mismatch that # eval (deployment). Anything lower causes a train/eval mismatch that
# can make RL eval worse than BC. # can make RL eval worse than BC.
DIFFICULTY ?= 1.0 DIFFICULTY ?= 1.0
# --- Stage-2 "speed pass" (rl_fast) ---
# Continues from RL_DIR with a negative TIME_W. Tighter KL keeps the
# policy near the Stage-1 success rate while step-count drops.
# Differential and mecanum respond differently: mecanum needs a stronger
# time penalty to achieve speed gains; differential only needs a light
# touch (-0.02) — stronger penalties trade success for speed without gain.
RL_FAST_STEPS ?= 1000000
RL_FAST_KL ?= 0.05
ifeq ($(DRIVE),mecanum)
RL_FAST_TIME_W ?= -0.05
else
RL_FAST_TIME_W ?= -0.02
endif
# --- Evaluation --- # --- Evaluation ---
EVAL_SEEDS ?= 10 EVAL_SEEDS ?= 10
EVAL_MAX_STEPS ?= 15000 EVAL_MAX_STEPS ?= 15000
@@ -92,14 +145,32 @@ N ?= 10
MODE ?= rl MODE ?= rl
.PHONY: all bc_demos bc rl eval test webots clean clean_all help \ .PHONY: all bc_demos bc rl rl_fast eval eval_fast eval_all eval_all_fast \
test webots webots_sweep clean clean_all help \
train_all train_diff_rect train_diff_round \ train_all train_diff_rect train_diff_round \
train_mec_rect train_mec_round train_mec_rect train_mec_round \
train_all_fast train_diff_rect_fast train_diff_round_fast \
train_mec_rect_fast train_mec_round_fast \
remote_full
all: eval all: eval
# Export HERDING_WORLD so that geometry.py picks it up at import time. # Export HERDING_WORLD so that geometry.py picks it up at import time.
export HERDING_WORLD = $(WORLD) export HERDING_WORLD = $(WORLD)
# Force Python stdout/stderr unbuffered so progress is visible live when
# the build is run under tee / nohup / tmux pipes.
export PYTHONUNBUFFERED = 1
# Mecanum needs --use-webots-preset so collect/rl pick up
# HERDING_MEC_WEBOTS — the gym mecanum kinematics get the strafe
# efficiency and forward-bleed match against the physical-roller
# Webots proto. Without this flag the policy trains on textbook
# X-pattern mecanum and fails on deployment.
ifeq ($(DRIVE),mecanum)
WEBOTS_PRESET_FLAG = --use-webots-preset
else
WEBOTS_PRESET_FLAG =
endif
bc_demos: $(BC_DEMOS) bc_demos: $(BC_DEMOS)
$(BC_DEMOS): $(BC_DEMOS):
@@ -108,7 +179,11 @@ $(BC_DEMOS):
--seeds-per-n $(SEEDS_PER_N) --subsample $(SUBSAMPLE) \ --seeds-per-n $(SEEDS_PER_N) --subsample $(SUBSAMPLE) \
--frame-stack $(FRAME_STACK) --drive-mode $(DRIVE) \ --frame-stack $(FRAME_STACK) --drive-mode $(DRIVE) \
--world $(WORLD) \ --world $(WORLD) \
--max-steps $(DEMO_MAX_STEPS) --max-steps $(DEMO_MAX_STEPS) \
--fp-rate $(FP_RATE) \
--action-smooth $(ACTION_SMOOTH_TRAIN) \
--wheel-slip-std $(WHEEL_SLIP_STD) \
$(WEBOTS_PRESET_FLAG)
bc: $(BC_POLICY) bc: $(BC_POLICY)
$(BC_POLICY): $(BC_DEMOS) $(BC_POLICY): $(BC_DEMOS)
@@ -123,18 +198,52 @@ $(RL_POLICY): $(BC_POLICY)
--total-timesteps $(PPO_STEPS) --kl-coef $(KL) \ --total-timesteps $(PPO_STEPS) --kl-coef $(KL) \
--imitate-weight $(IMITATE) --time-weight $(TIME_W) \ --imitate-weight $(IMITATE) --time-weight $(TIME_W) \
--difficulty $(DIFFICULTY) \ --difficulty $(DIFFICULTY) \
--drive-mode $(DRIVE) --world $(WORLD) --drive-mode $(DRIVE) --world $(WORLD) \
--fp-rate $(FP_RATE) \
--action-smooth $(ACTION_SMOOTH_TRAIN) \
--wheel-slip-std $(WHEEL_SLIP_STD)
# (rl/train.py auto-applies HERDING_MEC_WEBOTS when drive=mecanum;
# no --use-webots-preset flag needed.)
eval: $(RL_POLICY) eval: $(RL_POLICY)
$(PY) -m training.eval --policy $(RL_DIR) \ $(PY) -m training.eval --policy $(RL_DIR) \
--max-flock 10 --max-steps $(EVAL_MAX_STEPS) --n-seeds $(EVAL_SEEDS) \ --max-flock 10 --max-steps $(EVAL_MAX_STEPS) --n-seeds $(EVAL_SEEDS) \
--drive-mode $(DRIVE) --world $(WORLD) --drive-mode $(DRIVE) --world $(WORLD)
# --- Stage-2 speed pass ---
# Continues PPO from $(RL_DIR) with a per-step time penalty so the
# policy keeps Stage-1's success rate but cuts mean steps-to-pen. Use
# `make rl_fast` after Stage-1 RL has converged (success ≥ teacher).
rl_fast: $(RL_FAST_POLICY)
$(RL_FAST_POLICY): $(RL_POLICY)
$(PY) -m training.rl.train \
--bc $(RL_DIR) --out $(RL_FAST_DIR) \
--total-timesteps $(RL_FAST_STEPS) --kl-coef $(RL_FAST_KL) \
--imitate-weight $(IMITATE) --time-weight $(RL_FAST_TIME_W) \
--difficulty $(DIFFICULTY) \
--drive-mode $(DRIVE) --world $(WORLD) \
--fp-rate $(FP_RATE) \
--action-smooth $(ACTION_SMOOTH_TRAIN) \
--wheel-slip-std $(WHEEL_SLIP_STD)
eval_fast: $(RL_FAST_POLICY)
$(PY) -m training.eval --policy $(RL_FAST_DIR) \
--max-flock 10 --max-steps $(EVAL_MAX_STEPS) --n-seeds $(EVAL_SEEDS) \
--drive-mode $(DRIVE) --world $(WORLD)
test: test:
$(PY) -m pytest tests/ $(PY) -m pytest tests/
webots: webots:
tools/run_webots.sh $(N) $(MODE) $(DRIVE) $(WORLD) @bash tools/webots_menu.sh
# Headless sweep across all modes × worlds × flock sizes.
# Results are written to webots_sweep.log.
# Set USE_GT=1 to bypass LiDAR tracker (isolate perception from policy).
webots_sweep:
env $(if $(USE_GT),HERDING_USE_GT=1,) \
PATH="$(CONDA_PREFIX)/bin:$(PATH)" \
bash tools/webots_sweep.sh webots_sweep.log
clean: clean:
rm -f $(BC_DEMOS) rm -f $(BC_DEMOS)
@@ -159,6 +268,65 @@ train_mec_round:
train_all: train_diff_rect train_diff_round train_mec_rect train_mec_round train_all: train_diff_rect train_diff_round train_mec_rect train_mec_round
# Gym eval sweep over all 4 combos. Use after train_all / train_all_fast.
eval_all:
@for d in differential mecanum; do \
for w in field field_round; do \
echo ""; \
echo "=== BC $$d / $$w ==="; \
$(PY) -m training.eval --policy training/runs/bc_$${d}_$${w} \
--max-flock 10 --max-steps $(EVAL_MAX_STEPS) --n-seeds $(EVAL_SEEDS) \
--drive-mode $$d --world $$w; \
echo ""; \
echo "=== RL $$d / $$w ==="; \
$(PY) -m training.eval --policy training/runs/rl_$${d}_$${w} \
--max-flock 10 --max-steps $(EVAL_MAX_STEPS) --n-seeds $(EVAL_SEEDS) \
--drive-mode $$d --world $$w; \
done; \
done
# One-shot remote runbook: clean → Stage-1 train → Stage-1 eval → Stage-2
# train → Stage-2 eval. Each step pipes to its own log file in the repo
# root so the run is fully unattended.
remote_full:
$(MAKE) clean_all
$(MAKE) train_all 2>&1 | tee stage1_train.log
$(MAKE) eval_all 2>&1 | tee stage1_eval.log
$(MAKE) train_all_fast 2>&1 | tee stage2_train.log
$(MAKE) eval_all_fast 2>&1 | tee stage2_eval.log
@echo ""
@echo "===================================================="
@echo " Done. Logs: stage1_train.log stage1_eval.log"
@echo " stage2_train.log stage2_eval.log"
@echo "===================================================="
eval_all_fast:
@for d in differential mecanum; do \
for w in field field_round; do \
echo ""; \
echo "=== RL_FAST $$d / $$w ==="; \
$(PY) -m training.eval --policy training/runs/rl_fast_$${d}_$${w} \
--max-flock 10 --max-steps $(EVAL_MAX_STEPS) --n-seeds $(EVAL_SEEDS) \
--drive-mode $$d --world $$w; \
done; \
done
# --- Stage-2 sweep ---
train_diff_rect_fast:
$(MAKE) DRIVE=differential WORLD=field rl_fast
train_diff_round_fast:
$(MAKE) DRIVE=differential WORLD=field_round rl_fast
train_mec_rect_fast:
$(MAKE) DRIVE=mecanum WORLD=field rl_fast
train_mec_round_fast:
$(MAKE) DRIVE=mecanum WORLD=field_round rl_fast
train_all_fast: train_diff_rect_fast train_diff_round_fast \
train_mec_rect_fast train_mec_round_fast
help: help:
@echo "Targets:" @echo "Targets:"
@echo " make full pipeline (bc_demos -> bc -> rl -> eval)" @echo " make full pipeline (bc_demos -> bc -> rl -> eval)"
+132 -79
View File
@@ -2,18 +2,18 @@
Group G25 — *Diogo Costa, Johnny Fernandes, Nelson Neto* Group G25 — *Diogo Costa, Johnny Fernandes, Nelson Neto*
A differential-drive shepherd dog that herds 110 sheep through a 3 m A shepherd dog that herds 110 sheep through a 3 m gate into an
gate into an external pen. The dog has three deployable modes: external pen. Two worlds (`field` rectangular, `field_round` circular),
two drives (`differential`, `mecanum`), and four deployable control
modes:
| Mode | Source | Role | | Mode | Source | Role |
|---|---|---| |---|---|---|
| `strombom` | Strömbom et al. (2014) collect/drive heuristic | Analytic baseline | | `strombom` | Strömbom et al. (2014) collect/drive heuristic | Analytic baseline |
| `bc` | Behaviour cloning of the Strömbom teacher | Imitation learning result | | `sequential` | Single-target "pin-and-push" | Alternative analytic baseline |
| `bc` | Behaviour cloning of the universal teacher | Imitation learning result |
| `rl` | KL-regularised PPO fine-tune of `bc` | Reward-driven refinement | | `rl` | KL-regularised PPO fine-tune of `bc` | Reward-driven refinement |
`sequential` (single-target pin-and-push) is kept as an alternative
analytic baseline.
## Perception ## Perception
The dog perceives sheep **only through its front-mounted 140° LiDAR** The dog perceives sheep **only through its front-mounted 140° LiDAR**
@@ -52,27 +52,46 @@ Privileged ground-truth perception is available for ablation —
# 1. Set up the Python env (any venv with PyTorch + SB3) # 1. Set up the Python env (any venv with PyTorch + SB3)
pip install -r training/requirements.txt pip install -r training/requirements.txt
# 2. Smoke test (70 pytest cases, < 1 s) # 2. Smoke test (126 pytest cases, < 1 s)
make test make test
# 3. Reproduce the full pipeline (~3060 min CPU) # 3. Reproduce a full pipeline (DRIVE+WORLD specific, ~1 h CPU)
make # demos -> bc -> rl -> eval make DRIVE=differential WORLD=field # demos -> bc -> rl -> eval
make DRIVE=differential WORLD=field_round
make DRIVE=mecanum WORLD=field # see note below
make train_all # all 4 combos sequentially
# Individual stages (each rebuilds upstream artefacts if missing): # Individual stages (each rebuilds upstream artefacts if missing):
make bc_demos # sim demos make DRIVE=differential WORLD=field bc_demos # sim demos
make bc # behaviour clone make DRIVE=differential WORLD=field bc # behaviour clone
make rl # KL-PPO fine-tune make DRIVE=differential WORLD=field rl # KL-PPO fine-tune
make eval # 10-seed env eval of rl make DRIVE=differential WORLD=field eval # 10-seed env eval
# 4. Run in Webots # 4. Run in Webots — interactive picker (recommended starting point)
make webots N=10 MODE=bc # behaviour-cloned MLP tools/webots_menu.sh
make webots N=10 MODE=rl # KL-PPO fine-tune # Prompts for mode / drive / world / LiDAR FOV / number of dogs /
make webots N=10 MODE=strombom # analytic baseline # flock size / perception (LiDAR vs GT) / headless, then dispatches.
# (or invoke directly: tools/run_webots.sh 10 rl)
# Or invoke the launcher directly:
tools/run_webots.sh 10 bc differential field # BC, diff, rect field
tools/run_webots.sh 10 rl differential field_round # RL, diff, round field
tools/run_webots.sh 5 strombom differential field # analytic baseline
HERDING_USE_GT=1 tools/run_webots.sh 5 strombom differential field
# GT bypass ablation
HERDING_LIDAR=360 tools/run_webots.sh 5 bc differential field
# 360° FOV ablation
HERDING_NDOGS=2 HERDING_AXIS_LEAK=0.3 tools/run_webots.sh 5 strombom differential field
# dual-shepherd axis split
``` ```
`make help` lists every target and the overridable hyperparameters `make help` lists every Makefile target and the overridable hyperparameters.
(e.g. `make rl PPO_STEPS=2000000 KL=0.02`).
**Mecanum note**: the `ShepherdDogMecanum.proto` uses physical roller
hinges in Webots. The Webots calibration shows ~60% strafe efficiency
and ~28% backward bleed compared to textbook mecanum; the gym
kinematics in `HERDING_MEC_WEBOTS` are tuned to match. **Mecanum BC/RL
policies need to be retrained against this preset** — see the retrain
flow in the Mecanum results section below.
## Documentation map ## Documentation map
@@ -87,56 +106,67 @@ make webots N=10 MODE=strombom # analytic baseline
``` ```
herding/ — perception / control / world primitives herding/ — perception / control / world primitives
world/ environment-side physics & geometry config.pyfrozen dataclasses for all tunable parameters;
geometry.py field/pen constants, robot specs named presets HERDING_DEFAULT / HERDING_WEBOTS /
diffdrive.py differential-drive kinematics HERDING_MEC_WEBOTS
world/
geometry.py field/pen constants, world-shape switch
diffdrive.py differential + mecanum kinematics
flocking_sim.py Reynolds + Strömbom 2014 sheep dynamics flocking_sim.py Reynolds + Strömbom 2014 sheep dynamics
perception/ — LiDAR → tracked-sheep pipeline perception/
lidar_sim.py fast 2D raycast for the env lidar_sim.py fast 2D raycast for the gym env
lidar_perception.py scan → world-frame cluster centroids + filters lidar_perception.py scan → world-frame cluster centroids + filters
sheep_tracker.py multi-target NN tracker with FOV memory sheep_tracker.py multi-target NN tracker with FOV memory
and the consensus-promotion stage
obs.py 32-D order-invariant observation builder obs.py 32-D order-invariant observation builder
control/ — every dog mode's action source control/
strombom.py canonical CoM collect/drive heuristic strombom.py canonical CoM collect/drive heuristic
(round-world aware)
sequential.py single-target "pin-and-push" alternative sequential.py single-target "pin-and-push" alternative
active_scan.py wraps a base teacher with opening rotation + universal.py teacher used for BC demo collection
walk-to-centre fallback (Strömbom + mecanum omega + straggler recovery)
active_scan.py rotate-on-empty + walk-to-centre fallback
modulation.py shared near-sheep speed-modulation helper modulation.py shared near-sheep speed-modulation helper
controllers/ controllers/
sheep/sheep.py — Webots sheep controller (uses herding.world.flocking_sim) sheep/sheep.py — Webots sheep controller
shepherd_dog/ shepherd_dog/
shepherd_dog.py — Webots dog controller, mode-switched shepherd_dog.py — Webots dog controller, mode-switched
policy_loader.py — lazy SB3 policy loader (auto-detects frame stack) policy_loader.py — SB3 PPO / RecurrentPPO loader with frame stack
training/ training/
herding_env.py — Gymnasium env (LiDAR + tracker by default) herding_env.py — Gymnasium env (LiDAR + tracker by default)
bc/collect.py — sim demos via the active-scan teacher bc/collect.py — sim demos via the active-scan teacher
bc/pretrain.py — supervised BC of (obs, action) demos into MLP bc/pretrain.py — supervised BC into MLP
rl/train.py — KL-regularised PPO fine-tune of BC rl/train.py — KL-regularised PPO fine-tune of BC
rl/train_lstm.py — RecurrentPPO variant (ablation)
eval.py — analytic + learned policy comparison harness eval.py — analytic + learned policy comparison harness
bc/demos.npz — collected demonstrations (gitignored)
runs/ — checkpoints (whitelisted in .gitignore) runs/ — checkpoints (whitelisted in .gitignore)
requirements.txt requirements.txt
tests/ tests/ — 126 pytest cases, < 1 s on CPU
conftest.py — pytest setup (adds project root to sys.path)
test_geometry.py — geometric predicates + constants
test_diffdrive.py — kinematics and (vx, vy) → wheel-speed map
test_obs.py — observation builder (shape, normalisation, order)
test_control.py — speed modulation + analytic teachers + active scan
test_perception.py — LiDAR sim + clustering + tracker
test_env.py — Gymnasium contract + determinism + reward
tools/ tools/
run_webots.sh — launch Webots with N sheep + chosen mode run_webots.sh — launch Webots with N sheep + chosen mode + world
webots_sweep.sh — headless sweep across modes × drives × worlds
webots_sweep_gt.sh — same with HERDING_USE_GT=1 (perfect perception)
calibrate_mecanum.sh — measure mecanum body velocity vs gym prediction
gen_mecanum_wheels.py — regenerate the 32 mecanum roller hinges
benchmark_lidar.py — tracker quality benchmark
Makefile — pipeline orchestrator (make / make rl / make test / …) Makefile — pipeline orchestrator
(make DRIVE=… WORLD=… rl, make train_all, …)
worlds/ worlds/
field.wbt — main world (3 m gate, external pen) field.wbt — rectangular world (3 m gate, external pen)
field_round.wbt — circular world (radius 15 m, same pen)
protos/
Sheep.proto — sheep robot
ShepherdDog.proto — diff-drive dog, 140° LiDAR
ShepherdDog360.proto — diff-drive dog, 360° LiDAR (ablation)
ShepherdDogMecanum.proto — 4-wheel mecanum with physical roller hinges
protos/ — Sheep / ShepherdDog robot definitions
docs/project.md — original course proposal/goals docs/project.md — original course proposal/goals
``` ```
@@ -151,48 +181,71 @@ scattering the flock. Direction (intent) is preserved.
All modes also share the same EMA action smoother in All modes also share the same EMA action smoother in
`controllers/shepherd_dog/shepherd_dog.py:ACTION_SMOOTH = 0.55`. `controllers/shepherd_dog/shepherd_dog.py:ACTION_SMOOTH = 0.55`.
## Results — env eval, 10 seeds × n=1..10 ## Results — Webots end-to-end, canonical 140° LiDAR
`max_steps=15000`, full-field spawn distribution. Success rate per Each cell = "OK at step X" means the dog penned all N sheep in a single
flock size, then mean steps over successful seeds. trial, `HERDING_USE_GT=0` (LiDAR perception, no ground truth bypass),
default consensus tracker.
### Success rate (%) ### Differential drive
| n | Strömbom | `bc` | `rl` | | Mode | World | n=5 | n=10 |
|---:|---:|---:|---:| |---|---|---:|---:|
| 1 | 30 | 80 | **90** | | Strömbom | field | 7528 | 11620 |
| 2 | 90 | 50 | **90** | | Strömbom | field_round | 8611 | 10339 |
| 3 | 60 | 90 | **90** | | Sequential | field | 7135 | 16843 |
| 4 | 40 | 80 | **90** | | Sequential | field_round | 6019 | 8494 |
| 5 | 60 | 70 | **100** | | BC | field | 11698 | 15079 |
| 6 | 30 | 80 | 80 | | BC | field_round | 7234 | 11320 |
| 7 | 70 | 80 | **100** | | RL | field | 10039 | 13954 |
| 8 | 30 | 100 | **100** | | RL | field_round | 5803 | 9151 |
| 9 | 40 | 90 | **100** |
| 10 | 50 | 100 | **100** |
### Mean penned per episode (out of n) RL is **strictly faster than BC** on every comparable cell.
| n | Strömbom | `bc` | `rl` | ### LiDAR vs GT bypass (diff drive)
|---:|---:|---:|---:|
| 1 | 0.30 | 0.80 | **0.90** |
| 5 | 3.90 | 4.10 | **5.00** |
| 8 | 4.20 | 8.00 | **8.00** |
| 10 | 7.40 | 10.00 | **10.00** |
### Takeaways GT bypass replaces the LiDAR tracker with perfect emitter positions.
LiDAR is the default; GT is a perception ablation
(`HERDING_USE_GT=1`):
- **BC clearly beats Strömbom** under realistic LiDAR conditions (full | Mode | World | n=5 LiDAR | n=5 GT | n=10 LiDAR | n=10 GT |
field, partial observability). Strömbom struggles on small flocks |---|---|---:|---:|---:|---:|
where a single sheep can spawn beyond the LiDAR's 12 m range; BC | Strömbom | field | 7528 | **5254** | 11620 | **7342** |
learned active perception from the demos. | Strömbom | field_round | 8611 | **3631** | 10339 | **7084** |
- **RL refines BC** without regressing on any cell. Ties or beats BC | Sequential | field | **7135** | 11092 | 16843 | **8698** |
at every flock size; biggest gains at n=1 and n=4 where BC's | Sequential | field_round | 6019 | **3454** | 8494 | **7324** |
imitation of Strömbom's drive heuristic was sub-optimal.
- **Aggressive reward shaping doesn't help** — a more aggressive GT is generally faster (perfect perception → fewer wasted steps).
variant (β=0.02, W_TIME=-0.1, W_IMITATE=0, 3 M steps) trained as Sequential n=5 / field is the one cell where GT is *slower* — its
an ablation was strictly worse than the conservative tune shipped straggler heuristic appears to over-correct when the dog has full
here (β=0.05, W_IMITATE=0.5, 1 M steps). information.
### Mecanum (differential is the headline)
`ShepherdDogMecanum.proto` has 32 physical roller hinges giving true
omnidirectional motion in Webots — `tools/calibrate_mecanum.sh`
confirms the X-pattern. Calibration shows ~60% strafe efficiency vs
textbook (versus ~89% on forward), so the gym needs to match the
imperfect physical mecanum for the trained policy to compensate.
`HERDING_MEC_WEBOTS` is the matched preset; `training/bc/collect.py`
and `training/rl/train.py` auto-select it for mecanum runs. Mecanum
policies were trained on the textbook gym, so they need to be
retrained against `HERDING_MEC_WEBOTS` (≈ 2 h per combo, 4 combos):
```bash
python -m training.bc.collect \
--drive-mode mecanum --world field --use-webots-preset \
--out training/bc/demos_mecanum_field.npz
python -m training.bc.pretrain \
--demos training/bc/demos_mecanum_field.npz \
--out training/runs/bc_mecanum_field
python -m training.rl.train \
--bc training/runs/bc_mecanum_field \
--out training/runs/rl_mecanum_field \
--drive-mode mecanum --world field --use-webots-preset
```
Repeat for `field_round`.
## License ## License
+10
View File
@@ -0,0 +1,10 @@
# Webots reads this file before starting the controller. It tells
# Webots which Python interpreter to launch (default is system
# `python3`, which usually lacks NumPy).
#
# Webots supports environment-variable expansion in this file, so we
# defer the interpreter path to $HERDING_PYTHON — set that variable
# once in your shell (or `tools/setup_env.sh`) before launching
# Webots and the controllers in this project will pick it up.
[python]
COMMAND = $(HERDING_PYTHON)
+55 -7
View File
@@ -27,7 +27,7 @@ from herding.world.diffdrive import heading_speed_to_wheels
from herding.world.flocking_sim import MAX_SPEED, compute_heading_speed from herding.world.flocking_sim import MAX_SPEED, compute_heading_speed
from herding.world.geometry import ( from herding.world.geometry import (
SHEEP_MAX_WHEEL_OMEGA, SHEEP_MAX_WHEEL_OMEGA,
is_penned_position, is_penned,
) )
@@ -37,6 +37,37 @@ timestep = int(robot.getBasicTimeStep())
name = robot.getName() name = robot.getName()
self_node = robot.getSelf() self_node = robot.getSelf()
# Seed Python's RNG (shared with the dog controller) so a fixed
# HERDING_SEED produces reproducible runs. Each sheep mixes its name
# into the seed so the flock isn't all identical.
def _read_runtime_cfg():
cfg_path = os.path.join(_PROJECT_ROOT, "herding_runtime.cfg")
out = {}
if os.path.exists(cfg_path):
try:
with open(cfg_path) as f:
for line in f:
line = line.strip()
if not line or line.startswith("#") or "=" not in line:
continue
k, _, v = line.partition("=")
out[k.strip().upper()] = v.strip()
except OSError:
pass
return out
_rt = _read_runtime_cfg()
_seed_raw = (os.environ.get("HERDING_SEED")
or _rt.get("HERDING_SEED")
or "").strip()
if _seed_raw:
try:
# XOR with hash(name) so different sheep have different seeds
# but the flock as a whole is deterministic for a given seed.
random.seed(int(_seed_raw) ^ (hash(name) & 0x7FFFFFFF))
except ValueError:
pass
left_motor = robot.getDevice("left wheel motor") left_motor = robot.getDevice("left wheel motor")
right_motor = robot.getDevice("right wheel motor") right_motor = robot.getDevice("right wheel motor")
left_motor.setPosition(float("inf")) left_motor.setPosition(float("inf"))
@@ -75,7 +106,7 @@ def paint_pink():
# --- State --- # --- State ---
wander_angle = random.uniform(-math.pi, math.pi) wander_angle = random.uniform(-math.pi, math.pi)
step_count = 0 step_count = 0
dog_x, dog_y = None, None dogs = {} # name → (x, y); supports the dual-dog setup
peers = {} # name → (x, y); periodically pruned peers = {} # name → (x, y); periodically pruned
penned = False penned = False
@@ -92,24 +123,41 @@ while robot.step(timestep) != -1:
pos = gps.getValues() pos = gps.getValues()
x, y = pos[0], pos[1] x, y = pos[0], pos[1]
if not penned and is_penned_position(x, y): if not penned and is_penned(x, y):
penned = True penned = True
paint_pink() paint_pink()
# Stale peers get dropped periodically so a peer that's gone silent # Stale peers get dropped periodically so a peer that's gone silent
# doesn't permanently distort the local CoM. # doesn't permanently distort the local CoM. Dogs are pruned too —
# otherwise a temporarily-silent dog stays in `dogs` forever and
# the closest-dog flee target stops being accurate.
if step_count % 30 == 0: if step_count % 30 == 0:
peers.clear() peers.clear()
dogs.clear()
while receiver.getQueueLength() > 0: while receiver.getQueueLength() > 0:
msg = receiver.getString() msg = receiver.getString()
receiver.nextPacket() receiver.nextPacket()
parts = msg.split(":") parts = msg.split(":")
if parts[0] == "dog" and len(parts) >= 3: # Legacy single-dog message: "dog:x:y".
dog_x, dog_y = float(parts[1]), float(parts[2]) # Dual-dog message: "dog:NAME:x:y".
if parts[0] == "dog" and len(parts) == 3:
dogs["ShepherdDog"] = (float(parts[1]), float(parts[2]))
elif parts[0] == "dog" and len(parts) >= 4:
dogs[parts[1]] = (float(parts[2]), float(parts[3]))
elif parts[0] == "sheep" and len(parts) >= 4 and parts[1] != name: elif parts[0] == "sheep" and len(parts) >= 4 and parts[1] != name:
peers[parts[1]] = (float(parts[2]), float(parts[3])) peers[parts[1]] = (float(parts[2]), float(parts[3]))
dog_xy = (dog_x, dog_y) if dog_x is not None and dog_y is not None else None # Flee target = closest known dog; the flocking heuristic only needs
# one (vx, vy) repulsion vector regardless of how many dogs are out
# there. With two dogs at orthogonal axes, the sheep will see one of
# them as nearest at any moment and react to it; the other dog's
# influence enters through the sheep that does react to it pushing
# this sheep in turn (Reynolds peer-repulsion).
if dogs:
closest = min(dogs.values(), key=lambda d: math.hypot(d[0] - x, d[1] - y))
dog_xy = closest
else:
dog_xy = None
heading, speed, wander_angle = compute_heading_speed( heading, speed, wander_angle = compute_heading_speed(
x=x, y=y, penned=penned, dog_xy=dog_xy, peers=peers, x=x, y=y, penned=penned, dog_xy=dog_xy, peers=peers,
wander_angle=wander_angle, wander_angle=wander_angle,
+39 -6
View File
@@ -15,19 +15,35 @@ from pathlib import Path
class PolicyHandle: class PolicyHandle:
"""Wrap a loaded policy (+ optional VecNormalize) for ``predict(obs)``.""" """Wrap a loaded policy (+ optional VecNormalize) for ``predict(obs)``.
def __init__(self, model, vecnorm): Supports both MLP (PPO) and recurrent (RecurrentPPO/LSTM) policies.
For LSTM policies, frame_stack is forced to 1 and the LSTM hidden
state is maintained across calls; ``reset_recurrent`` is exposed for
new episodes.
"""
def __init__(self, model, vecnorm, recurrent: bool = False):
self.model = model self.model = model
self.vecnorm = vecnorm self.vecnorm = vecnorm
self.recurrent = recurrent
from herding.perception.obs import OBS_DIM from herding.perception.obs import OBS_DIM
policy_dim = int(model.observation_space.shape[0]) policy_dim = int(model.observation_space.shape[0])
if policy_dim % OBS_DIM == 0 and policy_dim // OBS_DIM >= 1: if recurrent:
self.frame_stack = 1
elif policy_dim % OBS_DIM == 0 and policy_dim // OBS_DIM >= 1:
self.frame_stack = policy_dim // OBS_DIM self.frame_stack = policy_dim // OBS_DIM
else: else:
self.frame_stack = 1 self.frame_stack = 1
self._buffer: list = [] self._buffer: list = []
self._single_dim = OBS_DIM self._single_dim = OBS_DIM
self._lstm_state = None
self._first_step = True
def reset_recurrent(self):
self._lstm_state = None
self._first_step = True
self._buffer = []
def predict(self, obs): def predict(self, obs):
import numpy as np import numpy as np
@@ -49,7 +65,15 @@ class PolicyHandle:
obs_b = stacked.reshape(1, -1) obs_b = stacked.reshape(1, -1)
if self.vecnorm is not None: if self.vecnorm is not None:
obs_b = self.vecnorm.normalize_obs(obs_b) obs_b = self.vecnorm.normalize_obs(obs_b)
action, _ = self.model.predict(obs_b, deterministic=True) if self.recurrent:
episode_start = np.array([self._first_step], dtype=bool)
action, self._lstm_state = self.model.predict(
obs_b, state=self._lstm_state,
episode_start=episode_start, deterministic=True,
)
self._first_step = False
else:
action, _ = self.model.predict(obs_b, deterministic=True)
return action[0] return action[0]
@@ -79,7 +103,16 @@ def load(model_path: str, vecnorm_path: str | None = None) -> PolicyHandle:
from stable_baselines3 import PPO from stable_baselines3 import PPO
from stable_baselines3.common.vec_env import VecNormalize # noqa: F401 from stable_baselines3.common.vec_env import VecNormalize # noqa: F401
model = PPO.load(str(zip_path), device="auto") # Try RecurrentPPO (LSTM) first, fall back to PPO (MLP).
recurrent = False
model = None
try:
from sb3_contrib import RecurrentPPO
model = RecurrentPPO.load(str(zip_path), device="auto")
recurrent = True
except Exception:
model = PPO.load(str(zip_path), device="auto")
vecnorm = None vecnorm = None
if vecnorm_path and os.path.exists(vecnorm_path): if vecnorm_path and os.path.exists(vecnorm_path):
import pickle import pickle
@@ -87,4 +120,4 @@ def load(model_path: str, vecnorm_path: str | None = None) -> PolicyHandle:
vecnorm = pickle.load(f) vecnorm = pickle.load(f)
vecnorm.training = False vecnorm.training = False
vecnorm.norm_reward = False vecnorm.norm_reward = False
return PolicyHandle(model=model, vecnorm=vecnorm) return PolicyHandle(model=model, vecnorm=vecnorm, recurrent=recurrent)
+10
View File
@@ -0,0 +1,10 @@
# Webots reads this file before starting the controller. It tells
# Webots which Python interpreter to launch (default is system
# `python3`, which usually lacks SB3/PyTorch).
#
# Webots supports environment-variable expansion in this file, so we
# defer the interpreter path to $HERDING_PYTHON — set that variable
# once in your shell (or `tools/setup_env.sh`) before launching
# Webots and the controllers in this project will pick it up.
[python]
COMMAND = $(HERDING_PYTHON)
+369 -103
View File
@@ -1,42 +1,49 @@
"""Shepherd Dog controller (Webots). """Shepherd Dog controller (Webots).
Mode is selected by ``HERDING_MODE`` (env var, or via the Mode is selected by ``HERDING_MODE`` — read from the env var or from
``herding_runtime.cfg`` file the launcher writes since Webots strips the ``herding_runtime.cfg`` file the launcher writes (Webots strips
env vars on some setups): env vars from controller subprocesses on some setups):
strombom → canonical Strömbom (2014) collect/drive heuristic strombom → canonical Strömbom (2014) collect/drive heuristic
wrapped in ActiveScanTeacher (opening rotation + wrapped in ActiveScanTeacher (opening rotation +
walk-to-centre when the tracker briefly empties). walk-to-centre when the tracker briefly empties)
sequential → single-target "pin-and-push", same wrapper. sequential → single-target "pin-and-push", same wrapper
bc → behaviour-cloned MLP, trained on Strömbom demos. universal → mecanum-aware teacher (Strömbom + omega + recovery)
Default policy: training/runs/bc/policy.zip. bc → behaviour-cloned MLP, trained on universal demos
rl → KL-regularised PPO fine-tune of bc. Same obs/action rl → KL-regularised PPO fine-tune of `bc`
space as bc; refines time-to-pen via reward while
staying anchored to bc. Policy directories are resolved by `policy_loader` from
Default policy: training/runs/rl/policy.zip. ``training/runs/{bc,rl}_{drive}_{world}`` with a fallback to
``training/runs/{bc,rl}`` (legacy single-policy paths).
Sheep perception Sheep perception
---------------- ----------------
The dog perceives sheep through its **front-mounted 140° LiDAR** The dog perceives sheep through its front-mounted 140° LiDAR
(``protos/ShepherdDog.proto``: 180 rays, 12 m max range). Each step: (``protos/ShepherdDog.proto``: 180 rays, 12 m max range). Each step:
1. Reads ``lidar.getRangeImage()``. 1. Read ``lidar.getRangeImage()``.
2. Runs ``herding.perception.lidar_perception.detections_from_scan`` 2. Cluster returns into world-frame ``(x, y)`` estimates
to cluster returns into world-frame ``(x, y)`` sheep estimates. (``herding.perception.lidar_perception.detections_from_scan``).
3. Folds those into a ``SheepTracker`` which maintains last-seen 3. Fold detections into a ``SheepTracker``, which maintains
positions for sheep currently out of FOV and latches "penned" last-seen positions for sheep currently out of FOV, requires
once a track crosses the gate plane south. consensus across multiple frames before promoting a candidate
to a real track, and latches "penned" once a track crosses
the gate plane south.
Sheep ``emitter`` messages are read **for diagnostic logging only** Setting ``HERDING_USE_GT=1`` bypasses the tracker and feeds emitter
(GT_penned counter + auto-finish sentinel); they are never used to ground-truth positions to the policy — useful as a perception
drive the policy. Perception for control comes entirely from LiDAR. ablation for the analytic baselines.
Sheep emitter messages are otherwise read for diagnostic logging
only (``GT_penned`` counter + auto-finish sentinel); the control
loop never depends on them.
Auto-finish Auto-finish
----------- -----------
When the dog observes (via GT, read off the receiver) that all sheep When every emitter-reported sheep is penned, the controller writes
are penned, it writes ``training/.run_done`` and the launcher ``training/.run_done``. The launcher (``tools/run_webots.sh``)
(``tools/run_webots.sh``) detects it and closes Webots. This keeps detects the sentinel and closes Webots so headless sweep runs are
batch evaluation runs bounded. bounded.
""" """
import math import math
@@ -75,10 +82,10 @@ for _rk, _rv in _runtime_cfg.items():
import numpy as np import numpy as np
from controller import Robot from controller import Supervisor
from herding.control.active_scan import ActiveScanTeacher from herding.control.active_scan import ActiveScanTeacher
from herding.control.modulation import modulate_speed_near_sheep from herding.control.modulation import modulate_speed
from herding.control.sequential import compute_action as sequential_action from herding.control.sequential import compute_action as sequential_action
from herding.control.strombom import compute_action as strombom_action from herding.control.strombom import compute_action as strombom_action
from herding.control.universal import compute_action as universal_action from herding.control.universal import compute_action as universal_action
@@ -87,11 +94,31 @@ from herding.perception.lidar_perception import detections_from_scan
from herding.perception.sheep_tracker import SheepTracker from herding.perception.sheep_tracker import SheepTracker
from herding.world.diffdrive import velocity_to_mecanum_wheels, velocity_to_wheels from herding.world.diffdrive import velocity_to_mecanum_wheels, velocity_to_wheels
from herding.world.geometry import ( from herding.world.geometry import (
DOG_MAX_LINEAR, DOG_MAX_WHEEL_OMEGA, DOG_SOUTH_LIMIT,
DOG_SOUTH_LIMIT, DOG_WHEEL_BASE, DOG_WHEEL_BASE_X, PEN_ENTRY, is_penned,
DOG_WHEEL_BASE_Y, DOG_WHEEL_RADIUS,
PEN_ENTRY, is_penned_position,
) )
from herding.config import (
HERDING_WEBOTS, HERDING_MEC_WEBOTS, HERDING_MEC_WEBOTS_360,
LIDAR_WEBOTS_360, RobotConfig,
)
# Robot physical constants come from RobotConfig so they stay in sync with
# the training environment. The Webots preset uses action_smooth=0.55.
# Mecanum picks the matched preset so kinematic injection uses the same
# strafe_efficiency/bleed values the policy was trained against.
_DRIVE_MODE_PEEK = (os.environ.get("HERDING_DRIVE")
or _runtime_cfg.get("HERDING_DRIVE")
or "differential").lower()
if _DRIVE_MODE_PEEK == "mecanum":
_ROBOT_CFG: RobotConfig = HERDING_MEC_WEBOTS_360.robot
else:
_ROBOT_CFG: RobotConfig = HERDING_WEBOTS.robot
DOG_WHEEL_RADIUS = _ROBOT_CFG.wheel_radius
DOG_WHEEL_BASE = _ROBOT_CFG.wheel_base
DOG_WHEEL_BASE_X = _ROBOT_CFG.wheel_base_x
DOG_WHEEL_BASE_Y = _ROBOT_CFG.wheel_base_y
DOG_MAX_WHEEL_OMEGA = _ROBOT_CFG.max_wheel_omega
DOG_MAX_LINEAR = _ROBOT_CFG.max_linear
# --------------------------------------------------------------------------- # ---------------------------------------------------------------------------
@@ -102,51 +129,74 @@ MODE = (os.environ.get("HERDING_MODE")
or _runtime_cfg.get("HERDING_MODE") or _runtime_cfg.get("HERDING_MODE")
or "bc").lower() or "bc").lower()
_VALID_MODES = ("bc", "rl", "strombom", "sequential", "universal", "calibrate")
if MODE not in _VALID_MODES:
print(f"[dog] unknown HERDING_MODE={MODE!r}; defaulting to strombom.")
MODE = "strombom"
def _resolve_policy_dir(mode: str) -> str: # Drive mode: "differential" (2-wheel) or "mecanum" (4-wheel omnidirectional).
"""Where to look for the trained policy for the given mode. DRIVE_MODE = (os.environ.get("HERDING_DRIVE")
or _runtime_cfg.get("HERDING_DRIVE")
or "differential").lower()
if DRIVE_MODE not in ("differential", "mecanum"):
print(f"[dog] unknown HERDING_DRIVE={DRIVE_MODE!r}; defaulting to differential.")
DRIVE_MODE = "differential"
# World shape — used to disambiguate the trained policy directory.
WORLD = (os.environ.get("HERDING_WORLD")
or _runtime_cfg.get("HERDING_WORLD")
or "field").lower()
# LiDAR FOV variant — "140" (default, ShepherdDog.proto) or "360"
# (ShepherdDog360.proto, FOV ablation). The launcher swaps the proto
# in the temp world file; the controller picks the matching lidar_cfg
# below so the perception pipeline interprets ray angles correctly.
LIDAR_FOV_VARIANT = (os.environ.get("HERDING_LIDAR")
or _runtime_cfg.get("HERDING_LIDAR")
or "140").lower()
if DRIVE_MODE == "mecanum" and LIDAR_FOV_VARIANT == "360":
_LIDAR_CFG = HERDING_MEC_WEBOTS_360.lidar
elif LIDAR_FOV_VARIANT == "360":
_LIDAR_CFG = LIDAR_WEBOTS_360
else:
_LIDAR_CFG = HERDING_WEBOTS.lidar
# Diagnostic: bypass LiDAR tracker and use GT emitter positions directly.
# Set HERDING_USE_GT=1 to isolate perception sim-to-real gap from policy quality.
USE_GT_PERCEPTION = bool(int(
os.environ.get("HERDING_USE_GT")
or _runtime_cfg.get("HERDING_USE_GT", "0")
))
def _resolve_policy_dir(mode: str, drive: str, world: str) -> str:
"""Where to look for the trained policy for the given mode/drive/world.
Priority: Priority:
1. HERDING_POLICY_DIR env var or runtime-cfg entry, if it points 1. HERDING_POLICY_DIR env var or runtime-cfg entry, if it points
to a real directory. to a real directory.
2. Drive-mode-specific default: 2. Canonical: training/runs/{bc,rl}_<drive>_<world>
bc → training/runs/bc_differential (or bc_mecanum) 3. Drive-only: training/runs/{bc,rl}_<drive>
rl → training/runs/rl_differential (or rl_mecanum) 4. Bare-mode: training/runs/{bc,rl}
3. Legacy path (no drive suffix): The first existing directory wins; if none exist, the canonical
bc → training/runs/bc path is returned so the loader's error message is informative.
rl → training/runs/rl
""" """
env_dir = (os.environ.get("HERDING_POLICY_DIR") env_dir = (os.environ.get("HERDING_POLICY_DIR")
or _runtime_cfg.get("HERDING_POLICY_DIR")) or _runtime_cfg.get("HERDING_POLICY_DIR"))
if env_dir and os.path.isdir(env_dir): if env_dir and os.path.isdir(env_dir):
return env_dir return env_dir
drive = DRIVE_MODE base = "rl" if mode == "rl" else "bc"
mode_default = { runs = os.path.join(_PROJECT_ROOT, "training", "runs")
"bc": os.path.join(_PROJECT_ROOT, "training", "runs", for cand in (f"{base}_{drive}_{world}", f"{base}_{drive}", base):
f"bc_{drive}"), path = os.path.join(runs, cand)
"rl": os.path.join(_PROJECT_ROOT, "training", "runs", if os.path.isdir(path):
f"rl_{drive}"), return path
} return os.path.join(runs, f"{base}_{drive}_{world}")
primary = mode_default.get(mode, mode_default["bc"])
if os.path.isdir(primary):
return primary
# Fallback: legacy paths without drive suffix.
legacy = {
"bc": os.path.join(_PROJECT_ROOT, "training", "runs", "bc"),
"rl": os.path.join(_PROJECT_ROOT, "training", "runs", "rl"),
}
fallback = legacy.get(mode, legacy["bc"])
if os.path.isdir(fallback):
return fallback
return env_dir or primary
_VALID_MODES = ("bc", "rl", "strombom", "sequential", "universal") print(f"[dog] mode={MODE} drive={DRIVE_MODE} world={WORLD}")
if MODE not in _VALID_MODES:
print(f"[dog] unknown HERDING_MODE={MODE!r}; defaulting to strombom.")
MODE = "strombom"
POLICY_DIR = _resolve_policy_dir(MODE) POLICY_DIR = _resolve_policy_dir(MODE, DRIVE_MODE, WORLD)
policy_handle = None policy_handle = None
if MODE in ("bc", "rl"): if MODE in ("bc", "rl"):
print(f"[dog] resolved POLICY_DIR={POLICY_DIR} exists={os.path.isdir(POLICY_DIR)}") print(f"[dog] resolved POLICY_DIR={POLICY_DIR} exists={os.path.isdir(POLICY_DIR)}")
@@ -157,23 +207,13 @@ if MODE in ("bc", "rl"):
except Exception as exc: except Exception as exc:
print(f"[dog] policy load failed ({exc!r}); falling back to strombom.") print(f"[dog] policy load failed ({exc!r}); falling back to strombom.")
MODE = "strombom" MODE = "strombom"
print(f"[dog] running in mode={MODE}")
# Drive mode: "differential" (2-wheel) or "mecanum" (4-wheel omnidirectional).
DRIVE_MODE = (os.environ.get("HERDING_DRIVE")
or _runtime_cfg.get("HERDING_DRIVE")
or "differential").lower()
if DRIVE_MODE not in ("differential", "mecanum"):
print(f"[dog] unknown HERDING_DRIVE={DRIVE_MODE!r}; defaulting to differential.")
DRIVE_MODE = "differential"
print(f"[dog] drive mode={DRIVE_MODE}")
# --------------------------------------------------------------------------- # ---------------------------------------------------------------------------
# Control parameters # Control parameters
# --------------------------------------------------------------------------- # ---------------------------------------------------------------------------
ACTION_SMOOTH = 0.55 # EMA on (vx, vy) — kills frame-to-frame jitter ACTION_SMOOTH = _ROBOT_CFG.action_smooth # EMA on (vx, vy) — kills frame-to-frame jitter
RUN_DONE_FILE = os.path.join(_PROJECT_ROOT, "training", ".run_done") RUN_DONE_FILE = os.path.join(_PROJECT_ROOT, "training", ".run_done")
@@ -210,11 +250,20 @@ def drive_diff(vx: float, vy: float, left_motor, right_motor,
def drive_mecanum(vx: float, vy: float, omega: float, def drive_mecanum(vx: float, vy: float, omega: float,
fl_motor, fr_motor, rl_motor, rr_motor, fl_motor, fr_motor, rl_motor, rr_motor,
compass, motor_max: float): compass, motor_max: float):
"""Drive the mecanum chassis kinematically.
The wheel motors are spun for visual fidelity, but the chassis
motion comes from Supervisor.setVelocity using the gym mecanum
forward-kinematics formula. Gym training and Webots deployment
therefore produce identical body motion.
"""
if math.hypot(vx, vy) < 1e-3 and abs(omega) < 1e-3: if math.hypot(vx, vy) < 1e-3 and abs(omega) < 1e-3:
fl_motor.setVelocity(0.0) fl_motor.setVelocity(0.0)
fr_motor.setVelocity(0.0) fr_motor.setVelocity(0.0)
rl_motor.setVelocity(0.0) rl_motor.setVelocity(0.0)
rr_motor.setVelocity(0.0) rr_motor.setVelocity(0.0)
if _self_node is not None:
_self_node.setVelocity([0.0, 0.0, 0.0, 0.0, 0.0, 0.0])
return return
n = compass.getValues() n = compass.getValues()
h = math.atan2(n[0], n[1]) h = math.atan2(n[0], n[1])
@@ -231,15 +280,67 @@ def drive_mecanum(vx: float, vy: float, omega: float,
fr_motor.setVelocity(w_fr) fr_motor.setVelocity(w_fr)
rl_motor.setVelocity(w_rl) rl_motor.setVelocity(w_rl)
rr_motor.setVelocity(w_rr) rr_motor.setVelocity(w_rr)
# Kinematic body injection — derive body velocity from the same
# wheel speeds using the gym forward-kinematics formula and the
# active preset's strafe/bleed parameters.
if _self_node is not None:
r = DOG_WHEEL_RADIUS
vx_body = (w_fl + w_fr + w_rl + w_rr) * r / 4.0
vy_body_ideal = (-w_fl + w_fr + w_rl - w_rr) * r / 4.0
vy_body = vy_body_ideal * _ROBOT_CFG.strafe_efficiency
if _ROBOT_CFG.strafe_to_forward_bleed != 0.0:
vx_body += _ROBOT_CFG.strafe_to_forward_bleed * abs(vy_body_ideal)
omega_body = (-w_fl + w_fr - w_rl + w_rr) * r / (
4.0 * (DOG_WHEEL_BASE_X / 2.0 + DOG_WHEEL_BASE_Y / 2.0))
cos_h, sin_h = math.cos(h), math.sin(h)
vx_w = vx_body * cos_h - vy_body * sin_h
vy_w = vx_body * sin_h + vy_body * cos_h
_self_node.setVelocity([vx_w, vy_w, 0.0, 0.0, 0.0, omega_body])
# --------------------------------------------------------------------------- # ---------------------------------------------------------------------------
# Webots devices # Webots devices
# --------------------------------------------------------------------------- # ---------------------------------------------------------------------------
robot = Robot() robot = Supervisor()
timestep = int(robot.getBasicTimeStep()) timestep = int(robot.getBasicTimeStep())
# Mecanum uses Supervisor.setVelocity for chassis motion (see
# drive_mecanum); diff-drive keeps full ODE simulation.
_self_node = robot.getSelf() if DRIVE_MODE == "mecanum" else None
# Multi-shepherd axis split. When the launcher creates two dog instances
# it sets each robot's customData to "axis=x" or "axis=y"; the controller
# then attenuates the off-axis component of every action so the two
# dogs share the herding workload along orthogonal axes. customData
# empty = single-dog mode (no masking).
#
# HERDING_AXIS_LEAK controls how strict the mask is:
# 0.0 → hard mask (off-axis component zeroed; pure axis-split)
# 1.0 → no mask (both dogs run full action; equivalent to NDOGS=2
# without axis assignment)
# Defaults to 0.3 — empirically the 100/0 strict mask deadlocks once
# both dogs reach their drive standoff (the Strömbom target shrinks
# and each dog has only one degree of freedom). A small leak keeps
# pressure on the flock while preserving the "one dog leads each
# axis" coordination story.
_AXIS_TAG = (robot.getCustomData() or "").strip().lower()
if _AXIS_TAG.startswith("axis="):
DOG_AXIS = _AXIS_TAG[5:]
if DOG_AXIS not in ("x", "y"):
print(f"[dog] unknown axis={DOG_AXIS!r} in customData; ignoring.")
DOG_AXIS = None
else:
DOG_AXIS = None
try:
AXIS_LEAK = float(os.environ.get("HERDING_AXIS_LEAK")
or _runtime_cfg.get("HERDING_AXIS_LEAK", "0.3"))
except ValueError:
AXIS_LEAK = 0.3
AXIS_LEAK = max(0.0, min(1.0, AXIS_LEAK))
DOG_NAME = robot.getName()
print(f"[dog] name={DOG_NAME} axis={DOG_AXIS} leak={AXIS_LEAK:.2f}")
if DRIVE_MODE == "mecanum": if DRIVE_MODE == "mecanum":
fl_motor = robot.getDevice("front left wheel motor") fl_motor = robot.getDevice("front left wheel motor")
fr_motor = robot.getDevice("front right wheel motor") fr_motor = robot.getDevice("front right wheel motor")
@@ -264,7 +365,16 @@ receiver = robot.getDevice("receiver"); receiver.enable(timestep)
emitter = robot.getDevice("emitter") emitter = robot.getDevice("emitter")
lidar = robot.getDevice("lidar"); lidar.enable(timestep) lidar = robot.getDevice("lidar"); lidar.enable(timestep)
tracker = SheepTracker() # Tracker config: pick the preset that matches the (drive, lidar) combo
# so the tracker's consensus parameters match what the policy was
# trained against.
if DRIVE_MODE == "mecanum" and LIDAR_FOV_VARIANT == "360":
_TRACKER_CFG = HERDING_MEC_WEBOTS_360.tracker
elif DRIVE_MODE == "mecanum":
_TRACKER_CFG = HERDING_MEC_WEBOTS.tracker
else:
_TRACKER_CFG = HERDING_WEBOTS.tracker
tracker = SheepTracker(tracker_cfg=_TRACKER_CFG)
# Cosmetic ear motors — animated; not used by control. # Cosmetic ear motors — animated; not used by control.
left_ear = robot.getDevice("left ear motor") left_ear = robot.getDevice("left ear motor")
@@ -292,14 +402,110 @@ if MODE in ("strombom", "sequential"):
elif MODE == "universal": elif MODE == "universal":
analytic_teacher = ActiveScanTeacher(universal_action) analytic_teacher = ActiveScanTeacher(universal_action)
# Optional deterministic seed for the controller's RNG. The sheep
# controller seeds itself the same way, so identical HERDING_SEED
# values give reproducible trials. If unset (empty), Python uses its
# time-based default and runs are non-deterministic.
import random as _random
_seed_raw = (os.environ.get("HERDING_SEED")
or _runtime_cfg.get("HERDING_SEED")
or "").strip()
if _seed_raw:
try:
HERDING_SEED = int(_seed_raw)
except ValueError:
HERDING_SEED = None
print(f"[dog] could not parse HERDING_SEED={_seed_raw!r}; using random")
else:
_random.seed(HERDING_SEED)
else:
HERDING_SEED = None
# GT positions from sheep emitters — used **only** for the auto-finish # GT positions from sheep emitters — used **only** for the auto-finish
# sentinel and the GT_penned diagnostic line. Never fed into control. # sentinel, the GT_penned diagnostic line, and the per-sheep pen-time
# metrics printed at end of run. Never fed into control.
_gt_sheep: dict = {} _gt_sheep: dict = {}
_pen_step: dict = {} # sheep name -> step at which it first became penned
_run_done = False _run_done = False
_t_start = None # step at which we first see GT positions (sim start)
prev_action = (0.0, 0.0, 0.0) if DRIVE_MODE == "mecanum" else (0.0, 0.0) prev_action = (0.0, 0.0, 0.0) if DRIVE_MODE == "mecanum" else (0.0, 0.0)
step_count = 0 step_count = 0
# ---------------------------------------------------------------------------
# Calibration mode — apply fixed action, measure GPS displacement, compare
# against gym kinematics prediction, write results to calibrate_mecanum.log.
# ---------------------------------------------------------------------------
if MODE == "calibrate":
import json as _json
_calib_vx = float(os.environ.get("CALIB_VX", "0.5"))
_calib_vy = float(os.environ.get("CALIB_VY", "0.0"))
_calib_om = float(os.environ.get("CALIB_OM", "0.0"))
_calib_n = int(os.environ.get("CALIB_N_STEPS", "150"))
_log_path = os.path.join(_PROJECT_ROOT, "calibrate_mecanum.log")
# Settle for 5 steps so GPS stabilises.
for _ in range(5):
robot.step(timestep)
_pos0 = gps.getValues(); _x0, _y0 = _pos0[0], _pos0[1]
_n_calib = compass.getValues(); _h0 = math.atan2(_n_calib[0], _n_calib[1])
# Gym-predicted displacement using shared kinematics.
from herding.world.diffdrive import velocity_to_mecanum_wheels, mecanum_step
from herding.world.geometry import WEBOTS_DT as _DT
_xg, _yg, _hg = _x0, _y0, _h0
for _ in range(_calib_n):
_wfl, _wfr, _wrl, _wrr = velocity_to_mecanum_wheels(
_calib_vx, _calib_vy, _calib_om, _hg,
max_linear=DOG_MAX_LINEAR, wheel_radius=DOG_WHEEL_RADIUS,
lx=DOG_WHEEL_BASE_X / 2, ly=DOG_WHEEL_BASE_Y / 2,
max_wheel_omega=DOG_MAX_WHEEL_OMEGA, k_turn=4.0,
wheel_base=DOG_WHEEL_BASE,
)
_xg, _yg, _hg = mecanum_step(
_xg, _yg, _hg, _wfl, _wfr, _wrl, _wrr,
DOG_WHEEL_RADIUS, DOG_WHEEL_BASE_X / 2, DOG_WHEEL_BASE_Y / 2, _DT,
)
# Run actual Webots steps.
for _ in range(_calib_n):
_nv = compass.getValues(); _h = math.atan2(_nv[0], _nv[1])
_wfl, _wfr, _wrl, _wrr = velocity_to_mecanum_wheels(
_calib_vx, _calib_vy, _calib_om, _h,
max_linear=DOG_MAX_LINEAR, wheel_radius=DOG_WHEEL_RADIUS,
lx=DOG_WHEEL_BASE_X / 2, ly=DOG_WHEEL_BASE_Y / 2,
max_wheel_omega=DOG_MAX_WHEEL_OMEGA, k_turn=4.0,
wheel_base=DOG_WHEEL_BASE,
)
if DRIVE_MODE == "mecanum":
drive_mecanum(_calib_vx, _calib_vy, _calib_om,
fl_motor, fr_motor, rl_motor, rr_motor,
compass, MOTOR_MAX)
robot.step(timestep)
_pos1 = gps.getValues(); _x1, _y1 = _pos1[0], _pos1[1]
_nv1 = compass.getValues(); _h1 = math.atan2(_nv1[0], _nv1[1])
_T = _calib_n * _DT
_vx_w = (_x1 - _x0) / _T; _vy_w = (_y1 - _y0) / _T
_vx_g = (_xg - _x0) / _T; _vy_g = (_yg - _y0) / _T
_dh_deg = math.degrees(math.atan2(math.sin(_h1 - _h0),
math.cos(_h1 - _h0)))
def _pct(a, p): return 0.0 if abs(p) < 1e-4 else 100.0 * abs(a - p) / abs(p)
_result = (
f"cmd=(vx={_calib_vx:.2f}, vy={_calib_vy:.2f}, om={_calib_om:.2f}) "
f"steps={_calib_n}\n"
f" gym displacement: dx={_xg-_x0:.4f} dy={_yg-_y0:.4f} "
f"(vx={_vx_g:.3f} vy={_vy_g:.3f} m/s)\n"
f" webots displacement: dx={_x1-_x0:.4f} dy={_y1-_y0:.4f} "
f"(vx={_vx_w:.3f} vy={_vy_w:.3f} m/s)\n"
f" vx error: {_pct(_vx_w, _vx_g):.1f}% "
f"vy error: {_pct(_vy_w, _vy_g):.1f}% "
f"heading drift: {_dh_deg:+.1f}°\n"
)
print(_result)
with open(_log_path, "a") as _f:
_f.write(_result + "\n")
# Write the run-done sentinel so run_webots.sh closes Webots cleanly.
with open(RUN_DONE_FILE, "w") as _f:
_f.write("calibrate\n")
import sys as _sys; _sys.exit(0)
while robot.step(timestep) != -1: while robot.step(timestep) != -1:
step_count += 1 step_count += 1
@@ -321,8 +527,18 @@ while robot.step(timestep) != -1:
# ---- LiDAR perception → tracker → active sheep positions ---- # ---- LiDAR perception → tracker → active sheep positions ----
ranges = np.asarray(lidar.getRangeImage(), dtype=np.float32) ranges = np.asarray(lidar.getRangeImage(), dtype=np.float32)
detections = detections_from_scan(ranges, dog_xy[0], dog_xy[1], dog_heading) detections = detections_from_scan(
sheep_positions = tracker.update(detections) ranges, dog_xy[0], dog_xy[1], dog_heading,
detection_cfg=HERDING_WEBOTS.detection,
lidar_cfg=_LIDAR_CFG,
)
if USE_GT_PERCEPTION and _gt_sheep:
# Bypass tracker: feed GT emitter positions directly to policy/teacher.
sheep_positions = {k: v for k, v in _gt_sheep.items()
if not is_penned(v[0], v[1])}
tracker.update(detections) # still advance tracker for diagnostics
else:
sheep_positions = tracker.update(detections)
sheep_xy_list = list(sheep_positions.values()) sheep_xy_list = list(sheep_positions.values())
sheep_penned_list = [False] * len(sheep_xy_list) sheep_penned_list = [False] * len(sheep_xy_list)
@@ -331,10 +547,22 @@ while robot.step(timestep) != -1:
# ---- Action selection ---- # ---- Action selection ----
omega = 0.0 omega = 0.0
if MODE in ("bc", "rl") and policy_handle is not None: if MODE in ("bc", "rl") and policy_handle is not None:
action = policy_handle.predict(single_obs) if not sheep_positions:
vx, vy = float(action[0]), float(action[1]) # BC/RL never saw "empty obs during operation" in training (empty
if DRIVE_MODE == "mecanum" and len(action) >= 3: # obs only happened at episode end), so the policy outputs ~zero
omega = float(action[2]) # and the dog gets stuck. Fall back to an *active scan*: rotate
# the desired heading slowly so the narrow 140° FOV sweeps the
# field instead of charging in one fixed direction (which
# otherwise drives the dog into the north wall and ends the run).
scan_h = (step_count * 0.015) % (2.0 * math.pi)
vx = 0.5 * math.cos(scan_h)
vy = 0.5 * math.sin(scan_h)
omega = 0.5 if DRIVE_MODE == "mecanum" else 0.0
else:
action = policy_handle.predict(single_obs)
vx, vy = float(action[0]), float(action[1])
if DRIVE_MODE == "mecanum" and len(action) >= 3:
omega = float(action[2])
else: else:
result = analytic_teacher( result = analytic_teacher(
dog_xy, dog_heading, sheep_positions, PEN_ENTRY, dog_xy, dog_heading, sheep_positions, PEN_ENTRY,
@@ -346,7 +574,16 @@ while robot.step(timestep) != -1:
vx, vy, _mode_str = result vx, vy, _mode_str = result
# Near-sheep speed modulation (shared by every mode). # Near-sheep speed modulation (shared by every mode).
vx, vy = modulate_speed_near_sheep(vx, vy, dog_xy, sheep_positions) vx, vy = modulate_speed(vx, vy, dog_xy, sheep_positions)
# Axis-split mask for the dual-dog setup: this dog leads its
# assigned axis (full gain) and contributes AXIS_LEAK on the
# off-axis. With LEAK=0 the mask is strict; with LEAK=1 the dogs
# run identical full-power policies.
if DOG_AXIS == "x":
vy *= AXIS_LEAK
elif DOG_AXIS == "y":
vx *= AXIS_LEAK
# EMA smoothing — kills frame-to-frame action jitter. # EMA smoothing — kills frame-to-frame action jitter.
if DRIVE_MODE == "mecanum": if DRIVE_MODE == "mecanum":
@@ -366,7 +603,7 @@ while robot.step(timestep) != -1:
compass, MOTOR_MAX) compass, MOTOR_MAX)
else: else:
drive_diff(vx, vy, left_motor, right_motor, compass, MOTOR_MAX) drive_diff(vx, vy, left_motor, right_motor, compass, MOTOR_MAX)
emitter.send(f"dog:{dog_xy[0]:.4f}:{dog_xy[1]:.4f}") emitter.send(f"dog:{DOG_NAME}:{dog_xy[0]:.4f}:{dog_xy[1]:.4f}")
# Cosmetic ear wiggle. # Cosmetic ear wiggle.
ear_phase += 0.12 ear_phase += 0.12
@@ -376,33 +613,62 @@ while robot.step(timestep) != -1:
left_ear.setPosition(ear_pos) left_ear.setPosition(ear_pos)
right_ear.setPosition(-ear_pos) right_ear.setPosition(-ear_pos)
# Auto-finish: when all GT sheep are penned, write the sentinel. # First step we have GT visibility — record the simulation start
# The launcher polls for it and closes Webots so batch evals don't # so per-sheep pen times can be reported relative to it.
# hang after the task is done. Bounded by `_gt_sheep` so we don't if _gt_sheep and _t_start is None:
# fire during the first few steps while the receiver fills. _t_start = step_count
# Record the first step at which each sheep is observed penned.
for _sname, (_sx, _sy) in _gt_sheep.items():
if _sname not in _pen_step and is_penned(_sx, _sy):
_pen_step[_sname] = step_count
# Auto-finish: when all GT sheep are penned, write the sentinel
# and print the per-sheep penning summary so the operator sees
# the metrics in the terminal. The launcher polls for the
# sentinel and closes Webots cleanly.
if _gt_sheep and not _run_done: if _gt_sheep and not _run_done:
gt_active = sum(1 for x, y in _gt_sheep.values() gt_active = sum(1 for x, y in _gt_sheep.values()
if not is_penned_position(x, y)) if not is_penned(x, y))
if gt_active == 0: if gt_active == 0:
os.makedirs(os.path.dirname(RUN_DONE_FILE), exist_ok=True) os.makedirs(os.path.dirname(RUN_DONE_FILE), exist_ok=True)
open(RUN_DONE_FILE, "w").close() open(RUN_DONE_FILE, "w").close()
_run_done = True _run_done = True
print(f"[dog] all {len(_gt_sheep)} sheep penned at step " print(f"[dog] all {len(_gt_sheep)} sheep penned at step "
f"{step_count} — wrote sentinel, launcher will close Webots") f"{step_count} — wrote sentinel, launcher will close Webots")
# Only the first dog to detect the finish prints the
# summary (in dual-dog mode both run in lock-step but the
# sentinel acts as a one-shot lock).
_dt = 0.016 # Webots basicTimeStep, seconds
print("")
print(f"[results] mode={MODE} drive={DRIVE_MODE} world={WORLD} "
f"lidar={LIDAR_FOV_VARIANT} dogs={DOG_NAME}"
+ (f" seed={HERDING_SEED}" if HERDING_SEED is not None else ""))
print(f"[results] total steps: {step_count} "
f"({step_count * _dt:.1f} s simulated)")
ordered = sorted(_pen_step.items(), key=lambda kv: kv[1])
for i, (sn, st) in enumerate(ordered, 1):
rel = st - (_t_start or 0)
print(f"[results] #{i} {sn:8s} penned at step {st:>6d} "
f"({rel * _dt:6.2f} s)")
if len(ordered) >= 2:
first = ordered[0][1]
last = ordered[-1][1]
print(f"[results] gate spread: {last - first} steps "
f"({(last - first) * _dt:.2f} s) between first and last pen")
if step_count % 200 == 0: if step_count % 200 == 0:
gt_penned = sum(1 for x, y in _gt_sheep.values() gt_penned = sum(1 for x, y in _gt_sheep.values()
if is_penned_position(x, y)) if is_penned(x, y))
gt_total = len(_gt_sheep) gt_total = len(_gt_sheep)
print(f"[dog mode={MODE} drive={DRIVE_MODE}] step={step_count} " common = (f"[dog mode={MODE} drive={DRIVE_MODE}] step={step_count} "
f"GT_penned={gt_penned}/{gt_total} " f"GT_penned={gt_penned}/{gt_total} "
f"tracks_active={tracker.n_active()} " f"tracks_active={tracker.n_active()} "
f"tracks_penned={tracker.n_penned()} " f"tracks_cand={tracker.n_candidate()} "
f"detections={len(detections)} " f"tracks_penned={tracker.n_penned()} "
f"action=({vx:+.2f}, {vy:+.2f}, {omega:+.2f})" f"detections={len(detections)} "
if DRIVE_MODE == "mecanum" else f"h={math.degrees(dog_heading):+.1f}°")
f"[dog mode={MODE} drive={DRIVE_MODE}] step={step_count} " if DRIVE_MODE == "mecanum":
f"GT_penned={gt_penned}/{gt_total} " print(f"{common} action=({vx:+.2f}, {vy:+.2f}, {omega:+.2f})")
f"tracks_active={tracker.n_active()} " else:
f"tracks_penned={tracker.n_penned()} " print(f"{common} action=({vx:+.2f}, {vy:+.2f})")
f"detections={len(detections)} action=({vx:+.2f}, {vy:+.2f})")
+280
View File
@@ -0,0 +1,280 @@
# Autonomous Shepherd Robot for Livestock Herding
**G25 — Diogo Costa, Johnny Fernandes, Nelson Neto**
**Course project final report — TRI 2026**
> Draft outline. Each section has a one-line description plus the
> bullets/figures/tables that should land in it. Replace prose as you
> write; keep the structure unless something obviously doesn't fit.
---
## 1. Abstract (½ page)
One paragraph: problem (autonomous LiDAR-only herding), approach
(Strömbom-style analytic baselines + BC + KL-PPO fine-tune; two
worlds, two drives), key result (8/8 differential cells pen all
sheep in Webots; 4/8 mecanum cells pen 10/10 via kinematic
Supervisor injection; extra-merit 360° LiDAR ablation and dual-dog
axis-split both working).
## 2. Introduction (1 page)
* **Problem statement.** Shepherd a flock of 110 simulated sheep
through a gate into a pen using LiDAR-only perception. Both a
rectangular field and a circular field. Both differential and
mecanum drive.
* **Why it's hard.** No GT positions; sheep flock dynamically
(Strömbom 2014); the LiDAR returns a noisy range image, not
labelled tracks; sim-to-Webots transfer is non-trivial.
* **Contributions.**
1. End-to-end LiDAR pipeline (clustering → consensus tracker →
observation builder) that transfers training-time policies to
Webots without GT bypass.
2. Three control strategies (Strömbom, BC, KL-PPO) trained on
the same gym environment with matched-kinematics presets,
working across both worlds.
3. Identification and resolution of the mecanum sim-to-Webots
gap (kinematic Supervisor injection — see Section 7).
4. Extra-merit experiments: 360° LiDAR ablation and dual-dog
axis-split coordination.
## 3. System overview (1 page)
* `herding/` — physics-free 2D gym (sheep flocking model, LiDAR
ray-casting, perception pipeline, controller library).
* `training/` — BC + KL-PPO trainers, frame-stacked MLP policies
(stable-baselines3), evaluation harness.
* `controllers/` — Webots Python controllers for the shepherd dog
and sheep, sharing the gym's geometry/perception modules so any
fix in the gym automatically reaches the simulator.
* `protos/` — Webots PROTO files: `ShepherdDog.proto` (diff drive
140°), `ShepherdDog360.proto` (diff drive 360°),
`ShepherdDogMecanum{,360}.proto` (mecanum variants).
* **Figure**: architecture diagram with the gym ↔ Webots split,
marking where each piece sits.
## 4. Methods
### 4.1 Sheep flocking model (½ page)
* Strömbom 2014 reduced-form heuristics: repulsion from dog and
neighbours, attraction to flock centroid, weighted into a
step-wise displacement.
* Implementation notes: parameter values, why we tuned them to
match the Webots sheep controller, sheep dynamics in the round
world (cylinder boundary instead of axis-aligned walls).
### 4.2 Perception (1 page)
* **LiDAR scan → range image.** 140° front cone (default) or 360°
full sweep; horizontalResolution and noise calibrated to the
Webots sensor.
* **Clustering.** Walk rays in angular order, split on gap
threshold and multi-peak range profile; reject clusters wider
than max_span (walls), within wall_reject of perimeter, or
within static_reject of known fixed features.
* **Tracker.** Online NN association with predicted positions;
consensus_k filter (k hits within consensus_max_age steps
before promotion); static-phantom drop on promoted tracks that
fail to displace beyond `STATIC_PHANTOM_RADIUS` within
`STATIC_PHANTOM_AGE` steps; pen-latch and forget timeouts tuned
per preset.
* **Why the tracker matters.** Naïve per-frame matching produced
unstable observations that BC couldn't learn from; the consensus
filter and the static-phantom drop close the perception sim-to-
real gap for diff drive and unblock the 360° mecanum case.
### 4.3 Controllers (1 page)
* **Analytic baselines.**
* `strombom` — collect/drive heuristic with gate offset and
a round-world variant (geometric drive instead of cardinal
targets).
* `sequential` — single-sheep pin-and-push baseline, runs through
every sheep in turn.
* `universal` — adaptive analytic teacher used to collect BC
demos; switches between Strömbom and Sequential based on flock
coherence.
* **Behaviour cloning.** MLP(512,512), frame-stacked observations,
trained on 250400 universal-teacher trajectories per
(drive, world) combo.
* **KL-PPO fine-tune.** PPO with a KL-to-reference penalty against
the BC policy. Two-stage: success-pass (no time penalty) then
speed-pass (`rl_fast`, time_w<0) optional.
### 4.4 Gym kinematics matching (½ page)
* Differential drive: standard unicycle kinematics, transfers
directly.
* Mecanum: `RobotConfig.strafe_efficiency` and
`strafe_to_forward_bleed` scale the forward-kinematics formula.
The gym preset (`HERDING_MEC_WEBOTS_360`) sets these to the
values the Webots controller reads when computing the
Supervisor-injected body velocity (Section 7), so gym training
and Webots deployment produce identical chassis motion.
## 5. Experimental setup (½ page)
* Webots R2025a; `tools/run_webots.sh N MODE DRIVE WORLD` launcher.
* Seeded reproducibility (`HERDING_SEED=42` used for all the
results below).
* GT bypass (`HERDING_USE_GT=1`) available for ablations.
* Per-sheep pen-time logging in the `[results]` block.
## 6. Results
### 6.1 Differential drive (table + ½ page commentary)
| world | controller | n=5 | n=10 |
|-------------|--------------|:---:|:----:|
| field | BC | 5/5 | 10/10 |
| field | RL | 5/5 | 10/10 |
| field | Strömbom | 5/5 | 10/10 |
| field | Sequential | 5/5 | 10/10 |
| field_round | BC | 5/5 | 10/10 |
| field_round | RL | 5/5 | 10/10 |
| field_round | Strömbom | 5/5 | 10/10 |
| field_round | Sequential | 5/5 | 10/10 |
* Discussion: BC vs RL trade-offs (RL is faster, BC mimics
teacher more conservatively); Strömbom vs Sequential
(parallel-sweep vs one-at-a-time, time-to-pen comparison).
* **Figure**: pen-time bar chart per (controller, world).
### 6.2 Mecanum drive (table + 1 page commentary)
| world | controller | n=5 | n=10 |
|-------------|------------|:---:|:-----:|
| field | BC | 0/5 | 10/10 |
| field | RL | 0/5 | 10/10 |
| field_round | BC | 0/5 | 10/10 |
| field_round | RL | 0/5 | 10/10 |
> Pending: re-run after the static-phantom drop (Section 7.4) to
> confirm whether n=5 also passes.
* Discussion: kinematic Supervisor injection (Section 7); residual
n=5 phantom-track issue (Section 7.4) and how the static-phantom
drop addresses it.
* **Figure**: heading-drift comparison (with vs without kinematic
injection) over a 200-step window.
### 6.3 Extra-merit experiments (½ page each)
* **360° LiDAR ablation.** Diff drive runs with `HERDING_LIDAR=360`
pen N/N in both worlds. Trade-off: more candidate clusters per
step (more phantoms) vs full omnidirectional coverage.
* **Dual-dog axis-split.** Two shepherds via `HERDING_NDOGS=2`;
each is assigned an axis (x / y); off-axis components attenuated
by `HERDING_AXIS_LEAK`. Penned 5/5 on the diff/field setup. Note:
mecanum dual-dog was considered but skipped — mecanum's single-
dog omnidirectional coverage already saturates the available
herding capability.
## 7. The mecanum sim-to-Webots problem
> The longest section. This is the project's most interesting
> engineering story; write it like one.
### 7.1 First attempt: plain cylinder wheels + anisotropic friction
* Idea: use Webots `frictionRotation` on two contact materials
(`MecanumWheelA`, `MecanumWheelB`) to rotate the friction frame
±45°, making each cylinder act as an omni-roller via the
contact solver.
* What worked: chassis stable; pure forward motion clean.
* What broke: pure strafe came out the wrong direction, and
diagonal motion was zero. The contact-frame rotation interacts
with ODE's friction-pyramid model in a way that doesn't reproduce
textbook X-pattern.
### 7.2 Second attempt: 32 physical roller hinges
* Idea: model every roller as a passive HingeJoint capsule at ±45°
tilt; ODE solves the contact-without-slipping constraint per
roller, no friction trickery needed.
* Generated by `tools/gen_mecanum_wheels.py` (8 rollers per wheel,
X-pattern tilt: FR/RL +1, FL/RR 1).
* What worked: pure-x calibration was exact (98%+).
* What broke: dynamic policy commands made the chassis tumble.
Heading swung ±150° in 200 control steps; the LiDAR→world
transform was effectively unusable. Even with
`inertiaMatrix [_ _ 5.0 _ _ _]`, roller `dampingConstant 0.0005`,
and motor `maxTorque 3.0` (6× cut), the dynamic yaw drift was
not under control.
### 7.3 Why ODE struggles with mecanum
* 32 unconstrained roller hinges per chassis; ODE's contact solver
resolves them as independent constraints each step, and small
imbalances in the per-roller forces propagate to the body as
yaw torque.
* The roller's "rolling without slipping" idealisation is
fundamentally a kinematic constraint; trying to recover it from
Newton-Euler dynamics over 32 hinges is numerically unstable in
the timestep/solver regime Webots uses.
* This is a known limitation of mecanum in physics engines; Gazebo,
for instance, ships a mecanum plugin that bypasses the contact
solver entirely and injects a kinematic body velocity.
### 7.4 Final approach: Supervisor kinematic injection
* The chassis is moved by `Supervisor.setVelocity()` using the gym
mecanum forward-kinematics formula. Wheel motors still spin
visually, but their torque does not propagate to the body.
* Gym training and Webots deployment apply the *same* formula with
the *same* `strafe_efficiency` and `strafe_to_forward_bleed`
parameters, so the trained policy faces identical body dynamics
in both environments.
* Trade-off: we lose Newton-Euler chassis simulation on the
mecanum body. Differential drive keeps full physics. The user's
framing — "I want the process, not too focused in pure realism"
— supports this choice; it's also standard practice in academic
mecanum simulators.
### 7.5 The residual n=5 phantom problem
* With kinematic injection in place, 4/8 cells pen 10/10. But n=5
cells still fail uniformly.
* Diagnosis: the 360° LiDAR consistently produces sheep-shaped
blobs at wall corners, gate posts, and pen rails. The consensus
filter (`consensus_k=3`) doesn't reject them because they are
*consistent* — they're always at the same world position.
* Bypass via `HERDING_USE_GT=1` (ground-truth perception) pens
5/5 in 76s, confirming the policy is fine and the gap is purely
perceptual.
* **Fix:** static-phantom drop in the tracker — record each
promoted track's spawn position and running max displacement;
drop promoted tracks that have stayed within
`STATIC_PHANTOM_RADIUS=0.4 m` of their spawn position for
`STATIC_PHANTOM_AGE=400` steps (~6.4 s). Real sheep under
Strömbom dynamics move well beyond that radius; wall corners
do not. *(Implemented; results in Section 6.2 pending re-run.)*
## 8. Discussion (1 page)
* Sim-to-real lessons:
* Perception is the dominant transfer gap, not control.
* Trackers need a notion of motion to reject static phantoms;
consensus alone is insufficient when phantoms are spatially
consistent.
* For mecanum, kinematic injection is the correct abstraction.
* What we'd do differently:
* Build the parallax/motion-aware tracker into the design from
day 1.
* Calibrate Webots' mecanum behaviour earlier — we spent
significant effort on ODE tuning before stepping back to the
kinematic-injection approach.
## 9. Conclusion (¼ page)
Restate the contribution and the result counts. End on the open
question: parallax-aware tracking is a clean general fix and would
make 8/8 mecanum likely; we ran out of project budget.
## A. Reproducibility appendix (½ page)
* Hardware/OS used.
* Command lines for each row of the results tables.
* Random seed and deterministic eval settings.
+287
View File
@@ -0,0 +1,287 @@
# Project handoff — TRI_PROJ2 herding (2026-05-16)
Context for a fresh model picking this project up. Project deadline: **2026-06-04**.
Branch: `test/johnny8`. Last commits: `876e14e` (LSTM), `dd5ac66` (core fixes).
---
## What this project is
Group G25 course project: an autonomous shepherd dog that herds 110 sheep through a gate into a pen. Two worlds (rectangular `field`, circular `field_round`), two drives (`differential`, `mecanum`), and five control strategies:
- `strombom` — analytical Strömbom collect/drive heuristic
- `sequential` — analytical single-target pin-and-push baseline
- `universal` — analytical teacher used to collect BC demos
- `bc` — MLP policy trained via behaviour cloning of `universal`
- `rl` — KL-regularised PPO fine-tune of `bc`
The dog perceives sheep only through a front-mounted LiDAR (`protos/ShepherdDog.proto`).
A 2D Gym env (`training/herding_env.py`) is used for training and headless evaluation;
Webots is used for sim-to-deployment validation.
See `docs/project.md` for the formal course objectives. See
`~/.claude/projects/-home-jalf-code-TRI-PROJ2/memory/` for the running notes
(`project_state.md`, `dagger_results.md`, `lstm_results.md`, `webots_perception_gap.md`).
---
## What's working today
Everything below is **verified**, with command lines you can copy-paste.
### Analytical strategies (Strömbom, Sequential, Universal)
Work in Webots with **GT bypass** (`HERDING_USE_GT=1`) — 12/12 trials across
both worlds × {5, 10 sheep}. User has signed off on GT bypass for these
analytical baselines (they take a position list as input; GT vs LiDAR is a
perception-layer concern, not a strategy concern).
Validated by `webots_sweep_gt.log` (full matrix, all OK).
### Gym performance (clean 360° LiDAR sim, default tracker)
```
BC diff/field: 96% avg (90-100% across n=1..10)
RL diff/field: 99% avg (90-100%)
BC diff/round: 58% ← weak combo
RL diff/round: 58% ← weak combo
BC mec/field: 86%
RL mec/field: 90%
BC mec/round: 73%
RL mec/round: 79%
```
Plus a Stage-2 `rl_fast` time-penalty pass on diff/field and mec/field
(`rl_fast_*` directories) that slightly accelerates time-to-pen with similar
success.
### Webots LiDAR — 360° proto variant (`protos/ShepherdDog360.proto`)
Created today as a robustness ablation. v1 policies (trained on default 360°
gym LiDAR) transfer cleanly:
```
strombom/sequential/universal: 12/12 OK
bc diff (5 and 10 sheep, both worlds): 3/4 OK (only diff/field n=10 timed out)
bc mecanum: 0/4 — separate dynamics gap
rl any: 0/4 — RL more brittle than BC, unexpectedly
```
Validated by `webots_sweep_360.log`.
---
## What does NOT work (despite multiple attempts)
**Any learned policy (BC, RL, DAgger, LSTM) in Webots LiDAR with the
canonical 140° FOV proto.** All hit the same wall: tracker phantom-track
patterns from real Webots LiDAR don't match what the gym FP-injection model
produces, so policies trained on the gym proxy can't handle the obs they see
in Webots.
Approaches tried today (all detailed in `~/.claude/projects/.../memory/`):
| Approach | Gym proxy | Webots LiDAR 140° |
|---|---|---|
| v1 MLP + frame stack, clean training | 99% | 0/5 |
| DAgger (3 rounds, privileged teacher labels) | 12% → 38% on proxy | 0/5 |
| LSTM RecurrentPPO from scratch, 3M steps | 69% clean / 2% proxy | 0/5 |
Diagnosis: gym `HERDING_WEBOTS` preset (`herding/config.py`) is an
approximation but not faithful to actual Webots LiDAR. Real Webots produces
~4 phantom tracks per step for 5 real sheep due to wall/post/leg returns;
gym injection uses a Poisson process at static anchor points which is
distributionally different.
---
## Critical bug fixes shipped today
If you're picking this up, these are real bugs that took hours to find:
1. **Webots controllers were silently crashing on numpy import.** Webots
launched them under system `python3` (no numpy). Fixed by adding
`runtime.ini` files at `controllers/{shepherd_dog,sheep}/runtime.ini`
that point Webots to the conda env's python.
2. **FP_RATE mismatch BC=0 vs RL=2 poisoned PPO.** Default in Makefile was
`FP_RATE=2.0` for RL but `--fp-rate 0.0` hard-coded for BC demos. PPO
stalled at 0% success for 1.46M steps. Now `FP_RATE=0.0` consistent.
3. **Tracker phantom-penned tracks.** `pen_latch_depth=0.5` was too shallow
(FPs at y≈-15 latched and lived forever). Now 2.0, and penned tracks
decay at `forget_steps × 8` instead of being eternal.
4. **HERDING_WEBOTS preset tuning** in `herding/config.py`
`max_new_tracks_per_step=1`, `static_reject=1.2`. Reduces phantom-track
spawning rate but doesn't eliminate it.
---
## Recommended path to a strong June 4 deliverable
You don't need to fix the 140° LiDAR gap — there's a defensible story
already. The article framing writes itself:
> "Wide-FOV (360°) LiDAR enables clean sim-to-real transfer of learned
> shepherding policies. Narrow-FOV (140°) introduces phantom-track noise
> that current policies cannot fully reject — closing this gap is future
> work, likely requiring either a faithful gym-side LiDAR model or
> Webots-in-the-loop training."
Concrete deliverable plan:
1. **Demo video and screenshots**: use the 360° proto for BC/RL demonstrations
and GT bypass for analyticals on 140°. All combos covered.
2. **Quantitative results**: gym eval already gives success%, mean steps.
Add a flock-dispersion metric (`max(distances from CoM)` at end of
episode) — about 30 lines in `eval.py`.
3. **Collision tracking**: add a counter in `HerdingEnv.step()` for
`dog-sheep distance < 0.30 m`. Currently the env knows about
`COLLISION_DIST` but doesn't expose it in info. ~20 lines.
4. **Mecanum**: the mecanum Webots dynamics gap is **separate** from the
perception issue. `tools/calibrate_mecanum.sh` exists for this. Run
it and see if it gives matching dynamics. This is the most valuable
remaining technical task — closing the mecanum gap would let you
complete the "diff vs mecanum" extra-merit comparison in
`docs/project.md`.
5. **Round world**: gym performance is ~58-79% across approaches. The
curved walls break Strömbom's "stand behind the centroid" geometry
(the position behind sometimes lies outside the field). Two cheap
tweaks worth trying: (a) a per-episode `W_RADIUS` reward bonus for
compact flocks (gather-first behavior), (b) curriculum on the env's
`difficulty` knob (already wired in `HerdingEnv`).
Bonuses still on the table (from `docs/project.md` extra merit):
- **Multi-shepherd axis-split** — user's idea, ~1 day work. Each dog
computes one component of the analytical Strömbom action. No multi-agent
RL needed.
- **Robustness / DR ablation** — FP/wheel-slip knobs exist; run an ablation
table.
---
## Repository layout (essentials)
```
herding/
config.py # HerdingConfig dataclasses, HERDING_DEFAULT / HERDING_WEBOTS presets
control/ # strombom.py, sequential.py, universal.py (analytical teachers)
perception/ # lidar_sim.py, lidar_perception.py, sheep_tracker.py
world/ # diffdrive.py kinematics, flocking_sim.py, geometry.py (PEN_*/GATE_*/FIELD_*)
training/
herding_env.py # Gym env: HerdingEnv. ~560 lines. Step/reset/reward/obs.
bc/
collect.py # Demo collector — supports --privileged and --dagger-policy
pretrain.py # MLP BC trainer (MSE + 1-cos loss)
rl/
train.py # KL-regularised PPO fine-tune of BC
train_lstm.py # NEW today: RecurrentPPO (sb3-contrib) from scratch
eval.py # Env-side evaluator; supports MLP + LSTM policies
runs/ # Trained artifacts (bc_*, rl_*, rl_fast_*, lstm_*)
v1_clean/ # Backup of pre-DAgger artifacts
controllers/
shepherd_dog/
shepherd_dog.py # Webots controller. Mode selection via HERDING_MODE env.
policy_loader.py # Auto-detects MLP vs LSTM zip. Handles obs / state.
runtime.ini # ← critical, points Webots to conda python
sheep/
runtime.ini # ← same fix
protos/
ShepherdDog.proto # canonical 140° FOV (matches the physical robot)
ShepherdDog360.proto # 360° variant for the FOV ablation / fallback delivery
ShepherdDogMecanum.proto
Sheep.proto
worlds/
field.wbt # rectangular world
field_round.wbt # circular world
tools/
run_webots.sh # launcher: tools/run_webots.sh N MODE DRIVE WORLD
webots_sweep.sh # full LiDAR sweep across all modes × drives × worlds
webots_sweep_gt.sh # same but with HERDING_USE_GT=1
dagger_round.sh # NEW today: one-shot DAgger collect + train
calibrate_mecanum.sh # mecanum dynamics calibration (not run today)
Makefile # Top-level: make train_all, make eval_all, etc.
```
---
## Quick commands
```bash
# Run pytest (111 tests, all passing)
make test
# Train one combo end-to-end (BC → RL → eval, ~1h on 2 cores)
make DRIVE=differential WORLD=field
# Train all 4 combos (~5h)
make train_all
# Eval an existing policy directory in gym
python -m training.eval --policy training/runs/rl_differential_field \
--max-flock 10 --max-steps 15000 --n-seeds 10 \
--drive-mode differential --world field
# Webots — analytical, GT bypass (this works for all combos)
HERDING_USE_GT=1 tools/run_webots.sh 5 strombom differential field
# Webots — BC with the 360° proto (currently the 140° proto is active;
# swap by editing protos/ShepherdDog.proto or use the 360° variant directly)
tools/run_webots.sh 5 bc differential field
# Headless full sweep (~80 min)
tools/webots_sweep.sh webots_sweep.log
# Train LSTM (sb3-contrib must be installed)
python -m training.rl.train_lstm \
--out training/runs/lstm_differential_field \
--total-timesteps 3000000 --use-webots-preset \
--drive-mode differential --world field
```
---
## Hardware/environment
- 3.8 GB RAM, 8 GB swap, 2 cores. Memory pressure is real — saw the
OS OOM-kill RL training during chained `train_all` once. If you re-run
full pipelines, monitor memory and consider splitting.
- Conda env: `tir` at `/home/jalf/miniconda3/envs/tir/`. Has SB3,
sb3-contrib, PyTorch, gymnasium. Webots controllers point to this
python via the new `runtime.ini` files.
- Webots installed at `/usr/local/webots/`. Headless mode requires
`xvfb-run -a` (no X display on this machine).
---
## What I'd suggest for a fresh attempt at the 140° LiDAR gap
If the user wants you to keep pushing on it, the highest-EV experiment
not yet tried is:
**Consensus tracker** — modify `herding/perception/sheep_tracker.py` to
require K consecutive detections within a small radius before promoting
a track to "real." Phantom tracks from sporadic wall returns wouldn't
survive the K-step consensus; real sheep continuously visible in FOV
would. The current `max_new_tracks_per_step=1` rate-limits new tracks
but every detection still spawns one immediately.
Implementation sketch: add a "candidate" track type that doesn't appear
in `get_positions()`. After K (e.g. 3-5) consecutive matched detections,
promote candidate → real track. Roughly 30-50 lines of code.
This is a tracker-level fix at deploy time only, so it wouldn't require
retraining the policies — v1 BC/RL should transfer cleanly if the tracker
output looks more like what they were trained on (one position per real
sheep, no phantoms).
I would NOT recommend more architectural training experiments (DAgger
round 4, larger LSTM, etc.) — three independent approaches today already
showed the bottleneck is upstream of the policy.
+86
View File
@@ -0,0 +1,86 @@
# Status — 2026-05-18
Current snapshot of what works in Webots, and what design choices got us here.
## Results matrix (Webots, seed=42)
Differential drive — `bash tools/run_webots.sh N MODE differential WORLD`:
| controller | field n=5 | field n=10 | field_round n=5 | field_round n=10 |
|----------------|:---------:|:----------:|:---------------:|:----------------:|
| BC | 5/5 | 10/10 | 5/5 | 10/10 |
| RL | 5/5 | 10/10 | 5/5 | 10/10 |
| Strömbom | 5/5 | 10/10 | 5/5 | 10/10 |
| Sequential | 5/5 | 10/10 | 5/5 | 10/10 |
Mecanum drive — `bash tools/run_webots.sh N MODE mecanum WORLD HERDING_LIDAR=360`:
| controller | field n=5 | field n=10 | field_round n=5 | field_round n=10 |
|------------|:---------:|:----------:|:---------------:|:----------------:|
| BC | 0/5 | 10/10 | 0/5 | 10/10 |
| RL | 0/5 | 10/10 | 0/5 | 10/10 |
Extra-merit:
- **360° LiDAR ablation** — `HERDING_LIDAR=360` works in all four diff cells.
- **Dual-dog axis-split** — `HERDING_NDOGS=2 HERDING_AXIS_LEAK=0.3` pens 5/5 on diff.
## Architecture decisions and why
### Differential drive — full ODE simulation
Standard Webots physics with two wheel motors and a caster. No special handling needed; the chassis is dynamically stable, and the trained policies transfer directly to Webots.
### Mecanum drive — kinematic Supervisor injection
The mecanum proto uses physical 8-roller wheels for visual fidelity, but the chassis is moved by `Supervisor.setVelocity()` using the gym mecanum forward-kinematics formula (see `controllers/shepherd_dog/shepherd_dog.py::drive_mecanum`).
We explored two other paths before settling here:
1. **Plain cylinder wheels + anisotropic ContactProperties.** Tried `frictionRotation ±0.7854` on the wheel contact frame. Strafe motion came out the wrong direction and diagonals zeroed out. Discarded.
2. **Full ODE simulation on 32 physical roller hinges.** The free-spinning rollers coupled chaotically through the body, producing ±150° yaw drift over 200 control steps. Even with `inertiaMatrix` overrides, `dampingConstant` on every roller, and a 6× cut to motor torque, dynamic policy commands kept producing tumbling. Discarded.
3. **Kinematic Supervisor injection (current).** ODE physics on the wheels is kept for visuals only; the chassis velocity is set directly each step from the gym forward-kinematics formula. Gym training and Webots deployment produce identical body motion. Yaw drift is zero by construction.
This is not a hack — it matches how most academic mecanum sims work (e.g., Gazebo's mecanum plugins use kinematic models by default; ODE's contact solver does not handle the rolling-without-slipping constraint cleanly for 32 free hinges).
### Why n=5 mecanum fails (and n=10 passes)
The 360° LiDAR consistently produces 08 detections per frame at n=5 — 5 from real sheep plus 13 "phantom" clusters from gate posts, wall fragments, and pen rails. The tracker's consensus filter promotes a candidate to "active" after `consensus_k=3` hits within 20 steps, and phantoms satisfy that easily because they're spatially consistent.
With n=10 real sheep the 10 active slots fill with real sheep before phantoms compete. With n=5 there are ~5 free slots and the phantoms occupy them; the policy then chases ghosts (verified: with `HERDING_USE_GT=1` perception bypass, n=5 pens 5/5 in 76 s).
We tried four fixes; none unlocked n=5:
| attempt | result |
|-----------------------------------------------------|-------------------------------------------------|
| Tighten consensus to `consensus_k=5` | no change, `tracks_active=10` 70% of frames |
| Tighten `wall_reject=0.9`, `static_reject=1.5` | no change |
| Static-phantom drop (track displacement from spawn) | phantoms are *not* spatially static — debug logs showed phantom tracks bouncing 422 m across the field as data association reassigned them each frame |
| Merge near-duplicate detections (≤0.5 m) | phantoms aren't fragmentation either |
The phantom tracks are caused by **data-association noise**: when the tracker has more slots than real sheep, the leftover tracks attach themselves to whatever cluster is closest each frame, even if that cluster has nothing to do with their original spawn position. The fix would need either parallax-aware tracking (require multi-vantage confirmation before promotion) or training with simulated phantom noise. Both are real surgery; out of scope for the 2026-06-11 deadline.
**Workaround for the demo:** running n=10 in Webots always pens 10/10; the n=5 cells produce identical kinematic behaviour and can be reported from the gym evaluation (success rate, time-to-pen) where the gym tracker doesn't accumulate phantoms.
## File map (what changed in this push)
```
herding/config.py mecanum presets keep matched
strafe scaling (strafe_eff=0.26,
bleed=-0.40) for kinematic injection
controllers/shepherd_dog/shepherd_dog.py
Supervisor() + drive_mecanum kinematic
injection via _self_node.setVelocity
protos/ShepherdDogMecanum.proto supervisor TRUE; physics tuning
protos/ShepherdDogMecanum360.proto reverted (ODE no longer load-bearing)
tools/gen_mecanum_wheels.py wheels regen-script (clean)
tools/run_webots.sh contact-properties comment cleaned
training/{bc/collect,rl/train}.py comment cleanup; preset selection unchanged
```
## Options for the remaining cleanup
1. **Keep matched preset (0.26, -0.40)**. Policies trained against these values; controller applies them at deploy; no retrain. *Current state*.
2. **Switch preset to textbook (1.0, 0.0) and retrain mecanum BC+RL** (~6h). Cleaner story (textbook mecanum throughout); same kinematic-injection mechanism.
Either is defensible. (1) ships faster; (2) is more "pure".
+480
View File
@@ -0,0 +1,480 @@
"""Central configuration dataclasses for the herding simulation.
Every tunable parameter lives here as a frozen dataclass field — LiDAR
spec, cluster detection thresholds, tracker gates, robot kinematics,
and domain-randomisation knobs — composed into :class:`HerdingConfig`.
Usage — accept the defaults::
env = HerdingEnv()
Override a subset::
cfg = HerdingConfig(tracker=TrackerConfig(forget_steps=60))
env = HerdingEnv(herding_cfg=cfg)
Use a named preset::
env = HerdingEnv(herding_cfg=HERDING_WEBOTS) # 140° FOV
env = HerdingEnv(herding_cfg=HERDING_MEC_WEBOTS) # + mecanum slip
Design notes
------------
* All dataclasses are frozen so instances are immutable after construction.
* This module must not import from other ``herding.*`` packages —
field-geometry constants live in ``herding.world.geometry`` because
they depend on the world variant selected at runtime via
``HERDING_WORLD``, which would create an import cycle here.
"""
from __future__ import annotations
import math
from dataclasses import dataclass, field, replace
# ---------------------------------------------------------------------------
# LiDAR hardware spec
# ---------------------------------------------------------------------------
@dataclass(frozen=True)
class LidarConfig:
"""Parameters of the simulated / physical LiDAR sensor.
The two canonical presets are :data:`LIDAR_FULL` (360°, oracle mode)
and :data:`LIDAR_WEBOTS` (140°/180-ray, matches the ShepherdDog proto).
"""
n_rays: int = 360
"""Number of rays in the scan."""
fov_rad: float = 2.0 * math.pi
"""Full field-of-view in radians, centred on the robot's forward axis."""
max_range: float = 12.0
"""Maximum detectable range in metres."""
noise_std: float = 0.005
"""Gaussian standard deviation (metres) applied to each hit reading."""
sheep_radius: float = 0.30
"""Effective disc radius of a sheep in the 2-D LiDAR plane (metres)."""
post_radius: float = 0.25
"""Effective disc radius of gate / corner posts (metres)."""
def __post_init__(self) -> None:
if self.n_rays < 1:
raise ValueError(f"n_rays must be ≥ 1, got {self.n_rays}")
if not (0.0 < self.fov_rad <= 2.0 * math.pi):
raise ValueError(f"fov_rad must be in (0, 2π], got {self.fov_rad:.4f}")
if self.max_range <= 0.0:
raise ValueError(f"max_range must be > 0, got {self.max_range}")
# Named presets -----------------------------------------------------------
LIDAR_FULL = LidarConfig(
n_rays=360,
fov_rad=2.0 * math.pi,
)
"""360° full-circle scan — oracle / ablation mode."""
LIDAR_WEBOTS = LidarConfig(
n_rays=180,
fov_rad=math.radians(140.0),
)
"""Matches the ShepherdDog.proto Lidar device (180 rays, 140° FOV).
Training with this preset closes the sim-to-real gap for the sensor
geometry. Because the observation is built from tracker output (not raw
rays), a policy trained here can be deployed on a wider-FOV LiDAR (e.g.
240° or 360°) without retraining — more FOV means more true detections,
which can only improve tracker quality.
"""
LIDAR_WEBOTS_360 = LidarConfig(
n_rays=360,
fov_rad=2.0 * math.pi,
max_range=15.0,
)
"""Matches ShepherdDog360.proto (360 rays, 360° FOV, 15 m range).
Used by the FOV-ablation Webots launch (HERDING_LIDAR=360). The wider
range and full surround visibility hand the tracker more detections
per step, so the trained policy — already trained on 360° gym
perception — sees an observation distribution closer to training.
"""
# ---------------------------------------------------------------------------
# Cluster-detection pipeline
# ---------------------------------------------------------------------------
@dataclass(frozen=True)
class DetectionConfig:
"""Parameters for the LiDAR-scan → detection clustering pipeline."""
gap_threshold: float = 0.6
"""Adjacent hit-points farther apart than this (metres) start a new cluster."""
max_cluster_span: float = 1.5
"""Clusters wider than this (metres) are rejected as walls / structures."""
range_hit_eps: float = 0.05
"""A ray is considered a hit if ``range < max_range - range_hit_eps``."""
split_range_gap: float = 0.20
"""Range increase within a cluster that triggers a multi-peak split."""
wall_reject: float = 0.5
"""Drop detections within this distance (metres) of any field wall."""
static_reject: float = 0.8
"""Drop detections within this distance (metres) of known static features
(gate posts, field corners)."""
def __post_init__(self) -> None:
if self.wall_reject < 0.0:
raise ValueError(f"wall_reject must be ≥ 0, got {self.wall_reject}")
if self.static_reject < 0.0:
raise ValueError(f"static_reject must be ≥ 0, got {self.static_reject}")
# ---------------------------------------------------------------------------
# Multi-target tracker
# ---------------------------------------------------------------------------
@dataclass(frozen=True)
class TrackerConfig:
"""Parameters for the nearest-neighbour sheep tracker."""
gate_m: float = 2.5
"""Primary NN association gate in metres (recently observed tracks)."""
reacquire_gate_m: float = 4.5
"""Wider gate used when re-acquiring tracks stale for ≥ ``reacquire_min_age`` steps."""
reacquire_min_age: int = 20
"""Minimum staleness (steps) before the wider re-acquisition gate activates."""
penned_gate_m: float = 4.0
"""Gate for matching new detections to already-penned tracks."""
forget_steps: int = 200
"""Delete an active track that has not been observed for this many steps (~3.2 s)."""
predict_steps: int = 120
"""Extrapolate a track's position using constant velocity for this many steps (~1.9 s)."""
velocity_clamp: float = 1.0
"""Maximum predicted speed (m/s) used during extrapolation."""
max_new_tracks_per_step: int = 10
"""Maximum number of new tracks that may be spawned in a single step.
Capping this limits the damage from LiDAR false-positive bursts (e.g.
wall reflections in Webots) that would otherwise flood the track set.
The default (10 = MAX_SHEEP) preserves the original behaviour; reduce
to 23 for Webots deployment robustness.
"""
pen_latch_depth: float = 0.0
"""Minimum depth past the gate line (metres) before a track is latched
as penned. 0.0 = original behaviour (latch at y ≤ GATE_Y). Increase
to 0.5 for Webots to prevent gate-hardware LiDAR reflections near y=-15
from permanently consuming tracker slots as false "penned" sheep.
"""
consensus_k: int = 3
"""New tracks must accumulate this many matches before they appear in
``get_positions``. ``1`` disables the candidate stage entirely;
``3`` (default) requires three nearby confirmations within
``consensus_max_age`` and reliably filters single-shot detection
splits / out-of-range stragglers that confuse the policy on the
round field while real sheep promote in ~50 ms (3 frames).
"""
consensus_radius_m: float = 0.5
"""Maximum distance (metres) between successive matches for a candidate
to age toward promotion. Tighter than ``gate_m`` so wall-cluster
centroid jitter cannot keep a phantom alive. Real sheep move
≪ 0.05 m / step at max speed so this gate is very loose for them.
"""
consensus_max_age: int = 15
"""A candidate that has not been matched for this many steps is dropped.
Short enough that a one-shot phantom can't keep itself alive, long
enough that a real sheep glimpsed twice in a short interval
confirms.
"""
def __post_init__(self) -> None:
if self.forget_steps < 1:
raise ValueError(f"forget_steps must be ≥ 1, got {self.forget_steps}")
if self.max_new_tracks_per_step < 1:
raise ValueError(
f"max_new_tracks_per_step must be ≥ 1, got {self.max_new_tracks_per_step}"
)
if self.consensus_k < 1:
raise ValueError(f"consensus_k must be ≥ 1, got {self.consensus_k}")
if self.consensus_radius_m <= 0.0:
raise ValueError(
f"consensus_radius_m must be > 0, got {self.consensus_radius_m}"
)
if self.consensus_max_age < 1:
raise ValueError(
f"consensus_max_age must be ≥ 1, got {self.consensus_max_age}"
)
# ---------------------------------------------------------------------------
# Robot physical specification
# ---------------------------------------------------------------------------
@dataclass(frozen=True)
class RobotConfig:
"""Physical parameters of the shepherd-dog robot.
Values mirror ``protos/ShepherdDog.proto`` and ``protos/ShepherdDogMecanum.proto``.
"""
wheel_radius: float = 0.038
"""Wheel radius in metres."""
wheel_base: float = 0.28
"""Axle-to-axle distance for differential drive (metres)."""
wheel_base_x: float = 0.28
"""Front-to-back axle distance for mecanum drive (metres)."""
wheel_base_y: float = 0.28
"""Left-to-right axle distance for mecanum drive (metres)."""
max_wheel_omega: float = 70.0
"""Maximum wheel angular velocity (rad/s)."""
action_smooth: float = 0.0
"""Exponential moving-average coefficient applied to actions inside the env.
``0.0`` means no smoothing (gym default).
``0.55`` matches the hard-coded EMA in ``shepherd_dog.py`` — use this
when training so the policy learns to act through the same filter it
sees at deployment.
"""
strafe_efficiency: float = 1.0
"""Mecanum strafe magnitude as a fraction of textbook X-pattern.
``1.0`` (default) is the ideal kinematic mecanum. Values below 1
model strafe slip; the Webots controller reads the same value and
applies it in the Supervisor velocity injection, so gym training
and Webots deployment see identical body motion. No effect on
differential drive.
"""
strafe_to_forward_bleed: float = 0.0
"""Fraction of ideal strafe magnitude that bleeds into body-frame x.
``0.0`` (default) = no bleed. Non-zero values add
``strafe_to_forward_bleed * |vy_body_ideal|`` to ``vx_body`` to
model the consistent forward (or backward) drift that some
mecanum chassis exhibit during pure-strafe commands. No effect on
differential drive.
"""
def __post_init__(self) -> None:
if not (0.0 <= self.action_smooth < 1.0):
raise ValueError(
f"action_smooth must be in [0, 1), got {self.action_smooth}"
)
if not (0.0 < self.strafe_efficiency <= 1.0):
raise ValueError(
f"strafe_efficiency must be in (0, 1], got {self.strafe_efficiency}"
)
@property
def max_linear(self) -> float:
"""Maximum achievable linear speed (m/s)."""
return self.wheel_radius * self.max_wheel_omega
# ---------------------------------------------------------------------------
# Domain randomisation
# ---------------------------------------------------------------------------
@dataclass(frozen=True)
class DomainRandomConfig:
"""Parameters that inject physics / sensor noise for domain randomisation.
All values default to 0 (disabled) so the base env is deterministic and
backwards-compatible. Enable them gradually to close the sim-to-real gap.
"""
fp_rate: float = 0.0
"""Mean number of false-positive detections injected per step (Poisson λ).
FPs are placed near static features (walls, posts) with positional
noise ``fp_std_pos``, mimicking the spurious clusters Webots' physical
LiDAR returns from 3D geometry.
"""
fp_std_pos: float = 0.3
"""Positional standard deviation (metres) of injected false-positive clusters."""
wheel_slip_std: float = 0.0
"""Gaussian noise standard deviation (rad/s) added to each wheel speed
before kinematic integration. Models real-world wheel slip and motor
variation. Suggested starting value: 0.05.
"""
compass_noise_std: float = 0.0
"""Gaussian noise standard deviation (radians) added to the heading
reading each step. Models magnetometer drift in Webots.
Suggested starting value: 0.02.
"""
def __post_init__(self) -> None:
if self.fp_rate < 0.0:
raise ValueError(f"fp_rate must be ≥ 0, got {self.fp_rate}")
if self.wheel_slip_std < 0.0:
raise ValueError(f"wheel_slip_std must be ≥ 0, got {self.wheel_slip_std}")
if self.compass_noise_std < 0.0:
raise ValueError(f"compass_noise_std must be ≥ 0, got {self.compass_noise_std}")
# ---------------------------------------------------------------------------
# Aggregate config
# ---------------------------------------------------------------------------
@dataclass(frozen=True)
class HerdingConfig:
"""Root configuration object passed to :class:`~training.herding_env.HerdingEnv`.
Sub-configs default to the original simulation parameters so that
``HerdingEnv()`` and ``HerdingEnv(herding_cfg=HerdingConfig())`` produce
identical behaviour.
"""
lidar: LidarConfig = field(default_factory=LidarConfig)
detection: DetectionConfig = field(default_factory=DetectionConfig)
tracker: TrackerConfig = field(default_factory=TrackerConfig)
robot: RobotConfig = field(default_factory=RobotConfig)
domain_random: DomainRandomConfig = field(default_factory=DomainRandomConfig)
def replace(self, **kwargs) -> "HerdingConfig":
"""Return a new config with selected top-level sub-configs replaced.
Example::
cfg = HERDING_WEBOTS.replace(
domain_random=DomainRandomConfig(fp_rate=2.0, wheel_slip_std=0.05)
)
"""
return replace(self, **kwargs)
# ---------------------------------------------------------------------------
# Named full-pipeline presets
# ---------------------------------------------------------------------------
HERDING_DEFAULT = HerdingConfig()
"""Original simulation defaults — zero behaviour change."""
HERDING_WEBOTS = HerdingConfig(
lidar=LIDAR_WEBOTS,
detection=DetectionConfig(wall_reject=0.5, static_reject=1.2),
tracker=TrackerConfig(
forget_steps=300,
max_new_tracks_per_step=1,
pen_latch_depth=2.0,
predict_steps=180,
consensus_k=3,
consensus_radius_m=0.3,
consensus_max_age=20,
),
robot=RobotConfig(action_smooth=0.55),
)
HERDING_MEC_WEBOTS = HerdingConfig(
lidar=LIDAR_WEBOTS,
detection=DetectionConfig(wall_reject=0.5, static_reject=1.2),
tracker=TrackerConfig(
forget_steps=300,
max_new_tracks_per_step=1,
pen_latch_depth=2.0,
predict_steps=180,
consensus_k=3,
consensus_radius_m=0.3,
consensus_max_age=20,
),
robot=RobotConfig(
action_smooth=0.55,
strafe_efficiency=0.26,
strafe_to_forward_bleed=-0.40,
),
)
"""Mecanum + 140° LiDAR preset.
Mirrors HERDING_WEBOTS but with mecanum-specific kinematic scaling
(``strafe_efficiency`` and ``strafe_to_forward_bleed``) applied to
the gym forward-kinematics formula. The Webots controller reads
these same values via ``RobotConfig`` and feeds them through the
Supervisor velocity injection, so gym and Webots produce identical
body motion. Diff-drive ignores both fields.
"""
HERDING_MEC_WEBOTS_360 = HerdingConfig(
lidar=LIDAR_WEBOTS_360,
# Looser detection thresholds for the wider FOV — the 360° scan
# catches far walls, gate posts and pen rails the 140° front cone
# never sees, so the cluster/feature filters need slightly more
# margin to keep promotion rates similar.
detection=DetectionConfig(wall_reject=0.6, static_reject=1.2),
tracker=TrackerConfig(
forget_steps=300,
max_new_tracks_per_step=2, # 360° gives more candidates per step
pen_latch_depth=3.0,
predict_steps=180,
consensus_k=3,
consensus_radius_m=0.3,
consensus_max_age=20,
),
robot=RobotConfig(
action_smooth=0.55,
strafe_efficiency=0.26,
strafe_to_forward_bleed=-0.40,
),
)
"""Mecanum + 360° LiDAR preset (the deployable mecanum target).
The 360° FOV gives the policy perception coverage in every direction,
which matches the omnidirectional motion the mecanum chassis can
produce. Used for both gym training and Webots deployment so the
trained policy sees the same observation geometry it will face at
deploy time.
"""
"""Webots-matched training preset.
Changes vs HERDING_DEFAULT:
* LiDAR: 180 rays / 140° FOV matching ShepherdDog.proto hardware
* Detection: wall_reject kept at 0.5 m (original default; static_reject
handles post FPs; 1.0 m was too aggressive near the south gate)
* Tracker:
- consensus_k=3, radius=0.3 m, max_age=20 (~320 ms window): a new
detection must be confirmed by two more nearby detections within
a tight 0.3 m radius to promote. Real sheep barely move
frame-to-frame (≪0.05 m/step) so they easily self-confirm while
the dog is rotating across them; wall-return phantoms whose
cluster centroid jitters by more than 0.3 m as the dog moves
can't accumulate three nearby hits and decay as separate
candidates.
- forget_steps=300 (~4.8 s) + predict_steps=180 (~2.9 s): once a
real sheep is confirmed, it lives in tracker memory long enough
for the policy — trained on 360° full-visibility obs — to plan
while the dog sweeps a sparse cone across the field. Set short
enough that any phantom that does leak through promotion dies
after the dog walks away from the wall that created it.
- max_new_tracks_per_step=1 still rate-caps spawn bursts.
* Robot: action_smooth 0.0 → 0.55 (matches Webots controller EMA)
"""
+3 -3
View File
@@ -12,7 +12,7 @@ exploration behaviours:
beyond the 12 m LiDAR range). beyond the 12 m LiDAR range).
When the tracker has detections the base teacher's action is used, When the tracker has detections the base teacher's action is used,
post-processed by ``modulate_speed_near_sheep`` so the dog doesn't post-processed by ``modulate_speed`` so the dog doesn't
charge the flock. charge the flock.
""" """
@@ -20,7 +20,7 @@ from __future__ import annotations
import math import math
from herding.control.modulation import modulate_speed_near_sheep from herding.control.modulation import modulate_speed
INITIAL_SCAN_STEPS = 80 # ≈1.3 s — covers one full rotation INITIAL_SCAN_STEPS = 80 # ≈1.3 s — covers one full rotation
@@ -117,6 +117,6 @@ class ActiveScanTeacher:
else: else:
vx, vy, mode = result vx, vy, mode = result
omega = 0.0 omega = 0.0
vx, vy = modulate_speed_near_sheep(vx, vy, dog_xy, sheep_positions) vx, vy = modulate_speed(vx, vy, dog_xy, sheep_positions)
self.last_action = (vx, vy) self.last_action = (vx, vy)
return vx, vy, omega, mode return vx, vy, omega, mode
+2 -2
View File
@@ -1,6 +1,6 @@
"""Shared action post-processing. """Shared action post-processing.
Every dog mode routes its action through ``modulate_speed_near_sheep`` Every dog mode routes its action through ``modulate_speed``
so the magnitude is reduced near sheep — direction (intent) is so the magnitude is reduced near sheep — direction (intent) is
preserved. preserved.
""" """
@@ -14,7 +14,7 @@ SLOW_NEAR_SHEEP = 2.5 # m — distance below which action norm is scaled down
MIN_SPEED = 0.30 # action norm at zero distance MIN_SPEED = 0.30 # action norm at zero distance
def modulate_speed_near_sheep( def modulate_speed(
vx: float, vy: float, vx: float, vy: float,
dog_xy: tuple[float, float], dog_xy: tuple[float, float],
sheep_positions, sheep_positions,
+56 -48
View File
@@ -1,9 +1,23 @@
"""Sequential "pin-and-push" shepherd-dog controller. """Adaptive sequential shepherd-dog controller.
Single-target alternative to Strömbom: each step, target the sheep Three-phase strategy:
closest to the pen, park behind it, drive it through; once it latches
penned the next-closest sheep becomes the target. Naturally queues 1. **Collect** (flock scattered): Strömbom collect — park behind the
the flock through a narrow gate. furthest sheep and push it toward the CoM. Identical to the
Strömbom heuristic; keeps the flock together.
2. **Drive** (flock compact, >STRAGGLER_THRESHOLD active): Strömbom
drive — park behind the CoM relative to the pen and push the whole
group through the gate.
3. **Targeted** (≤STRAGGLER_THRESHOLD sheep remain active): single-
target push on the sheep closest to the pen entry. Safe to isolate
individual sheep once the flock is nearly exhausted.
The original pure pin-and-push (Phase 3 only) caused flock scatter in
Webots physics whenever the dog tried to isolate a sheep while others
were still spread across the field. Phases 12 handle the bulk of
herding with flock-aware Strömbom logic; Phase 3 cleans up stragglers.
""" """
import math import math
@@ -11,64 +25,58 @@ import math
from herding.world.geometry import GATE_Y, PEN_ENTRY, in_pen from herding.world.geometry import GATE_Y, PEN_ENTRY, in_pen
DELTA_DRIVE = 1.5 # standoff behind the target sheep F_FACTOR = 4.0 # collect/drive threshold: radius > F_FACTOR·√n
APPROACH_GAIN = 1.0 # action magnitude scale (1 = full speed) DELTA_COLLECT = 1.5 # standoff behind the furthest sheep (collect)
DELTA_DRIVE = 2.0 # standoff behind CoM (drive)
DELTA_TARGET = 1.5 # standoff behind single target sheep (targeted)
STRAGGLER_THRESHOLD = 2 # switch to targeted push when ≤ this many active
def _unit(x, y): def _unit(x: float, y: float):
d = math.hypot(x, y) d = math.hypot(x, y)
if d < 1e-6: if d < 1e-6:
return 0.0, 0.0 return 0.0, 0.0
return x / d, y / d return x / d, y / d
def _is_active(x, y) -> bool: def _is_active(x: float, y: float) -> bool:
return (not in_pen(x, y)) and y > GATE_Y return (not in_pen(x, y)) and y > GATE_Y
def compute_action(dog_xy, sheep_positions, pen_target=PEN_ENTRY): def compute_action(dog_xy, sheep_positions, pen_target=PEN_ENTRY):
"""Return ``(vx, vy, mode)`` — same call signature as Strömbom.""" """Return ``(vx, vy, mode)`` — same signature as Strömbom."""
active = [(name, x, y) for name, (x, y) in sheep_positions.items() active = [(x, y) for (x, y) in sheep_positions.values() if _is_active(x, y)]
if _is_active(x, y)]
if not active: if not active:
return 0.0, 0.0, "idle" return 0.0, 0.0, "idle"
name, sx, sy = min( n = len(active)
active, com_x = sum(p[0] for p in active) / n
key=lambda s: math.hypot(s[1] - pen_target[0], s[2] - pen_target[1]), com_y = sum(p[1] for p in active) / n
) dists = [math.hypot(p[0] - com_x, p[1] - com_y) for p in active]
radius = max(dists)
ux, uy = _unit(sx - pen_target[0], sy - pen_target[1]) if n <= STRAGGLER_THRESHOLD:
tx = sx + DELTA_DRIVE * ux # Targeted: push the sheep closest to the pen entry individually.
ty = sy + DELTA_DRIVE * uy sx, sy = min(active,
key=lambda p: math.hypot(p[0] - pen_target[0],
p[1] - pen_target[1]))
ux, uy = _unit(sx - pen_target[0], sy - pen_target[1])
tx, ty = sx + DELTA_TARGET * ux, sy + DELTA_TARGET * uy
mode = "targeted"
elif radius > F_FACTOR * math.sqrt(n):
# Collect: aim behind the furthest sheep from the CoM.
idx = max(range(n), key=lambda i: dists[i])
sx, sy = active[idx]
ux, uy = _unit(sx - com_x, sy - com_y)
tx, ty = sx + DELTA_COLLECT * ux, sy + DELTA_COLLECT * uy
mode = "collect"
else:
# Drive: push the whole compact flock toward the gate.
ux, uy = _unit(com_x - pen_target[0], com_y - pen_target[1])
tx, ty = com_x + DELTA_DRIVE * ux, com_y + DELTA_DRIVE * uy
mode = "drive"
ax, ay = _unit(tx - dog_xy[0], ty - dog_xy[1]) ax, ay = _unit(tx - dog_xy[0], ty - dog_xy[1])
return APPROACH_GAIN * ax, APPROACH_GAIN * ay, f"drive:{name}" return ax, ay, mode
def compute_action_debug(dog_xy, sheep_positions, pen_target=PEN_ENTRY):
"""``compute_action`` plus a debug dict (target, drive point)."""
active = [(name, x, y) for name, (x, y) in sheep_positions.items()
if _is_active(x, y)]
if not active:
return 0.0, 0.0, "idle", {
"n_active": 0, "target_name": "",
"target_x": 0.0, "target_y": 0.0,
"drive_x": dog_xy[0], "drive_y": dog_xy[1],
}
name, sx, sy = min(
active,
key=lambda s: math.hypot(s[1] - pen_target[0], s[2] - pen_target[1]),
)
ux, uy = _unit(sx - pen_target[0], sy - pen_target[1])
tx = sx + DELTA_DRIVE * ux
ty = sy + DELTA_DRIVE * uy
ax, ay = _unit(tx - dog_xy[0], ty - dog_xy[1])
return APPROACH_GAIN * ax, APPROACH_GAIN * ay, f"drive:{name}", {
"n_active": len(active), "target_name": name,
"target_x": sx, "target_y": sy,
"drive_x": tx, "drive_y": ty,
}
+21 -38
View File
@@ -10,7 +10,10 @@ Reference: Strömbom et al. 2014, "Solving the shepherding problem."
import math import math
from herding.world.geometry import PEN_ENTRY, GATE_Y, in_pen from herding.world.geometry import (
FIELD_ROUND_R, FIELD_SHAPE,
PEN_ENTRY, GATE_Y, in_pen,
)
F_FACTOR = 4.0 # collect/drive threshold scaled by √n F_FACTOR = 4.0 # collect/drive threshold scaled by √n
DELTA_COLLECT = 1.5 # drive-position offset behind the furthest sheep DELTA_COLLECT = 1.5 # drive-position offset behind the furthest sheep
@@ -54,42 +57,22 @@ def compute_action(dog_xy, sheep_positions, pen_target=PEN_ENTRY):
tx, ty = com_x + DELTA_DRIVE * ux, com_y + DELTA_DRIVE * uy tx, ty = com_x + DELTA_DRIVE * ux, com_y + DELTA_DRIVE * uy
mode = "drive" mode = "drive"
# Round-field wall fallback: if the drive target lies outside the
# curved boundary, push the flock radially inward first so it
# leaves the wall — otherwise the dog ends up tangent to the wall
# and the flock circles indefinitely.
if FIELD_SHAPE == "field_round" and mode == "drive":
if math.hypot(tx, ty) > FIELD_ROUND_R - 1.0:
r_com = math.hypot(com_x, com_y)
if r_com > 1e-3:
ux2, uy2 = com_x / r_com, com_y / r_com
tx = com_x + DELTA_DRIVE * ux2
ty = com_y + DELTA_DRIVE * uy2
r_t = math.hypot(tx, ty)
if r_t > FIELD_ROUND_R - 1.0:
scale = (FIELD_ROUND_R - 1.0) / r_t
tx *= scale
ty *= scale
ax, ay = _unit(tx - dog_xy[0], ty - dog_xy[1]) ax, ay = _unit(tx - dog_xy[0], ty - dog_xy[1])
return ax, ay, mode return ax, ay, mode
def compute_action_debug(dog_xy, sheep_positions, pen_target=PEN_ENTRY):
"""``compute_action`` plus a small debug dict (CoM, target, radius)."""
active = [(x, y) for (x, y) in sheep_positions.values() if _is_active(x, y)]
if not active:
return 0.0, 0.0, "idle", {
"n_active": 0, "radius": 0.0, "threshold": 0.0,
"com_x": 0.0, "com_y": 0.0,
"target_x": dog_xy[0], "target_y": dog_xy[1],
}
n = len(active)
com_x = sum(p[0] for p in active) / n
com_y = sum(p[1] for p in active) / n
dists = [math.hypot(p[0] - com_x, p[1] - com_y) for p in active]
radius = max(dists)
threshold = F_FACTOR * math.sqrt(n)
if radius > threshold:
idx = max(range(n), key=lambda i: dists[i])
sx, sy = active[idx]
ux, uy = _unit(sx - com_x, sy - com_y)
tx, ty = sx + DELTA_COLLECT * ux, sy + DELTA_COLLECT * uy
mode = "collect"
else:
ux, uy = _unit(com_x - pen_target[0], com_y - pen_target[1])
tx, ty = com_x + DELTA_DRIVE * ux, com_y + DELTA_DRIVE * uy
mode = "drive"
ax, ay = _unit(tx - dog_xy[0], ty - dog_xy[1])
dbg = {
"n_active": n, "radius": radius, "threshold": threshold,
"com_x": com_x, "com_y": com_y,
"target_x": tx, "target_y": ty,
}
return ax, ay, mode, dbg
+43 -21
View File
@@ -29,6 +29,7 @@ For differential drive ``omega`` is always 0.0 and can be ignored.
import math import math
from herding.world.geometry import ( from herding.world.geometry import (
FIELD_ROUND_R, FIELD_SHAPE,
PEN_ENTRY, GATE_X, GATE_Y, in_pen, PEN_ENTRY, GATE_X, GATE_Y, in_pen,
) )
@@ -43,9 +44,14 @@ DELTA_DRIVE = 2.0 # standoff behind flock CoM
# Omega gain for mecanum (how strongly the dog turns to face target) # Omega gain for mecanum (how strongly the dog turns to face target)
OMEGA_GAIN = 0.6 OMEGA_GAIN = 0.6
# Recovery: push the last straggler straight through the gate. # Recovery: push small flocks (≤ RECOVERY_MAX_N) through the gate one
RECOVERY_GATE_DIST = 6.0 # only when straggler is this close to gate centre # sheep at a time. n=1 alone is not enough — at n=2..3 on the round
RECOVERY_PUSH_DIST = 1.2 # stand-off behind straggler, away from gate # field the flock is too small to self-cohere through the 3 m gate but
# the standard collect/drive standoff just orbits them. Push the sheep
# nearest the gate first; once it pens, the rule re-applies to the next.
RECOVERY_MAX_N = 3
RECOVERY_GATE_DIST = 8.0 # only when target sheep is this close to gate
RECOVERY_PUSH_DIST = 1.2 # stand-off behind sheep, away from gate
# --------------------------------------------------------------------------- # ---------------------------------------------------------------------------
@@ -114,11 +120,19 @@ def compute_action(dog_xy, dog_heading, sheep_positions,
dists = [math.hypot(p[0] - com_x, p[1] - com_y) for p in active] dists = [math.hypot(p[0] - com_x, p[1] - com_y) for p in active]
radius = max(dists) radius = max(dists)
# ---- Last-straggler recovery (single sheep circling near gate) ---- # ---- Small-flock recovery (push sheep through the gate one by one) ----
# Triggers when the active flock is small (≤ RECOVERY_MAX_N) and the
# sheep nearest the gate is close enough that direct pushing works.
# For larger flocks the standard collect/drive logic handles them.
gc = _gate_center() gc = _gate_center()
if n == 1: if n <= RECOVERY_MAX_N:
sx, sy = active[0] # Pick the sheep closest to the gate as the recovery target —
d_to_gate = math.hypot(sx - gc[0], sy - gc[1]) # finishing that one first reduces the active count and lets the
# remaining sheep get their own recovery turn.
gate_dists = [math.hypot(p[0] - gc[0], p[1] - gc[1]) for p in active]
target_idx = min(range(n), key=lambda i: gate_dists[i])
sx, sy = active[target_idx]
d_to_gate = gate_dists[target_idx]
if d_to_gate < RECOVERY_GATE_DIST: if d_to_gate < RECOVERY_GATE_DIST:
dx_g = sx - gc[0] dx_g = sx - gc[0]
dy_g = sy - gc[1] dy_g = sy - gc[1]
@@ -158,6 +172,28 @@ def compute_action(dog_xy, dog_heading, sheep_positions,
mode = "drive" mode = "drive"
face_target = pen_target face_target = pen_target
# On the round field the natural "behind the flock" point can fall
# outside the curved wall when the flock CoM is itself close to the
# wall. The dog tries to reach an unreachable target, ends up
# tangent to the wall, and the flock circles indefinitely.
# Fix: when the natural target leaves the field, fall back to
# pushing the flock radially inward toward the centre — break the
# wall-circle pattern, then resume normal pen-direction drive once
# the flock is back in the interior.
if FIELD_SHAPE == "field_round" and mode == "drive":
if math.hypot(tx, ty) > FIELD_ROUND_R - 1.0:
r_com = math.hypot(com_x, com_y)
if r_com > 1e-3:
ux2, uy2 = com_x / r_com, com_y / r_com
tx = com_x + DELTA_DRIVE * ux2
ty = com_y + DELTA_DRIVE * uy2
# Clamp to inside-field radius so the dog target is reachable.
r_t = math.hypot(tx, ty)
if r_t > FIELD_ROUND_R - 1.0:
scale = (FIELD_ROUND_R - 1.0) / r_t
tx *= scale
ty *= scale
ax, ay = _unit(tx - dog_xy[0], ty - dog_xy[1]) ax, ay = _unit(tx - dog_xy[0], ty - dog_xy[1])
# ---- Omega (mecanum only) ---- # ---- Omega (mecanum only) ----
@@ -171,17 +207,3 @@ def compute_action(dog_xy, dog_heading, sheep_positions,
omega = max(-1.0, min(1.0, OMEGA_GAIN * err / math.pi)) omega = max(-1.0, min(1.0, OMEGA_GAIN * err / math.pi))
return ax, ay, omega, mode return ax, ay, omega, mode
def compute_action_diff(dog_xy, dog_heading, sheep_positions,
pen_target=PEN_ENTRY):
"""Compatibility wrapper returning ``(vx, vy, mode)`` — same as Strömbom.
Use this when plugging into existing differential-drive code that
doesn't expect omega.
"""
vx, vy, _omega, mode = compute_action(
dog_xy, dog_heading, sheep_positions, pen_target,
drive_mode="differential",
)
return vx, vy, mode
+52 -18
View File
@@ -21,9 +21,13 @@ The downstream tracker handles association across frames.
from __future__ import annotations from __future__ import annotations
import math import math
from typing import TYPE_CHECKING
import numpy as np import numpy as np
if TYPE_CHECKING:
from herding.config import DetectionConfig, LidarConfig
from herding.world.geometry import ( from herding.world.geometry import (
FIELD_SHAPE, FIELD_ROUND_R, FIELD_SHAPE, FIELD_ROUND_R,
FIELD_X, FIELD_Y, GATE_X, GATE_Y, FIELD_X, FIELD_Y, GATE_X, GATE_Y,
@@ -79,21 +83,22 @@ def _in_field_region(cx: float, cy: float) -> bool:
FIELD_Y[0] - 0.2 < cy < FIELD_Y[1] + 0.2) FIELD_Y[0] - 0.2 < cy < FIELD_Y[1] + 0.2)
def _near_wall(cx: float, cy: float) -> bool: def _near_wall(cx: float, cy: float, wall_reject: float = WALL_REJECT) -> bool:
"""True if the detection is too close to a wall to be a sheep.""" """True if the detection is too close to a wall to be a sheep."""
if FIELD_SHAPE == "field_round": if FIELD_SHAPE == "field_round":
r = math.hypot(cx, cy) r = math.hypot(cx, cy)
return r > FIELD_ROUND_R - WALL_REJECT return r > FIELD_ROUND_R - wall_reject
return ( return (
cx > FIELD_X[1] - WALL_REJECT or cx < FIELD_X[0] + WALL_REJECT or cx > FIELD_X[1] - wall_reject or cx < FIELD_X[0] + wall_reject or
cy > FIELD_Y[1] - WALL_REJECT or cy > FIELD_Y[1] - wall_reject or
(cy < FIELD_Y[0] + WALL_REJECT and not (PEN_X[0] <= cx <= PEN_X[1])) (cy < FIELD_Y[0] + wall_reject and not (PEN_X[0] <= cx <= PEN_X[1]))
) )
def _split_cluster_by_range( def _split_cluster_by_range(
points: list[tuple[float, float]], points: list[tuple[float, float]],
range_vals: list[float], range_vals: list[float],
split_range_gap: float = SPLIT_RANGE_GAP,
) -> list[list[tuple[float, float]]]: ) -> list[list[tuple[float, float]]]:
"""Split a cluster at range-profile local maxima (gaps between sheep). """Split a cluster at range-profile local maxima (gaps between sheep).
@@ -108,7 +113,7 @@ def _split_cluster_by_range(
# Find the maximum range (the dip/gap between sheep). # Find the maximum range (the dip/gap between sheep).
r_max = max(range_vals) r_max = max(range_vals)
# If the range variation is small, it's a single target. # If the range variation is small, it's a single target.
if r_max - r_min < SPLIT_RANGE_GAP: if r_max - r_min < split_range_gap:
return [points] return [points]
# Find the split point: the index with the maximum range. # Find the split point: the index with the maximum range.
split_idx = range_vals.index(r_max) split_idx = range_vals.index(r_max)
@@ -124,7 +129,7 @@ def _split_cluster_by_range(
(right, range_vals[split_idx + 1:]), (right, range_vals[split_idx + 1:]),
]: ]:
if len(sub_pts) >= 1: if len(sub_pts) >= 1:
result.extend(_split_cluster_by_range(sub_pts, sub_ranges)) result.extend(_split_cluster_by_range(sub_pts, sub_ranges, split_range_gap))
return result if result else [points] return result if result else [points]
@@ -132,14 +137,43 @@ def detections_from_scan(
ranges: np.ndarray, ranges: np.ndarray,
dog_x: float, dog_y: float, dog_heading: float, dog_x: float, dog_y: float, dog_heading: float,
max_range: float = LIDAR_MAX_RANGE, max_range: float = LIDAR_MAX_RANGE,
detection_cfg: "DetectionConfig | None" = None,
lidar_cfg: "LidarConfig | None" = None,
) -> list[tuple[float, float]]: ) -> list[tuple[float, float]]:
"""Return list of (x, y) world-frame sheep position estimates.""" """Return list of (x, y) world-frame sheep position estimates.
Pass ``detection_cfg`` to override clustering/filtering thresholds, or
``lidar_cfg`` to inform the function of a non-default FOV (the number of
rays and FOV are inferred from the length of ``ranges`` and
``lidar_cfg.fov_rad`` respectively).
"""
# Resolve parameters — fall back to module-level constants when no cfg.
if detection_cfg is not None:
gap_thr = detection_cfg.gap_threshold
max_span = detection_cfg.max_cluster_span
hit_eps = detection_cfg.range_hit_eps
split_gap = detection_cfg.split_range_gap
wall_rej = detection_cfg.wall_reject
static_rej = detection_cfg.static_reject
else:
gap_thr = GAP_THRESHOLD
max_span = MAX_CLUSTER_SPAN
hit_eps = RANGE_HIT_EPS
split_gap = SPLIT_RANGE_GAP
wall_rej = WALL_REJECT
static_rej = STATIC_REJECT
sheep_r = lidar_cfg.sheep_radius if lidar_cfg is not None else SHEEP_RADIUS
fov = lidar_cfg.fov_rad if lidar_cfg is not None else LIDAR_FOV
if lidar_cfg is not None:
max_range = lidar_cfg.max_range
ranges = np.asarray(ranges, dtype=np.float32) ranges = np.asarray(ranges, dtype=np.float32)
n_rays = ranges.shape[0] n_rays = ranges.shape[0]
if n_rays == 0: if n_rays == 0:
return [] return []
angles = ray_angles(n_rays, LIDAR_FOV) angles = ray_angles(n_rays, fov)
hit = ranges < max_range - RANGE_HIT_EPS hit = ranges < max_range - hit_eps
world_a = dog_heading + angles world_a = dog_heading + angles
px = dog_x + ranges * np.cos(world_a) px = dog_x + ranges * np.cos(world_a)
@@ -159,7 +193,7 @@ def detections_from_scan(
prev_xy = None prev_xy = None
continue continue
pt = (float(px[i]), float(py[i]), float(ranges[i])) pt = (float(px[i]), float(py[i]), float(ranges[i]))
if prev_xy is not None and math.hypot(pt[0] - prev_xy[0], pt[1] - prev_xy[1]) > GAP_THRESHOLD: if prev_xy is not None and math.hypot(pt[0] - prev_xy[0], pt[1] - prev_xy[1]) > gap_thr:
clusters.append(current) clusters.append(current)
current = [] current = []
current.append(pt) current.append(pt)
@@ -174,7 +208,7 @@ def detections_from_scan(
# Multi-peak splitting. # Multi-peak splitting.
if len(cluster) >= 4: if len(cluster) >= 4:
sub_clusters = _split_cluster_by_range(points_xy, range_vals) sub_clusters = _split_cluster_by_range(points_xy, range_vals, split_gap)
else: else:
sub_clusters = [points_xy] sub_clusters = [points_xy]
@@ -185,24 +219,24 @@ def detections_from_scan(
ys = [p[1] for p in sub] ys = [p[1] for p in sub]
cx, cy = sum(xs) / len(xs), sum(ys) / len(ys) cx, cy = sum(xs) / len(xs), sum(ys) / len(ys)
span = math.hypot(max(xs) - min(xs), max(ys) - min(ys)) span = math.hypot(max(xs) - min(xs), max(ys) - min(ys))
if span > MAX_CLUSTER_SPAN: if span > max_span:
continue continue
# Rays hit the front edge of the sheep; offset outward by # Rays hit the front edge of the sheep; offset outward by
# SHEEP_RADIUS along the dog→cluster direction. # sheep_radius along the dog→cluster direction.
dx, dy = cx - dog_x, cy - dog_y dx, dy = cx - dog_x, cy - dog_y
d = math.hypot(dx, dy) d = math.hypot(dx, dy)
if d > 1e-3: if d > 1e-3:
cx += SHEEP_RADIUS * dx / d cx += sheep_r * dx / d
cy += SHEEP_RADIUS * dy / d cy += sheep_r * dy / d
in_main = _in_field_region(cx, cy) in_main = _in_field_region(cx, cy)
in_gate_strip = (PEN_X[0] - 0.2 < cx < PEN_X[1] + 0.2 and in_gate_strip = (PEN_X[0] - 0.2 < cx < PEN_X[1] + 0.2 and
GATE_Y - 1.0 < cy < GATE_Y + 0.2) GATE_Y - 1.0 < cy < GATE_Y + 0.2)
if not (in_main or in_gate_strip): if not (in_main or in_gate_strip):
continue continue
if any(math.hypot(cx - fx, cy - fy) < STATIC_REJECT if any(math.hypot(cx - fx, cy - fy) < static_rej
for fx, fy in _STATIC_FEATURES): for fx, fy in _STATIC_FEATURES):
continue continue
if _near_wall(cx, cy): if _near_wall(cx, cy, wall_rej):
continue continue
detections.append((cx, cy)) detections.append((cx, cy))
return detections return detections
+31 -11
View File
@@ -2,20 +2,25 @@
Raycasts against sheep (discs) and static world geometry. For rectangular Raycasts against sheep (discs) and static world geometry. For rectangular
fields this is axis-aligned walls + gate posts; for round fields it is a fields this is axis-aligned walls + gate posts; for round fields it is a
circular wall + gate posts. The env reproduces the false-positive cluster circular wall + gate posts.
distribution Webots produces from real 3D geometry.
Returns a range array matching the Webots Lidar device: The module-level constants (``LIDAR_N_RAYS``, ``LIDAR_FOV``, etc.) reflect
180 rays, 140° FOV centred on forward, 12 m max range, 5 mm noise. the original 360°/360-ray oracle configuration. Pass a
See ``protos/ShepherdDog.proto``. :class:`~herding.config.LidarConfig` to :func:`simulate_scan` to use a
different spec (e.g. :data:`~herding.config.LIDAR_WEBOTS` for 180-ray/140°
matching the ShepherdDog.proto hardware).
""" """
from __future__ import annotations from __future__ import annotations
import math import math
from typing import TYPE_CHECKING
import numpy as np import numpy as np
if TYPE_CHECKING:
from herding.config import LidarConfig
from herding.world.geometry import ( from herding.world.geometry import (
FIELD_SHAPE, FIELD_ROUND_R, FIELD_SHAPE, FIELD_ROUND_R,
FIELD_X, FIELD_Y, FIELD_X, FIELD_Y,
@@ -192,14 +197,30 @@ def simulate_scan(
noise: float = LIDAR_NOISE, noise: float = LIDAR_NOISE,
max_range: float = LIDAR_MAX_RANGE, max_range: float = LIDAR_MAX_RANGE,
rng: np.random.Generator | None = None, rng: np.random.Generator | None = None,
lidar_cfg: "LidarConfig | None" = None,
) -> np.ndarray: ) -> np.ndarray:
"""Return a (N,) float32 range array. No-hit entries equal ``max_range``. """Return a (N,) float32 range array. No-hit entries equal ``max_range``.
``sheep_xy`` is every sheep (penned or active) in the scene. ``sheep_xy`` is every sheep (penned or active) in the scene.
Pass ``lidar_cfg`` to override the module-level defaults for a single
call (e.g. to use :data:`~herding.config.LIDAR_WEBOTS`).
""" """
ch, sh = math.cos(dog_heading), math.sin(dog_heading) if lidar_cfg is not None:
cos_w = ch * _COS - sh * _SIN n_rays = lidar_cfg.n_rays
sin_w = sh * _COS + ch * _SIN fov = lidar_cfg.fov_rad
max_range = lidar_cfg.max_range
noise = lidar_cfg.noise_std
sheep_r2 = lidar_cfg.sheep_radius ** 2
angles = ray_angles(n_rays, fov)
ch, sh = math.cos(dog_heading), math.sin(dog_heading)
cos_w = ch * np.cos(angles) - sh * np.sin(angles)
sin_w = sh * np.cos(angles) + ch * np.sin(angles)
else:
sheep_r2 = SHEEP_RADIUS ** 2
ch, sh = math.cos(dog_heading), math.sin(dog_heading)
cos_w = ch * _COS - sh * _SIN
sin_w = sh * _COS + ch * _SIN
best = _raycast_static(dog_x, dog_y, cos_w, sin_w) best = _raycast_static(dog_x, dog_y, cos_w, sin_w)
@@ -209,9 +230,8 @@ def simulate_scan(
t = np.outer(sx, cos_w) + np.outer(sy, sin_w) t = np.outer(sx, cos_w) + np.outer(sy, sin_w)
s_dist2 = (sx ** 2 + sy ** 2)[:, None] s_dist2 = (sx ** 2 + sy ** 2)[:, None]
perp2 = s_dist2 - t ** 2 perp2 = s_dist2 - t ** 2
R2 = SHEEP_RADIUS ** 2 hit = (perp2 < sheep_r2) & (t > 0.0)
hit = (perp2 < R2) & (t > 0.0) half = np.sqrt(np.clip(sheep_r2 - perp2, 0.0, None))
half = np.sqrt(np.clip(R2 - perp2, 0.0, None))
candidate = np.where(hit, t - half, np.inf) candidate = np.where(hit, t - half, np.inf)
nearest = candidate.min(axis=0) nearest = candidate.min(axis=0)
np.minimum(best, nearest, out=best) np.minimum(best, nearest, out=best)
+232 -56
View File
@@ -1,29 +1,37 @@
"""Multi-target tracker for LiDAR-detected sheep. """Multi-target tracker for LiDAR-detected sheep.
Greedy nearest-neighbour data association across frames, with a wider Three-stage greedy nearest-neighbour data association:
re-acquisition gate for stale tracks (sheep flee during occlusion and
reappear off-position), plus memory of last-seen positions for sheep
out of FOV. Output is ``{name: (x, y)}`` — Strömbom / Sequential
consume it directly.
When **predictive mode** is enabled (the default), tracks carry a 1. **Consensus promotion**. New detections start as *candidate* tracks
constant-velocity state ``(vx, vy)`` estimated from the last two invisible to ``get_positions``. They must accumulate ``consensus_k``
observations. While a track is occluded its position is extrapolated matches within ``consensus_radius_m`` to promote; candidates that
using this velocity for up to ``PREDICT_STEPS`` frames, keeping the fail to re-confirm within ``consensus_max_age`` steps die. This
teacher's CoM estimate stable during brief losses. After prediction filters one-shot LiDAR phantoms — wall returns, multi-cluster sheep
expires, the track falls back to its last-seen position (static memory) splits, fast-moving sheep position jumps — at the cost of a small
until ``FORGET_STEPS`` deletes it entirely. acquisition latency (~50 ms at the default ``consensus_k=3``).
``consensus_k=1`` disables the stage.
2. **Constant-velocity prediction**. Each track carries a smoothed
``(vx, vy)``. While a track is occluded its position is
extrapolated for up to ``PREDICT_STEPS`` frames, then falls back to
last-seen static memory until ``FORGET_STEPS`` deletes it.
3. **Pen latching**. A track whose estimated position crosses the gate
plane south of ``is_penned`` is marked penned, excluded
from ``get_positions``, and kept indefinitely.
A track is marked penned once its estimated position crosses the gate Output of :meth:`SheepTracker.get_positions` is ``{name: (x, y)}`` —
plane south (``is_penned_position``). Penned tracks are excluded from Strömbom, Sequential and the BC observation builder consume it
``get_positions`` and kept indefinitely. directly.
""" """
from __future__ import annotations from __future__ import annotations
import math import math
from typing import TYPE_CHECKING
from herding.world.geometry import MAX_SHEEP, in_pen, is_penned_position if TYPE_CHECKING:
from herding.config import TrackerConfig
from herding.world.geometry import MAX_SHEEP, in_pen, is_penned
GATE_M = 2.5 # m — primary NN gate (recently observed tracks) GATE_M = 2.5 # m — primary NN gate (recently observed tracks)
@@ -39,33 +47,60 @@ VELOCITY_CLAMP = 1.0 # m/s — max predicted speed (sheep max is ~0.78 m/s)
class Track: class Track:
"""Single track with position, velocity, and age.""" """Single track with position, velocity, and age.
__slots__ = ("x", "y", "vx", "vy", "last_seen", "penned") Attributes
----------
candidate
``True`` while the track has not yet accumulated enough
consensus matches to be visible (``hit_count < consensus_k``).
Candidates are excluded from :meth:`SheepTracker.get_positions`
and from the active/penned counters.
hit_count
Number of detections this track has absorbed since spawn,
used by the consensus filter.
"""
def __init__(self, x: float, y: float, step: int, penned: bool = False): __slots__ = ("x", "y", "vx", "vy", "last_seen", "penned",
"candidate", "hit_count")
def __init__(
self,
x: float,
y: float,
step: int,
penned: bool = False,
candidate: bool = False,
):
self.x = x self.x = x
self.y = y self.y = y
self.vx = 0.0 self.vx = 0.0
self.vy = 0.0 self.vy = 0.0
self.last_seen = step self.last_seen = step
self.penned = penned self.penned = penned
self.candidate = candidate
self.hit_count = 1
@property @property
def age(self) -> int: def age(self) -> int:
"""Not-a-property in the hot loop — callers pass current step.""" """Not-a-property in the hot loop — callers pass current step."""
raise NotImplementedError raise NotImplementedError
def predicted_position(self, current_step: int) -> tuple[float, float]: def predicted_position(
self,
current_step: int,
predict_steps: int = PREDICT_STEPS,
velocity_clamp: float = VELOCITY_CLAMP,
) -> tuple[float, float]:
"""Extrapolated position using constant velocity, clamped.""" """Extrapolated position using constant velocity, clamped."""
dt = current_step - self.last_seen dt = current_step - self.last_seen
if dt <= 0 or dt > PREDICT_STEPS: if dt <= 0 or dt > predict_steps:
return self.x, self.y return self.x, self.y
speed = math.hypot(self.vx, self.vy) speed = math.hypot(self.vx, self.vy)
if speed < 1e-4: if speed < 1e-4:
return self.x, self.y return self.x, self.y
# Clamp extrapolation distance. # Clamp extrapolation distance.
max_d = VELOCITY_CLAMP * dt * 0.016 # steps → seconds max_d = velocity_clamp * dt * 0.016 # steps → seconds
d = min(speed * dt * 0.016, max_d) d = min(speed * dt * 0.016, max_d)
return ( return (
self.x + d * (self.vx / speed), self.x + d * (self.vx / speed),
@@ -93,10 +128,42 @@ class SheepTracker:
Each track is a :class:`Track` with position, velocity estimate, Each track is a :class:`Track` with position, velocity estimate,
last-seen step, and penned flag. last-seen step, and penned flag.
Pass a :class:`~herding.config.TrackerConfig` to override any
module-level defaults without changing this file.
""" """
def __init__(self, gate: float = GATE_M): def __init__(
self.gate = gate self,
gate: float = GATE_M,
tracker_cfg: "TrackerConfig | None" = None,
):
if tracker_cfg is not None:
self.gate = tracker_cfg.gate_m
self._reacquire_gate = tracker_cfg.reacquire_gate_m
self._reacquire_min_age = tracker_cfg.reacquire_min_age
self._penned_gate = tracker_cfg.penned_gate_m
self._forget_steps = tracker_cfg.forget_steps
self._predict_steps = tracker_cfg.predict_steps
self._velocity_clamp = tracker_cfg.velocity_clamp
self._max_new_per_step = tracker_cfg.max_new_tracks_per_step
self._pen_latch_depth = tracker_cfg.pen_latch_depth
self._consensus_k = tracker_cfg.consensus_k
self._consensus_radius = tracker_cfg.consensus_radius_m
self._consensus_max_age = tracker_cfg.consensus_max_age
else:
self.gate = gate
self._reacquire_gate = REACQUIRE_GATE_M
self._reacquire_min_age = REACQUIRE_MIN_AGE
self._penned_gate = PENNED_GATE_M
self._forget_steps = FORGET_STEPS
self._predict_steps = PREDICT_STEPS
self._velocity_clamp = VELOCITY_CLAMP
self._max_new_per_step = MAX_ACTIVE_TRACKS
self._pen_latch_depth = 0.0
self._consensus_k = 1
self._consensus_radius = 0.5
self._consensus_max_age = 8
self._tracks: dict[int, Track] = {} self._tracks: dict[int, Track] = {}
self._next_id = 0 self._next_id = 0
self.step = 0 self.step = 0
@@ -113,14 +180,17 @@ class SheepTracker:
det_used: set[int] = set() det_used: set[int] = set()
updated_tids: set[int] = set() updated_tids: set[int] = set()
# Pass 1 — match active tracks within the primary gate. # Pass 1 — match promoted active tracks within the primary gate.
# Use predicted positions for matching, oldest-first. # Use predicted positions for matching, oldest-first. Candidates
active_tids = [tid for tid, t in self._tracks.items() if not t.penned] # are excluded; they get their own (tighter) pass below so a
# stray detection cannot rescue an already-stale candidate.
active_tids = [tid for tid, t in self._tracks.items()
if not t.penned and not t.candidate]
active_tids.sort(key=lambda tid: self._tracks[tid].last_seen) active_tids.sort(key=lambda tid: self._tracks[tid].last_seen)
for tid in active_tids: for tid in active_tids:
track = self._tracks[tid] track = self._tracks[tid]
# Use predicted position for matching. tx, ty = track.predicted_position(
tx, ty = track.predicted_position(self.step) self.step, self._predict_steps, self._velocity_clamp)
best_j, best_d = -1, self.gate best_j, best_d = -1, self.gate
for j, (dx, dy) in enumerate(detections): for j, (dx, dy) in enumerate(detections):
if j in det_used: if j in det_used:
@@ -132,6 +202,7 @@ class SheepTracker:
if best_j >= 0: if best_j >= 0:
dx, dy = detections[best_j] dx, dy = detections[best_j]
track.update(dx, dy, self.step) track.update(dx, dy, self.step)
track.hit_count += 1
det_used.add(best_j) det_used.add(best_j)
updated_tids.add(tid) updated_tids.add(tid)
@@ -140,10 +211,11 @@ class SheepTracker:
if tid in updated_tids: if tid in updated_tids:
continue continue
track = self._tracks[tid] track = self._tracks[tid]
if (self.step - track.last_seen) < REACQUIRE_MIN_AGE: if (self.step - track.last_seen) < self._reacquire_min_age:
continue continue
tx, ty = track.predicted_position(self.step) tx, ty = track.predicted_position(
best_j, best_d = -1, REACQUIRE_GATE_M self.step, self._predict_steps, self._velocity_clamp)
best_j, best_d = -1, self._reacquire_gate
for j, (dx, dy) in enumerate(detections): for j, (dx, dy) in enumerate(detections):
if j in det_used: if j in det_used:
continue continue
@@ -154,14 +226,18 @@ class SheepTracker:
if best_j >= 0: if best_j >= 0:
dx, dy = detections[best_j] dx, dy = detections[best_j]
track.update(dx, dy, self.step) track.update(dx, dy, self.step)
track.hit_count += 1
det_used.add(best_j) det_used.add(best_j)
updated_tids.add(tid) updated_tids.add(tid)
# Pass 2 — match remaining detections to penned tracks. # Pass 1c — match remaining detections to candidate tracks within
penned_tids = [tid for tid, t in self._tracks.items() if t.penned] # the tight consensus radius. Each hit ages the candidate; once
for tid in penned_tids: # hit_count reaches consensus_k it is promoted (handled below).
candidate_tids = [tid for tid, t in self._tracks.items() if t.candidate]
candidate_tids.sort(key=lambda tid: self._tracks[tid].last_seen)
for tid in candidate_tids:
track = self._tracks[tid] track = self._tracks[tid]
best_j, best_d = -1, PENNED_GATE_M best_j, best_d = -1, self._consensus_radius
for j, (dx, dy) in enumerate(detections): for j, (dx, dy) in enumerate(detections):
if j in det_used: if j in det_used:
continue continue
@@ -172,33 +248,98 @@ class SheepTracker:
if best_j >= 0: if best_j >= 0:
dx, dy = detections[best_j] dx, dy = detections[best_j]
track.update(dx, dy, self.step) track.update(dx, dy, self.step)
track.hit_count += 1
det_used.add(best_j) det_used.add(best_j)
# Spawn new tracks for unmatched detections. # Pass 2 — match remaining detections to penned tracks.
penned_tids = [tid for tid, t in self._tracks.items() if t.penned]
for tid in penned_tids:
track = self._tracks[tid]
best_j, best_d = -1, self._penned_gate
for j, (dx, dy) in enumerate(detections):
if j in det_used:
continue
d = math.hypot(dx - track.x, dy - track.y)
if d < best_d:
best_d = d
best_j = j
if best_j >= 0:
dx, dy = detections[best_j]
track.update(dx, dy, self.step)
track.hit_count += 1
det_used.add(best_j)
# Spawn tracks for still-unmatched detections.
#
# When ``consensus_k > 1`` every new track starts as a candidate
# and remains invisible to ``get_positions`` until it accumulates
# the required matches. Penned latching is deferred to after
# promotion — otherwise gate-area phantoms could still skip the
# consensus filter by landing inside the pen column and being
# latched forever, which is exactly the failure mode the filter
# is meant to eliminate. ``max_new_tracks_per_step`` continues
# to rate-cap spawns.
spawned = 0
spawn_candidates = self._consensus_k > 1
for j, (dx, dy) in enumerate(detections): for j, (dx, dy) in enumerate(detections):
if j in det_used: if j in det_used:
continue continue
penned = in_pen(dx, dy) or is_penned_position(dx, dy) if spawned >= self._max_new_per_step:
self._tracks[self._next_id] = Track(dx, dy, self.step, penned) break
if spawn_candidates:
self._tracks[self._next_id] = Track(
dx, dy, self.step, penned=False, candidate=True)
else:
penned = self._is_penned(dx, dy)
self._tracks[self._next_id] = Track(
dx, dy, self.step, penned=penned, candidate=False)
self._next_id += 1 self._next_id += 1
spawned += 1
# Promote candidates that have accumulated enough matches.
for track in self._tracks.values():
if track.candidate and track.hit_count >= self._consensus_k:
track.candidate = False
# Promote active tracks whose current estimate crosses the gate. # Promote active tracks whose current estimate crosses the gate.
# Candidates are deliberately excluded — a track that hasn't yet
# earned visibility shouldn't be allowed to latch as penned
# either (that path is exactly how south-wall FPs persisted
# forever before the consensus filter existed).
for track in self._tracks.values(): for track in self._tracks.values():
if track.penned: if track.penned or track.candidate:
continue continue
px, py = track.predicted_position(self.step) px, py = track.predicted_position(
if is_penned_position(px, py): self.step, self._predict_steps, self._velocity_clamp)
if self._is_penned(px, py):
track.penned = True track.penned = True
# Forget stale active tracks; penned tracks live forever. # Forget stale tracks. Candidates have their own short timeout
stale = [tid for tid, t in self._tracks.items() # (one window to confirm or die); promoted active tracks decay at
if not t.penned and (self.step - t.last_seen) > FORGET_STEPS] # forget_steps; penned tracks decay 8× slower because real penned
# sheep are still observed when the dog faces the pen.
penned_forget = self._forget_steps * 8
stale: list[int] = []
for tid, t in self._tracks.items():
age = self.step - t.last_seen
if t.candidate:
if age > self._consensus_max_age:
stale.append(tid)
elif t.penned:
if age > penned_forget:
stale.append(tid)
else:
if age > self._forget_steps:
stale.append(tid)
for tid in stale: for tid in stale:
del self._tracks[tid] del self._tracks[tid]
# Hard cap on the active set — drop the oldest-seen overflow. # Hard cap on the visible (promoted, not penned) active set —
# drop the oldest-seen overflow. Candidates are not counted here:
# they don't compete for slots until they earn promotion, and
# rate-limiting their spawn is the job of ``max_new_per_step``.
active = [(tid, t.last_seen) for tid, t in self._tracks.items() active = [(tid, t.last_seen) for tid, t in self._tracks.items()
if not t.penned] if not t.penned and not t.candidate]
if len(active) > MAX_ACTIVE_TRACKS: if len(active) > MAX_ACTIVE_TRACKS:
active.sort(key=lambda kv: kv[1]) active.sort(key=lambda kv: kv[1])
for tid, _ in active[: len(active) - MAX_ACTIVE_TRACKS]: for tid, _ in active[: len(active) - MAX_ACTIVE_TRACKS]:
@@ -206,18 +347,46 @@ class SheepTracker:
return self.get_positions() return self.get_positions()
def get_positions(self) -> dict[str, tuple[float, float]]: def _is_penned(self, x: float, y: float) -> bool:
"""Active (not-penned) tracks as a ``{name: (x, y)}`` dict. """Check whether a position should be considered penned.
Uses ``pen_latch_depth`` to require the position to be that many
metres past the gate line before latching. Increasing the depth
prevents gate-area LiDAR false positives (gate hardware reflections
at y ≈ -15) from being permanently latched as penned tracks.
"""
from herding.world.geometry import GATE_Y
# Apply depth threshold to both in_pen and is_penned so
# that any position in the gate column must clear GATE_Y - depth.
threshold = GATE_Y - self._pen_latch_depth
return (in_pen(x, y) or is_penned(x, y)) and y <= threshold
def get_positions(self, min_freshness: int | None = None) -> dict[str, tuple[float, float]]:
"""Promoted (non-candidate, non-penned) tracks as ``{name: (x, y)}``.
For tracks currently being predicted (occluded but within For tracks currently being predicted (occluded but within
PREDICT_STEPS), returns the extrapolated position so the teacher predict_steps), returns the extrapolated position so the teacher
sees a smooth estimate. sees a smooth estimate.
Candidate tracks — those that have not yet accumulated
``consensus_k`` matches — are excluded so a one-shot phantom
detection never reaches the policy/teacher.
``min_freshness`` (optional, deploy-only): drop tracks whose
last_seen is older than ``step - min_freshness``. Real sheep in
FOV are detected nearly every step; phantom tracks from sporadic
Webots FPs stop being re-observed and decay. Default ``None``
preserves training behaviour (extrapolated tracks visible).
""" """
result = {} result = {}
for tid, track in self._tracks.items(): for tid, track in self._tracks.items():
if track.penned: if track.penned or track.candidate:
continue continue
px, py = track.predicted_position(self.step) if (min_freshness is not None
and self.step - track.last_seen > min_freshness):
continue
px, py = track.predicted_position(
self.step, self._predict_steps, self._velocity_clamp)
result[f"t{tid}"] = (px, py) result[f"t{tid}"] = (px, py)
return result return result
@@ -225,13 +394,20 @@ class SheepTracker:
return {f"t{tid}" for tid, t in self._tracks.items() if t.penned} return {f"t{tid}" for tid, t in self._tracks.items() if t.penned}
def n_active(self) -> int: def n_active(self) -> int:
return sum(1 for t in self._tracks.values() if not t.penned) """Number of promoted (non-candidate, non-penned) tracks."""
return sum(1 for t in self._tracks.values()
if not t.penned and not t.candidate)
def n_penned(self) -> int: def n_penned(self) -> int:
return sum(1 for t in self._tracks.values() if t.penned) return sum(1 for t in self._tracks.values() if t.penned)
def n_candidate(self) -> int:
"""Number of unpromoted candidate tracks awaiting consensus."""
return sum(1 for t in self._tracks.values() if t.candidate)
def n_predicted(self) -> int: def n_predicted(self) -> int:
"""Number of active tracks currently being extrapolated (not directly observed).""" """Number of promoted active tracks currently being extrapolated (not directly observed)."""
return sum(1 for t in self._tracks.values() return sum(1 for t in self._tracks.values()
if not t.penned and (self.step - t.last_seen) > 0 if not t.penned and not t.candidate
and (self.step - t.last_seen) <= PREDICT_STEPS) and (self.step - t.last_seen) > 0
and (self.step - t.last_seen) <= self._predict_steps)
+50 -6
View File
@@ -2,14 +2,22 @@
controllers. controllers.
First-order rigid-body model — no slip, wheel-accel limits, or contact First-order rigid-body model — no slip, wheel-accel limits, or contact
forces. Webots' ODE physics handles those at inference; the env stays forces by default. Pass ``slip_std`` and an ``rng`` to
close enough to first order that a policy trained here transfers. :func:`kinematics_step` / :func:`mecanum_step` to add
per-wheel Gaussian speed noise for domain randomisation.
""" """
from __future__ import annotations
import math import math
from typing import Optional
import numpy as np
def kinematics_step(x, y, h, w_left, w_right, wheel_radius, wheel_base, dt): def kinematics_step(x, y, h, w_left, w_right, wheel_radius, wheel_base, dt,
slip_std: float = 0.0,
rng: Optional[np.random.Generator] = None):
"""Integrate one step of differential-drive forward kinematics. """Integrate one step of differential-drive forward kinematics.
Inputs Inputs
@@ -19,9 +27,15 @@ def kinematics_step(x, y, h, w_left, w_right, wheel_radius, wheel_base, dt):
w_left, w_right : wheel angular velocities (rad/s) w_left, w_right : wheel angular velocities (rad/s)
wheel_radius, wheel_base : robot dimensions (m) wheel_radius, wheel_base : robot dimensions (m)
dt : timestep (s) dt : timestep (s)
slip_std : optional Gaussian std (rad/s) added to each wheel speed
rng : numpy Generator for slip noise; required when slip_std > 0
Returns (new_x, new_y, new_h). Returns (new_x, new_y, new_h).
""" """
if slip_std > 0.0 and rng is not None:
noise = rng.normal(0.0, slip_std, size=2)
w_left = w_left + noise[0]
w_right = w_right + noise[1]
v = (w_right + w_left) * wheel_radius * 0.5 v = (w_right + w_left) * wheel_radius * 0.5
omega = (w_right - w_left) * wheel_radius / wheel_base omega = (w_right - w_left) * wheel_radius / wheel_base
new_x = x + v * math.cos(h) * dt new_x = x + v * math.cos(h) * dt
@@ -66,8 +80,12 @@ def heading_speed_to_wheels(heading, speed_motor, h, max_wheel_omega,
# Mecanum (4-wheel omnidirectional) kinematics # Mecanum (4-wheel omnidirectional) kinematics
# --------------------------------------------------------------------------- # ---------------------------------------------------------------------------
def mecanum_kinematics_step(x, y, h, w_fl, w_fr, w_rl, w_rr, def mecanum_step(x, y, h, w_fl, w_fr, w_rl, w_rr,
wheel_radius, lx, ly, dt): wheel_radius, lx, ly, dt,
slip_std: float = 0.0,
rng: Optional[np.random.Generator] = None,
strafe_efficiency: float = 1.0,
strafe_to_forward_bleed: float = 0.0):
"""Integrate one step of mecanum forward kinematics. """Integrate one step of mecanum forward kinematics.
Parameters Parameters
@@ -79,12 +97,38 @@ def mecanum_kinematics_step(x, y, h, w_fl, w_fr, w_rl, w_rr,
lx : half the front-to-back axle distance (m) lx : half the front-to-back axle distance (m)
ly : half the left-to-right axle distance (m) ly : half the left-to-right axle distance (m)
dt : timestep (s) dt : timestep (s)
slip_std : optional Gaussian std (rad/s) added to each wheel speed
rng : numpy Generator for slip noise; required when slip_std > 0
strafe_efficiency : scales the realised lateral (vy_body) velocity.
``1.0`` (default) = perfect mecanum (textbook X-pattern). Set to
the value that matches deployed-platform calibration to train
a policy that compensates for under-actuated strafing — Webots
with the roller-hinge mecanum proto currently calibrates to
~0.4 of textbook on strafe.
strafe_to_forward_bleed : fraction of |vy_body_ideal| added to
vx_body to simulate the consistent body-x bleed-through that
accompanies strafing in Webots' physical-roller mecanum. Use a
*negative* value (Webots calibrates to ≈ -0.28) to model the
backward bleed seen on strafe; positive would model forward
bleed. The bleed magnitude is symmetric in strafe sign — both
+y and -y commands produce the same x-direction error.
Returns (new_x, new_y, new_h). Returns (new_x, new_y, new_h).
""" """
if slip_std > 0.0 and rng is not None:
noise = rng.normal(0.0, slip_std, size=4)
w_fl, w_fr = w_fl + noise[0], w_fr + noise[1]
w_rl, w_rr = w_rl + noise[2], w_rr + noise[3]
r = wheel_radius r = wheel_radius
vx_body = (w_fl + w_fr + w_rl + w_rr) * r / 4.0 vx_body = (w_fl + w_fr + w_rl + w_rr) * r / 4.0
vy_body = (-w_fl + w_fr + w_rl - w_rr) * r / 4.0 vy_body_ideal = (-w_fl + w_fr + w_rl - w_rr) * r / 4.0
vy_body = vy_body_ideal * strafe_efficiency
if strafe_to_forward_bleed != 0.0:
# Bleed-through is asymmetric — forward in body frame, matching
# Webots behaviour where strafe commands push the dog forward
# regardless of strafe sign (the rollers slip the same way
# symmetrically across the body's longitudinal axis).
vx_body += strafe_to_forward_bleed * abs(vy_body_ideal)
omega = (-w_fl + w_fr - w_rl + w_rr) * r / (4.0 * (lx + ly)) omega = (-w_fl + w_fr - w_rl + w_rr) * r / (4.0 * (lx + ly))
cos_h = math.cos(h) cos_h = math.cos(h)
+31 -1
View File
@@ -72,6 +72,36 @@ if FIELD_SHAPE == "field_round":
GATE_Y = FIELD_ROUND_GATE_Y GATE_Y = FIELD_ROUND_GATE_Y
def configure_from_args(argv: list[str] | None = None) -> str:
"""Parse ``--world`` from *argv* (or ``sys.argv[1:]``), call :func:`configure`,
and set ``HERDING_WORLD`` in the environment.
Returns the resolved world name (``"field"`` or ``"field_round"``).
Call this at the very top of any script that accepts a ``--world`` flag,
*before* importing anything from ``herding.*`` that depends on field
geometry. This centralises the pre-parse logic that was previously
duplicated in ``bc/collect.py``, ``rl/train.py``, and ``eval.py``::
from herding.world.geometry import configure_from_args
configure_from_args() # reads sys.argv
"""
import os
import sys as _sys
args = argv if argv is not None else _sys.argv[1:]
world = "field"
for i, a in enumerate(args):
if a == "--world" and i + 1 < len(args):
world = args[i + 1]
break
if a.startswith("--world="):
world = a.split("=", 1)[1]
break
configure(world)
os.environ["HERDING_WORLD"] = world
return world
def configure(shape: str) -> None: def configure(shape: str) -> None:
"""Switch the active field geometry at runtime. """Switch the active field geometry at runtime.
@@ -141,7 +171,7 @@ def in_gate_corridor(x: float, y: float, margin: float = 0.0) -> bool:
and PEN_Y[0] - margin <= y <= GATE_Y + margin) and PEN_Y[0] - margin <= y <= GATE_Y + margin)
def is_penned_position(x: float, y: float, latch_margin: float = 0.2) -> bool: def is_penned(x: float, y: float, latch_margin: float = 0.2) -> bool:
"""True iff (x, y) is in the gate column and south of the gate line.""" """True iff (x, y) is in the gate column and south of the gate line."""
return (GATE_X[0] - latch_margin <= x <= GATE_X[1] + latch_margin return (GATE_X[0] - latch_margin <= x <= GATE_X[1] + latch_margin
and y <= GATE_Y) and y <= GATE_Y)
+2 -1
View File
@@ -138,7 +138,8 @@ PROTO ShepherdDog [
} }
] ]
} }
# Lidar front-facing 140° FOV, mounted at snout tip # Lidar front-facing 140° FOV (canonical hardware spec).
# See ShepherdDog360.proto for the 360° robustness-ablation variant.
Lidar { Lidar {
translation 0.05 0 0.01 translation 0.05 0 0.01
name "lidar" name "lidar"
+691
View File
@@ -0,0 +1,691 @@
#VRML_SIM R2025a utf8
# Shepherd Dog Robot - wheeled base with dog character on top, tail-mounted 360 lidar
EXTERNPROTO "https://raw.githubusercontent.com/cyberbotics/webots/R2025a/projects/appearances/protos/TireRubber.proto"
PROTO ShepherdDog360 [
field SFVec3f translation 0 0 0
field SFRotation rotation 0 1 0 0
field SFString name "ShepherdDog"
field SFString controller "shepherd_dog"
field MFString controllerArgs []
field SFString customData ""
field SFBool supervisor FALSE
field SFBool synchronization TRUE
]
{
Robot {
translation IS translation
rotation IS rotation
name IS name
controller IS controller
controllerArgs IS controllerArgs
customData IS customData
supervisor IS supervisor
synchronization IS synchronization
children [
# ========== CHASSIS / BASE ==========
DEF CHASSIS Transform {
translation 0 0 0.05
children [
Shape {
appearance DEF CHASSIS_APP PBRAppearance {
baseColor 0.2 0.2 0.2
roughness 0.6
metalness 0.3
}
geometry Box {
size 0.32 0.16 0.06
}
}
]
}
# Front slope
DEF CHASSIS_FRONT Transform {
translation 0.14 0 0.07
children [
Shape {
appearance USE CHASSIS_APP
geometry Box {
size 0.06 0.14 0.04
}
}
]
}
# Rear slope
DEF CHASSIS_REAR Transform {
translation -0.14 0 0.07
children [
Shape {
appearance USE CHASSIS_APP
geometry Box {
size 0.06 0.14 0.04
}
}
]
}
# ========== DOG BODY on top of chassis ==========
DEF BODY Transform {
translation 0 0 0.11
children [
Shape {
appearance DEF FUR_BROWN PBRAppearance {
baseColor 0.55 0.35 0.17
roughness 0.85
metalness 0.0
}
geometry Box {
size 0.30 0.16 0.08
}
}
]
}
# ========== CHEST ==========
DEF CHEST Transform {
translation 0.12 0 0.11
children [
Shape {
appearance DEF FUR_CREAM PBRAppearance {
baseColor 0.85 0.72 0.55
roughness 0.85
metalness 0.0
}
geometry Box {
size 0.08 0.18 0.08
}
}
]
}
# ========== HEAD ==========
DEF HEAD Transform {
translation 0.20 0 0.17
children [
Shape {
appearance USE FUR_BROWN
geometry Box {
size 0.10 0.12 0.09
}
}
]
}
# ========== SNOUT + LIDAR ==========
DEF SNOUT Transform {
translation 0.28 0 0.155
children [
Shape {
appearance USE FUR_CREAM
geometry Box {
size 0.08 0.07 0.05
}
}
# Nose
Transform {
translation 0.04 0 0.01
children [
Shape {
appearance PBRAppearance {
baseColor 0.1 0.1 0.1
roughness 0.4
}
geometry Sphere {
radius 0.013
subdivision 2
}
}
]
}
# Lidar 360° FOV (was 140°/2.44 rad). Wider FOV closes the
# FOV-loss perception gap so policies trained on 360° gym sim
# transfer cleanly without retraining.
Lidar {
translation 0.05 0 0.01
name "lidar"
horizontalResolution 360
fieldOfView 6.28
numberOfLayers 1
minRange 0.10
maxRange 15.0
noise 0.005
}
]
}
# ========== LEFT EAR ==========
DEF LEFT_EAR HingeJoint {
jointParameters HingeJointParameters {
axis 0 0 1
anchor 0.19 0.055 0.21
}
device [
RotationalMotor {
name "left ear motor"
maxVelocity 10.0
minPosition -0.5
maxPosition 0.5
}
]
endPoint Solid {
translation 0.19 0.055 0.21
rotation 0 0 1 0.2
name "left ear"
children [
Shape {
appearance DEF FUR_DARK PBRAppearance {
baseColor 0.35 0.20 0.10
roughness 0.85
metalness 0.0
}
geometry Box {
size 0.035 0.025 0.06
}
}
]
boundingObject Box {
size 0.035 0.025 0.06
}
physics Physics {
density -1
mass 0.005
}
}
}
# ========== RIGHT EAR ==========
DEF RIGHT_EAR HingeJoint {
jointParameters HingeJointParameters {
axis 0 0 1
anchor 0.19 -0.055 0.21
}
device [
RotationalMotor {
name "right ear motor"
maxVelocity 10.0
minPosition -0.5
maxPosition 0.5
}
]
endPoint Solid {
translation 0.19 -0.055 0.21
rotation 0 0 -1 0.2
name "right ear"
children [
Shape {
appearance USE FUR_DARK
geometry Box {
size 0.035 0.025 0.06
}
}
]
boundingObject Box {
size 0.035 0.025 0.06
}
physics Physics {
density -1
mass 0.005
}
}
}
# ========== EYES ==========
DEF LEFT_EYE Transform {
translation 0.25 0.05 0.19
children [
Shape {
appearance PBRAppearance {
baseColor 0.95 0.95 0.95
roughness 0.3
}
geometry Sphere {
radius 0.016
subdivision 2
}
}
# Pupil
Transform {
translation 0.012 0 0.004
children [
Shape {
appearance PBRAppearance {
baseColor 0.1 0.1 0.1
roughness 0.2
}
geometry Sphere {
radius 0.009
subdivision 2
}
}
]
}
]
}
DEF RIGHT_EYE Transform {
translation 0.25 -0.05 0.19
children [
Shape {
appearance PBRAppearance {
baseColor 0.95 0.95 0.95
roughness 0.3
}
geometry Sphere {
radius 0.016
subdivision 2
}
}
# Pupil
Transform {
translation 0.012 0 0.004
children [
Shape {
appearance PBRAppearance {
baseColor 0.1 0.1 0.1
roughness 0.2
}
geometry Sphere {
radius 0.009
subdivision 2
}
}
]
}
]
}
# ========== COLLAR ==========
DEF COLLAR Transform {
translation 0.16 0 0.125
children [
Shape {
appearance PBRAppearance {
baseColor 0.8 0.1 0.1
roughness 0.5
}
geometry Cylinder {
height 0.02
radius 0.095
subdivision 16
}
}
# ID tag
Transform {
translation 0 0.10 0
rotation 1 0 0 1.5708
children [
Shape {
appearance PBRAppearance {
baseColor 0.75 0.75 0.0
metalness 0.8
roughness 0.2
}
geometry Cylinder {
height 0.003
radius 0.018
subdivision 8
}
}
]
}
]
}
# ========== TAIL (lidar inside tail tip ball) ==========
DEF TAIL HingeJoint {
jointParameters HingeJointParameters {
axis 0 1 0
anchor -0.15 0 0.11
}
device [
RotationalMotor {
name "tail motor"
maxVelocity 5.0
minPosition -1.0
maxPosition 1.0
}
]
endPoint Solid {
translation -0.17 0 0.13
name "tail solid"
children [
Shape {
appearance USE FUR_BROWN
geometry Capsule {
height 0.12
radius 0.013
top FALSE
}
}
# Tail tip ball
Transform {
translation 0 0 0.08
children [
Shape {
appearance PBRAppearance {
baseColor 0.2 0.2 0.2
roughness 0.3
metalness 0.6
}
geometry Sphere {
radius 0.028
subdivision 4
}
}
]
}
]
boundingObject Group {
children [
Capsule {
height 0.12
radius 0.013
}
Transform {
translation 0 0 0.08
children [
Sphere {
radius 0.028
}
]
}
]
}
physics Physics {
density -1
mass 0.08
}
}
}
# ========== RIGHT AXLE ARM (horizontal bar from chassis to wheel) ==========
DEF RIGHT_AXLE Transform {
translation 0 -0.115 0.038
children [
Shape {
appearance PBRAppearance {
baseColor 0.5 0.5 0.5
roughness 0.3
metalness 0.8
}
geometry Box {
size 0.02 0.08 0.02
}
}
]
}
# ========== LEFT AXLE ARM ==========
DEF LEFT_AXLE Transform {
translation 0 0.115 0.038
children [
Shape {
appearance PBRAppearance {
baseColor 0.5 0.5 0.5
roughness 0.3
metalness 0.8
}
geometry Box {
size 0.02 0.08 0.02
}
}
]
}
# ========== RIGHT WHEEL ==========
DEF RIGHT_WHEEL_JOINT HingeJoint {
jointParameters HingeJointParameters {
axis 0 1 0
anchor 0 -0.14 0.038
}
device [
RotationalMotor {
name "right wheel motor"
maxVelocity 70.0
maxTorque 20.0
}
PositionSensor {
name "right wheel sensor"
resolution 0.00628
}
]
endPoint Solid {
translation 0 -0.14 0.038
rotation 0 -1 0 1.570796
children [
DEF WHEEL_VIS Pose {
rotation 1 0 0 -1.5708
children [
Shape {
appearance PBRAppearance {
baseColor 0.15 0.15 0.15
roughness 0.4
metalness 0.5
}
geometry Cylinder {
height 0.016
radius 0.035
subdivision 24
}
}
Shape {
appearance PBRAppearance {
baseColor 0.6 0.6 0.6
roughness 0.3
metalness 0.7
}
geometry Cylinder {
height 0.018
radius 0.014
subdivision 12
}
}
Shape {
appearance TireRubber {
textureTransform TextureTransform {
scale 1.5 0.6
}
type "bike"
}
geometry Cylinder {
height 0.022
radius 0.038
subdivision 24
}
}
]
}
]
name "right wheel"
boundingObject Pose {
rotation 1 0 0 -1.5708
children [
Cylinder {
height 0.022
radius 0.038
}
]
}
physics Physics {
density -1
mass 0.06
centerOfMass [
0 0 0
]
}
}
}
# ========== LEFT WHEEL ==========
DEF LEFT_WHEEL_JOINT HingeJoint {
jointParameters HingeJointParameters {
axis 0 1 0
anchor 0 0.14 0.038
}
device [
RotationalMotor {
name "left wheel motor"
maxVelocity 70.0
maxTorque 20.0
}
PositionSensor {
name "left wheel sensor"
resolution 0.00628
}
]
endPoint Solid {
translation 0 0.14 0.038
rotation 0.707105 0 0.707109 -3.14159
children [
USE WHEEL_VIS
]
name "left wheel"
boundingObject Pose {
rotation 1 0 0 -1.5708
children [
Cylinder {
height 0.022
radius 0.038
}
]
}
physics Physics {
density -1
mass 0.12
centerOfMass [
0 0 0
]
}
}
}
# ========== FRONT CASTER ==========
DEF FRONT_CASTER BallJoint {
jointParameters BallJointParameters {
anchor 0.14 0 0.02
}
endPoint Solid {
translation 0.14 0 0.02
name "front caster"
children [
Shape {
appearance PBRAppearance {
baseColor 0.2 0.2 0.2
roughness 0.3
metalness 0.5
}
geometry Sphere {
radius 0.02
subdivision 2
}
}
]
boundingObject Sphere {
radius 0.02
}
physics Physics {
density -1
mass 0.03
}
}
}
# ========== REAR CASTER ==========
DEF REAR_CASTER BallJoint {
jointParameters BallJointParameters {
anchor -0.14 0 0.02
}
endPoint Solid {
translation -0.14 0 0.02
name "rear caster"
children [
Shape {
appearance PBRAppearance {
baseColor 0.2 0.2 0.2
roughness 0.3
metalness 0.5
}
geometry Sphere {
radius 0.02
subdivision 2
}
}
]
boundingObject Sphere {
radius 0.02
}
physics Physics {
density -1
mass 0.03
}
}
}
# ========== IMU SENSORS ==========
Accelerometer {
translation 0 0 0.10
name "accelerometer"
}
Gyro {
translation 0 0 0.10
name "gyro"
}
Compass {
translation 0 0 0.10
name "compass"
}
# ========== GPS ==========
GPS {
translation 0 0 0.17
name "gps"
}
# ========== RECEIVER ==========
Receiver {
name "receiver"
channel 1
}
# ========== EMITTER ==========
Emitter {
name "emitter"
channel 1
range 50.0
}
]
# ========== BOUNDING OBJECT ==========
boundingObject Group {
children [
# Chassis box
Transform {
translation 0 0 0.05
children [
Box {
size 0.32 0.16 0.06
}
]
}
# Body box
Transform {
translation 0 0 0.11
children [
Box {
size 0.30 0.16 0.08
}
]
}
]
}
# ========== PHYSICS ==========
physics Physics {
density -1
mass 5.0
centerOfMass [
0 0 0.03
]
}
}
}
File diff suppressed because it is too large Load Diff
File diff suppressed because it is too large Load Diff
+290
View File
@@ -0,0 +1,290 @@
"""Tests for herding/config.py — dataclass construction, defaults, overrides."""
import math
import pytest
from herding.config import (
DetectionConfig,
DomainRandomConfig,
HerdingConfig,
HERDING_DEFAULT,
HERDING_WEBOTS,
LidarConfig,
LIDAR_FULL,
LIDAR_WEBOTS,
RobotConfig,
TrackerConfig,
)
# ---------------------------------------------------------------------------
# LidarConfig
# ---------------------------------------------------------------------------
class TestLidarConfig:
def test_defaults_match_full_circle_preset(self):
assert LidarConfig() == LIDAR_FULL
def test_webots_preset(self):
assert LIDAR_WEBOTS.n_rays == 180
assert abs(LIDAR_WEBOTS.fov_rad - math.radians(140.0)) < 1e-9
def test_frozen(self):
cfg = LidarConfig()
with pytest.raises((AttributeError, TypeError)):
cfg.n_rays = 42 # type: ignore[misc]
def test_invalid_n_rays(self):
with pytest.raises(ValueError):
LidarConfig(n_rays=0)
def test_invalid_fov(self):
with pytest.raises(ValueError):
LidarConfig(fov_rad=0.0)
with pytest.raises(ValueError):
LidarConfig(fov_rad=math.pi * 3)
def test_invalid_max_range(self):
with pytest.raises(ValueError):
LidarConfig(max_range=-1.0)
# ---------------------------------------------------------------------------
# TrackerConfig
# ---------------------------------------------------------------------------
class TestTrackerConfig:
def test_defaults(self):
cfg = TrackerConfig()
assert cfg.forget_steps == 200
assert cfg.max_new_tracks_per_step == 10
def test_webots_preset_tighter(self):
cfg = HERDING_WEBOTS.tracker
# forget_steps was extended so confirmed sheep tracks survive
# sparse 140° FOV re-sightings; consensus blocks phantoms from
# reaching this lifetime.
assert cfg.forget_steps >= 200
assert cfg.max_new_tracks_per_step == 1
assert cfg.pen_latch_depth == 2.0
def test_default_consensus_enabled(self):
# Consensus is on by default — it filters tracker phantoms that
# confused the policy on the round field (52% → 88%) at no cost
# on the rectangular field (100% → 100%). Pass-through (k=1) is
# still available by explicitly constructing TrackerConfig(consensus_k=1).
cfg = TrackerConfig()
assert cfg.consensus_k >= 2
assert cfg.consensus_radius_m > 0.0
assert cfg.consensus_max_age > cfg.consensus_k
def test_webots_preset_enables_consensus(self):
cfg = HERDING_WEBOTS.tracker
assert cfg.consensus_k > 1
assert cfg.consensus_radius_m > 0.0
assert cfg.consensus_max_age >= cfg.consensus_k
def test_invalid_forget_steps(self):
with pytest.raises(ValueError):
TrackerConfig(forget_steps=0)
def test_invalid_max_new_tracks(self):
with pytest.raises(ValueError):
TrackerConfig(max_new_tracks_per_step=0)
def test_invalid_consensus_params(self):
with pytest.raises(ValueError):
TrackerConfig(consensus_k=0)
with pytest.raises(ValueError):
TrackerConfig(consensus_radius_m=0.0)
with pytest.raises(ValueError):
TrackerConfig(consensus_max_age=0)
# ---------------------------------------------------------------------------
# DetectionConfig
# ---------------------------------------------------------------------------
class TestDetectionConfig:
def test_defaults(self):
cfg = DetectionConfig()
assert cfg.wall_reject == 0.5
def test_webots_preset_wall_reject(self):
# wall_reject stays at 0.5 m — 1.0 m was too aggressive near the south gate
cfg = HERDING_WEBOTS.detection
assert cfg.wall_reject == 0.5
def test_invalid_wall_reject(self):
with pytest.raises(ValueError):
DetectionConfig(wall_reject=-0.1)
# ---------------------------------------------------------------------------
# RobotConfig
# ---------------------------------------------------------------------------
class TestRobotConfig:
def test_max_linear_derived(self):
cfg = RobotConfig()
assert abs(cfg.max_linear - cfg.wheel_radius * cfg.max_wheel_omega) < 1e-9
def test_default_action_smooth_zero(self):
assert RobotConfig().action_smooth == 0.0
def test_webots_action_smooth(self):
assert HERDING_WEBOTS.robot.action_smooth == 0.55
def test_invalid_action_smooth(self):
with pytest.raises(ValueError):
RobotConfig(action_smooth=1.0)
with pytest.raises(ValueError):
RobotConfig(action_smooth=-0.1)
def test_default_strafe_passthrough(self):
cfg = RobotConfig()
assert cfg.strafe_efficiency == 1.0
assert cfg.strafe_to_forward_bleed == 0.0
def test_invalid_strafe_efficiency(self):
with pytest.raises(ValueError):
RobotConfig(strafe_efficiency=0.0)
with pytest.raises(ValueError):
RobotConfig(strafe_efficiency=1.5)
with pytest.raises(ValueError):
RobotConfig(strafe_efficiency=-0.1)
def test_mec_webots_preset(self):
from herding.config import HERDING_MEC_WEBOTS
# Mecanum runs deploy via Supervisor kinematic injection
# (controllers/shepherd_dog/shepherd_dog.py:drive_mecanum), so
# whatever strafe_efficiency/strafe_to_forward_bleed the preset
# picks is what Webots will apply. The preset is allowed to be
# textbook (1.0, 0.0) or matched (<1.0, ≠0.0).
cfg = HERDING_MEC_WEBOTS.robot
assert 0.0 < cfg.strafe_efficiency <= 1.0
# ---------------------------------------------------------------------------
# DomainRandomConfig
# ---------------------------------------------------------------------------
class TestDomainRandomConfig:
def test_all_zeros_by_default(self):
cfg = DomainRandomConfig()
assert cfg.fp_rate == 0.0
assert cfg.wheel_slip_std == 0.0
assert cfg.compass_noise_std == 0.0
def test_invalid_fp_rate(self):
with pytest.raises(ValueError):
DomainRandomConfig(fp_rate=-1.0)
def test_invalid_slip_std(self):
with pytest.raises(ValueError):
DomainRandomConfig(wheel_slip_std=-0.01)
# ---------------------------------------------------------------------------
# HerdingConfig
# ---------------------------------------------------------------------------
class TestHerdingConfig:
def test_default_is_herding_default(self):
assert HerdingConfig() == HERDING_DEFAULT
def test_replace_sub_config(self):
new_cfg = HERDING_WEBOTS.replace(
domain_random=DomainRandomConfig(fp_rate=2.0)
)
assert new_cfg.domain_random.fp_rate == 2.0
# Other sub-configs unchanged
assert new_cfg.tracker == HERDING_WEBOTS.tracker
assert new_cfg.lidar == HERDING_WEBOTS.lidar
def test_herding_default_matches_original_module_constants(self):
"""Verify the default config reproduces the original hardcoded values."""
from herding.perception.lidar_sim import (
LIDAR_N_RAYS, LIDAR_FOV, LIDAR_MAX_RANGE, LIDAR_NOISE,
SHEEP_RADIUS, POST_RADIUS,
)
from herding.perception.lidar_perception import (
GAP_THRESHOLD, MAX_CLUSTER_SPAN, RANGE_HIT_EPS,
SPLIT_RANGE_GAP, WALL_REJECT, STATIC_REJECT,
)
from herding.perception.sheep_tracker import (
GATE_M, REACQUIRE_GATE_M, REACQUIRE_MIN_AGE, PENNED_GATE_M,
FORGET_STEPS, PREDICT_STEPS, VELOCITY_CLAMP,
)
cfg = HERDING_DEFAULT
assert cfg.lidar.n_rays == LIDAR_N_RAYS
assert cfg.lidar.fov_rad == LIDAR_FOV
assert cfg.lidar.max_range == LIDAR_MAX_RANGE
assert cfg.lidar.noise_std == LIDAR_NOISE
assert cfg.lidar.sheep_radius == SHEEP_RADIUS
assert cfg.lidar.post_radius == POST_RADIUS
assert cfg.detection.gap_threshold == GAP_THRESHOLD
assert cfg.detection.max_cluster_span == MAX_CLUSTER_SPAN
assert cfg.detection.range_hit_eps == RANGE_HIT_EPS
assert cfg.detection.split_range_gap == SPLIT_RANGE_GAP
assert cfg.detection.wall_reject == WALL_REJECT
assert cfg.detection.static_reject == STATIC_REJECT
assert cfg.tracker.gate_m == GATE_M
assert cfg.tracker.reacquire_gate_m == REACQUIRE_GATE_M
assert cfg.tracker.reacquire_min_age == REACQUIRE_MIN_AGE
assert cfg.tracker.penned_gate_m == PENNED_GATE_M
assert cfg.tracker.forget_steps == FORGET_STEPS
assert cfg.tracker.predict_steps == PREDICT_STEPS
assert cfg.tracker.velocity_clamp == VELOCITY_CLAMP
# ---------------------------------------------------------------------------
# Integration: HerdingEnv honours the config
# ---------------------------------------------------------------------------
class TestHerdingEnvConfig:
def test_default_env_unchanged(self):
"""HerdingEnv() still works with no config — zero behaviour change."""
from training.herding_env import HerdingEnv
env = HerdingEnv(n_sheep=1, max_steps=5, difficulty=1.0, seed=0)
obs, info = env.reset()
assert obs.shape == (32,)
obs2, *_ = env.step(env.action_space.sample())
assert obs2.shape == (32,)
def test_webots_config_propagates_action_smooth(self):
from training.herding_env import HerdingEnv
env = HerdingEnv(herding_cfg=HERDING_WEBOTS)
assert env.ACTION_SMOOTH == 0.55
def test_webots_config_runs(self):
from training.herding_env import HerdingEnv
env = HerdingEnv(
n_sheep=2, max_steps=10, difficulty=1.0, seed=42,
herding_cfg=HERDING_WEBOTS,
)
obs, _ = env.reset()
for _ in range(5):
obs, _, terminated, truncated, _ = env.step(env.action_space.sample())
assert obs.shape == (32,)
def test_domain_random_fp_runs(self):
from training.herding_env import HerdingEnv
cfg = HERDING_WEBOTS.replace(
domain_random=DomainRandomConfig(fp_rate=3.0, fp_std_pos=0.2)
)
env = HerdingEnv(n_sheep=2, max_steps=10, difficulty=1.0, seed=7, herding_cfg=cfg)
env.reset()
for _ in range(5):
env.step(env.action_space.sample())
def test_domain_random_slip_runs(self):
from training.herding_env import HerdingEnv
cfg = HERDING_WEBOTS.replace(
domain_random=DomainRandomConfig(wheel_slip_std=0.05, compass_noise_std=0.02)
)
env = HerdingEnv(n_sheep=1, max_steps=10, difficulty=1.0, seed=3,
drive_mode="mecanum", herding_cfg=cfg)
env.reset()
for _ in range(5):
env.step(env.action_space.sample())
+33 -10
View File
@@ -8,7 +8,7 @@ from herding.control.active_scan import (
EMPTY_DEBOUNCE_STEPS, INITIAL_SCAN_STEPS, ActiveScanTeacher, EMPTY_DEBOUNCE_STEPS, INITIAL_SCAN_STEPS, ActiveScanTeacher,
) )
from herding.control.modulation import ( from herding.control.modulation import (
MIN_SPEED, SLOW_NEAR_SHEEP, modulate_speed_near_sheep, MIN_SPEED, SLOW_NEAR_SHEEP, modulate_speed,
) )
from herding.control.sequential import compute_action as sequential_action from herding.control.sequential import compute_action as sequential_action
from herding.control.strombom import ( from herding.control.strombom import (
@@ -23,23 +23,23 @@ from herding.world.geometry import PEN_ENTRY
# --------------------------------------------------------------------------- # ---------------------------------------------------------------------------
def test_modulation_empty_input_passthrough(): def test_modulation_empty_input_passthrough():
assert modulate_speed_near_sheep(1.0, 0.0, (0.0, 0.0), []) == (1.0, 0.0) assert modulate_speed(1.0, 0.0, (0.0, 0.0), []) == (1.0, 0.0)
assert modulate_speed_near_sheep(1.0, 0.0, (0.0, 0.0), {}) == (1.0, 0.0) assert modulate_speed(1.0, 0.0, (0.0, 0.0), {}) == (1.0, 0.0)
def test_modulation_far_sheep_passthrough(): def test_modulation_far_sheep_passthrough():
vx, vy = modulate_speed_near_sheep(1.0, 0.0, (0.0, 0.0), [(100.0, 0.0)]) vx, vy = modulate_speed(1.0, 0.0, (0.0, 0.0), [(100.0, 0.0)])
assert (vx, vy) == (1.0, 0.0) assert (vx, vy) == (1.0, 0.0)
def test_modulation_close_sheep_min_speed(): def test_modulation_close_sheep_min_speed():
vx, vy = modulate_speed_near_sheep(1.0, 0.0, (0.0, 0.0), [(0.0, 0.0)]) vx, vy = modulate_speed(1.0, 0.0, (0.0, 0.0), [(0.0, 0.0)])
assert math.isclose(vx, MIN_SPEED) assert math.isclose(vx, MIN_SPEED)
assert vy == 0.0 assert vy == 0.0
def test_modulation_preserves_direction(): def test_modulation_preserves_direction():
vx, vy = modulate_speed_near_sheep(0.6, 0.8, (0.0, 0.0), [(1.0, 0.0)]) vx, vy = modulate_speed(0.6, 0.8, (0.0, 0.0), [(1.0, 0.0)])
ratio = math.hypot(vx, vy) ratio = math.hypot(vx, vy)
# Direction preserved. # Direction preserved.
assert math.isclose(vx / ratio, 0.6, abs_tol=1e-6) assert math.isclose(vx / ratio, 0.6, abs_tol=1e-6)
@@ -47,16 +47,16 @@ def test_modulation_preserves_direction():
def test_modulation_linear_ramp_midpoint(): def test_modulation_linear_ramp_midpoint():
vx, _ = modulate_speed_near_sheep(1.0, 0.0, (0.0, 0.0), vx, _ = modulate_speed(1.0, 0.0, (0.0, 0.0),
[(SLOW_NEAR_SHEEP / 2, 0.0)]) [(SLOW_NEAR_SHEEP / 2, 0.0)])
expected = MIN_SPEED + (1.0 - MIN_SPEED) * 0.5 expected = MIN_SPEED + (1.0 - MIN_SPEED) * 0.5
assert math.isclose(vx, expected, abs_tol=1e-6) assert math.isclose(vx, expected, abs_tol=1e-6)
def test_modulation_accepts_dict_input(): def test_modulation_accepts_dict_input():
vx_list, _ = modulate_speed_near_sheep(1.0, 0.0, (0.0, 0.0), vx_list, _ = modulate_speed(1.0, 0.0, (0.0, 0.0),
[(1.0, 0.0)]) [(1.0, 0.0)])
vx_dict, _ = modulate_speed_near_sheep(1.0, 0.0, (0.0, 0.0), vx_dict, _ = modulate_speed(1.0, 0.0, (0.0, 0.0),
{"t0": (1.0, 0.0)}) {"t0": (1.0, 0.0)})
assert math.isclose(vx_list, vx_dict) assert math.isclose(vx_list, vx_dict)
@@ -106,11 +106,34 @@ def test_sequential_empty_input_idle():
def test_sequential_targets_closest_to_pen(): def test_sequential_targets_closest_to_pen():
# With 2 sheep (≤ STRAGGLER_THRESHOLD), sequential goes straight to
# "targeted" phase and pushes the sheep nearest to the pen.
near = (10.0, -5.0) # closer to pen entry (11.5, -15) near = (10.0, -5.0) # closer to pen entry (11.5, -15)
far = (-10.0, 10.0) far = (-10.0, 10.0)
sheep = {"near": near, "far": far} sheep = {"near": near, "far": far}
vx, vy, mode = sequential_action((0.0, 0.0), sheep, PEN_ENTRY)
assert mode == "targeted"
# Dog should be directed toward near sheep (south-east), not far (north-west).
assert vx > 0 and vy < 0
def test_sequential_collects_when_scattered():
# With >STRAGGLER_THRESHOLD sheep and radius > F_FACTOR*sqrt(n):
# should use collect (Strombom) not targeted.
sheep = {f"s{i}": pos for i, pos in enumerate([
(12.0, 10.0), (-12.0, 10.0), (0.0, 12.0),
(12.0, -12.0), (-10.0, -8.0),
])}
_vx, _vy, mode = sequential_action((0.0, 0.0), sheep, PEN_ENTRY) _vx, _vy, mode = sequential_action((0.0, 0.0), sheep, PEN_ENTRY)
assert mode.startswith("drive:near") assert mode in ("collect", "drive")
def test_sequential_drives_when_compact():
# Compact flock of 5 sheep near centre — should drive, not collect.
sheep = {f"s{i}": (float(i) * 0.3, float(i) * 0.3)
for i in range(5)}
_vx, _vy, mode = sequential_action((0.0, 5.0), sheep, PEN_ENTRY)
assert mode == "drive"
# --------------------------------------------------------------------------- # ---------------------------------------------------------------------------
+44 -5
View File
@@ -6,7 +6,7 @@ import pytest
from herding.world.diffdrive import ( from herding.world.diffdrive import (
heading_speed_to_wheels, kinematics_step, heading_speed_to_wheels, kinematics_step,
mecanum_inverse, mecanum_kinematics_step, mecanum_inverse, mecanum_step,
velocity_to_mecanum_wheels, velocity_to_wheels, velocity_to_mecanum_wheels, velocity_to_wheels,
) )
@@ -95,7 +95,7 @@ LY = 0.14 # half wheel_base_y
def test_mecanum_kinematics_zero_is_identity(): def test_mecanum_kinematics_zero_is_identity():
x, y, h = mecanum_kinematics_step( x, y, h = mecanum_step(
1.0, 2.0, 0.5, 0.0, 0.0, 0.0, 0.0, WHEEL_R, LX, LY, DT, 1.0, 2.0, 0.5, 0.0, 0.0, 0.0, 0.0, WHEEL_R, LX, LY, DT,
) )
assert (x, y, h) == (1.0, 2.0, 0.5) assert (x, y, h) == (1.0, 2.0, 0.5)
@@ -104,7 +104,7 @@ def test_mecanum_kinematics_zero_is_identity():
def test_mecanum_kinematics_pure_forward(): def test_mecanum_kinematics_pure_forward():
# All 4 wheels equal → pure forward (vx_body > 0, vy_body = 0). # All 4 wheels equal → pure forward (vx_body > 0, vy_body = 0).
w = 10.0 w = 10.0
x, y, h = mecanum_kinematics_step( x, y, h = mecanum_step(
0.0, 0.0, 0.0, w, w, w, w, WHEEL_R, LX, LY, DT, 0.0, 0.0, 0.0, w, w, w, w, WHEEL_R, LX, LY, DT,
) )
assert h == pytest.approx(0.0, abs=1e-9) assert h == pytest.approx(0.0, abs=1e-9)
@@ -118,7 +118,7 @@ def test_mecanum_kinematics_pure_strafe():
# vy_body = (-w_fl+w_fr+w_rl-w_rr)*r/4 > 0 # vy_body = (-w_fl+w_fr+w_rl-w_rr)*r/4 > 0
# Use w_fl=-10, w_fr=10, w_rl=10, w_rr=-10. # Use w_fl=-10, w_fr=10, w_rl=10, w_rr=-10.
w_fl, w_fr, w_rl, w_rr = -10.0, 10.0, 10.0, -10.0 w_fl, w_fr, w_rl, w_rr = -10.0, 10.0, 10.0, -10.0
x, y, h = mecanum_kinematics_step( x, y, h = mecanum_step(
0.0, 0.0, 0.0, w_fl, w_fr, w_rl, w_rr, WHEEL_R, LX, LY, DT, 0.0, 0.0, 0.0, w_fl, w_fr, w_rl, w_rr, WHEEL_R, LX, LY, DT,
) )
assert h == pytest.approx(0.0, abs=1e-9) assert h == pytest.approx(0.0, abs=1e-9)
@@ -127,11 +127,50 @@ def test_mecanum_kinematics_pure_strafe():
assert math.isclose(y, expected_vy * DT, rel_tol=1e-6) assert math.isclose(y, expected_vy * DT, rel_tol=1e-6)
def test_mecanum_kinematics_strafe_efficiency_scales_y():
# With strafe_efficiency=0.4, realised strafe should be 40% of ideal.
w_fl, w_fr, w_rl, w_rr = -10.0, 10.0, 10.0, -10.0
x, y, _ = mecanum_step(
0.0, 0.0, 0.0, w_fl, w_fr, w_rl, w_rr, WHEEL_R, LX, LY, DT,
strafe_efficiency=0.4,
)
ideal_vy = (-w_fl + w_fr + w_rl - w_rr) * WHEEL_R / 4.0
assert math.isclose(y, 0.4 * ideal_vy * DT, rel_tol=1e-6)
assert x == pytest.approx(0.0, abs=1e-9)
def test_mecanum_kinematics_strafe_bleed_pushes_backward():
# Negative bleed means strafe commands also push the body backward.
w_fl, w_fr, w_rl, w_rr = -10.0, 10.0, 10.0, -10.0
x, y, _ = mecanum_step(
0.0, 0.0, 0.0, w_fl, w_fr, w_rl, w_rr, WHEEL_R, LX, LY, DT,
strafe_efficiency=1.0,
strafe_to_forward_bleed=-0.28,
)
ideal_vy = (-w_fl + w_fr + w_rl - w_rr) * WHEEL_R / 4.0
assert math.isclose(y, ideal_vy * DT, rel_tol=1e-6)
expected_x = -0.28 * abs(ideal_vy) * DT
assert math.isclose(x, expected_x, rel_tol=1e-6)
def test_mecanum_kinematics_forward_unaffected_by_strafe_params():
# Forward command should be untouched by strafe_efficiency / bleed.
w_fl = w_fr = w_rl = w_rr = 10.0
x, y, _ = mecanum_step(
0.0, 0.0, 0.0, w_fl, w_fr, w_rl, w_rr, WHEEL_R, LX, LY, DT,
strafe_efficiency=0.4,
strafe_to_forward_bleed=-0.28,
)
expected_vx = (w_fl + w_fr + w_rl + w_rr) * WHEEL_R / 4.0
assert math.isclose(x, expected_vx * DT, rel_tol=1e-6)
assert y == pytest.approx(0.0, abs=1e-9)
def test_mecanum_kinematics_pure_rotation(): def test_mecanum_kinematics_pure_rotation():
# Pure rotation: vx_body=0, vy_body=0, omega>0. # Pure rotation: vx_body=0, vy_body=0, omega>0.
# w_fl=-10, w_fr=10, w_rl=-10, w_rr=10 → all sums cancel except omega. # w_fl=-10, w_fr=10, w_rl=-10, w_rr=10 → all sums cancel except omega.
w_fl, w_fr, w_rl, w_rr = -10.0, 10.0, -10.0, 10.0 w_fl, w_fr, w_rl, w_rr = -10.0, 10.0, -10.0, 10.0
x, y, h = mecanum_kinematics_step( x, y, h = mecanum_step(
0.0, 0.0, 0.0, w_fl, w_fr, w_rl, w_rr, WHEEL_R, LX, LY, DT, 0.0, 0.0, 0.0, w_fl, w_fr, w_rl, w_rr, WHEEL_R, LX, LY, DT,
) )
assert x == pytest.approx(0.0, abs=1e-9) assert x == pytest.approx(0.0, abs=1e-9)
+12 -12
View File
@@ -5,7 +5,7 @@ import math
from herding.world.geometry import ( from herding.world.geometry import (
FIELD_X, FIELD_Y, GATE_X, GATE_Y, MAX_SHEEP, PEN_ENTRY, PEN_X, PEN_Y, FIELD_X, FIELD_Y, GATE_X, GATE_Y, MAX_SHEEP, PEN_ENTRY, PEN_X, PEN_Y,
distance_to_pen_entry, in_field, in_gate_corridor, in_pen, distance_to_pen_entry, in_field, in_gate_corridor, in_pen,
is_penned_position, is_penned,
) )
@@ -44,23 +44,23 @@ def test_in_gate_corridor():
assert not in_gate_corridor(5.0, -18.0) assert not in_gate_corridor(5.0, -18.0)
def test_is_penned_position_latches_below_gate(): def test_is_penned_latches_below_gate():
# In the gate column and south of the gate plane → penned. # In the gate column and south of the gate plane → penned.
assert is_penned_position(11.5, -15.0) assert is_penned(11.5, -15.0)
assert is_penned_position(10.5, -18.0) assert is_penned(10.5, -18.0)
assert is_penned_position(12.5, -22.0) assert is_penned(12.5, -22.0)
# Above the gate plane → not yet. # Above the gate plane → not yet.
assert not is_penned_position(11.5, -14.9) assert not is_penned(11.5, -14.9)
# Outside the gate column → not penned even if south. # Outside the gate column → not penned even if south.
assert not is_penned_position(0.0, -16.0) assert not is_penned(0.0, -16.0)
assert not is_penned_position(14.0, -16.0) assert not is_penned(14.0, -16.0)
def test_is_penned_position_latch_margin(): def test_is_penned_latch_margin():
# Slight tolerance on the gate column. # Slight tolerance on the gate column.
assert is_penned_position(9.9, -15.5) assert is_penned(9.9, -15.5)
assert is_penned_position(13.1, -15.5) assert is_penned(13.1, -15.5)
assert not is_penned_position(9.7, -15.5) assert not is_penned(9.7, -15.5)
def test_distance_to_pen_entry(): def test_distance_to_pen_entry():
+86 -1
View File
@@ -136,7 +136,7 @@ def test_tracker_forgets_stale_tracks():
def test_tracker_penned_position_promotes_track(): def test_tracker_penned_position_promotes_track():
t = SheepTracker() t = SheepTracker()
t.update([(11.5, -16.0)]) # spawn inside the pen column t.update([(11.5, -16.0)]) # spawn inside the pen column
# is_penned_position is True for this point. # is_penned is True for this point.
assert t.n_penned() == 1 assert t.n_penned() == 1
assert t.n_active() == 0 assert t.n_active() == 0
@@ -164,3 +164,88 @@ def test_tracker_reset_clears_state():
t.reset() t.reset()
assert t.n_active() == 0 assert t.n_active() == 0
assert t.step == 0 assert t.step == 0
# ---------------------------------------------------------------------------
# Consensus promotion
# ---------------------------------------------------------------------------
def _tracker_with_consensus(k: int = 3, radius: float = 0.5, max_age: int = 8):
from herding.config import TrackerConfig
return SheepTracker(tracker_cfg=TrackerConfig(
consensus_k=k, consensus_radius_m=radius, consensus_max_age=max_age,
))
def test_consensus_default_disabled():
"""With consensus_k=1 (default) the first detection is immediately visible."""
t = SheepTracker()
t.update([(5.0, 0.0)])
assert t.n_active() == 1
assert len(t.get_positions()) == 1
def test_consensus_hides_one_shot_detection():
"""K>=2: a single detection that never reappears is filtered out."""
t = _tracker_with_consensus(k=3)
t.update([(5.0, 0.0)])
assert t.n_active() == 0 # candidate, not promoted
assert t.n_candidate() == 1
assert t.get_positions() == {}
def test_consensus_promotes_after_k_matches():
"""A real sheep visible for K frames promotes and appears in get_positions."""
t = _tracker_with_consensus(k=3)
for _ in range(3):
t.update([(5.0, 0.0)])
assert t.n_active() == 1
assert t.n_candidate() == 0
assert len(t.get_positions()) == 1
def test_consensus_candidate_expires_quickly():
"""A candidate that fails to re-confirm within consensus_max_age dies."""
t = _tracker_with_consensus(k=3, max_age=5)
t.update([(5.0, 0.0)])
assert t.n_candidate() == 1
for _ in range(6): # > max_age empty frames
t.update([])
assert t.n_candidate() == 0
assert t.n_active() == 0
def test_consensus_tracker_does_not_promote_phantom_pen():
"""A one-shot detection inside the pen column must not latch as penned
while it is still a candidate."""
t = _tracker_with_consensus(k=3)
t.update([(11.5, -16.0)]) # gate-area FP, inside the pen column
# Not promoted, not penned — just a candidate.
assert t.n_penned() == 0
assert t.n_candidate() == 1
# And after one expiry window it disappears entirely.
for _ in range(10):
t.update([])
assert t.n_penned() == 0
assert t.n_candidate() == 0
def test_consensus_distinguishes_real_sheep_from_phantom():
"""Real sheep (continuous detections) promote; phantom (intermittent
detections at jittered positions outside consensus_radius) does not
appear in get_positions even while individual candidates are still
within the max-age window."""
t = _tracker_with_consensus(k=3, radius=0.4, max_age=4)
# Real sheep visible at (5, 0) every frame; phantom jitters > radius.
phantom_positions = [(10.0, 5.0), (10.5, 5.6), (11.1, 5.0), (10.0, 5.7)]
for k in range(4):
t.update([(5.0, 0.0), phantom_positions[k]])
positions = t.get_positions()
assert len(positions) == 1
real_xy = next(iter(positions.values()))
assert math.hypot(real_xy[0] - 5.0, real_xy[1]) < 0.5
# And once the candidate window has elapsed, every phantom has died.
for _ in range(8):
t.update([(5.0, 0.0)])
assert t.n_candidate() == 0
assert len(t.get_positions()) == 1
+57
View File
@@ -0,0 +1,57 @@
#!/usr/bin/env bash
# Measure the actual velocity response of the Webots mecanum robot and
# compare against the gym's first-order kinematics prediction.
#
# Uses HERDING_MODE=calibrate in the shepherd_dog controller, which applies
# a known fixed action for N steps, records GPS displacement, and computes
# the relative error vs gym prediction.
#
# Usage:
# bash tools/calibrate_mecanum.sh [N_STEPS]
# N_STEPS : steps to hold each action (default 150, ≈ 2.4 s real-time)
#
# Output:
# calibrate_mecanum.log — per-axis results printed and written here
#
# Target: < 10% relative error on each axis.
# If errors are high, tune coulombFriction / forceDependentSlip in
# tools/run_webots.sh (mecanum contactProperties block).
set -euo pipefail
N_STEPS="${1:-150}"
ROOT="$( cd "$( dirname "${BASH_SOURCE[0]}" )/.." && pwd )"
LOG="$ROOT/calibrate_mecanum.log"
source "$( cd "$( dirname "${BASH_SOURCE[0]}" )" && pwd )/setup_env.sh"
echo "Running mecanum calibration (N_STEPS=$N_STEPS)..."
echo "Results will be written to: $LOG"
truncate -s 0 "$LOG"
run_calib() {
local vx="$1" vy="$2" om="$3"
echo " Testing vx=$vx vy=$vy om=$om ..."
rm -f "$ROOT/training/.run_done"
timeout --kill-after=15s 60 \
xvfb-run -a \
env WEBOTS_HEADLESS=1 WEBOTS_EXTRA_ARGS="--stdout --stderr" \
HERDING_MODE=calibrate HERDING_DRIVE=mecanum HERDING_WORLD=field \
CALIB_VX="$vx" CALIB_VY="$vy" CALIB_OM="$om" \
CALIB_N_STEPS="$N_STEPS" \
bash "$ROOT/tools/run_webots.sh" 0 calibrate mecanum field \
2>&1 | grep -E "cmd=|gym|webots|error" || true
pkill -9 -f "webots-bin|Xvfb" 2>/dev/null || true
sleep 1
}
# Three test vectors: pure-x, pure-y, diagonal
run_calib 0.5 0.0 0.0
run_calib 0.0 0.5 0.0
run_calib 0.35 0.35 0.0
echo ""
echo "=== Calibration results ==="
cat "$LOG" 2>/dev/null || echo "(no results written — check controller output above)"
echo ""
echo "Target: <10% error on each axis."
echo "If errors are high, tune coulombFriction / forceDependentSlip in"
echo "tools/run_webots.sh (mecanum contactProperties block)."
+67
View File
@@ -0,0 +1,67 @@
#!/usr/bin/env bash
# Run one DAgger round on a (drive, world) combo.
#
# Usage: tools/dagger_round.sh <drive> <world> [seeds_per_n] [round_idx]
#
# Collects DAgger demos using the current BC policy as the actor and the
# universal teacher as the labeller, in the HERDING_WEBOTS preset env
# (140° FOV, tight tracker — matches deployment). Concatenates with the
# original BC demos, re-trains BC, and saves to runs/bc_dagger_<combo>/.
set -euo pipefail
ROOT="$( cd "$( dirname "${BASH_SOURCE[0]}" )/.." && pwd )"
cd "$ROOT"
DRIVE="${1:-differential}"
WORLD="${2:-field}"
SEEDS="${3:-15}"
ROUND="${4:-1}"
TAG="${DRIVE}_${WORLD}"
ORIG_DEMOS="training/bc/demos_${TAG}.npz"
DAGGER_DEMOS="training/bc/dagger${ROUND}_${TAG}.npz"
COMBINED_DEMOS="training/bc/combined${ROUND}_${TAG}.npz"
BC_DIR="training/runs/bc_${TAG}"
OUT_DIR="training/runs/bc_dagger${ROUND}_${TAG}"
case "$WORLD" in
field_round)
EPOCHS=150
;;
*)
EPOCHS=60
;;
esac
echo "=== DAgger round ${ROUND}: ${DRIVE}/${WORLD} ==="
echo " Actor policy: ${BC_DIR}/policy.zip"
echo " Output: ${OUT_DIR}/policy.zip"
# 1. Collect DAgger demos: BC drives, teacher labels (privileged + HERDING_WEBOTS).
python -m training.bc.collect \
--teacher universal --out "$DAGGER_DEMOS" \
--seeds-per-n "$SEEDS" --subsample 3 \
--frame-stack 4 --drive-mode "$DRIVE" --world "$WORLD" \
--max-steps 30000 \
--privileged --use-webots-preset \
--fp-rate 0.0 --action-smooth 0.55 --wheel-slip-std 0.05 \
--dagger-policy "$BC_DIR"
# 2. Concatenate original demos + dagger demos.
python - <<PY
import numpy as np
orig = np.load("${ORIG_DEMOS}")
dag = np.load("${DAGGER_DEMOS}")
obs = np.concatenate([orig["obs"], dag["obs"]], axis=0)
act = np.concatenate([orig["actions"], dag["actions"]], axis=0)
np.savez("${COMBINED_DEMOS}", obs=obs, actions=act,
meta=np.concatenate([orig["meta"], dag["meta"]], axis=0))
print(f"[combine] orig={orig['obs'].shape[0]} + dagger={dag['obs'].shape[0]} = {obs.shape[0]}")
PY
# 3. Re-train BC on combined demos.
python -m training.bc.pretrain \
--demos "$COMBINED_DEMOS" --out "$OUT_DIR" \
--epochs "$EPOCHS" --net-arch 512,512
echo "=== DAgger round ${ROUND} done: ${OUT_DIR}/policy.zip ==="
+86
View File
@@ -0,0 +1,86 @@
#!/usr/bin/env bash
# Full retrain + eval + Webots-validate pipeline.
#
# Usage: bash tools/full_pipeline.sh
#
# Output logs are written to the repo root:
# full_pipeline.log — main pipeline log
# stage_train.log — make train_all output
# stage_eval.log — make eval_all output
# stage_webots.log — Webots validation sweep
#
# Total runtime estimate: 812 hours.
set -e
ROOT="$( cd "$( dirname "${BASH_SOURCE[0]}" )/.." && pwd )"
cd "$ROOT"
source "$ROOT/tools/setup_env.sh"
PIPELINE_LOG="$ROOT/full_pipeline.log"
TRAIN_LOG="$ROOT/stage_train.log"
EVAL_LOG="$ROOT/stage_eval.log"
WEBOTS_LOG="$ROOT/stage_webots.log"
truncate -s 0 "$PIPELINE_LOG" "$TRAIN_LOG" "$EVAL_LOG" "$WEBOTS_LOG"
log() { echo "[pipeline $(date +%H:%M:%S)] $*" | tee -a "$PIPELINE_LOG"; }
log "=== START full pipeline $(date) ==="
log ""
log "Phase 1/4: clean_all"
make clean_all 2>&1 | tee -a "$PIPELINE_LOG"
log ""
log "Phase 2/4: train_all (4 combos, ~8h)"
make train_all 2>&1 | tee -a "$TRAIN_LOG"
log " train_all finished"
log ""
log "Phase 3/4: eval_all (gym eval, ~30min)"
make eval_all 2>&1 | tee -a "$EVAL_LOG"
log " eval_all finished"
log ""
log "Phase 4/4: Webots validation sweep (~90min)"
truncate -s 0 "$WEBOTS_LOG"
run_cell() {
local MODE="$1" DRIVE="$2" WORLD="$3" N="$4"
echo "" | tee -a "$WEBOTS_LOG"
echo "=== $MODE $DRIVE $WORLD n=$N ===" | tee -a "$WEBOTS_LOG"
rm -f "$ROOT/training/.run_done"
local STDOUT="$ROOT/pipeline_${MODE}_${DRIVE}_${WORLD}_n${N}.stdout"
timeout --kill-after=15s 320 \
xvfb-run -a \
env WEBOTS_HEADLESS=1 WEBOTS_EXTRA_ARGS="--stdout --stderr" \
HERDING_SEED=42 \
bash tools/run_webots.sh "$N" "$MODE" "$DRIVE" "$WORLD" > "$STDOUT" 2>&1 || true
BEST=$(grep "GT_penned=" "$STDOUT" 2>/dev/null | awk -F'GT_penned=' '{print $2}' | awk '{split($1,a,"/"); print a[1]"/"a[2]}' | sort -t/ -k1,1n | tail -1)
grep -E "\[results\]" "$STDOUT" 2>/dev/null | head -1 | tee -a "$WEBOTS_LOG"
echo " best GT_penned: $BEST" | tee -a "$WEBOTS_LOG"
pkill -9 -f "webots-bin|Xvfb" 2>/dev/null || true
sleep 1
}
# Differential drive: 4 controllers × 2 worlds × 2 n
for M in bc rl strombom sequential; do
for W in field field_round; do
for N in 5 10; do
run_cell "$M" differential "$W" "$N"
done
done
done
# Mecanum drive: 2 controllers × 2 worlds × 2 n
for M in bc rl; do
for W in field field_round; do
for N in 5 10; do
run_cell "$M" mecanum "$W" "$N"
done
done
done
log ""
log "=== FULL PIPELINE DONE $(date) ==="
log ""
log "Summary:"
grep -E "=== |best GT_penned" "$WEBOTS_LOG" | tee -a "$PIPELINE_LOG"
+210
View File
@@ -0,0 +1,210 @@
"""Generate ShepherdDogMecanum.proto wheel blocks with physical rollers.
Each wheel becomes:
HingeJoint (motor, axis 0 1 0 = body lateral)
-> Solid (wheel hub, rotation 0 -1 0 π/2)
children:
- WHEEL_VIS (visual, kept as-is for appearance)
- 8x HingeJoint (passive roller, axis tilted ±45° from wheel rotation
axis, tangent to the wheel circumference at the mount
point)
-> Solid (capsule)
boundingObject: a small Cylinder for the hub (smaller radius than the
roller circle so the hub doesn't touch the ground)
X-pattern roller tilt assignment:
FR, RL -> -45° (wheel-axis-relative)
FL, RR -> +45°
All math is done in the WHEEL SOLID's local frame. The wheel solid's
rotation `0 -1 0 π/2` takes wheel-local x -> body +z (up),
wheel-local y -> body +y (lateral, = wheel rotation axis),
wheel-local z -> body -x (rearward). Conversely, a body-frame offset
(dx, dy, dz) becomes (dz, dy, -dx) in wheel-local coords.
For a wheel rotating about body y at angle θ (θ=0 = body +x = forward,
θ=π/2 = body +z = top), the roller mount in body frame is
(R*cos(θ), 0, R*sin(θ)) relative to wheel centre. Tangent (radial-perp,
in the wheel-spin plane) is (-sin(θ), 0, cos(θ)); the wheel rotation
axis is (0, 1, 0). Roller axis tilted +45° from tangent toward wheel
axis:
axis_body(+45°) = (1/√2) * (-sin(θ), +1, cos(θ))
axis_body(-45°) = (1/√2) * (-sin(θ), -1, cos(θ))
Transformed to wheel-local: (dz, dy, -dx) on each component gives
mount_local = (R*sin(θ), 0, -R*cos(θ))
axis_local(+45) = (cos(θ)/√2, +1/√2, sin(θ)/√2)
axis_local(-45) = (cos(θ)/√2, -1/√2, sin(θ)/√2)
The Solid's `rotation` field needs to align the Capsule's default
axis (+y) with that local axis. The minimal axis-angle that does this:
rotation_axis = (sin(θ), 0, -cos(θ)) (unit)
rotation_angle = π/4 for +45° tilt, 3π/4 for -45° tilt
"""
import math
WHEEL_NAMES = {
# Tilt sign refers to roller-axis tilt direction relative to the wheel
# rotation axis (body +y). X-pattern requires rollers on each wheel to
# tilt INWARD toward the body centre. For a wheel at +y body coord, that
# means tilting toward -y; for a wheel at -y, tilting toward +y.
"fr": ("front right", +0.14, -0.14, +1), # +1 = +45° tilt (toward +y, inward)
"fl": ("front left", +0.14, +0.14, -1), # -1 = -45° tilt (toward -y, inward)
"rr": ("rear right", -0.14, -0.14, -1), # -1 (toward -y, "outward"...
"rl": ("rear left", -0.14, +0.14, +1), # +1 (toward +y, "outward"...
# ...for the rear pair the X-pattern flips so diagonal pairs FL+RR have
# SAME tilt direction in body frame, FR+RL the other. The signs above
# encode that: FR/RL both +1, FL/RR both -1.
}
R_ROLLER_OFFSET = 0.031 # roller-centre distance from wheel hub centre
R_ROLLER_RADIUS = 0.007
R_ROLLER_HEIGHT = 0.020
ROLLER_MASS = 0.003
HUB_RADIUS = 0.020 # < R_ROLLER_OFFSET - R_ROLLER_RADIUS so hub doesn't touch
HUB_HEIGHT = 0.022
HUB_MASS = 0.045
N_ROLLERS = 8
def wheel_block(key):
name, ax, ay, tilt_sign = WHEEL_NAMES[key]
contact_mat = "MecanumWheelA" if tilt_sign > 0 else "MecanumWheelB"
safe = name.replace(" ", "_").upper()
rollers = []
for k in range(N_ROLLERS):
theta = 2.0 * math.pi * k / N_ROLLERS
s, c = math.sin(theta), math.cos(theta)
# Mount position in wheel-local frame.
mx = R_ROLLER_OFFSET * s
my = 0.0
mz = -R_ROLLER_OFFSET * c
# Hinge axis in wheel-local frame.
ax_l = c / math.sqrt(2.0)
ay_l = tilt_sign / math.sqrt(2.0)
az_l = s / math.sqrt(2.0)
# Rotation that maps Capsule default axis (0,1,0) to (ax_l, ay_l, az_l).
rot_axis = (s, 0.0, -c)
rot_angle = math.pi / 4.0 if tilt_sign > 0 else 3.0 * math.pi / 4.0
rollers.append(f"""\
# Mecanum roller {k+1} (θ={math.degrees(theta):.0f}°)
HingeJoint {{
jointParameters HingeJointParameters {{
axis {ax_l:.6f} {ay_l:.6f} {az_l:.6f}
anchor {mx:.6f} {my:.6f} {mz:.6f}
}}
endPoint Solid {{
translation {mx:.6f} {my:.6f} {mz:.6f}
rotation {rot_axis[0]:.6f} {rot_axis[1]:.6f} {rot_axis[2]:.6f} {rot_angle:.6f}
children [
Shape {{
appearance PBRAppearance {{
baseColor 0.12 0.12 0.12
roughness 0.7
metalness 0.1
}}
geometry Capsule {{
height {R_ROLLER_HEIGHT}
radius {R_ROLLER_RADIUS}
subdivision 8
}}
}}
]
name "{name} roller {k+1}"
contactMaterial "{contact_mat}"
boundingObject Capsule {{
height {R_ROLLER_HEIGHT}
radius {R_ROLLER_RADIUS}
subdivision 8
}}
physics Physics {{
density -1
mass {ROLLER_MASS}
centerOfMass [
0 0 0
]
}}
}}
}}""")
rollers_str = "\n".join(rollers)
return f"""\
# ========== {name.upper()} WHEEL ==========
DEF {safe}_WHEEL_JOINT HingeJoint {{
jointParameters HingeJointParameters {{
axis 0 1 0
anchor {ax} {ay} 0.038
}}
device [
RotationalMotor {{
name "{name} wheel motor"
maxVelocity 70.0
maxTorque 20.0
}}
PositionSensor {{
name "{name} wheel sensor"
resolution 0.00628
}}
]
endPoint Solid {{
translation {ax} {ay} 0.038
rotation 0 -1 0 1.570796
children [
# Visual hub only — the rollers below provide ground contact.
Pose {{
rotation 1 0 0 -1.5708
children [
Shape {{
appearance PBRAppearance {{
baseColor 0.5 0.5 0.5
roughness 0.3
metalness 0.7
}}
geometry Cylinder {{
height 0.018
radius {HUB_RADIUS - 0.002}
subdivision 16
}}
}}
Shape {{
appearance PBRAppearance {{
baseColor 0.6 0.6 0.6
roughness 0.2
metalness 0.8
}}
geometry Cylinder {{
height 0.022
radius 0.008
subdivision 8
}}
}}
]
}}
{rollers_str}
]
name "{name} wheel"
boundingObject Pose {{
rotation 1 0 0 -1.5708
children [
Cylinder {{
height {HUB_HEIGHT}
radius {HUB_RADIUS}
}}
]
}}
physics Physics {{
density -1
mass {HUB_MASS}
centerOfMass [
0 0 0
]
}}
}}
}}"""
if __name__ == "__main__":
for k in ("fr", "fl", "rr", "rl"):
print(wheel_block(k))
print()
+147 -33
View File
@@ -33,17 +33,23 @@
# WEBOTS_EXTRA_ARGS="--stdout --stderr" WEBOTS_HEADLESS=1 tools/run_webots.sh 10 rl # WEBOTS_EXTRA_ARGS="--stdout --stderr" WEBOTS_HEADLESS=1 tools/run_webots.sh 10 rl
set -e set -e
# Make sure HERDING_PYTHON is resolved and on PATH so Webots inherits
# the right interpreter (controllers/{shepherd_dog,sheep}/runtime.ini
# both read $HERDING_PYTHON via env-var expansion).
source "$( cd "$( dirname "${BASH_SOURCE[0]}" )" && pwd )/setup_env.sh"
N=${1:-10} N=${1:-10}
MODE=${2:-bc} MODE=${2:-bc}
DRIVE=${3:-differential} DRIVE=${3:-differential}
WORLD=${4:-field} WORLD=${4:-field}
if (( N < 1 || N > 10 )); then if (( N < 0 || N > 10 )); then
echo "N must be 1..10, got $N" >&2; exit 1 echo "N must be 0..10, got $N" >&2; exit 1
fi fi
case "$MODE" in case "$MODE" in
bc|rl|strombom|sequential|universal) ;; bc|rl|strombom|sequential|universal|calibrate) ;;
*) echo "MODE must be bc|rl|strombom|sequential|universal, got '$MODE'" >&2; exit 1 ;; *) echo "MODE must be bc|rl|strombom|sequential|universal|calibrate, got '$MODE'" >&2; exit 1 ;;
esac esac
case "$DRIVE" in case "$DRIVE" in
differential|mecanum) ;; differential|mecanum) ;;
@@ -60,52 +66,103 @@ DST="$ROOT/worlds/${WORLD}_test.wbt"
if [[ -n "${HERDING_POLICY_DIR:-}" ]]; then if [[ -n "${HERDING_POLICY_DIR:-}" ]]; then
RESOLVED_POLICY_DIR="$HERDING_POLICY_DIR" RESOLVED_POLICY_DIR="$HERDING_POLICY_DIR"
else else
# Try drive-mode-specific path first, then legacy path. # The training pipeline writes policies to:
# training/runs/{bc,rl}_<drive>_<world>
# Try that first; fall back to the drive-only and finally the
# bare-mode legacy paths so older policy checkouts still load.
if [[ "$MODE" == "rl" ]]; then if [[ "$MODE" == "rl" ]]; then
DRIVED="$ROOT/training/runs/rl_${DRIVE}" BASE="rl"
LEGACY="$ROOT/training/runs/rl"
else else
DRIVED="$ROOT/training/runs/bc_${DRIVE}" BASE="bc"
LEGACY="$ROOT/training/runs/bc"
fi
if [[ -d "$DRIVED" ]]; then
RESOLVED_POLICY_DIR="$DRIVED"
else
RESOLVED_POLICY_DIR="$LEGACY"
fi fi
for CAND in \
"$ROOT/training/runs/${BASE}_${DRIVE}_${WORLD}" \
"$ROOT/training/runs/${BASE}_${DRIVE}" \
"$ROOT/training/runs/${BASE}"
do
if [[ -d "$CAND" ]]; then
RESOLVED_POLICY_DIR="$CAND"
break
fi
done
: "${RESOLVED_POLICY_DIR:=$ROOT/training/runs/${BASE}_${DRIVE}_${WORLD}}"
fi fi
cp "$SRC" "$DST" cp "$SRC" "$DST"
# Swap robot proto based on drive mode. # LiDAR FOV variant. Mecanum defaults to 360° (the trained mecanum
# Base worlds reference ShepherdDog (diff-drive). For mecanum we swap in # target); diff defaults to 140°. Override with HERDING_LIDAR=140 or
# ShepherdDogMecanum and inject mecanum contact properties. # HERDING_LIDAR=360 for ablations.
if [[ "$DRIVE" == "mecanum" ]]; then if [[ -z "${HERDING_LIDAR:-}" ]]; then
if [[ "$DRIVE" == "mecanum" ]]; then
LIDAR_VARIANT="360"
else
LIDAR_VARIANT="140"
fi
else
LIDAR_VARIANT="$HERDING_LIDAR"
fi
if [[ "$LIDAR_VARIANT" != "140" && "$LIDAR_VARIANT" != "360" ]]; then
echo "HERDING_LIDAR must be 140 or 360, got '$LIDAR_VARIANT'" >&2; exit 1
fi
export HERDING_LIDAR="$LIDAR_VARIANT"
# Swap robot proto based on drive mode + LiDAR variant.
# Base worlds reference ShepherdDog (diff-drive 140°). The four
# combinations the launcher supports:
# diff + 140° → ShepherdDog.proto (default)
# diff + 360° → ShepherdDog360.proto (FOV ablation for diff)
# mecanum+ 140° → ShepherdDogMecanum.proto
# mecanum+ 360° → ShepherdDogMecanum360.proto (the trained mecanum target)
if [[ "$DRIVE" == "mecanum" && "$LIDAR_VARIANT" == "360" ]]; then
sed -i 's|"../protos/ShepherdDog.proto"|"../protos/ShepherdDogMecanum360.proto"|' "$DST"
sed -i 's|^ShepherdDog {|ShepherdDogMecanum360 {|' "$DST"
elif [[ "$DRIVE" == "mecanum" ]]; then
sed -i 's|"../protos/ShepherdDog.proto"|"../protos/ShepherdDogMecanum.proto"|' "$DST" sed -i 's|"../protos/ShepherdDog.proto"|"../protos/ShepherdDogMecanum.proto"|' "$DST"
sed -i 's|^ShepherdDog {|ShepherdDogMecanum {|' "$DST" sed -i 's|^ShepherdDog {|ShepherdDogMecanum {|' "$DST"
# Inject mecanum contact properties after the existing contactProperties block. elif [[ "$LIDAR_VARIANT" == "360" ]]; then
sed -i 's|"../protos/ShepherdDog.proto"|"../protos/ShepherdDog360.proto"|' "$DST"
sed -i 's|^ShepherdDog {|ShepherdDog360 {|' "$DST"
fi
if [[ "$DRIVE" == "mecanum" ]]; then
# Wheel-ground friction. The chassis is driven kinematically by
# the Supervisor (see drive_mecanum in controllers/shepherd_dog),
# so these properties only affect wheel-spin visuals, not the
# robot's motion. coulombFriction 2.0 plus a small
# forceDependentSlip keeps the rollers from locking up against
# the ground.
python3 -c " python3 -c "
import re, sys with open('$DST', 'r') as f:
with open(sys.argv[1], 'r') as f:
txt = f.read() txt = f.read()
# Find the closing ']' of contactProperties and insert before it. mec = ''' ContactProperties {
mec = ''' material1 \"MecanumWheelA\"
ContactProperties {
material1 \"MecanumWheel\"
coulombFriction [ coulombFriction [
2 2.0
] ]
bounce 0 bounce 0
forceDependentSlip [ forceDependentSlip [
10 0.005
] ]
softCFM 0.0001 softCFM 0.0001
}''' }
# Insert before the first ']' that closes contactProperties [...] ContactProperties {
txt = re.sub(r'(contactProperties\s*\[[^\]]*)(\])', r'\1' + mec + r'\2', txt, count=1) material1 \"MecanumWheelB\"
with open(sys.argv[1], 'w') as f: coulombFriction [
2.0
]
bounce 0
forceDependentSlip [
0.005
]
softCFM 0.0001
}
'''
# The contactProperties array closes with ' ]\n}' (2-space indent ] then WorldInfo }).
# Insert the new block just before that closing ].
txt = txt.replace('\n ]\n}', '\n' + mec + ' ]\n}', 1)
with open('$DST', 'w') as f:
f.write(txt) f.write(txt)
" "$DST" "
fi fi
# Comment out sheep N+1..10 by prefixing the matching Sheep { ... } line. # Comment out sheep N+1..10 by prefixing the matching Sheep { ... } line.
@@ -113,11 +170,62 @@ for i in $(seq $((N+1)) 10); do
sed -i "s|^Sheep .* \"sheep${i}\".*|# &|" "$DST" sed -i "s|^Sheep .* \"sheep${i}\".*|# &|" "$DST"
done done
active=$(grep -c '^Sheep' "$DST") # Dual-dog axis split. When HERDING_NDOGS=2 the launcher replaces the
# single dog node in the world with two named dogs whose customData
# carries the axis assignment (x or y); the controller masks the
# off-axis component of every action.
NDOGS="${HERDING_NDOGS:-1}"
if [[ "$NDOGS" != "1" && "$NDOGS" != "2" ]]; then
echo "HERDING_NDOGS must be 1 or 2, got '$NDOGS'" >&2; exit 1
fi
if [[ "$NDOGS" == "2" ]]; then
DOG_NODE_NAME="ShepherdDog"
if [[ "$DRIVE" == "mecanum" ]]; then
DOG_NODE_NAME="ShepherdDogMecanum"
elif [[ "$LIDAR_VARIANT" == "360" ]]; then
DOG_NODE_NAME="ShepherdDog360"
fi
python3 - "$DST" "$DOG_NODE_NAME" <<'PY'
import re, sys
path, node = sys.argv[1], sys.argv[2]
with open(path) as f:
txt = f.read()
# Match the single existing dog block from "ShepherdDog{,360,Mecanum} {"
# up to its closing "}" on a line by itself.
pattern = re.compile(rf"^{re.escape(node)} \{{\n(.*?\n)^\}}\n", re.MULTILINE | re.DOTALL)
m = pattern.search(txt)
if m is None:
sys.exit(f"[run_webots] could not locate single-dog block ({node}) for split")
two_dogs = (
f"{node} {{\n"
f" translation -4 -10 0.5\n"
f" rotation 0 0 1 1.5708\n"
f' name "ShepherdDogX"\n'
f' customData "axis=x"\n'
f' controller "shepherd_dog"\n'
f"}}\n"
f"{node} {{\n"
f" translation 4 -10 0.5\n"
f" rotation 0 0 1 1.5708\n"
f' name "ShepherdDogY"\n'
f' customData "axis=y"\n'
f' controller "shepherd_dog"\n'
f"}}\n"
)
with open(path, 'w') as f:
f.write(txt[:m.start()] + two_dogs + txt[m.end():])
PY
fi
export HERDING_NDOGS="$NDOGS"
active=$(grep -c '^Sheep' "$DST" || true)
ndog=$(grep -cE '^(ShepherdDog|ShepherdDog360|ShepherdDogMecanum) \{' "$DST" || true)
echo "------------------------------------------------------------" echo "------------------------------------------------------------"
echo "World : $DST" echo "World : $DST"
echo "Mode : $MODE" echo "Mode : $MODE"
echo "Drive : $DRIVE" echo "Drive : $DRIVE"
echo "LiDAR : ${LIDAR_VARIANT}°"
echo "Dogs : $ndog (axis-split=${NDOGS})"
echo "Sheep : $active active" echo "Sheep : $active active"
echo "Policy dir : $RESOLVED_POLICY_DIR" echo "Policy dir : $RESOLVED_POLICY_DIR"
echo "------------------------------------------------------------" echo "------------------------------------------------------------"
@@ -129,12 +237,18 @@ HERDING_MODE=$MODE
HERDING_POLICY_DIR=$RESOLVED_POLICY_DIR HERDING_POLICY_DIR=$RESOLVED_POLICY_DIR
HERDING_DRIVE=$DRIVE HERDING_DRIVE=$DRIVE
HERDING_WORLD=$WORLD HERDING_WORLD=$WORLD
HERDING_LIDAR=$LIDAR_VARIANT
HERDING_NDOGS=$NDOGS
HERDING_AXIS_LEAK=${HERDING_AXIS_LEAK:-0.3}
HERDING_USE_GT=${HERDING_USE_GT:-0}
HERDING_SEED=${HERDING_SEED:-}
EOF EOF
export HERDING_MODE="$MODE" export HERDING_MODE="$MODE"
export HERDING_POLICY_DIR="$RESOLVED_POLICY_DIR" export HERDING_POLICY_DIR="$RESOLVED_POLICY_DIR"
export HERDING_DRIVE="$DRIVE" export HERDING_DRIVE="$DRIVE"
export HERDING_WORLD="$WORLD" export HERDING_WORLD="$WORLD"
export HERDING_LIDAR="$LIDAR_VARIANT"
# The controller writes this sentinel when all GT sheep are penned. We # The controller writes this sentinel when all GT sheep are penned. We
# poll for it and kill Webots so the run finishes cleanly instead of # poll for it and kill Webots so the run finishes cleanly instead of
+23
View File
@@ -0,0 +1,23 @@
# Source this from your shell before running the launchers:
#
# source tools/setup_env.sh
#
# The launchers (`tools/run_webots.sh`, `tools/webots_sweep*.sh`,
# `tools/calibrate_mecanum.sh`) and the Webots controllers (via
# `controllers/*/runtime.ini`) all read $HERDING_PYTHON to decide
# which Python interpreter to use. The default below points at the
# project author's conda env — edit this file or override the var in
# your shell to match your own setup.
: "${HERDING_PYTHON:=/home/jalf/miniconda3/envs/tir/bin/python3}"
export HERDING_PYTHON
# Prepend the Python's bin/ to PATH so subprocesses pick up the same
# interpreter (used by Webots when it doesn't read runtime.ini, and
# by any Python tooling launched by the bash scripts).
export PATH="$(dirname "$HERDING_PYTHON"):$PATH"
if [[ ! -x "$HERDING_PYTHON" ]]; then
echo "[setup_env] WARNING: HERDING_PYTHON=$HERDING_PYTHON is not executable." >&2
echo "[setup_env] Edit tools/setup_env.sh or 'export HERDING_PYTHON=...' yourself." >&2
fi
+197
View File
@@ -0,0 +1,197 @@
#!/usr/bin/env bash
# Interactive Webots launcher. Prompts for the experiment knobs
# (mode, drive, world, LiDAR FOV, number of dogs, flock size, GT
# bypass) and then dispatches to tools/run_webots.sh with the
# selected configuration.
#
# Usage: bash tools/webots_menu.sh
set -e
SCRIPT_DIR="$( cd "$( dirname "${BASH_SOURCE[0]}" )" && pwd )"
ROOT="$( cd "$SCRIPT_DIR/.." && pwd )"
# Resolve HERDING_PYTHON the same way every other launcher does.
source "$SCRIPT_DIR/setup_env.sh"
# ----- Cosmetics ----------------------------------------------------
if [[ -t 1 ]]; then
B=$'\e[1m'; D=$'\e[2m'; R=$'\e[0m'
G=$'\e[32m'; Y=$'\e[33m'; C=$'\e[36m'
else
B=""; D=""; R=""; G=""; Y=""; C=""
fi
banner () {
cat <<EOF
${B}${C}┌──────────────────────────────────────────────────────────────────┐
│ Shepherd-dog Webots launcher (interactive) │
└──────────────────────────────────────────────────────────────────┘${R}
${Y}⚠ Python interpreter${R}
This script and the Webots controllers read ${B}\$HERDING_PYTHON${R} to
decide which interpreter to start. Current value:
${G}$HERDING_PYTHON${R}
${D}If that path is wrong on your machine, edit ${R}${B}tools/setup_env.sh${R}${D}
or export HERDING_PYTHON=/path/to/python3 in your shell.${R}
EOF
}
ask_choice () {
# ask_choice "Prompt" "default" "label1:val1" "label2:val2" ...
local prompt="$1" default="$2"; shift 2
local i=1 labels=() values=()
for opt in "$@"; do
labels+=("${opt%%:*}")
values+=("${opt#*:}")
done
while true; do
echo "${B}$prompt${R}"
for i in "${!labels[@]}"; do
local marker=" "
[[ "${values[$i]}" == "$default" ]] && marker="${G}*${R}"
printf " %s %d) ${B}%s${R}\n" "$marker" "$((i+1))" "${labels[$i]}"
done
printf " Choice [${G}1-${#labels[@]}${R}, default ${G}%s${R}]: " "$default"
local raw; read -r raw || true
raw="${raw:-}"
if [[ -z "$raw" ]]; then
CHOICE="$default"; return
fi
if [[ "$raw" =~ ^[0-9]+$ ]] && (( raw >= 1 && raw <= ${#labels[@]} )); then
CHOICE="${values[$((raw-1))]}"; return
fi
echo " ${Y}invalid — try again${R}"
done
}
ask_int () {
# ask_int "Prompt" default min max
local prompt="$1" default="$2" lo="$3" hi="$4"
while true; do
printf "${B}%s${R} [${G}%s${R}-${G}%s${R}, default ${G}%s${R}]: " "$prompt" "$lo" "$hi" "$default"
local raw; read -r raw || true
raw="${raw:-$default}"
if [[ "$raw" =~ ^[0-9]+$ ]] && (( raw >= lo && raw <= hi )); then
CHOICE="$raw"; return
fi
echo " ${Y}must be an integer in [$lo, $hi]${R}"
done
}
# ----- Prompts ------------------------------------------------------
banner
ask_choice "Mode" "bc" \
"BC (behaviour-cloned MLP):bc" \
"RL (KL-PPO fine-tune):rl" \
"Strömbom (analytic):strombom" \
"Sequential (analytic):sequential" \
"Universal teacher (BC source):universal"
MODE="$CHOICE"
echo
ask_choice "Drive" "differential" \
"Differential (2-wheel):differential" \
"Mecanum (4-wheel, omnidirectional):mecanum"
DRIVE="$CHOICE"
echo
ask_choice "World" "field" \
"Rectangular (field):field" \
"Round (field_round):field_round"
WORLD="$CHOICE"
echo
# LiDAR ablation only applies to differential (mecanum proto has its
# own 140° sensor that we don't fork).
if [[ "$DRIVE" == "differential" ]]; then
ask_choice "LiDAR FOV" "140" \
"140° (canonical, ShepherdDog.proto):140" \
"360° (FOV ablation, ShepherdDog360.proto):360"
LIDAR="$CHOICE"
else
LIDAR="140"
echo "${D}LiDAR: 140° (mecanum drive — no 360° proto variant available)${R}"
fi
echo
ask_choice "Number of shepherd dogs" "1" \
"1 — solo:1" \
"2 — axis-split (X-dog + Y-dog):2"
NDOGS="$CHOICE"
echo
if [[ "$NDOGS" == "2" ]]; then
ask_choice "Axis-split leak (soft mask gain on the off-axis)" "0.3" \
"0.0 — strict (each dog only moves on its axis; tends to deadlock):0.0" \
"0.3 — default (off-axis at 30% gain; verified to pen):0.3" \
"0.5 — softer:0.5" \
"1.0 — no mask (both dogs run full policy):1.0"
AXIS_LEAK="$CHOICE"
echo
fi
ask_int "Flock size (number of sheep)" 5 1 10
N_SHEEP="$CHOICE"
echo
ask_choice "Perception" "lidar" \
"LiDAR (canonical):lidar" \
"Ground-truth bypass (HERDING_USE_GT=1):gt"
if [[ "$CHOICE" == "gt" ]]; then USE_GT=1; else USE_GT=0; fi
echo
ask_choice "Seed" "random" \
"Random (different sheep wander each run):random" \
"Fixed seed (reproducible run — pick an integer):fixed"
if [[ "$CHOICE" == "fixed" ]]; then
ask_int " → Seed value" 0 0 1000000
SEED="$CHOICE"
else
SEED=""
fi
echo
ask_choice "Headless?" "no" \
"No — show the Webots window:no" \
"Yes — headless, fast simulation (xvfb-run):yes"
HEADLESS="$CHOICE"
echo
# ----- Summary ------------------------------------------------------
cat <<EOF
${B}${C}── Launch configuration ──────────────────────────────────────────${R}
Mode : ${B}$MODE${R}
Drive : ${B}$DRIVE${R}
World : ${B}$WORLD${R}
LiDAR FOV : ${B}${LIDAR}°${R}
Dogs : ${B}$NDOGS${R}$( [[ "$NDOGS" == "2" ]] && echo " (axis_leak=${B}$AXIS_LEAK${R})" )
Sheep : ${B}$N_SHEEP${R}
Perception : ${B}$( [[ "$USE_GT" == "1" ]] && echo "GT bypass" || echo "LiDAR" )${R}
Seed : ${B}$( [[ -n "$SEED" ]] && echo "$SEED" || echo "random" )${R}
Headless : ${B}$HEADLESS${R}
${C}──────────────────────────────────────────────────────────────────${R}
EOF
printf "${B}Launch? [Y/n] ${R}"
read -r confirm || true
if [[ "$confirm" =~ ^[Nn] ]]; then
echo "Aborted."; exit 0
fi
# ----- Dispatch -----------------------------------------------------
export HERDING_LIDAR="$LIDAR"
export HERDING_NDOGS="$NDOGS"
export HERDING_USE_GT="$USE_GT"
[[ -n "${AXIS_LEAK:-}" ]] && export HERDING_AXIS_LEAK="$AXIS_LEAK"
[[ -n "$SEED" ]] && export HERDING_SEED="$SEED"
if [[ "$HEADLESS" == "yes" ]]; then
export WEBOTS_HEADLESS=1
export WEBOTS_EXTRA_ARGS="--stdout --stderr"
exec xvfb-run -a bash "$SCRIPT_DIR/run_webots.sh" \
"$N_SHEEP" "$MODE" "$DRIVE" "$WORLD"
else
exec bash "$SCRIPT_DIR/run_webots.sh" \
"$N_SHEEP" "$MODE" "$DRIVE" "$WORLD"
fi
+101
View File
@@ -0,0 +1,101 @@
#!/usr/bin/env bash
# Headless Webots sweep across modes, drives, worlds, and flock sizes.
# Runs sequentially; each trial gets a hard 150s wall-clock timeout.
# Results are written to webots_sweep.log (tab-separated) and printed.
#
# Usage: bash tools/webots_sweep.sh [output_log]
set -euo pipefail
ROOT="$( cd "$( dirname "${BASH_SOURCE[0]}" )/.." && pwd )"
OUT="${1:-$ROOT/webots_sweep.log}"
TIMEOUT_S=120 # ~80k steps in fast headless mode — covers slow-converging physics
# Source the project python path. Edit tools/setup_env.sh or override
# HERDING_PYTHON in your shell to point at a Python with SB3+PyTorch.
source "$( cd "$( dirname "${BASH_SOURCE[0]}" )" && pwd )/setup_env.sh"
# Columns: mode drive world n_sheep success steps
printf "%-12s %-14s %-12s %7s %7s %s\n" \
"mode" "drive" "world" "n_sheep" "success" "steps" | tee "$OUT"
printf '%s\n' "$(printf '%-12s %-14s %-12s %7s %7s %s' \
'----' '-----' '-----' '-------' '-------' '-----')" | tee -a "$OUT"
run_trial() {
local mode="$1" drive="$2" world="$3" n="$4" policy_dir="${5:-}"
local done_file="$ROOT/training/.run_done"
rm -f "$done_file"
local extra_env=()
extra_env+=(WEBOTS_HEADLESS=1)
extra_env+=(WEBOTS_EXTRA_ARGS="--stdout --stderr")
extra_env+=(HERDING_USE_GT=0)
if [[ -n "$policy_dir" ]]; then
extra_env+=(HERDING_POLICY_DIR="$ROOT/$policy_dir")
fi
local raw
raw=$(
timeout --kill-after=15s "$TIMEOUT_S" \
xvfb-run -a \
env "${extra_env[@]}" \
bash "$ROOT/tools/run_webots.sh" "$n" "$mode" "$drive" "$world" \
2>&1
) || true
# Webots-bin and Xvfb can survive the timeout; kill any orphans now.
pkill -9 -f "webots-bin|Xvfb" 2>/dev/null || true
sleep 1
local success="FAIL"
local steps="timeout"
if echo "$raw" | grep -q "\[dog\] all .* sheep penned at step"; then
success="OK"
steps=$(echo "$raw" | grep "\[dog\] all .* sheep penned at step" \
| grep -oP 'step \K[0-9]+')
fi
printf "%-12s %-14s %-12s %7s %7s %s\n" \
"$mode" "$drive" "$world" "$n" "$success" "$steps" | tee -a "$OUT"
}
# ---------------------------------------------------------------------------
# Analytic baselines (differential only — that's the story context)
# strombom / sequential: canonical baselines
# universal: the actual teacher used to collect BC demos
# ---------------------------------------------------------------------------
for mode in strombom sequential universal; do
for world in field field_round; do
for n in 5 10; do
run_trial "$mode" differential "$world" "$n"
done
done
done
# ---------------------------------------------------------------------------
# BC — world-specific policies
# ---------------------------------------------------------------------------
for drive in differential mecanum; do
for world in field field_round; do
for n in 5 10; do
run_trial bc "$drive" "$world" "$n" \
"training/runs/bc_${drive}_${world}"
done
done
done
# ---------------------------------------------------------------------------
# RL_FAST — MODE=rl with explicit HERDING_POLICY_DIR pointing to rl_fast dirs
# (run_webots.sh rejects "rl_fast" as a mode; "rl" + policy override is correct)
# ---------------------------------------------------------------------------
for drive in differential mecanum; do
for world in field field_round; do
for n in 5 10; do
run_trial rl "$drive" "$world" "$n" \
"training/runs/rl_fast_${drive}_${world}"
done
done
done
echo ""
echo "Sweep complete. Results saved to: $OUT"
+101
View File
@@ -0,0 +1,101 @@
#!/usr/bin/env bash
# Headless Webots sweep across modes, drives, worlds, and flock sizes.
# Runs sequentially; each trial gets a hard 150s wall-clock timeout.
# Results are written to webots_sweep.log (tab-separated) and printed.
#
# Usage: bash tools/webots_sweep.sh [output_log]
set -euo pipefail
ROOT="$( cd "$( dirname "${BASH_SOURCE[0]}" )/.." && pwd )"
OUT="${1:-$ROOT/webots_sweep.log}"
TIMEOUT_S=120 # ~80k steps in fast headless mode — covers slow-converging physics
# Source the project python path. Edit tools/setup_env.sh or override
# HERDING_PYTHON in your shell to point at a Python with SB3+PyTorch.
source "$( cd "$( dirname "${BASH_SOURCE[0]}" )" && pwd )/setup_env.sh"
# Columns: mode drive world n_sheep success steps
printf "%-12s %-14s %-12s %7s %7s %s\n" \
"mode" "drive" "world" "n_sheep" "success" "steps" | tee "$OUT"
printf '%s\n' "$(printf '%-12s %-14s %-12s %7s %7s %s' \
'----' '-----' '-----' '-------' '-------' '-----')" | tee -a "$OUT"
run_trial() {
local mode="$1" drive="$2" world="$3" n="$4" policy_dir="${5:-}"
local done_file="$ROOT/training/.run_done"
rm -f "$done_file"
local extra_env=()
extra_env+=(WEBOTS_HEADLESS=1)
extra_env+=(WEBOTS_EXTRA_ARGS="--stdout --stderr")
extra_env+=(HERDING_USE_GT=1)
if [[ -n "$policy_dir" ]]; then
extra_env+=(HERDING_POLICY_DIR="$ROOT/$policy_dir")
fi
local raw
raw=$(
timeout --kill-after=15s "$TIMEOUT_S" \
xvfb-run -a \
env "${extra_env[@]}" \
bash "$ROOT/tools/run_webots.sh" "$n" "$mode" "$drive" "$world" \
2>&1
) || true
# Webots-bin and Xvfb can survive the timeout; kill any orphans now.
pkill -9 -f "webots-bin|Xvfb" 2>/dev/null || true
sleep 1
local success="FAIL"
local steps="timeout"
if echo "$raw" | grep -q "\[dog\] all .* sheep penned at step"; then
success="OK"
steps=$(echo "$raw" | grep "\[dog\] all .* sheep penned at step" \
| grep -oP 'step \K[0-9]+')
fi
printf "%-12s %-14s %-12s %7s %7s %s\n" \
"$mode" "$drive" "$world" "$n" "$success" "$steps" | tee -a "$OUT"
}
# ---------------------------------------------------------------------------
# Analytic baselines (differential only — that's the story context)
# strombom / sequential: canonical baselines
# universal: the actual teacher used to collect BC demos
# ---------------------------------------------------------------------------
for mode in strombom sequential universal; do
for world in field field_round; do
for n in 5 10; do
run_trial "$mode" differential "$world" "$n"
done
done
done
# ---------------------------------------------------------------------------
# BC — world-specific policies
# ---------------------------------------------------------------------------
for drive in differential mecanum; do
for world in field field_round; do
for n in 5 10; do
run_trial bc "$drive" "$world" "$n" \
"training/runs/bc_${drive}_${world}"
done
done
done
# ---------------------------------------------------------------------------
# RL_FAST — MODE=rl with explicit HERDING_POLICY_DIR pointing to rl_fast dirs
# (run_webots.sh rejects "rl_fast" as a mode; "rl" + policy override is correct)
# ---------------------------------------------------------------------------
for drive in differential mecanum; do
for world in field field_round; do
for n in 5 10; do
run_trial rl "$drive" "$world" "$n" \
"training/runs/rl_fast_${drive}_${world}"
done
done
done
echo ""
echo "Sweep complete. Results saved to: $OUT"
+65 -37
View File
@@ -1,80 +1,105 @@
# Training and Evaluation Details # Training and evaluation details
This file is the command-level companion to the root README. It focuses Command-level companion to the root README. Covers demo collection,
on data collection, BC, PPO fine-tuning, evaluation flags, and generated behaviour cloning, PPO fine-tuning, and evaluation flags; use the root
artifacts; use the root README for the high-level architecture and README for the high-level architecture and Webots quick start.
Webots demo quick start.
Two stages, strictly sequential: The pipeline is two strictly-sequential stages per `(drive, world)`
combo:
``` ```
sim demos (Strömbom on tracker output, K=4 frame stack) sim demos (universal teacher on tracker output, K=4 frame stack)
bc/pretrain.py ──► runs/bc (Strömbom-imitated MLP) bc/pretrain.py ──► runs/bc_<drive>_<world> (MLP)
▼ KL-regularised PPO fine-tune ▼ KL-regularised PPO fine-tune
runs/rl (deployed `rl` mode — beats BC and Strömbom) runs/rl_<drive>_<world> (deployed `rl` mode)
``` ```
## Files ## Files
``` ```
herding_env.py — Gymnasium env (LiDAR raycast + tracker by default) herding_env.py — Gymnasium env (LiDAR raycast + tracker by default)
bc/collect.py — universal-teacher sim demos
bc/pretrain.py — MSE + cosine BC of (obs, action) demos into MlpPolicy bc/pretrain.py — MSE + cosine BC of (obs, action) demos into MlpPolicy
rl/train.py — KL-regularised PPO fine-tune of a BC checkpoint rl/train.py — KL-regularised PPO fine-tune of a BC checkpoint
rl/train_lstm.py — RecurrentPPO variant (ablation)
eval.py — multi-seed analytic / learned policy comparison eval.py — multi-seed analytic / learned policy comparison
runs/ — checkpoints (whitelisted entries in top-level .gitignore) runs/ — checkpoints (gitignored except for policy.zip)
(Unit + integration tests live in the top-level ``tests/`` directory;
run with ``python -m pytest tests/``.)
``` ```
Unit + integration tests live in the top-level `tests/`. Run with
`make test` or `python -m pytest tests/`.
## End-to-end pipeline ## End-to-end pipeline
The simplest way to run everything is the Makefile at the project The simplest way to train one combo is the project-root Makefile:
root: ``make`` does the full chain, ``make rl`` rebuilds whatever's
needed up to that point, etc. The individual stages below are kept
explicit for cases where you want to tune a single step.
```bash ```bash
# 1. Sim demos with the active-scan + Strömbom teacher under LiDAR make DRIVE=differential WORLD=field # demos → bc → rl → eval
# perception. K=4 frame stack so the MLP has temporal context. make DRIVE=differential WORLD=field_round
python -m training.bc.collect --teacher strombom \ make train_all # all four combos sequentially
--out training/bc/demos.npz --seeds-per-n 15 --subsample 3 --frame-stack 4 ```
# 2. Behaviour-clone. The individual stages below are kept explicit for cases where you
python -m training.bc.pretrain --demos training/bc/demos.npz \ want to tune a single step.
--out training/runs/bc --epochs 60 --net-arch 512,512
```bash
# 1. Sim demos with the active-scan + universal teacher under LiDAR
# perception. K=4 frame stack so the MLP has temporal context.
python -m training.bc.collect \
--teacher universal --drive-mode differential --world field \
--out training/bc/demos_differential_field.npz \
--seeds-per-n 15 --subsample 3 --frame-stack 4
# 2. Behaviour-clone the demos.
python -m training.bc.pretrain \
--demos training/bc/demos_differential_field.npz \
--out training/runs/bc_differential_field \
--epochs 60 --net-arch 512,512
# 3. KL-regularised PPO fine-tune of bc. # 3. KL-regularised PPO fine-tune of bc.
python -m training.rl.train \ python -m training.rl.train \
--bc training/runs/bc --out training/runs/rl \ --bc training/runs/bc_differential_field \
--out training/runs/rl_differential_field \
--drive-mode differential --world field \
--total-timesteps 1000000 --total-timesteps 1000000
# 4. Multi-seed eval (env-side, fast). # 4. Multi-seed eval (env-side, fast).
python -m training.eval --policy training/runs/rl \ python -m training.eval --policy training/runs/rl_differential_field \
--drive-mode differential --world field \
--max-flock 10 --max-steps 15000 --n-seeds 10 --max-flock 10 --max-steps 15000 --n-seeds 10
``` ```
`bc/pretrain.py` saves the **best-val_cos** snapshot, not the final `bc/pretrain.py` saves the **best-val_cos** snapshot, not the final
epoch — multi-modal teachers make training noisy and the last epoch is epoch — multi-modal teachers make training noisy and the last epoch
often worse than an earlier one. is often worse than an earlier one.
`rl/train.py` loads BC weights into both a trainable policy and a `rl/train.py` loads BC weights into both a trainable policy and a
frozen reference, fixes `log_std` small, and adds `β · KL(π‖π_ref)` to frozen reference, fixes `log_std` small, and adds `β · KL(π‖π_ref)` to
the loss so the policy can only move within a trust region around BC. the loss so the policy can only move within a trust region around BC.
See the file header for hyperparameter rationale. See the file header for hyperparameter rationale.
## Available analytic teachers ## Mecanum retraining
For mecanum runs, pass `--use-webots-preset`. Both `collect.py` and
`train.py` detect `--drive-mode mecanum` and switch to the
`HERDING_MEC_WEBOTS` preset, which matches the physical-roller
Webots proto's strafe efficiency (~0.4) and forward bleed (~0.28).
Training without this preset produces a policy that herds in textbook
gym mecanum but not in Webots.
## Analytic teachers
| Name | What it does | Notes | | Name | What it does | Notes |
|---|---|---| |---|---|---|
| `strombom` | Strömbom 2014 — collect when flock is scattered, drive CoM otherwise | Default; works for n=110 under tight cohesion | | `strombom` | Strömbom 2014 — collect when flock is scattered, drive CoM otherwise | Round-world aware (radially-inward fallback when natural target lies outside the curved boundary) |
| `sequential` | Pick the sheep closest to the pen and drive only it | Alternative; needs loose-cohesion regime | | `sequential` | Three-phase: collect, drive, then single-target push for the last 12 stragglers | Alternative to strombom |
| `universal` | Strömbom core + mecanum omega + last-straggler recovery | Used as the BC demo teacher |
Both are wrapped at demo-collection time in All three are wrapped at demo-collection time in
`herding/control/active_scan.py:ActiveScanTeacher`, which adds an `herding/control/active_scan.py:ActiveScanTeacher`, which adds an
opening in-place rotation, walk-to-centre when the LiDAR sees opening in-place rotation, walk-to-centre when the LiDAR sees
nothing, and near-sheep speed modulation (same modulation nothing, and near-sheep speed modulation (same modulation
@@ -83,8 +108,11 @@ inference).
## Evaluating analytic teachers directly ## Evaluating analytic teachers directly
```bash
python -m training.eval --policy strombom \
--drive-mode differential --world field \
--max-flock 10 --max-steps 15000 --n-seeds 10
python -m training.eval --policy sequential \
--drive-mode differential --world field_round \
--max-flock 10 --max-steps 15000 --n-seeds 10
``` ```
python -m training.eval --policy strombom --max-flock 10 --max-steps 15000 --n-seeds 10
python -m training.eval --policy sequential --max-flock 10 --max-steps 15000 --n-seeds 10
```
+112 -26
View File
@@ -8,8 +8,8 @@ the same partial-obs view the student will have at deployment.
Usage:: Usage::
python -m training.bc.collect --teacher strombom \\ python -m training.bc.collect --teacher universal --drive-mode differential \\
--out training/bc/demos.npz --frame-stack 4 --world field --out training/bc/demos_differential_field.npz --frame-stack 4
""" """
from __future__ import annotations from __future__ import annotations
@@ -21,22 +21,9 @@ from pathlib import Path
import numpy as np import numpy as np
# Early CLI parse so we can configure geometry before heavy imports. # Configure field geometry before other herding imports read it at module level.
# (argparse is used again below for the full parse; this is a lightweight from herding.world.geometry import configure_from_args as _configure_from_args
# pre-pass that only reads --world.) _configure_from_args()
_pre_argv = [a for a in os.sys.argv[1:]]
_pre_world = None
for i, a in enumerate(_pre_argv):
if a == "--world" and i + 1 < len(_pre_argv):
_pre_world = _pre_argv[i + 1]
break
if a.startswith("--world="):
_pre_world = a.split("=", 1)[1]
break
if _pre_world is not None:
from herding.world.geometry import configure as _geo_configure
_geo_configure(_pre_world)
os.environ["HERDING_WORLD"] = _pre_world
from herding.control.active_scan import ActiveScanTeacher from herding.control.active_scan import ActiveScanTeacher
from herding.world.geometry import PEN_ENTRY, FIELD_SHAPE from herding.world.geometry import PEN_ENTRY, FIELD_SHAPE
@@ -83,10 +70,17 @@ def _call_teacher(fn, dog_xy, dog_heading, sheep_positions, pen_target,
def collect_one(n_sheep: int, seed: int, max_steps: int, subsample: int, def collect_one(n_sheep: int, seed: int, max_steps: int, subsample: int,
teacher_fn, frame_stack: int = 1, privileged: bool = False, teacher_fn, frame_stack: int = 1, privileged: bool = False,
drive_mode: str = "differential"): drive_mode: str = "differential", herding_cfg=None,
actor_policy=None):
"""Collect (obs, teacher_action) pairs from one episode.
``actor_policy`` (DAgger mode): a callable ``policy(obs) -> action`` that
drives the env. The teacher still labels each visited state. If ``None``
(default), the teacher drives.
"""
env = HerdingEnv(n_sheep=n_sheep, max_steps=max_steps, env = HerdingEnv(n_sheep=n_sheep, max_steps=max_steps,
difficulty=1.0, seed=seed, frame_stack=frame_stack, difficulty=1.0, seed=seed, frame_stack=frame_stack,
drive_mode=drive_mode) drive_mode=drive_mode, herding_cfg=herding_cfg)
obs, _ = env.reset(seed=seed) obs, _ = env.reset(seed=seed)
obs_list, action_list = [], [] obs_list, action_list = [], []
scan_teacher = ActiveScanTeacher(teacher_fn) scan_teacher = ActiveScanTeacher(teacher_fn)
@@ -108,13 +102,16 @@ def collect_one(n_sheep: int, seed: int, max_steps: int, subsample: int,
) )
vx, vy, omega, _mode = result vx, vy, omega, _mode = result
if drive_mode == "mecanum": if drive_mode == "mecanum":
action = np.array([vx, vy, omega], dtype=np.float32) teacher_action = np.array([vx, vy, omega], dtype=np.float32)
else: else:
action = np.array([vx, vy], dtype=np.float32) teacher_action = np.array([vx, vy], dtype=np.float32)
if step % subsample == 0: if step % subsample == 0:
obs_list.append(obs.copy()) obs_list.append(obs.copy())
action_list.append(action.copy()) action_list.append(teacher_action.copy())
obs, _r, term, trunc, _info = env.step(action) # In DAgger mode the policy drives; otherwise the teacher does.
step_action = (actor_policy(obs) if actor_policy is not None
else teacher_action)
obs, _r, term, trunc, _info = env.step(step_action)
if term or trunc: if term or trunc:
break break
success = bool(env.sheep_penned.all()) success = bool(env.sheep_penned.all())
@@ -128,7 +125,9 @@ def collect_one(n_sheep: int, seed: int, max_steps: int, subsample: int,
def main(): def main():
parser = argparse.ArgumentParser() parser = argparse.ArgumentParser()
parser.add_argument("--out", default="training/bc/demos.npz") parser.add_argument("--out", required=True,
help="Output .npz path (convention: "
"training/bc/demos_<drive>_<world>.npz).")
parser.add_argument("--n-sheep-list", default="1,2,3,5,8,10") parser.add_argument("--n-sheep-list", default="1,2,3,5,8,10")
parser.add_argument("--seeds-per-n", type=int, default=15) parser.add_argument("--seeds-per-n", type=int, default=15)
parser.add_argument("--max-steps", type=int, default=30000) parser.add_argument("--max-steps", type=int, default=30000)
@@ -153,6 +152,24 @@ def main():
help="World shape. If not set, uses HERDING_WORLD " help="World shape. If not set, uses HERDING_WORLD "
"env var or defaults to 'field'. Must be set " "env var or defaults to 'field'. Must be set "
"before geometry is imported.") "before geometry is imported.")
# Domain randomisation — applied to the gym env during collection so
# the teacher demonstrates under the same noise the policy will face.
parser.add_argument("--fp-rate", type=float, default=0.0,
help="Mean false-positive detections injected per "
"step (Poisson λ). 0 = clean sim (default).")
parser.add_argument("--action-smooth", type=float, default=0.0,
help="EMA coefficient on dog actions (0 = none). "
"Set to 0.55 to match the Webots controller.")
parser.add_argument("--wheel-slip-std", type=float, default=0.0,
help="Gaussian noise (rad/s) on wheel speeds for "
"mecanum dynamics domain randomisation.")
parser.add_argument("--dagger-policy", default=None,
help="Path to a BC/PPO policy directory. When set, "
"the policy drives the env (DAgger) while the "
"teacher labels every visited state.")
parser.add_argument("--use-webots-preset", action="store_true",
help="Use HERDING_WEBOTS preset (140° FOV + tight "
"tracker). Match this to deployment for DAgger.")
args = parser.parse_args() args = parser.parse_args()
# Validate --world matches geometry (already configured by the # Validate --world matches geometry (already configured by the
@@ -161,6 +178,74 @@ def main():
print(f"[demos] WARNING: --world={args.world} but geometry is " print(f"[demos] WARNING: --world={args.world} but geometry is "
f"'{FIELD_SHAPE}'. This should not happen — file a bug.") f"'{FIELD_SHAPE}'. This should not happen — file a bug.")
from herding.config import (
HerdingConfig, HERDING_WEBOTS, HERDING_MEC_WEBOTS, HERDING_MEC_WEBOTS_360,
DomainRandomConfig, RobotConfig,
)
if args.use_webots_preset:
# Mecanum uses the 360° preset (the deployable mecanum target);
# diff drive keeps the canonical 140° preset.
if args.drive_mode == "mecanum":
base = HERDING_MEC_WEBOTS_360
preset_name = "HERDING_MEC_WEBOTS_360"
else:
base = HERDING_WEBOTS
preset_name = "HERDING_WEBOTS"
# Small compass noise for mecanum training (robustness margin
# for the Webots compass sensor).
compass_std = 0.1 if args.drive_mode == "mecanum" else 0.0
herding_cfg = base.replace(
domain_random=DomainRandomConfig(
fp_rate=args.fp_rate,
wheel_slip_std=args.wheel_slip_std,
compass_noise_std=compass_std,
),
robot=RobotConfig(
action_smooth=args.action_smooth,
strafe_efficiency=base.robot.strafe_efficiency,
strafe_to_forward_bleed=base.robot.strafe_to_forward_bleed,
),
)
print(f"[demos] {preset_name} preset + DR: fp_rate={args.fp_rate} "
f"action_smooth={args.action_smooth} wheel_slip_std={args.wheel_slip_std} "
f"strafe_eff={herding_cfg.robot.strafe_efficiency:.2f} "
f"compass_noise={compass_std}")
else:
herding_cfg = None
if args.fp_rate > 0.0 or args.action_smooth > 0.0 or args.wheel_slip_std > 0.0:
herding_cfg = HerdingConfig(
domain_random=DomainRandomConfig(
fp_rate=args.fp_rate,
wheel_slip_std=args.wheel_slip_std,
),
robot=RobotConfig(action_smooth=args.action_smooth),
)
print(f"[demos] domain-random: fp_rate={args.fp_rate} "
f"action_smooth={args.action_smooth} "
f"wheel_slip_std={args.wheel_slip_std}")
actor_policy = None
if args.dagger_policy is not None:
# DAgger: failures are the most valuable data (off-policy states
# where the student needs teacher correction). Always keep them.
args.keep_failures = True
from stable_baselines3 import PPO
from pathlib import Path as _P
run = _P(args.dagger_policy)
for name in ("policy.zip", "final.zip"):
if (run / name).exists():
zip_path = run / name
break
else:
raise FileNotFoundError(
f"No policy found in {run} (tried policy.zip, final.zip)")
_model = PPO.load(str(zip_path), device="auto")
print(f"[demos] DAgger mode: actor = {zip_path}")
def actor_policy(obs):
obs_b = np.asarray(obs, dtype=np.float32).reshape(1, -1)
a, _ = _model.predict(obs_b, deterministic=True)
return a[0]
teacher_fn = TEACHERS[args.teacher] teacher_fn = TEACHERS[args.teacher]
print(f"[demos] teacher: {args.teacher} world: {FIELD_SHAPE}") print(f"[demos] teacher: {args.teacher} world: {FIELD_SHAPE}")
@@ -177,7 +262,8 @@ def main():
obs, actions, success, total_steps = collect_one( obs, actions, success, total_steps = collect_one(
n, seed, args.max_steps, args.subsample, teacher_fn, n, seed, args.max_steps, args.subsample, teacher_fn,
frame_stack=args.frame_stack, privileged=args.privileged, frame_stack=args.frame_stack, privileged=args.privileged,
drive_mode=args.drive_mode, drive_mode=args.drive_mode, herding_cfg=herding_cfg,
actor_policy=actor_policy,
) )
n_total += 1 n_total += 1
if success: if success:
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
+10 -7
View File
@@ -12,8 +12,8 @@ Output zip is loadable by ``PPO.load(...)`` and consumed by
Usage:: Usage::
python -m training.bc.pretrain \\ python -m training.bc.pretrain \\
--demos training/bc/demos.npz \\ --demos training/bc/demos_differential_field.npz \\
--out training/runs/bc --out training/runs/bc_differential_field
""" """
from __future__ import annotations from __future__ import annotations
@@ -55,7 +55,7 @@ def build_model(net_arch_pi, net_arch_vf, log_std_init: float,
return model, env return model, env
def policy_forward_mean(policy, obs_batch): def forward_mean(policy, obs_batch):
"""Return the deterministic mean action for an obs batch. """Return the deterministic mean action for an obs batch.
SB3's ActorCriticPolicy routes ``forward`` through a Distribution SB3's ActorCriticPolicy routes ``forward`` through a Distribution
@@ -70,8 +70,11 @@ def policy_forward_mean(policy, obs_batch):
def main(): def main():
parser = argparse.ArgumentParser() parser = argparse.ArgumentParser()
parser.add_argument("--demos", default="training/bc/demos.npz") parser.add_argument("--demos", required=True,
parser.add_argument("--out", default="training/runs/bc") help="Path to demos .npz collected by training.bc.collect.")
parser.add_argument("--out", required=True,
help="Output directory (convention: "
"training/runs/bc_<drive>_<world>).")
parser.add_argument("--epochs", type=int, default=60) parser.add_argument("--epochs", type=int, default=60)
parser.add_argument("--batch-size", type=int, default=256) parser.add_argument("--batch-size", type=int, default=256)
parser.add_argument("--lr", type=float, default=1e-3) parser.add_argument("--lr", type=float, default=1e-3)
@@ -174,7 +177,7 @@ def main():
ob_batch = ob_batch.to(args.device) ob_batch = ob_batch.to(args.device)
act_batch = act_batch.to(args.device) act_batch = act_batch.to(args.device)
optimizer.zero_grad() optimizer.zero_grad()
mean_action = policy_forward_mean(policy, ob_batch) mean_action = forward_mean(policy, ob_batch)
loss, mse_val, cos_val = combined_loss(mean_action, act_batch) loss, mse_val, cos_val = combined_loss(mean_action, act_batch)
loss.backward() loss.backward()
optimizer.step() optimizer.step()
@@ -193,7 +196,7 @@ def main():
for ob_batch, act_batch in val_loader: for ob_batch, act_batch in val_loader:
ob_batch = ob_batch.to(args.device) ob_batch = ob_batch.to(args.device)
act_batch = act_batch.to(args.device) act_batch = act_batch.to(args.device)
mean_action = policy_forward_mean(policy, ob_batch) mean_action = forward_mean(policy, ob_batch)
bs = ob_batch.size(0) bs = ob_batch.size(0)
val_total += nn.functional.mse_loss( val_total += nn.functional.mse_loss(
mean_action, act_batch, reduction="sum", mean_action, act_batch, reduction="sum",
+38 -19
View File
@@ -18,21 +18,9 @@ from statistics import mean
import numpy as np import numpy as np
# Early CLI pre-parse for --world so geometry is configured before # Configure field geometry before other herding imports read it at module level.
# other herding.* modules are imported. from herding.world.geometry import configure_from_args as _configure_from_args
_pre_argv = [a for a in os.sys.argv[1:]] _configure_from_args()
_pre_world = None
for i, a in enumerate(_pre_argv):
if a == "--world" and i + 1 < len(_pre_argv):
_pre_world = _pre_argv[i + 1]
break
if a.startswith("--world="):
_pre_world = a.split("=", 1)[1]
break
if _pre_world is not None:
from herding.world.geometry import configure as _geo_configure
_geo_configure(_pre_world)
os.environ["HERDING_WORLD"] = _pre_world
from herding.world.geometry import MAX_SHEEP, PEN_ENTRY from herding.world.geometry import MAX_SHEEP, PEN_ENTRY
from herding.control.sequential import compute_action as sequential_action from herding.control.sequential import compute_action as sequential_action
@@ -71,16 +59,36 @@ def make_strombom_predictor(drive_mode: str = "differential"):
return make_analytic_predictor(strombom_action, drive_mode) return make_analytic_predictor(strombom_action, drive_mode)
def make_policy_predictor(model, vecnorm): def make_policy_predictor(model, vecnorm, recurrent: bool = False):
state = {"lstm": None, "first": True}
def _predict(_env, obs): def _predict(_env, obs):
obs_b = np.asarray(obs, dtype=np.float32).reshape(1, -1) obs_b = np.asarray(obs, dtype=np.float32).reshape(1, -1)
if vecnorm is not None: if vecnorm is not None:
obs_b = vecnorm.normalize_obs(obs_b) obs_b = vecnorm.normalize_obs(obs_b)
action, _ = model.predict(obs_b, deterministic=True) if recurrent:
episode_start = np.array([state["first"]], dtype=bool)
action, new_state = model.predict(
obs_b, state=state["lstm"], episode_start=episode_start,
deterministic=True,
)
state["lstm"] = new_state
state["first"] = False
else:
action, _ = model.predict(obs_b, deterministic=True)
return action[0] return action[0]
return _predict return _predict
def _reset_recurrent(predict_fn):
"""Reset the recurrent state between episodes."""
# The closure stores `state` dict; reach in via __closure__.
for cell in predict_fn.__closure__ or []:
if isinstance(cell.cell_contents, dict) and "lstm" in cell.cell_contents:
cell.cell_contents["lstm"] = None
cell.cell_contents["first"] = True
return
def main(): def main():
parser = argparse.ArgumentParser() parser = argparse.ArgumentParser()
parser.add_argument("--policy", required=True, parser.add_argument("--policy", required=True,
@@ -122,7 +130,17 @@ def main():
f"No checkpoint found in {run} " f"No checkpoint found in {run} "
f"(tried policy.zip, final.zip)" f"(tried policy.zip, final.zip)"
) )
model = PPO.load(str(zip_path), device="auto") # Try RecurrentPPO first (sb3-contrib) for LSTM policies, then
# fall back to PPO for MLP policies.
recurrent = False
model = None
try:
from sb3_contrib import RecurrentPPO
model = RecurrentPPO.load(str(zip_path), device="auto")
recurrent = True
print(f"[eval] loaded RecurrentPPO (LSTM) policy")
except Exception:
model = PPO.load(str(zip_path), device="auto")
from herding.perception.obs import OBS_DIM as _SINGLE from herding.perception.obs import OBS_DIM as _SINGLE
policy_obs_dim = int(model.observation_space.shape[0]) policy_obs_dim = int(model.observation_space.shape[0])
if policy_obs_dim % _SINGLE == 0 and policy_obs_dim // _SINGLE >= 1: if policy_obs_dim % _SINGLE == 0 and policy_obs_dim // _SINGLE >= 1:
@@ -139,7 +157,7 @@ def main():
vecnorm = pickle.load(f) vecnorm = pickle.load(f)
vecnorm.training = False vecnorm.training = False
vecnorm.norm_reward = False vecnorm.norm_reward = False
predict = make_policy_predictor(model, vecnorm) predict = make_policy_predictor(model, vecnorm, recurrent=recurrent)
# Infer drive_mode from policy action dim if using a learned policy. # Infer drive_mode from policy action dim if using a learned policy.
if args.policy not in ("strombom", "sequential"): if args.policy not in ("strombom", "sequential"):
@@ -161,6 +179,7 @@ def main():
env = HerdingEnv(n_sheep=n, max_steps=args.max_steps, env = HerdingEnv(n_sheep=n, max_steps=args.max_steps,
difficulty=args.difficulty, seed=seed, difficulty=args.difficulty, seed=seed,
frame_stack=frame_stack, drive_mode=drive_mode) frame_stack=frame_stack, drive_mode=drive_mode)
_reset_recurrent(predict)
r = rollout(env, predict, args.max_steps) r = rollout(env, predict, args.max_steps)
successes.append(int(r["success"])) successes.append(int(r["success"]))
steps.append(r["steps"]) steps.append(r["steps"])
+89 -5
View File
@@ -28,7 +28,7 @@ from gymnasium import spaces
from herding.world.diffdrive import ( from herding.world.diffdrive import (
heading_speed_to_wheels, kinematics_step, heading_speed_to_wheels, kinematics_step,
mecanum_kinematics_step, velocity_to_mecanum_wheels, velocity_to_wheels, mecanum_step, velocity_to_mecanum_wheels, velocity_to_wheels,
) )
from herding.world.flocking_sim import ( from herding.world.flocking_sim import (
FLEE_SPEED, MAX_SPEED, WANDER_SPEED, compute_heading_speed, FLEE_SPEED, MAX_SPEED, WANDER_SPEED, compute_heading_speed,
@@ -40,13 +40,14 @@ from herding.world.geometry import (
GATE_X, GATE_Y, MAX_SHEEP, GATE_X, GATE_Y, MAX_SHEEP,
PEN_ENTRY, PEN_X, PEN_Y, PEN_ENTRY, PEN_X, PEN_Y,
SHEEP_MAX_WHEEL_OMEGA, SHEEP_WHEEL_BASE, SHEEP_WHEEL_RADIUS, SHEEP_MAX_WHEEL_OMEGA, SHEEP_WHEEL_BASE, SHEEP_WHEEL_RADIUS,
WEBOTS_DT, clip_to_field, is_penned_position, WEBOTS_DT, clip_to_field, is_penned,
) )
from herding.perception.lidar_perception import detections_from_scan from herding.perception.lidar_perception import detections_from_scan
from herding.perception.lidar_sim import simulate_scan from herding.perception.lidar_sim import simulate_scan
from herding.perception.obs import OBS_DIM, build_obs from herding.perception.obs import OBS_DIM, build_obs
from herding.perception.sheep_tracker import SheepTracker from herding.perception.sheep_tracker import SheepTracker
from herding.control.strombom import compute_action as strombom_action from herding.control.strombom import compute_action as strombom_action
from herding.config import HerdingConfig
class HerdingEnv(gym.Env): class HerdingEnv(gym.Env):
@@ -87,13 +88,24 @@ class HerdingEnv(gym.Env):
use_lidar: bool = True, use_lidar: bool = True,
frame_stack: int = 1, frame_stack: int = 1,
drive_mode: str = "differential", drive_mode: str = "differential",
herding_cfg: Optional[HerdingConfig] = None,
): ):
super().__init__() super().__init__()
# Store the config; fall back to defaults when None.
self._herding_cfg = herding_cfg
# Apply robot config overrides — these shadow the class attributes
# so that per-instance configuration is possible without touching
# the class-level defaults used by unconfigured instances.
if herding_cfg is not None:
self.ACTION_SMOOTH = herding_cfg.robot.action_smooth
# ``use_lidar=True`` (default): obs and imitation-reward teacher # ``use_lidar=True`` (default): obs and imitation-reward teacher
# see only LiDAR-perceived positions via a tracker, matching the # see only LiDAR-perceived positions via a tracker, matching the
# Webots controller. ``False`` exposes ground truth for ablation. # Webots controller. ``False`` exposes ground truth for ablation.
self._use_lidar = bool(use_lidar) self._use_lidar = bool(use_lidar)
self._tracker = SheepTracker() if self._use_lidar else None tracker_cfg = herding_cfg.tracker if herding_cfg is not None else None
self._tracker = SheepTracker(tracker_cfg=tracker_cfg) if self._use_lidar else None
self._np_rng_lidar: Optional[np.random.Generator] = None self._np_rng_lidar: Optional[np.random.Generator] = None
# Frame stacking: the policy receives the last K obs concatenated, # Frame stacking: the policy receives the last K obs concatenated,
@@ -261,6 +273,14 @@ class HerdingEnv(gym.Env):
vx, vy = float(self.smoothed_action[0]), float(self.smoothed_action[1]) vx, vy = float(self.smoothed_action[0]), float(self.smoothed_action[1])
omega = float(self.smoothed_action[2]) if self._action_dim >= 3 else 0.0 omega = float(self.smoothed_action[2]) if self._action_dim >= 3 else 0.0
# Domain randomisation: compass (heading) noise.
dr = (self._herding_cfg.domain_random
if self._herding_cfg is not None else None)
slip_std = dr.wheel_slip_std if dr is not None else 0.0
if dr is not None and dr.compass_noise_std > 0.0 and self._np_rng_lidar is not None:
self.dog_heading = float(self.dog_heading + self._np_rng_lidar.normal(
0.0, dr.compass_noise_std))
# Safety supervisor — dog stays north of the gate. # Safety supervisor — dog stays north of the gate.
if self.dog_y < DOG_SOUTH_LIMIT and vy < 0.0: if self.dog_y < DOG_SOUTH_LIMIT and vy < 0.0:
vx, vy = 0.0, 1.0 vx, vy = 0.0, 1.0
@@ -276,12 +296,22 @@ class HerdingEnv(gym.Env):
k_turn=4.0, k_turn=4.0,
wheel_base=DOG_WHEEL_BASE, wheel_base=DOG_WHEEL_BASE,
) )
self.dog_x, self.dog_y, self.dog_heading = mecanum_kinematics_step( robot_cfg = (self._herding_cfg.robot
if self._herding_cfg is not None else None)
strafe_efficiency = (robot_cfg.strafe_efficiency
if robot_cfg is not None else 1.0)
strafe_bleed = (robot_cfg.strafe_to_forward_bleed
if robot_cfg is not None else 0.0)
self.dog_x, self.dog_y, self.dog_heading = mecanum_step(
self.dog_x, self.dog_y, self.dog_heading, self.dog_x, self.dog_y, self.dog_heading,
w_fl, w_fr, w_rl, w_rr, w_fl, w_fr, w_rl, w_rr,
DOG_WHEEL_RADIUS, DOG_WHEEL_RADIUS,
DOG_WHEEL_BASE_X / 2.0, DOG_WHEEL_BASE_Y / 2.0, DOG_WHEEL_BASE_X / 2.0, DOG_WHEEL_BASE_Y / 2.0,
WEBOTS_DT, WEBOTS_DT,
slip_std=slip_std,
rng=self._np_rng_lidar,
strafe_efficiency=strafe_efficiency,
strafe_to_forward_bleed=strafe_bleed,
) )
else: else:
wL, wR = velocity_to_wheels( wL, wR = velocity_to_wheels(
@@ -294,6 +324,8 @@ class HerdingEnv(gym.Env):
self.dog_x, self.dog_y, self.dog_heading = kinematics_step( self.dog_x, self.dog_y, self.dog_heading = kinematics_step(
self.dog_x, self.dog_y, self.dog_heading, self.dog_x, self.dog_y, self.dog_heading,
wL, wR, DOG_WHEEL_RADIUS, DOG_WHEEL_BASE, WEBOTS_DT, wL, wR, DOG_WHEEL_RADIUS, DOG_WHEEL_BASE, WEBOTS_DT,
slip_std=slip_std,
rng=self._np_rng_lidar,
) )
self.dog_x, self.dog_y = clip_to_field(self.dog_x, self.dog_y, margin=0.3) self.dog_x, self.dog_y = clip_to_field(self.dog_x, self.dog_y, margin=0.3)
# Extra constraint: dog stays north of the gate area. # Extra constraint: dog stays north of the gate area.
@@ -305,7 +337,7 @@ class HerdingEnv(gym.Env):
self._step_one_sheep(i) self._step_one_sheep(i)
for i in range(self.n_sheep): for i in range(self.n_sheep):
if (not self.sheep_penned[i] if (not self.sheep_penned[i]
and is_penned_position(self.sheep_x[i], self.sheep_y[i])): and is_penned(self.sheep_x[i], self.sheep_y[i])):
self.sheep_penned[i] = True self.sheep_penned[i] = True
# LiDAR perception runs after sheep move; feeds the obs and the # LiDAR perception runs after sheep move; feeds the obs and the
@@ -460,16 +492,68 @@ class HerdingEnv(gym.Env):
for i in range(self.n_sheep)] for i in range(self.n_sheep)]
def _update_tracker(self) -> None: def _update_tracker(self) -> None:
lidar_cfg = (self._herding_cfg.lidar
if self._herding_cfg is not None else None)
detection_cfg = (self._herding_cfg.detection
if self._herding_cfg is not None else None)
ranges = simulate_scan( ranges = simulate_scan(
self.dog_x, self.dog_y, self.dog_heading, self.dog_x, self.dog_y, self.dog_heading,
self._all_sheep_xy(), self._all_sheep_xy(),
rng=self._np_rng_lidar, rng=self._np_rng_lidar,
lidar_cfg=lidar_cfg,
) )
detections = detections_from_scan( detections = detections_from_scan(
ranges, self.dog_x, self.dog_y, self.dog_heading, ranges, self.dog_x, self.dog_y, self.dog_heading,
detection_cfg=detection_cfg,
lidar_cfg=lidar_cfg,
) )
# Domain randomisation: inject false-positive detections near static
# features to mimic the spurious clusters Webots' physical LiDAR
# produces from real 3D geometry (walls, posts, fence rails).
dr = (self._herding_cfg.domain_random
if self._herding_cfg is not None else None)
if dr is not None and dr.fp_rate > 0.0 and self._np_rng_lidar is not None:
detections = list(detections)
detections.extend(
self._sample_false_positives(dr.fp_rate, dr.fp_std_pos))
self._tracker.update(detections) self._tracker.update(detections)
# Static feature anchor points used for FP injection.
# The rectangular list covers gate posts and field corners; the round
# list covers just the gate posts (the circular wall is handled separately).
_FP_ANCHORS_RECT = (
(10.0, -15.0), (13.0, -15.0), # gate posts
(15.0, 15.0), (15.0, -15.0),
(-15.0, 15.0), (-15.0, -15.0), # field corners
(15.0, 0.0), (-15.0, 0.0), # mid-wall returns
(0.0, 15.0), (0.0, -15.0),
)
_FP_ANCHORS_ROUND = (
(0.0, -15.0), # gate centre
(-1.5, -15.0), (1.5, -15.0), # gate posts
(0.0, 15.0), # north wall
(10.6, -10.6), (-10.6, -10.6), # circular wall quadrants
)
def _sample_false_positives(
self, fp_rate: float, fp_std: float,
) -> list[tuple[float, float]]:
"""Return a list of spurious (x, y) detections for this step."""
from herding.world.geometry import FIELD_SHAPE
anchors = (self._FP_ANCHORS_ROUND
if FIELD_SHAPE == "field_round"
else self._FP_ANCHORS_RECT)
n_fps = int(self._np_rng_lidar.poisson(fp_rate))
if n_fps == 0:
return []
fps = []
chosen = self._np_rng_lidar.integers(0, len(anchors), size=n_fps)
noise = self._np_rng_lidar.normal(0.0, fp_std, size=(n_fps, 2))
for k in range(n_fps):
ax, ay = anchors[chosen[k]]
fps.append((float(ax + noise[k, 0]), float(ay + noise[k, 1])))
return fps
def perceived_positions(self) -> dict[str, tuple[float, float]]: def perceived_positions(self) -> dict[str, tuple[float, float]]:
"""What the controller would "see" this step: tracker output in """What the controller would "see" this step: tracker output in
LiDAR mode, ground truth in privileged mode. Used by demo LiDAR mode, ground truth in privileged mode. Used by demo
+66 -20
View File
@@ -23,22 +23,9 @@ import argparse
import os import os
from pathlib import Path from pathlib import Path
# Early CLI pre-parse for --world so geometry is configured before any # Configure field geometry before other herding imports read it at module level.
# herding.* / training.* import binds geometry constants. Matches the from herding.world.geometry import configure_from_args as _configure_from_args
# pattern used by training.bc.collect and training.eval. _configure_from_args()
_pre_argv = [a for a in os.sys.argv[1:]]
_pre_world = None
for i, a in enumerate(_pre_argv):
if a == "--world" and i + 1 < len(_pre_argv):
_pre_world = _pre_argv[i + 1]
break
if a.startswith("--world="):
_pre_world = a.split("=", 1)[1]
break
if _pre_world is not None:
from herding.world.geometry import configure as _geo_configure
_geo_configure(_pre_world)
os.environ["HERDING_WORLD"] = _pre_world
import numpy as np import numpy as np
import torch as th import torch as th
@@ -59,11 +46,12 @@ from training.herding_env import HerdingEnv
def _make_env(rank: int, seed: int, frame_stack: int, def _make_env(rank: int, seed: int, frame_stack: int,
drive_mode: str = "differential", drive_mode: str = "differential",
difficulty: float = 1.0, difficulty: float = 1.0,
max_n_sheep: int = 10): max_n_sheep: int = 10,
herding_cfg=None):
def _thunk(): def _thunk():
env = HerdingEnv(seed=seed + rank, frame_stack=frame_stack, env = HerdingEnv(seed=seed + rank, frame_stack=frame_stack,
drive_mode=drive_mode, difficulty=difficulty, drive_mode=drive_mode, difficulty=difficulty,
max_n_sheep=max_n_sheep) max_n_sheep=max_n_sheep, herding_cfg=herding_cfg)
env = Monitor(env, info_keywords=("is_success", "n_sheep", "n_penned")) env = Monitor(env, info_keywords=("is_success", "n_sheep", "n_penned"))
return env return env
return _thunk return _thunk
@@ -241,6 +229,13 @@ def main() -> None:
choices=["field", "field_round"], choices=["field", "field_round"],
help="World shape. If not set, uses HERDING_WORLD " help="World shape. If not set, uses HERDING_WORLD "
"env var or defaults to 'field'.") "env var or defaults to 'field'.")
# Domain randomisation
parser.add_argument("--fp-rate", type=float, default=0.0,
help="Mean false-positive detections per step (Poisson λ).")
parser.add_argument("--action-smooth", type=float, default=0.0,
help="EMA on dog actions (0=none, 0.55=Webots match).")
parser.add_argument("--wheel-slip-std", type=float, default=0.0,
help="Gaussian wheel-speed noise std (rad/s).")
args = parser.parse_args() args = parser.parse_args()
# --world was already honoured in the early pre-parse above; here we # --world was already honoured in the early pre-parse above; here we
# just sanity-check that the final argparse view agrees. # just sanity-check that the final argparse view agrees.
@@ -280,15 +275,66 @@ def main() -> None:
drive_mode = "differential" drive_mode = "differential"
print(f"[rl] drive_mode={drive_mode} (BC action_dim={bc_action_dim})") print(f"[rl] drive_mode={drive_mode} (BC action_dim={bc_action_dim})")
from herding.config import (
HerdingConfig, HERDING_MEC_WEBOTS_360, DomainRandomConfig, RobotConfig,
)
herding_cfg = None
# Mecanum trains under HERDING_MEC_WEBOTS_360 (360° LiDAR +
# kinematic-matched strafe scaling + small compass-noise DR).
is_mecanum = (drive_mode == "mecanum")
if is_mecanum or args.fp_rate > 0.0 or args.action_smooth > 0.0 or args.wheel_slip_std > 0.0:
if is_mecanum:
base = HERDING_MEC_WEBOTS_360
strafe_eff = base.robot.strafe_efficiency
strafe_bleed = base.robot.strafe_to_forward_bleed
compass_std = 0.1 # heading robustness DR
else:
base = None
strafe_eff = 1.0
strafe_bleed = 0.0
compass_std = 0.0
if is_mecanum:
herding_cfg = base.replace(
domain_random=DomainRandomConfig(
fp_rate=args.fp_rate,
wheel_slip_std=args.wheel_slip_std,
compass_noise_std=compass_std,
),
robot=RobotConfig(
action_smooth=args.action_smooth,
strafe_efficiency=strafe_eff,
strafe_to_forward_bleed=strafe_bleed,
),
)
else:
herding_cfg = HerdingConfig(
domain_random=DomainRandomConfig(
fp_rate=args.fp_rate,
wheel_slip_std=args.wheel_slip_std,
),
robot=RobotConfig(
action_smooth=args.action_smooth,
strafe_efficiency=strafe_eff,
strafe_to_forward_bleed=strafe_bleed,
),
)
print(f"[rl] domain-random: fp_rate={args.fp_rate} "
f"action_smooth={args.action_smooth} "
f"wheel_slip_std={args.wheel_slip_std} "
f"strafe_eff={strafe_eff:.2f} strafe_bleed={strafe_bleed:.2f} "
f"compass_noise={compass_std}")
env_fns = [_make_env(i, args.seed, frame_stack, drive_mode, env_fns = [_make_env(i, args.seed, frame_stack, drive_mode,
difficulty=args.difficulty, difficulty=args.difficulty,
max_n_sheep=args.max_n_sheep) max_n_sheep=args.max_n_sheep,
herding_cfg=herding_cfg)
for i in range(args.n_envs)] for i in range(args.n_envs)]
venv = SubprocVecEnv(env_fns) if args.n_envs > 1 else DummyVecEnv(env_fns) venv = SubprocVecEnv(env_fns) if args.n_envs > 1 else DummyVecEnv(env_fns)
eval_venv = DummyVecEnv([_make_env(99, args.seed + 999, frame_stack, eval_venv = DummyVecEnv([_make_env(99, args.seed + 999, frame_stack,
drive_mode, drive_mode,
difficulty=args.difficulty, difficulty=args.difficulty,
max_n_sheep=args.max_n_sheep)]) max_n_sheep=args.max_n_sheep,
herding_cfg=herding_cfg)])
print(f"[rl] difficulty={args.difficulty} max_n_sheep={args.max_n_sheep}") print(f"[rl] difficulty={args.difficulty} max_n_sheep={args.max_n_sheep}")
# Reward-shaping overrides (broadcast to every env instance). # Reward-shaping overrides (broadcast to every env instance).
+174
View File
@@ -0,0 +1,174 @@
"""Recurrent-PPO (LSTM) policy trainer for the herding env.
Motivation
----------
The MLP+frame-stack policy struggles with partial observability under
the 140° Webots LiDAR: the tracker briefly empties when the dog turns,
and sporadic FP tracks at static features confuse the policy. An LSTM
gives the policy unbounded temporal memory so it can:
* keep modelling sheep positions when the tracker briefly drops them,
* distinguish persistent (real) tracks from intermittent (phantom) ones.
This is the literature-correct fix for partial-observability + noisy
perception. Trains from scratch (no BC init) using vanilla PPO without
the KL-to-reference term (no reference exists when starting clean).
Usage
-----
python -m training.rl.train_lstm \\
--out training/runs/lstm_differential_field \\
--drive-mode differential --world field \\
--total-timesteps 3000000 \\
--use-webots-preset --fp-rate 0.0 --action-smooth 0.55
Frame stack is forced to 1 since the LSTM provides its own memory.
"""
from __future__ import annotations
import argparse
import os
import time
from pathlib import Path
import numpy as np
# Configure field geometry before other herding imports read it at module level.
from herding.world.geometry import configure_from_args as _configure_from_args
_configure_from_args()
from sb3_contrib import RecurrentPPO
from stable_baselines3.common.callbacks import EvalCallback
from stable_baselines3.common.vec_env import DummyVecEnv, SubprocVecEnv
from herding.world.geometry import MAX_SHEEP
from training.herding_env import HerdingEnv
def _make_env(rank: int, seed: int, drive_mode: str, difficulty: float,
max_n_sheep: int, herding_cfg):
def _init():
env = HerdingEnv(
max_n_sheep=max_n_sheep, difficulty=difficulty,
seed=seed + rank, frame_stack=1, drive_mode=drive_mode,
herding_cfg=herding_cfg,
)
return env
return _init
def main():
parser = argparse.ArgumentParser()
parser.add_argument("--out", required=True,
help="Output directory for the LSTM policy.")
parser.add_argument("--total-timesteps", type=int, default=3_000_000)
parser.add_argument("--n-envs", type=int, default=8)
parser.add_argument("--n-steps", type=int, default=256)
parser.add_argument("--lstm-hidden", type=int, default=128)
parser.add_argument("--lr", type=float, default=3e-4)
parser.add_argument("--seed", type=int, default=0)
parser.add_argument("--max-n-sheep", type=int, default=MAX_SHEEP)
parser.add_argument("--difficulty", type=float, default=1.0)
parser.add_argument("--drive-mode", default="differential",
choices=["differential", "mecanum"])
parser.add_argument("--world", default=None,
choices=["field", "field_round"])
parser.add_argument("--fp-rate", type=float, default=0.0)
parser.add_argument("--action-smooth", type=float, default=0.55)
parser.add_argument("--wheel-slip-std", type=float, default=0.05)
parser.add_argument("--use-webots-preset", action="store_true",
help="Train in the HERDING_WEBOTS env (140° FOV + tight tracker).")
parser.add_argument("--device", default="cpu")
args = parser.parse_args()
from herding.config import HerdingConfig, HERDING_WEBOTS, DomainRandomConfig, RobotConfig
if args.use_webots_preset:
herding_cfg = HERDING_WEBOTS.replace(
domain_random=DomainRandomConfig(
fp_rate=args.fp_rate,
wheel_slip_std=args.wheel_slip_std,
),
robot=RobotConfig(action_smooth=args.action_smooth),
)
print(f"[lstm] HERDING_WEBOTS preset + DR: fp_rate={args.fp_rate}")
else:
herding_cfg = None
if args.fp_rate > 0.0 or args.action_smooth > 0.0 or args.wheel_slip_std > 0.0:
herding_cfg = HerdingConfig(
domain_random=DomainRandomConfig(
fp_rate=args.fp_rate,
wheel_slip_std=args.wheel_slip_std,
),
robot=RobotConfig(action_smooth=args.action_smooth),
)
env_fns = [_make_env(i, args.seed, args.drive_mode, args.difficulty,
args.max_n_sheep, herding_cfg)
for i in range(args.n_envs)]
venv = SubprocVecEnv(env_fns) if args.n_envs > 1 else DummyVecEnv(env_fns)
eval_venv = DummyVecEnv([_make_env(99, args.seed + 999, args.drive_mode,
args.difficulty, args.max_n_sheep,
herding_cfg)])
out = Path(args.out)
out.mkdir(parents=True, exist_ok=True)
print(f"[lstm] drive_mode={args.drive_mode} world={os.environ.get('HERDING_WORLD', 'field')}")
print(f"[lstm] total_timesteps={args.total_timesteps} n_envs={args.n_envs} "
f"lr={args.lr} lstm_hidden={args.lstm_hidden}")
model = RecurrentPPO(
"MlpLstmPolicy", venv,
learning_rate=args.lr,
n_steps=args.n_steps,
batch_size=args.n_steps, # full rollout = one batch (matches LSTM episode boundaries)
n_epochs=4,
gamma=0.99,
gae_lambda=0.95,
clip_range=0.2,
ent_coef=0.0,
max_grad_norm=0.5,
policy_kwargs=dict(
net_arch=dict(pi=[256, 256], vf=[256, 256]),
lstm_hidden_size=args.lstm_hidden,
n_lstm_layers=1,
shared_lstm=False,
enable_critic_lstm=True,
),
device=args.device,
verbose=1,
seed=args.seed,
tensorboard_log=str(out / "tb"),
)
eval_cb = EvalCallback(
eval_venv,
best_model_save_path=str(out / "best"),
log_path=str(out / "evals"),
eval_freq=max(args.n_steps * args.n_envs, 20_000) // args.n_envs,
n_eval_episodes=5,
deterministic=True,
render=False,
)
t0 = time.time()
model.learn(total_timesteps=args.total_timesteps, callback=eval_cb,
progress_bar=True)
print(f"[lstm] training done in {time.time() - t0:.0f}s")
# Save best (by eval) if it exists; otherwise save final.
best = out / "best" / "best_model.zip"
if best.exists():
import shutil
shutil.copy(best, out / "policy.zip")
print(f"[lstm] best snapshot → {out / 'policy.zip'}")
else:
model.save(str(out / "policy.zip"))
print(f"[lstm] no eval beat init; final snapshot → {out / 'policy.zip'}")
model.save(str(out / "final.zip"))
if __name__ == "__main__":
main()
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.