Commit Graph

16 Commits

Author SHA1 Message Date
Johnny Fernandes eadeeafb32 Dual-shepherd soft axis-split (HERDING_AXIS_LEAK)
The strict 100/0 axis mask reaches drive standoff and deadlocks
because each dog has only one degree of freedom left to push the
flock. Soften the mask: each dog leads its assigned axis (full gain)
and contributes ``HERDING_AXIS_LEAK`` on the other axis. ``0.0`` is
the old strict behaviour; ``1.0`` is no mask (both dogs run full
policy, role-redundant). Default ``0.3`` breaks the deadlock while
preserving the "one dog per axis" coordination story.

Implementation:
* `controllers/shepherd_dog/shepherd_dog.py` reads
  `HERDING_AXIS_LEAK` from env / runtime cfg (clamped to [0, 1]),
  prints it next to the axis tag, and multiplies the off-axis
  velocity component by it instead of zeroing.
* `tools/run_webots.sh` writes `HERDING_AXIS_LEAK` into
  `herding_runtime.cfg` so Webots-stripped controller subprocesses
  still see it; defaults to 0.3 when unset.

Webots smoke test (HERDING_NDOGS=2, HERDING_AXIS_LEAK=0.3, strombom,
diff/field, 5 sheep, LiDAR perception, no GT): **5/5 penned at step
13204**, vs the strict 100/0 mask which timed out at 0/5. Penning
trail 1/5 → 2/5 → 4/5 → 5/5 between steps 6200 and 13400 — slower
than single-dog (Strömbom diff/field n=5: 7528) as expected since
the work is split, but the coordination demonstrably succeeds.

This gives the writeup a clean three-row ablation:
  α=0.0  (strict)  → deadlock, 0/5
  α=0.3  (default) → 5/5 @ 13204
  α=1.0  (no mask) → both dogs run full policy (single-dog
                     baseline applied twice; no axis story)

126 pytest cases still pass.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-17 02:43:40 +00:00
Johnny Fernandes cfbf4a0267 Dual-shepherd axis-split (HERDING_NDOGS=2)
The launcher can now spawn two `ShepherdDog` robots, each masked to a
single axis of motion, so the herding workload is split orthogonally.

Mechanic:
* `HERDING_NDOGS=2` (default 1) tells `tools/run_webots.sh` to replace
  the single-dog node in the generated test world with two copies:
  - `ShepherdDogX` at (-4, -10), `customData "axis=x"`
  - `ShepherdDogY` at (+4, -10), `customData "axis=y"`
  Each spawn position sits south of the field interior so the pair
  doesn't collide with starting sheep.
* `controllers/shepherd_dog/shepherd_dog.py` reads `getCustomData()`
  at startup; when `axis=x|y` it zeroes the off-axis component of every
  action *after* speed modulation and *before* EMA smoothing. With
  `customData` empty the controller behaves identically to single-dog
  mode, so all existing launches are unaffected.
* The dog's emitter line now carries the robot's name
  (`dog:ShepherdDogX:x:y`), and `controllers/sheep/sheep.py` keeps a
  `dogs` dict keyed by name, picking the closest one each step for
  its flee target. Single-dog runs still use the legacy two-field
  `dog:x:y` format thanks to a length check.
* `HERDING_NDOGS` is written into `herding_runtime.cfg` and exported
  to subprocesses so future tooling can read it.

Verified behaviour in Webots smoke tests (HERDING_NDOGS=2, strombom,
diff/field, 5 sheep): both dogs spawn with the expected names and
axis tags, the dual-dog status print appears, each dog acts only on
its assigned axis early in the trial, and the masking is internally
consistent. The pair stalls before penning under pure axis-split
because each dog reaches its drive standoff and then has only one
degree of freedom — useful research finding for the write-up;
coordination strategy (shared CoM, role-switching, etc.) is future
work.

126 pytest cases still pass.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-17 02:35:38 +00:00
Johnny Fernandes d00da52c3c Portable Python env + 360° LiDAR ablation flag
Two small features.

(1) Portable interpreter
* `tools/setup_env.sh` exports HERDING_PYTHON (default points to the
  project's conda env; override in your shell to retarget).
* Both `controllers/*/runtime.ini` files now use Webots' env-var
  expansion: `COMMAND = $(HERDING_PYTHON)` so the Webots-launched
  controllers pick up the same interpreter as the bash scripts.
* `tools/run_webots.sh`, `tools/webots_sweep{,_gt}.sh` and
  `tools/calibrate_mecanum.sh` all source `setup_env.sh` at the top
  instead of hard-coding `/home/jalf/miniconda3/envs/tir/bin`.
The hard-coded conda path is now exactly one line in `setup_env.sh`'s
fallback default — a single place to edit on a new machine, or
override-once via `export HERDING_PYTHON=...`.

(2) 360° LiDAR FOV ablation
* New `LIDAR_WEBOTS_360` preset matches the existing
  `protos/ShepherdDog360.proto` (360 rays / 2π FOV / 15 m range).
* `tools/run_webots.sh` reads `HERDING_LIDAR=140|360` and swaps the
  diff-drive proto accordingly (mecanum keeps 140° — the
  ShepherdDogMecanum proto has its own LiDAR section). The variant
  is written into `herding_runtime.cfg` so the controller can read
  it even when Webots strips env vars.
* `controllers/shepherd_dog/shepherd_dog.py` picks the matching
  `lidar_cfg` (`HERDING_WEBOTS.lidar` for 140°, `LIDAR_WEBOTS_360`
  otherwise) and feeds it to `detections_from_scan` so the
  perception pipeline interprets ray angles + max range correctly.

Smoke test: `HERDING_LIDAR=360 tools/run_webots.sh 5 strombom
differential field` launches with `ShepherdDog360.proto`, the
controller logs the new mode/drive/world line, and the dog is
penning sheep through 360° perception (4/5 at step 19200 before I
killed the test). No retraining required because the gym already
trains under `LIDAR_FULL` (360° preset).

126 pytest cases still pass.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-17 02:19:15 +00:00
Johnny Fernandes 10c01a938e Drop versioning vocabulary, polish docstrings, fix world-aware policy resolution
User-facing pass after the project was decided to be a single
submission with no inner iterations.

* Remove every "v1"/"v2"/"versioning" reference from the docs:
  - README mecanum section trims the "v1 predates the rewrite" prose
    in favour of a self-contained retrain recipe.
  - The 3.2 GB `training/runs/v1_clean/` backup directory is deleted.
* Refresh control-layer docstrings:
  - `sheep_tracker.py` header now describes the three actual pipeline
    stages (consensus, prediction, pen latching) instead of layering
    the consensus stage on top of a stale "predictive mode" preamble.
  - `controllers/shepherd_dog/shepherd_dog.py` mode list is
    up-to-date — adds `universal`, removes outdated single-policy
    default paths, mentions `HERDING_USE_GT=1` as the perception
    ablation.
* Refresh training command examples:
  - `training/bc/collect.py` and `training/bc/pretrain.py` usage
    snippets show the world-suffixed paths the Makefile actually
    uses; the `--out` arg is now required so old "demos.npz"
    invocations error loudly instead of silently overwriting.
  - `training/README.md` rewritten — drops the legacy `runs/bc`
    diagram, documents the per-(drive, world) pipeline, and adds
    the mecanum retraining caveat.
* Fix policy-directory resolution end-to-end:
  - `tools/run_webots.sh` now tries
    `training/runs/{bc,rl}_<drive>_<world>` first, then the drive-
    only path, then the bare-mode legacy path — matching the actual
    on-disk layout. Previously it looked for `bc_<drive>` (no
    world) and silently fell back to `bc`, masking the world
    selection.
  - `controllers/shepherd_dog/shepherd_dog.py:_resolve_policy_dir`
    has the same fix plus a latent NameError unmasked: it referenced
    `DRIVE_MODE` before that variable was set at module load. The
    block is restructured so MODE/DRIVE_MODE/WORLD are resolved
    first, then the function uses them as explicit arguments.

126 pytest cases still pass.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-17 01:50:54 +00:00
Johnny Fernandes ee77c8606c Gym mecanum kinematics matching to Webots roller-hinge proto
Mecanum proto rewrite in b3cf990 made the wheels truly omnidirectional
in Webots, but with asymmetric slip: forward command produces ~89% of
textbook speed while strafe produces only ~38% plus a consistent
~28% backward bleed-through. v1 BC/RL trained on perfect mecanum
gym kinematics could not herd the new dynamics. To unblock that:

* `mecanum_kinematics_step` gains two parameters that scale the
  realised motion to match a deployed-platform calibration:
    - strafe_efficiency  ∈ (0, 1]  default 1.0
    - strafe_to_forward_bleed     default 0.0
  Forward motion is untouched (textbook X-pattern continues to apply
  to vx_body); only the lateral channel is scaled and bleed is added.
* `RobotConfig` exposes both as drive-config fields with the same
  pass-through defaults so existing diff-drive code and existing
  mecanum training pipelines see no behaviour change.
* `HERDING_MEC_WEBOTS` preset bakes in the values measured against the
  current Webots mecanum proto (strafe_efficiency=0.4,
  strafe_to_forward_bleed=-0.28). Training mecanum BC/RL with this
  preset produces policies that compensate for the imperfect
  physical mecanum at deploy.
* `HerdingEnv` plumbs `RobotConfig.strafe_*` through to
  `mecanum_kinematics_step` so the preset takes effect.
* tools/gen_mecanum_wheels.py is added so the proto's 32 roller
  hinges can be regenerated by editing a single set of constants
  rather than hand-editing 1500+ lines of VRML.

Tests:
* 4 new mecanum_kinematics_step tests (default pass-through, strafe
  scaling, backward bleed, forward unaffected by strafe params).
* 3 new RobotConfig tests (defaults, validation, preset shape).
* Sanity check: gym strafe with HERDING_MEC_WEBOTS over 100 steps
  reproduces the Webots calibration to 2 decimal places.

126 unit tests pass (was 120).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-17 01:09:47 +00:00
Johnny Fernandes b3cf9909a8 Mecanum proto: replace cylinder wheels with physical roller hinges
Each wheel is now a hub solid + 8 passive HingeJoint rollers (capsules
tilted 45° in body xy plane at the bottom contact point) instead of
a single plain Cylinder. The rollers free-spin around their tilt axes
so the wheel exhibits mecanum X-pattern behaviour: gym-frame strafe
commands now produce body strafe in Webots, where before they
produced wrong-direction motion (the plain cylinders behaved as 4-
wheel skid-steer).

Calibration on flat field, 200 steps each:
                       gym predict      webots out         err
  vx=0.5  vy=0          1.33 m/s +x     1.19 m/s +x       10.9% +x
                        0     m/s +y    -0.10 m/s +y      ~clean
  vx=0    vy=0.5        1.33 m/s +y     0.50 m/s +y       62.1% +y
                        0     m/s +x    -0.37 m/s +x      noticeable
                                                          mecanum
                                                          coupling

Strafe is imperfect (-x bleed-through, magnitude under-shoot) but
direction is correct and the platform is now omnidirectional. Forward
motion is high-fidelity. Tilt signs assigned so diagonal pairs FL+RR
and FR+RL share the same body-frame roller orientation (the standard
X pattern). Two contact-material names "MecanumWheelA/B" are kept for
diagnostic separation; both use the same isotropic Coulomb friction
of 2.0 with forceDependentSlip 0.005.

tools/run_webots.sh ships the matching contactProperties block on
every mecanum launch (re-emitted into the temporary world copy).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-16 21:54:35 +00:00
Johnny Fernandes 03b2df5656 Fix run_webots.sh exit-1 when N=0 (calibration mode)
`active=$(grep -c '^Sheep' "$DST")` returns 0 with exit code 1 when
no sheep are left in the world, which fires set -e and kills the
script before it can launch Webots. Wrap with `|| true` so the
calibration mode (N=0) can actually run.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-16 20:40:28 +00:00
Johnny Fernandes dd5ac669e5 Webots sim-to-real fixes, DAgger pipeline, 360° proto variant
Today's session worked across the full Webots delivery stack — found and
fixed a cluster of bugs blocking the BC/RL transfer, then explored
training-side mitigations for the residual perception gap.

Bug fixes:
- Makefile FP_RATE default 2.0 → 0.0: BC demos used fp_rate=0 but RL
  fine-tune defaulted to fp_rate=2, poisoning the BC obs distribution
  and stalling PPO at 0% success across 1.46M+ steps.
- controllers/{shepherd_dog,sheep}/runtime.ini: Webots was launching
  controllers under system python3 (no numpy) and they were crashing
  silently. Pinned to the conda tir env.
- herding/config.py HERDING_WEBOTS preset: pen_latch_depth 0.5 → 2.0,
  max_new_tracks_per_step 3 → 1, static_reject 0.8 → 1.2. Stops phantom
  FPs near the gate from latching as permanently-penned tracks.
- herding/perception/sheep_tracker.py: penned tracks now decay at
  forget_steps × 8 instead of living forever. Adds get_positions
  min_freshness filter for deploy-time use.

Training/eval matches deployment:
- training/bc/collect.py: --dagger-policy flag for DAgger rollouts
  (policy drives, teacher labels) + --use-webots-preset for matched
  140° tracker + DR config.
- controllers/shepherd_dog/shepherd_dog.py: scan-fallback (0, 0.6) when
  BC/RL sees empty sheep_positions — recovers from FOV gaps.

Tooling:
- tools/dagger_round.sh: one-shot DAgger round (collect + concat + bc).
- tools/webots_sweep_gt.sh: full sweep with HERDING_USE_GT=1 for the
  perception-gap diagnosis matrix.
- protos/ShepherdDog360.proto: 360° FOV variant for the FOV-ablation
  comparison. Canonical proto stays at 140° per project spec.

Artifacts: v1 BC/RL policies for all 4 (drive × world) combos trained
in clean gym (success: diff/field 90-100%, diff/round 58%, mec/field
60-100%, mec/round 50-100%). DAgger r1/r2 BCs for diff/field show
12%→38% progression on gym HERDING_WEBOTS proxy but did not close
to actual Webots LiDAR (0/5 throughout). Next: LSTM policy or
learned tracker per the project-state memory.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-16 17:21:02 +00:00
Johnny Fernandes 5c2ee4bba5 Checkpoint 8 2026-05-12 22:41:03 +01:00
Johnny Fernandes a01a5c9cef Checkpoint 7 2026-05-11 12:21:51 +01:00
Johnny Fernandes fce0e0c786 Checkpoint 6 2026-05-11 10:35:48 +01:00
Johnny Fernandes b457155538 Checkpoint 5 - incomplete 2026-05-11 10:35:39 +01:00
Johnny Fernandes 6688325d89 Checkpoint 4 2026-05-11 00:42:52 +01:00
Johnny Fernandes 2a6db038df Checkpoint 3 2026-05-10 12:46:14 +01:00
Johnny Fernandes 1bb9415414 Checkpoint 2 2026-05-07 22:00:10 +01:00
Johnny Fernandes f256e99a76 Styling and sheep behaviour 2026-04-22 21:01:42 +01:00