Commit Graph

30 Commits

Author SHA1 Message Date
Johnny Fernandes e86fee5ae8 Per-sheep pen-time metrics, seed support, make webots → menu
* `controllers/shepherd_dog/shepherd_dog.py`
  - Tracks the first step at which each sheep crosses the gate; on
    auto-finish (all sheep penned) prints a `[results]` summary
    block: mode/drive/world/lidar/dogs/seed line, total simulated
    time, per-sheep penning order with absolute step + seconds since
    sim start, and the gate spread between the first and last
    penning.
  - Reads `HERDING_SEED` (env / runtime cfg) and seeds the
    controller's RNG when set. Empty = time-based default = old
    non-deterministic behaviour.
* `controllers/sheep/sheep.py` reads `HERDING_SEED` the same way
  (loading `herding_runtime.cfg` itself so it works even when
  Webots strips env vars) and seeds Python's RNG XOR'd with the
  sheep's name hash, so a fixed seed gives a reproducible flock
  trajectory without all sheep starting from identical wander state.
* `tools/run_webots.sh` writes `HERDING_SEED` into the runtime cfg
  (empty when unset so existing scripts keep their stochastic
  behaviour).
* `tools/webots_menu.sh` gains a Seed prompt (random / fixed
  integer); the launch summary box shows the choice next to the
  perception row.
* `Makefile`
  - `make webots`  now fires the interactive picker (replacing the
    old positional invocation).
  - `make webots_quick MODE=… DRIVE=… WORLD=… N=…` is the old
    positional path, kept for batch / scripted use.

Smoke-tested: menu renders Mode → Drive → World → LiDAR → Dogs
→ Sheep → Perception → Seed → Headless prompts and shows the
selected Seed value in the launch summary. 126 pytest cases still
pass.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-17 10:33:34 +00:00
Johnny Fernandes eadeeafb32 Dual-shepherd soft axis-split (HERDING_AXIS_LEAK)
The strict 100/0 axis mask reaches drive standoff and deadlocks
because each dog has only one degree of freedom left to push the
flock. Soften the mask: each dog leads its assigned axis (full gain)
and contributes ``HERDING_AXIS_LEAK`` on the other axis. ``0.0`` is
the old strict behaviour; ``1.0`` is no mask (both dogs run full
policy, role-redundant). Default ``0.3`` breaks the deadlock while
preserving the "one dog per axis" coordination story.

Implementation:
* `controllers/shepherd_dog/shepherd_dog.py` reads
  `HERDING_AXIS_LEAK` from env / runtime cfg (clamped to [0, 1]),
  prints it next to the axis tag, and multiplies the off-axis
  velocity component by it instead of zeroing.
* `tools/run_webots.sh` writes `HERDING_AXIS_LEAK` into
  `herding_runtime.cfg` so Webots-stripped controller subprocesses
  still see it; defaults to 0.3 when unset.

Webots smoke test (HERDING_NDOGS=2, HERDING_AXIS_LEAK=0.3, strombom,
diff/field, 5 sheep, LiDAR perception, no GT): **5/5 penned at step
13204**, vs the strict 100/0 mask which timed out at 0/5. Penning
trail 1/5 → 2/5 → 4/5 → 5/5 between steps 6200 and 13400 — slower
than single-dog (Strömbom diff/field n=5: 7528) as expected since
the work is split, but the coordination demonstrably succeeds.

This gives the writeup a clean three-row ablation:
  α=0.0  (strict)  → deadlock, 0/5
  α=0.3  (default) → 5/5 @ 13204
  α=1.0  (no mask) → both dogs run full policy (single-dog
                     baseline applied twice; no axis story)

126 pytest cases still pass.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-17 02:43:40 +00:00
Johnny Fernandes cfbf4a0267 Dual-shepherd axis-split (HERDING_NDOGS=2)
The launcher can now spawn two `ShepherdDog` robots, each masked to a
single axis of motion, so the herding workload is split orthogonally.

Mechanic:
* `HERDING_NDOGS=2` (default 1) tells `tools/run_webots.sh` to replace
  the single-dog node in the generated test world with two copies:
  - `ShepherdDogX` at (-4, -10), `customData "axis=x"`
  - `ShepherdDogY` at (+4, -10), `customData "axis=y"`
  Each spawn position sits south of the field interior so the pair
  doesn't collide with starting sheep.
* `controllers/shepherd_dog/shepherd_dog.py` reads `getCustomData()`
  at startup; when `axis=x|y` it zeroes the off-axis component of every
  action *after* speed modulation and *before* EMA smoothing. With
  `customData` empty the controller behaves identically to single-dog
  mode, so all existing launches are unaffected.
* The dog's emitter line now carries the robot's name
  (`dog:ShepherdDogX:x:y`), and `controllers/sheep/sheep.py` keeps a
  `dogs` dict keyed by name, picking the closest one each step for
  its flee target. Single-dog runs still use the legacy two-field
  `dog:x:y` format thanks to a length check.
* `HERDING_NDOGS` is written into `herding_runtime.cfg` and exported
  to subprocesses so future tooling can read it.

Verified behaviour in Webots smoke tests (HERDING_NDOGS=2, strombom,
diff/field, 5 sheep): both dogs spawn with the expected names and
axis tags, the dual-dog status print appears, each dog acts only on
its assigned axis early in the trial, and the masking is internally
consistent. The pair stalls before penning under pure axis-split
because each dog reaches its drive standoff and then has only one
degree of freedom — useful research finding for the write-up;
coordination strategy (shared CoM, role-switching, etc.) is future
work.

126 pytest cases still pass.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-17 02:35:38 +00:00
Johnny Fernandes d00da52c3c Portable Python env + 360° LiDAR ablation flag
Two small features.

(1) Portable interpreter
* `tools/setup_env.sh` exports HERDING_PYTHON (default points to the
  project's conda env; override in your shell to retarget).
* Both `controllers/*/runtime.ini` files now use Webots' env-var
  expansion: `COMMAND = $(HERDING_PYTHON)` so the Webots-launched
  controllers pick up the same interpreter as the bash scripts.
* `tools/run_webots.sh`, `tools/webots_sweep{,_gt}.sh` and
  `tools/calibrate_mecanum.sh` all source `setup_env.sh` at the top
  instead of hard-coding `/home/jalf/miniconda3/envs/tir/bin`.
The hard-coded conda path is now exactly one line in `setup_env.sh`'s
fallback default — a single place to edit on a new machine, or
override-once via `export HERDING_PYTHON=...`.

(2) 360° LiDAR FOV ablation
* New `LIDAR_WEBOTS_360` preset matches the existing
  `protos/ShepherdDog360.proto` (360 rays / 2π FOV / 15 m range).
* `tools/run_webots.sh` reads `HERDING_LIDAR=140|360` and swaps the
  diff-drive proto accordingly (mecanum keeps 140° — the
  ShepherdDogMecanum proto has its own LiDAR section). The variant
  is written into `herding_runtime.cfg` so the controller can read
  it even when Webots strips env vars.
* `controllers/shepherd_dog/shepherd_dog.py` picks the matching
  `lidar_cfg` (`HERDING_WEBOTS.lidar` for 140°, `LIDAR_WEBOTS_360`
  otherwise) and feeds it to `detections_from_scan` so the
  perception pipeline interprets ray angles + max range correctly.

Smoke test: `HERDING_LIDAR=360 tools/run_webots.sh 5 strombom
differential field` launches with `ShepherdDog360.proto`, the
controller logs the new mode/drive/world line, and the dog is
penning sheep through 360° perception (4/5 at step 19200 before I
killed the test). No retraining required because the gym already
trains under `LIDAR_FULL` (360° preset).

126 pytest cases still pass.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-17 02:19:15 +00:00
Johnny Fernandes 7ab69ab0f3 Rename multi-segment functions to two-concept names; polish docstrings
Naming pass: rename functions whose third+ segment is redundant or
implementation-detail, sticking to the codebase's preferred
``noun_verb`` / ``verb_noun`` two-concept idiom. Renames are atomic
across definitions, callers, and tests.

  is_penned_position        →  is_penned
  modulate_speed_near_sheep →  modulate_speed
  mecanum_kinematics_step   →  mecanum_step
  policy_forward_mean       →  forward_mean

Two-concept patterns like ``velocity_to_wheels`` / ``detections_from_scan``
/ ``make_strombom_predictor`` are left alone — they're idiomatic
converters / factories that read as a single concept, and the longer
form aids grep-ability.

Docstring polish:
* ``herding/config.py`` header drops the "previously lived as a
  module-level literal" historical framing — we ship as a single
  thing, so the refactor anecdote no longer earns its keep. The
  usage examples now mention both ``HERDING_WEBOTS`` and
  ``HERDING_MEC_WEBOTS`` presets.

126 pytest cases still pass.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-17 01:58:15 +00:00
Johnny Fernandes 10c01a938e Drop versioning vocabulary, polish docstrings, fix world-aware policy resolution
User-facing pass after the project was decided to be a single
submission with no inner iterations.

* Remove every "v1"/"v2"/"versioning" reference from the docs:
  - README mecanum section trims the "v1 predates the rewrite" prose
    in favour of a self-contained retrain recipe.
  - The 3.2 GB `training/runs/v1_clean/` backup directory is deleted.
* Refresh control-layer docstrings:
  - `sheep_tracker.py` header now describes the three actual pipeline
    stages (consensus, prediction, pen latching) instead of layering
    the consensus stage on top of a stale "predictive mode" preamble.
  - `controllers/shepherd_dog/shepherd_dog.py` mode list is
    up-to-date — adds `universal`, removes outdated single-policy
    default paths, mentions `HERDING_USE_GT=1` as the perception
    ablation.
* Refresh training command examples:
  - `training/bc/collect.py` and `training/bc/pretrain.py` usage
    snippets show the world-suffixed paths the Makefile actually
    uses; the `--out` arg is now required so old "demos.npz"
    invocations error loudly instead of silently overwriting.
  - `training/README.md` rewritten — drops the legacy `runs/bc`
    diagram, documents the per-(drive, world) pipeline, and adds
    the mecanum retraining caveat.
* Fix policy-directory resolution end-to-end:
  - `tools/run_webots.sh` now tries
    `training/runs/{bc,rl}_<drive>_<world>` first, then the drive-
    only path, then the bare-mode legacy path — matching the actual
    on-disk layout. Previously it looked for `bc_<drive>` (no
    world) and silently fell back to `bc`, masking the world
    selection.
  - `controllers/shepherd_dog/shepherd_dog.py:_resolve_policy_dir`
    has the same fix plus a latent NameError unmasked: it referenced
    `DRIVE_MODE` before that variable was set at module load. The
    block is restructured so MODE/DRIVE_MODE/WORLD are resolved
    first, then the function uses them as explicit arguments.

126 pytest cases still pass.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-17 01:50:54 +00:00
Johnny Fernandes 2d23289052 Consensus tracker + active scan close Webots 140° LiDAR gap
Two deploy-time fixes that take v1 360°-trained BC/RL from 0/n to n/n
penned on the canonical 140° LiDAR proto for diff/field:

* SheepTracker now supports a consensus stage: new detections start as
  candidate tracks invisible to get_positions(). A candidate must
  accumulate consensus_k matches within consensus_radius_m of itself
  inside a consensus_max_age window to be promoted; otherwise it
  expires. Real sheep self-confirm within 3 frames (≪0.05 m/step);
  wall-return cluster centroids jitter beyond 0.3 m as the dog moves
  and never promote. consensus_k=1 (default) is a no-op so unconfigured
  callers and HERDING_DEFAULT keep prior behaviour.
* HERDING_WEBOTS preset gets consensus_k=3, radius=0.3, max_age=20,
  plus longer forget_steps=300 and predict_steps=180 so confirmed
  sheep persist through long FOV-occlusion gaps a narrow 140° cone
  produces. max_new_tracks_per_step=1 still rate-caps spawn bursts.
* shepherd_dog.py BC/RL empty-obs fallback now rotates the desired
  heading with step_count so the cone actively sweeps the field
  instead of driving due north into the wall.

Verified in headless Webots (HERDING_USE_GT=0, LiDAR only):
  BC diff/field:        5/5 @ 11698, 10/10 @ 15079
  RL diff/field:        5/5 @ 10039, 9/10 @ 18200 (timeout)
  Strömbom diff/field:  5/5 @ 7528
All previously 0/n. 120 unit tests pass; 9 new consensus tests cover
the candidate stage, promotion radius, and one-shot phantom rejection.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-16 20:19:11 +00:00
Johnny Fernandes 876e14e74f LSTM (RecurrentPPO) experiment + recurrent policy support
Adds RecurrentPPO-based training as an alternative to MLP+frame-stack.
The LSTM gives the policy unbounded temporal memory, addressing the
partial-obs failure mode of the 140° Webots LiDAR (tracker briefly
empties when the dog turns; sporadic phantom tracks confuse decisions).

* training/rl/train_lstm.py: from-scratch RecurrentPPO trainer (no BC
  init, no KL term since there's no reference). Uses HERDING_WEBOTS
  preset so the obs distribution matches deployment.
* training/eval.py: auto-detects RecurrentPPO zips, maintains LSTM
  hidden state across steps, resets between episodes.
* controllers/shepherd_dog/policy_loader.py: PolicyHandle supports
  recurrent policies — state managed inside, reset_recurrent() exposed.

Result on diff/field after 3M steps:
- Gym (default 360°): 69% avg success across n=1..10
- Gym (HERDING_WEBOTS preset, training env): 2% — penning 3-4/5 but
  rarely all 5
- Webots LiDAR 140°: 0/5 (same wall as DAgger and v1 policies)

Conclusion: architectural changes (LSTM vs MLP) don't close the
perception sim-to-real gap. The gym LiDAR sim doesn't faithfully
reproduce Webots phantom-track distribution; any policy trained on the
gym proxy fails to handle real Webots phantoms regardless of
architecture. Closing this gap requires either modeling Webots phantom
patterns in the gym sim (multi-day work) or Webots-in-the-loop
training (very slow). See memory/lstm_results.md for details.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-16 19:22:32 +00:00
Johnny Fernandes dd5ac669e5 Webots sim-to-real fixes, DAgger pipeline, 360° proto variant
Today's session worked across the full Webots delivery stack — found and
fixed a cluster of bugs blocking the BC/RL transfer, then explored
training-side mitigations for the residual perception gap.

Bug fixes:
- Makefile FP_RATE default 2.0 → 0.0: BC demos used fp_rate=0 but RL
  fine-tune defaulted to fp_rate=2, poisoning the BC obs distribution
  and stalling PPO at 0% success across 1.46M+ steps.
- controllers/{shepherd_dog,sheep}/runtime.ini: Webots was launching
  controllers under system python3 (no numpy) and they were crashing
  silently. Pinned to the conda tir env.
- herding/config.py HERDING_WEBOTS preset: pen_latch_depth 0.5 → 2.0,
  max_new_tracks_per_step 3 → 1, static_reject 0.8 → 1.2. Stops phantom
  FPs near the gate from latching as permanently-penned tracks.
- herding/perception/sheep_tracker.py: penned tracks now decay at
  forget_steps × 8 instead of living forever. Adds get_positions
  min_freshness filter for deploy-time use.

Training/eval matches deployment:
- training/bc/collect.py: --dagger-policy flag for DAgger rollouts
  (policy drives, teacher labels) + --use-webots-preset for matched
  140° tracker + DR config.
- controllers/shepherd_dog/shepherd_dog.py: scan-fallback (0, 0.6) when
  BC/RL sees empty sheep_positions — recovers from FOV gaps.

Tooling:
- tools/dagger_round.sh: one-shot DAgger round (collect + concat + bc).
- tools/webots_sweep_gt.sh: full sweep with HERDING_USE_GT=1 for the
  perception-gap diagnosis matrix.
- protos/ShepherdDog360.proto: 360° FOV variant for the FOV-ablation
  comparison. Canonical proto stays at 140° per project spec.

Artifacts: v1 BC/RL policies for all 4 (drive × world) combos trained
in clean gym (success: diff/field 90-100%, diff/round 58%, mec/field
60-100%, mec/round 50-100%). DAgger r1/r2 BCs for diff/field show
12%→38% progression on gym HERDING_WEBOTS proxy but did not close
to actual Webots LiDAR (0/5 throughout). Next: LSTM policy or
learned tracker per the project-state memory.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-16 17:21:02 +00:00
Johnny Fernandes 5c2ee4bba5 Checkpoint 8 2026-05-12 22:41:03 +01:00
Johnny Fernandes a01a5c9cef Checkpoint 7 2026-05-11 12:21:51 +01:00
Johnny Fernandes fce0e0c786 Checkpoint 6 2026-05-11 10:35:48 +01:00
Johnny Fernandes b457155538 Checkpoint 5 - incomplete 2026-05-11 10:35:39 +01:00
Johnny Fernandes 6688325d89 Checkpoint 4 2026-05-11 00:42:52 +01:00
Johnny Fernandes 2a6db038df Checkpoint 3 2026-05-10 12:46:14 +01:00
Johnny Fernandes 1bb9415414 Checkpoint 2 2026-05-07 22:00:10 +01:00
Johnny Fernandes 90aa3bbcb4 Checkpoint 1 2026-05-07 21:59:58 +01:00
Johnny Fernandes deeae3193e Mimics webots approach better + debug. Lucky number 2026-04-26 18:55:53 +01:00
Johnny Fernandes 1af7d03ce2 Mimic webots physics 2026-04-26 18:22:26 +01:00
Johnny Fernandes ad185b4d7e Approach v4 simpler version 2026-04-26 17:18:20 +01:00
Johnny Fernandes 11e13c6980 Approach v3 w/ south penalty 2026-04-26 14:55:13 +01:00
Johnny Fernandes a44ddb7b08 Approach refinement 2026-04-26 12:59:04 +01:00
Johnny Fernandes b3251fcca3 Sheep training flock _ improver 2026-04-24 22:46:51 +01:00
Johnny Fernandes d599181d22 Sheep training flock _ improver 2026-04-24 21:29:44 +01:00
Johnny Fernandes bf9fe902d9 Sheep training flock of 10 fix? 2026-04-24 17:49:42 +01:00
Johnny Fernandes fcfa2c35c8 Sheep training flock of 10 fix? 2026-04-24 14:54:20 +01:00
Johnny Fernandes 17eb25864e Sheep training flock of 10 fix? 2026-04-24 10:58:36 +01:00
Johnny Fernandes 81dc2aca01 Sheep training flock of 10 2026-04-23 19:22:39 +01:00
Johnny Fernandes fdac0ae0b0 Shepherd Dog RL 2026-04-23 19:22:14 +01:00
Johnny Fernandes f256e99a76 Styling and sheep behaviour 2026-04-22 21:01:42 +01:00