Commit Graph

111 Commits

Author SHA1 Message Date
Johnny Fernandes e86fee5ae8 Per-sheep pen-time metrics, seed support, make webots → menu
* `controllers/shepherd_dog/shepherd_dog.py`
  - Tracks the first step at which each sheep crosses the gate; on
    auto-finish (all sheep penned) prints a `[results]` summary
    block: mode/drive/world/lidar/dogs/seed line, total simulated
    time, per-sheep penning order with absolute step + seconds since
    sim start, and the gate spread between the first and last
    penning.
  - Reads `HERDING_SEED` (env / runtime cfg) and seeds the
    controller's RNG when set. Empty = time-based default = old
    non-deterministic behaviour.
* `controllers/sheep/sheep.py` reads `HERDING_SEED` the same way
  (loading `herding_runtime.cfg` itself so it works even when
  Webots strips env vars) and seeds Python's RNG XOR'd with the
  sheep's name hash, so a fixed seed gives a reproducible flock
  trajectory without all sheep starting from identical wander state.
* `tools/run_webots.sh` writes `HERDING_SEED` into the runtime cfg
  (empty when unset so existing scripts keep their stochastic
  behaviour).
* `tools/webots_menu.sh` gains a Seed prompt (random / fixed
  integer); the launch summary box shows the choice next to the
  perception row.
* `Makefile`
  - `make webots`  now fires the interactive picker (replacing the
    old positional invocation).
  - `make webots_quick MODE=… DRIVE=… WORLD=… N=…` is the old
    positional path, kept for batch / scripted use.

Smoke-tested: menu renders Mode → Drive → World → LiDAR → Dogs
→ Sheep → Perception → Seed → Headless prompts and shows the
selected Seed value in the launch summary. 126 pytest cases still
pass.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-17 10:33:34 +00:00
Johnny Fernandes bdaff6a3e1 Interactive Webots launcher (tools/webots_menu.sh)
Single-command picker that prompts for every experimental knob the
project supports, then dispatches to `tools/run_webots.sh` with the
matching env vars. The banner reminds the user that the interpreter
path lives in `tools/setup_env.sh` (or `$HERDING_PYTHON`) so the
"this conda path won't exist on another machine" trap is hard to
fall into.

Prompts, in order:
  Mode          : bc | rl | strombom | sequential | universal
  Drive         : differential | mecanum
  World         : field | field_round
  LiDAR FOV     : 140° | 360°  (skipped when drive=mecanum)
  Dogs          : 1 | 2 (axis-split — only ask leak if 2)
  Sheep         : 1..10
  Perception    : LiDAR | GT bypass
  Headless      : no (windowed) | yes (xvfb-run + fast mode)

Each prompt has a default marked with `*`; pressing Enter through the
whole flow runs the canonical demo (BC / diff / field / 140° /
1 dog / 5 sheep / LiDAR / windowed). The configuration is summarised
in a boxed block before the final "Launch? [Y/n]" confirm.

README quick-start now lists `tools/webots_menu.sh` as the
recommended starting point and shows the env-var-prefixed launcher
invocations (HERDING_LIDAR=360, HERDING_NDOGS=2, HERDING_USE_GT=1)
for non-interactive use.

126 pytest cases still pass.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-17 02:49:06 +00:00
Johnny Fernandes eadeeafb32 Dual-shepherd soft axis-split (HERDING_AXIS_LEAK)
The strict 100/0 axis mask reaches drive standoff and deadlocks
because each dog has only one degree of freedom left to push the
flock. Soften the mask: each dog leads its assigned axis (full gain)
and contributes ``HERDING_AXIS_LEAK`` on the other axis. ``0.0`` is
the old strict behaviour; ``1.0`` is no mask (both dogs run full
policy, role-redundant). Default ``0.3`` breaks the deadlock while
preserving the "one dog per axis" coordination story.

Implementation:
* `controllers/shepherd_dog/shepherd_dog.py` reads
  `HERDING_AXIS_LEAK` from env / runtime cfg (clamped to [0, 1]),
  prints it next to the axis tag, and multiplies the off-axis
  velocity component by it instead of zeroing.
* `tools/run_webots.sh` writes `HERDING_AXIS_LEAK` into
  `herding_runtime.cfg` so Webots-stripped controller subprocesses
  still see it; defaults to 0.3 when unset.

Webots smoke test (HERDING_NDOGS=2, HERDING_AXIS_LEAK=0.3, strombom,
diff/field, 5 sheep, LiDAR perception, no GT): **5/5 penned at step
13204**, vs the strict 100/0 mask which timed out at 0/5. Penning
trail 1/5 → 2/5 → 4/5 → 5/5 between steps 6200 and 13400 — slower
than single-dog (Strömbom diff/field n=5: 7528) as expected since
the work is split, but the coordination demonstrably succeeds.

This gives the writeup a clean three-row ablation:
  α=0.0  (strict)  → deadlock, 0/5
  α=0.3  (default) → 5/5 @ 13204
  α=1.0  (no mask) → both dogs run full policy (single-dog
                     baseline applied twice; no axis story)

126 pytest cases still pass.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-17 02:43:40 +00:00
Johnny Fernandes cfbf4a0267 Dual-shepherd axis-split (HERDING_NDOGS=2)
The launcher can now spawn two `ShepherdDog` robots, each masked to a
single axis of motion, so the herding workload is split orthogonally.

Mechanic:
* `HERDING_NDOGS=2` (default 1) tells `tools/run_webots.sh` to replace
  the single-dog node in the generated test world with two copies:
  - `ShepherdDogX` at (-4, -10), `customData "axis=x"`
  - `ShepherdDogY` at (+4, -10), `customData "axis=y"`
  Each spawn position sits south of the field interior so the pair
  doesn't collide with starting sheep.
* `controllers/shepherd_dog/shepherd_dog.py` reads `getCustomData()`
  at startup; when `axis=x|y` it zeroes the off-axis component of every
  action *after* speed modulation and *before* EMA smoothing. With
  `customData` empty the controller behaves identically to single-dog
  mode, so all existing launches are unaffected.
* The dog's emitter line now carries the robot's name
  (`dog:ShepherdDogX:x:y`), and `controllers/sheep/sheep.py` keeps a
  `dogs` dict keyed by name, picking the closest one each step for
  its flee target. Single-dog runs still use the legacy two-field
  `dog:x:y` format thanks to a length check.
* `HERDING_NDOGS` is written into `herding_runtime.cfg` and exported
  to subprocesses so future tooling can read it.

Verified behaviour in Webots smoke tests (HERDING_NDOGS=2, strombom,
diff/field, 5 sheep): both dogs spawn with the expected names and
axis tags, the dual-dog status print appears, each dog acts only on
its assigned axis early in the trial, and the masking is internally
consistent. The pair stalls before penning under pure axis-split
because each dog reaches its drive standoff and then has only one
degree of freedom — useful research finding for the write-up;
coordination strategy (shared CoM, role-switching, etc.) is future
work.

126 pytest cases still pass.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-17 02:35:38 +00:00
Johnny Fernandes d00da52c3c Portable Python env + 360° LiDAR ablation flag
Two small features.

(1) Portable interpreter
* `tools/setup_env.sh` exports HERDING_PYTHON (default points to the
  project's conda env; override in your shell to retarget).
* Both `controllers/*/runtime.ini` files now use Webots' env-var
  expansion: `COMMAND = $(HERDING_PYTHON)` so the Webots-launched
  controllers pick up the same interpreter as the bash scripts.
* `tools/run_webots.sh`, `tools/webots_sweep{,_gt}.sh` and
  `tools/calibrate_mecanum.sh` all source `setup_env.sh` at the top
  instead of hard-coding `/home/jalf/miniconda3/envs/tir/bin`.
The hard-coded conda path is now exactly one line in `setup_env.sh`'s
fallback default — a single place to edit on a new machine, or
override-once via `export HERDING_PYTHON=...`.

(2) 360° LiDAR FOV ablation
* New `LIDAR_WEBOTS_360` preset matches the existing
  `protos/ShepherdDog360.proto` (360 rays / 2π FOV / 15 m range).
* `tools/run_webots.sh` reads `HERDING_LIDAR=140|360` and swaps the
  diff-drive proto accordingly (mecanum keeps 140° — the
  ShepherdDogMecanum proto has its own LiDAR section). The variant
  is written into `herding_runtime.cfg` so the controller can read
  it even when Webots strips env vars.
* `controllers/shepherd_dog/shepherd_dog.py` picks the matching
  `lidar_cfg` (`HERDING_WEBOTS.lidar` for 140°, `LIDAR_WEBOTS_360`
  otherwise) and feeds it to `detections_from_scan` so the
  perception pipeline interprets ray angles + max range correctly.

Smoke test: `HERDING_LIDAR=360 tools/run_webots.sh 5 strombom
differential field` launches with `ShepherdDog360.proto`, the
controller logs the new mode/drive/world line, and the dog is
penning sheep through 360° perception (4/5 at step 19200 before I
killed the test). No retraining required because the gym already
trains under `LIDAR_FULL` (360° preset).

126 pytest cases still pass.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-17 02:19:15 +00:00
Johnny Fernandes 7ab69ab0f3 Rename multi-segment functions to two-concept names; polish docstrings
Naming pass: rename functions whose third+ segment is redundant or
implementation-detail, sticking to the codebase's preferred
``noun_verb`` / ``verb_noun`` two-concept idiom. Renames are atomic
across definitions, callers, and tests.

  is_penned_position        →  is_penned
  modulate_speed_near_sheep →  modulate_speed
  mecanum_kinematics_step   →  mecanum_step
  policy_forward_mean       →  forward_mean

Two-concept patterns like ``velocity_to_wheels`` / ``detections_from_scan``
/ ``make_strombom_predictor`` are left alone — they're idiomatic
converters / factories that read as a single concept, and the longer
form aids grep-ability.

Docstring polish:
* ``herding/config.py`` header drops the "previously lived as a
  module-level literal" historical framing — we ship as a single
  thing, so the refactor anecdote no longer earns its keep. The
  usage examples now mention both ``HERDING_WEBOTS`` and
  ``HERDING_MEC_WEBOTS`` presets.

126 pytest cases still pass.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-17 01:58:15 +00:00
Johnny Fernandes 10c01a938e Drop versioning vocabulary, polish docstrings, fix world-aware policy resolution
User-facing pass after the project was decided to be a single
submission with no inner iterations.

* Remove every "v1"/"v2"/"versioning" reference from the docs:
  - README mecanum section trims the "v1 predates the rewrite" prose
    in favour of a self-contained retrain recipe.
  - The 3.2 GB `training/runs/v1_clean/` backup directory is deleted.
* Refresh control-layer docstrings:
  - `sheep_tracker.py` header now describes the three actual pipeline
    stages (consensus, prediction, pen latching) instead of layering
    the consensus stage on top of a stale "predictive mode" preamble.
  - `controllers/shepherd_dog/shepherd_dog.py` mode list is
    up-to-date — adds `universal`, removes outdated single-policy
    default paths, mentions `HERDING_USE_GT=1` as the perception
    ablation.
* Refresh training command examples:
  - `training/bc/collect.py` and `training/bc/pretrain.py` usage
    snippets show the world-suffixed paths the Makefile actually
    uses; the `--out` arg is now required so old "demos.npz"
    invocations error loudly instead of silently overwriting.
  - `training/README.md` rewritten — drops the legacy `runs/bc`
    diagram, documents the per-(drive, world) pipeline, and adds
    the mecanum retraining caveat.
* Fix policy-directory resolution end-to-end:
  - `tools/run_webots.sh` now tries
    `training/runs/{bc,rl}_<drive>_<world>` first, then the drive-
    only path, then the bare-mode legacy path — matching the actual
    on-disk layout. Previously it looked for `bc_<drive>` (no
    world) and silently fell back to `bc`, masking the world
    selection.
  - `controllers/shepherd_dog/shepherd_dog.py:_resolve_policy_dir`
    has the same fix plus a latent NameError unmasked: it referenced
    `DRIVE_MODE` before that variable was set at module load. The
    block is restructured so MODE/DRIVE_MODE/WORLD are resolved
    first, then the function uses them as explicit arguments.

126 pytest cases still pass.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-17 01:50:54 +00:00
Johnny Fernandes a584a034e9 Project-wide cleanup: gitignore, dead code, stale artifacts, README
Repo hygiene pass after a long working session.

Files removed:
* stage1_train.log — runtime training log (~125 KB), shouldn't have
  been tracked.
* training/bc/demos.npz — orphan default-name demos file from before
  the world+drive-suffixed naming convention took over; no script
  references it.
* training/runs/bc_dagger{1,2}_differential_field/policy.zip — failed
  DAgger experiment artifacts. Per `memory/dagger_results.md` the
  whole DAgger experiment hit 0/5 on Webots transfer; these checkpoints
  have no consumers.

Untracked-but-deleted (no git change) — also cleaned from disk:
* Root-level runtime logs (43 *.log files, all unused — gitignored now).
* training/bc/{combined,dagger}*.npz (5 huge demo blobs, 2.6 GB
  reclaimed; not committed).
* training/bc/v1/ (2.6 GB backup of pre-DAgger demos; reclaimed).
* training/runs/at_20260426_*/ (orphan timestamped runs; reclaimed).
* All __pycache__/.

Dead code removed:
* `herding/control/strombom.py::compute_action_debug` — no callers
  anywhere in the repo.
* `herding/control/sequential.py::compute_action_debug` — same.
* `herding/control/universal.py::compute_action_diff` — same.

.gitignore extended to cover:
* All *.log files (training/eval/webots logs are runtime artifacts).
* training/bc/*.npz (re-collectable on demand by `make bc_demos`).
* training/bc/v1/.
* .pytest_cache, *.pyc, .claude/.

README refreshed:
* Mecanum + round-world coverage in the headline.
* Quick-start updated for DRIVE/WORLD-suffixed Makefile targets,
  GT-bypass example, and the mecanum-retrain caveat.
* Layout reflects the actual current tree (config.py, both protos,
  both worlds, all tools).
* Results table replaced with the Webots end-to-end numbers from
  the 2026-05-16 sweep (8/8 diff combos + LiDAR/GT comparison).

Verification: 126 pytest cases still pass (was 126 going in — no
test-coverage regression from the dead-code removal).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-17 01:38:19 +00:00
Johnny Fernandes 3b4c99a6c4 Training pipelines auto-select mecanum-Webots preset
* training/bc/collect.py: --use-webots-preset now picks the
  drive-matched variant. Mecanum drives get HERDING_MEC_WEBOTS
  (with the Webots-calibrated strafe efficiency and bleed) so the
  collected demos reflect the imperfect physical mecanum the
  deployed policy will see. Differential drives still use
  HERDING_WEBOTS (no behaviour change there).
* training/rl/train.py: mecanum fine-tune now *unconditionally*
  applies the HERDING_MEC_WEBOTS robot config to the PPO env (the
  policy must update against the same imperfect kinematics it
  deploys on). Diff fine-tune unchanged.

To retrain a mecanum policy end-to-end against the new proto:

  python -m training.bc.collect --drive-mode mecanum --world field \
    --use-webots-preset \
    --out training/bc/demos_mecanum_field_v2.npz
  python -m training.bc.pretrain --demos training/bc/demos_mecanum_field_v2.npz \
    --out training/runs/bc_mecanum_field_v2 ...
  python -m training.rl.train --bc training/runs/bc_mecanum_field_v2 \
    --out training/runs/rl_mecanum_field_v2 \
    --drive-mode mecanum --world field --use-webots-preset

The same flow for field_round / mecanum/round.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-17 01:12:06 +00:00
Johnny Fernandes ee77c8606c Gym mecanum kinematics matching to Webots roller-hinge proto
Mecanum proto rewrite in b3cf990 made the wheels truly omnidirectional
in Webots, but with asymmetric slip: forward command produces ~89% of
textbook speed while strafe produces only ~38% plus a consistent
~28% backward bleed-through. v1 BC/RL trained on perfect mecanum
gym kinematics could not herd the new dynamics. To unblock that:

* `mecanum_kinematics_step` gains two parameters that scale the
  realised motion to match a deployed-platform calibration:
    - strafe_efficiency  ∈ (0, 1]  default 1.0
    - strafe_to_forward_bleed     default 0.0
  Forward motion is untouched (textbook X-pattern continues to apply
  to vx_body); only the lateral channel is scaled and bleed is added.
* `RobotConfig` exposes both as drive-config fields with the same
  pass-through defaults so existing diff-drive code and existing
  mecanum training pipelines see no behaviour change.
* `HERDING_MEC_WEBOTS` preset bakes in the values measured against the
  current Webots mecanum proto (strafe_efficiency=0.4,
  strafe_to_forward_bleed=-0.28). Training mecanum BC/RL with this
  preset produces policies that compensate for the imperfect
  physical mecanum at deploy.
* `HerdingEnv` plumbs `RobotConfig.strafe_*` through to
  `mecanum_kinematics_step` so the preset takes effect.
* tools/gen_mecanum_wheels.py is added so the proto's 32 roller
  hinges can be regenerated by editing a single set of constants
  rather than hand-editing 1500+ lines of VRML.

Tests:
* 4 new mecanum_kinematics_step tests (default pass-through, strafe
  scaling, backward bleed, forward unaffected by strafe params).
* 3 new RobotConfig tests (defaults, validation, preset shape).
* Sanity check: gym strafe with HERDING_MEC_WEBOTS over 100 steps
  reproduces the Webots calibration to 2 decimal places.

126 unit tests pass (was 120).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-17 01:09:47 +00:00
Johnny Fernandes b3cf9909a8 Mecanum proto: replace cylinder wheels with physical roller hinges
Each wheel is now a hub solid + 8 passive HingeJoint rollers (capsules
tilted 45° in body xy plane at the bottom contact point) instead of
a single plain Cylinder. The rollers free-spin around their tilt axes
so the wheel exhibits mecanum X-pattern behaviour: gym-frame strafe
commands now produce body strafe in Webots, where before they
produced wrong-direction motion (the plain cylinders behaved as 4-
wheel skid-steer).

Calibration on flat field, 200 steps each:
                       gym predict      webots out         err
  vx=0.5  vy=0          1.33 m/s +x     1.19 m/s +x       10.9% +x
                        0     m/s +y    -0.10 m/s +y      ~clean
  vx=0    vy=0.5        1.33 m/s +y     0.50 m/s +y       62.1% +y
                        0     m/s +x    -0.37 m/s +x      noticeable
                                                          mecanum
                                                          coupling

Strafe is imperfect (-x bleed-through, magnitude under-shoot) but
direction is correct and the platform is now omnidirectional. Forward
motion is high-fidelity. Tilt signs assigned so diagonal pairs FL+RR
and FR+RL share the same body-frame roller orientation (the standard
X pattern). Two contact-material names "MecanumWheelA/B" are kept for
diagnostic separation; both use the same isotropic Coulomb friction
of 2.0 with forceDependentSlip 0.005.

tools/run_webots.sh ships the matching contactProperties block on
every mecanum launch (re-emitted into the temporary world copy).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-16 21:54:35 +00:00
Johnny Fernandes 1c197e0ff7 Enable consensus tracker by default + round-world Strömbom fix
Two changes that together raise diff/round gym success ~52%→88% (BC)
and ~68%→88% (RL) without retraining; diff/field stays at 100%.

* TrackerConfig.consensus_k default 1 → 3 (radius 0.5 m, max_age 15
  frames). The same candidate-promotion mechanism that closed the
  Webots LiDAR gap also filters gym tracker phantoms — they show up
  on the round field where sheep run further between detection
  cycles than GATE_M, so each new position spawns a fresh track
  while the stale one persists in memory. SheepTracker() called with
  no tracker_cfg keeps the legacy pass-through behaviour for
  backwards compatibility.
* Strömbom + universal teachers now detect when the natural
  "behind the flock" drive target leaves the curved boundary and
  fall back to pushing the flock radially inward toward the centre.
  Breaks the wall-circling pattern that previously trapped both the
  analytical baselines and the trained policies.

A/B numbers (n_sheep ∈ {1,2,3,5,10}, 5 seeds each, max_steps=15000):

  diff/field  bc:  baseline 100%  consensus 100%
  diff/field  rl:  baseline 100%  consensus 100%
  diff/round  bc:  baseline  52%  consensus  88%
  diff/round  rl:  baseline  68%  consensus  88%

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-16 21:09:25 +00:00
Johnny Fernandes 03b2df5656 Fix run_webots.sh exit-1 when N=0 (calibration mode)
`active=$(grep -c '^Sheep' "$DST")` returns 0 with exit code 1 when
no sheep are left in the world, which fires set -e and kills the
script before it can launch Webots. Wrap with `|| true` so the
calibration mode (N=0) can actually run.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-16 20:40:28 +00:00
Johnny Fernandes 2d23289052 Consensus tracker + active scan close Webots 140° LiDAR gap
Two deploy-time fixes that take v1 360°-trained BC/RL from 0/n to n/n
penned on the canonical 140° LiDAR proto for diff/field:

* SheepTracker now supports a consensus stage: new detections start as
  candidate tracks invisible to get_positions(). A candidate must
  accumulate consensus_k matches within consensus_radius_m of itself
  inside a consensus_max_age window to be promoted; otherwise it
  expires. Real sheep self-confirm within 3 frames (≪0.05 m/step);
  wall-return cluster centroids jitter beyond 0.3 m as the dog moves
  and never promote. consensus_k=1 (default) is a no-op so unconfigured
  callers and HERDING_DEFAULT keep prior behaviour.
* HERDING_WEBOTS preset gets consensus_k=3, radius=0.3, max_age=20,
  plus longer forget_steps=300 and predict_steps=180 so confirmed
  sheep persist through long FOV-occlusion gaps a narrow 140° cone
  produces. max_new_tracks_per_step=1 still rate-caps spawn bursts.
* shepherd_dog.py BC/RL empty-obs fallback now rotates the desired
  heading with step_count so the cone actively sweeps the field
  instead of driving due north into the wall.

Verified in headless Webots (HERDING_USE_GT=0, LiDAR only):
  BC diff/field:        5/5 @ 11698, 10/10 @ 15079
  RL diff/field:        5/5 @ 10039, 9/10 @ 18200 (timeout)
  Strömbom diff/field:  5/5 @ 7528
All previously 0/n. 120 unit tests pass; 9 new consensus tests cover
the candidate stage, promotion radius, and one-shot phantom rejection.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-16 20:19:11 +00:00
Johnny Fernandes 876e14e74f LSTM (RecurrentPPO) experiment + recurrent policy support
Adds RecurrentPPO-based training as an alternative to MLP+frame-stack.
The LSTM gives the policy unbounded temporal memory, addressing the
partial-obs failure mode of the 140° Webots LiDAR (tracker briefly
empties when the dog turns; sporadic phantom tracks confuse decisions).

* training/rl/train_lstm.py: from-scratch RecurrentPPO trainer (no BC
  init, no KL term since there's no reference). Uses HERDING_WEBOTS
  preset so the obs distribution matches deployment.
* training/eval.py: auto-detects RecurrentPPO zips, maintains LSTM
  hidden state across steps, resets between episodes.
* controllers/shepherd_dog/policy_loader.py: PolicyHandle supports
  recurrent policies — state managed inside, reset_recurrent() exposed.

Result on diff/field after 3M steps:
- Gym (default 360°): 69% avg success across n=1..10
- Gym (HERDING_WEBOTS preset, training env): 2% — penning 3-4/5 but
  rarely all 5
- Webots LiDAR 140°: 0/5 (same wall as DAgger and v1 policies)

Conclusion: architectural changes (LSTM vs MLP) don't close the
perception sim-to-real gap. The gym LiDAR sim doesn't faithfully
reproduce Webots phantom-track distribution; any policy trained on the
gym proxy fails to handle real Webots phantoms regardless of
architecture. Closing this gap requires either modeling Webots phantom
patterns in the gym sim (multi-day work) or Webots-in-the-loop
training (very slow). See memory/lstm_results.md for details.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-16 19:22:32 +00:00
Johnny Fernandes dd5ac669e5 Webots sim-to-real fixes, DAgger pipeline, 360° proto variant
Today's session worked across the full Webots delivery stack — found and
fixed a cluster of bugs blocking the BC/RL transfer, then explored
training-side mitigations for the residual perception gap.

Bug fixes:
- Makefile FP_RATE default 2.0 → 0.0: BC demos used fp_rate=0 but RL
  fine-tune defaulted to fp_rate=2, poisoning the BC obs distribution
  and stalling PPO at 0% success across 1.46M+ steps.
- controllers/{shepherd_dog,sheep}/runtime.ini: Webots was launching
  controllers under system python3 (no numpy) and they were crashing
  silently. Pinned to the conda tir env.
- herding/config.py HERDING_WEBOTS preset: pen_latch_depth 0.5 → 2.0,
  max_new_tracks_per_step 3 → 1, static_reject 0.8 → 1.2. Stops phantom
  FPs near the gate from latching as permanently-penned tracks.
- herding/perception/sheep_tracker.py: penned tracks now decay at
  forget_steps × 8 instead of living forever. Adds get_positions
  min_freshness filter for deploy-time use.

Training/eval matches deployment:
- training/bc/collect.py: --dagger-policy flag for DAgger rollouts
  (policy drives, teacher labels) + --use-webots-preset for matched
  140° tracker + DR config.
- controllers/shepherd_dog/shepherd_dog.py: scan-fallback (0, 0.6) when
  BC/RL sees empty sheep_positions — recovers from FOV gaps.

Tooling:
- tools/dagger_round.sh: one-shot DAgger round (collect + concat + bc).
- tools/webots_sweep_gt.sh: full sweep with HERDING_USE_GT=1 for the
  perception-gap diagnosis matrix.
- protos/ShepherdDog360.proto: 360° FOV variant for the FOV-ablation
  comparison. Canonical proto stays at 140° per project spec.

Artifacts: v1 BC/RL policies for all 4 (drive × world) combos trained
in clean gym (success: diff/field 90-100%, diff/round 58%, mec/field
60-100%, mec/round 50-100%). DAgger r1/r2 BCs for diff/field show
12%→38% progression on gym HERDING_WEBOTS proxy but did not close
to actual Webots LiDAR (0/5 throughout). Next: LSTM policy or
learned tracker per the project-state memory.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-16 17:21:02 +00:00
Johnny Fernandes c61df91950 Checkpoint 10 2026-05-13 23:22:17 +01:00
Johnny Fernandes aa598fcb83 Checkpoint 10 2026-05-13 23:14:16 +01:00
Johnny Fernandes 0f807003a5 Results from last checkpoint 2026-05-13 20:26:18 +00:00
Johnny Fernandes 683de740af Checkpoint 9 2026-05-13 13:46:50 +01:00
Johnny Fernandes be58ad2054 Results from last checkpoinr 2026-05-13 07:49:17 +00:00
Johnny Fernandes 5c2ee4bba5 Checkpoint 8 2026-05-12 22:41:03 +01:00
Johnny Fernandes a01a5c9cef Checkpoint 7 2026-05-11 12:21:51 +01:00
Johnny Fernandes fce0e0c786 Checkpoint 6 2026-05-11 10:35:48 +01:00
Johnny Fernandes b457155538 Checkpoint 5 - incomplete 2026-05-11 10:35:39 +01:00
Johnny Fernandes 6688325d89 Checkpoint 4 2026-05-11 00:42:52 +01:00
Johnny Fernandes 2a6db038df Checkpoint 3 2026-05-10 12:46:14 +01:00
Johnny Fernandes 1bb9415414 Checkpoint 2 2026-05-07 22:00:10 +01:00
Johnny Fernandes 90aa3bbcb4 Checkpoint 1 2026-05-07 21:59:58 +01:00
Johnny Fernandes 80a314b9e9 Trying attention method 2026-04-26 22:32:13 +01:00
Johnny Fernandes a2363d882f Trying attention method 2026-04-26 22:28:43 +01:00
Johnny Fernandes 57b1735e1a Mimics webots approach better + debug. Lucky number 2026-04-26 20:36:36 +01:00
Johnny Fernandes deeae3193e Mimics webots approach better + debug. Lucky number 2026-04-26 18:55:53 +01:00
Johnny Fernandes 1af7d03ce2 Mimic webots physics 2026-04-26 18:22:26 +01:00
Johnny Fernandes 8110fc3143 Run n3 2026-04-26 16:42:55 +00:00
Johnny Fernandes ad185b4d7e Approach v4 simpler version 2026-04-26 17:18:20 +01:00
Johnny Fernandes 27fe6d1bf5 Run v3 2026-04-26 16:01:30 +00:00
Johnny Fernandes e2883212c5 Approach v3 w/ south penalty fix 2026-04-26 15:26:24 +01:00
Johnny Fernandes 11e13c6980 Approach v3 w/ south penalty 2026-04-26 14:55:13 +01:00
Johnny Fernandes a561f8a697 Run v2 2026-04-26 13:32:48 +00:00
Johnny Fernandes a44ddb7b08 Approach refinement 2026-04-26 12:59:04 +01:00
Johnny Fernandes acf0810425 Test26_1200 2026-04-26 11:04:23 +00:00
Johnny Fernandes 3cfd6b5e81 Approach refinement 2026-04-26 02:55:14 +01:00
Johnny Fernandes d1aab20322 Approach refinement 2026-04-26 02:19:10 +01:00
Johnny Fernandes 287743709a Approach refinement 2026-04-26 02:02:25 +01:00
Johnny Fernandes 61f8a7db15 Cleanup and new approach 2026-04-26 01:50:01 +01:00
Johnny Fernandes b031473758 Behaviour refinement - fence penalty 2026-04-26 01:09:50 +01:00
Johnny Fernandes 6253850620 Behaviour refinement - fence penalty 2026-04-25 23:42:02 +01:00
Johnny Fernandes 6612dbc1ba Test25_2330 2026-04-25 22:32:06 +00:00
Johnny Fernandes 7b87908410 Behaviour refinement 2026-04-25 21:35:23 +01:00