TIR_PROJ

Author	SHA1	Message	Date
Johnny Fernandes	1df84ae4b5	Drop webots_quick target; mecanum BC demos now auto-use HERDING_MEC_WEBOTS * Remove `webots_quick` Makefile target — `make webots` is the only webots entry point now (it fires the interactive picker). The positional non-interactive path is still available as `bash tools/run_webots.sh N MODE DRIVE WORLD` for scripted use. * Add `WEBOTS_PRESET_FLAG = --use-webots-preset` for mecanum drive and pass it to the `bc.collect` recipe so demos are collected under the gym kinematics that match the physical-roller Webots mecanum. Without this, mecanum BC demos would record textbook X-pattern teacher actions against textbook gym kinematics, and the resulting policy would fail at deployment exactly the same way the current v1 mecanum policies do. * `rl/train.py` already auto-detects mecanum and applies HERDING_MEC_WEBOTS internally (commit `3b4c99a`), so the rl recipe doesn't need the flag — a one-line comment in the Makefile makes that intent explicit. Diff drive keeps the existing recipe: no --use-webots-preset, so BC demos collected on HERDING_DEFAULT (360° gym, no FP). This is the regime that produced the current diff/field and diff/round policies that pen 5/5 in Webots LiDAR; retraining under the same regime is the safest reproduction. 126 pytest cases still pass. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-17 10:44:15 +00:00
Johnny Fernandes	e86fee5ae8	Per-sheep pen-time metrics, seed support, make webots → menu * `controllers/shepherd_dog/shepherd_dog.py` - Tracks the first step at which each sheep crosses the gate; on auto-finish (all sheep penned) prints a `[results]` summary block: mode/drive/world/lidar/dogs/seed line, total simulated time, per-sheep penning order with absolute step + seconds since sim start, and the gate spread between the first and last penning. - Reads `HERDING_SEED` (env / runtime cfg) and seeds the controller's RNG when set. Empty = time-based default = old non-deterministic behaviour. * `controllers/sheep/sheep.py` reads `HERDING_SEED` the same way (loading `herding_runtime.cfg` itself so it works even when Webots strips env vars) and seeds Python's RNG XOR'd with the sheep's name hash, so a fixed seed gives a reproducible flock trajectory without all sheep starting from identical wander state. * `tools/run_webots.sh` writes `HERDING_SEED` into the runtime cfg (empty when unset so existing scripts keep their stochastic behaviour). * `tools/webots_menu.sh` gains a Seed prompt (random / fixed integer); the launch summary box shows the choice next to the perception row. * `Makefile` - `make webots` now fires the interactive picker (replacing the old positional invocation). - `make webots_quick MODE=… DRIVE=… WORLD=… N=…` is the old positional path, kept for batch / scripted use. Smoke-tested: menu renders Mode → Drive → World → LiDAR → Dogs → Sheep → Perception → Seed → Headless prompts and shows the selected Seed value in the launch summary. 126 pytest cases still pass. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-17 10:33:34 +00:00
Johnny Fernandes	bdaff6a3e1	Interactive Webots launcher (tools/webots_menu.sh) Single-command picker that prompts for every experimental knob the project supports, then dispatches to `tools/run_webots.sh` with the matching env vars. The banner reminds the user that the interpreter path lives in `tools/setup_env.sh` (or `$HERDING_PYTHON`) so the "this conda path won't exist on another machine" trap is hard to fall into. Prompts, in order: Mode : bc \| rl \| strombom \| sequential \| universal Drive : differential \| mecanum World : field \| field_round LiDAR FOV : 140° \| 360° (skipped when drive=mecanum) Dogs : 1 \| 2 (axis-split — only ask leak if 2) Sheep : 1..10 Perception : LiDAR \| GT bypass Headless : no (windowed) \| yes (xvfb-run + fast mode) Each prompt has a default marked with `*`; pressing Enter through the whole flow runs the canonical demo (BC / diff / field / 140° / 1 dog / 5 sheep / LiDAR / windowed). The configuration is summarised in a boxed block before the final "Launch? [Y/n]" confirm. README quick-start now lists `tools/webots_menu.sh` as the recommended starting point and shows the env-var-prefixed launcher invocations (HERDING_LIDAR=360, HERDING_NDOGS=2, HERDING_USE_GT=1) for non-interactive use. 126 pytest cases still pass. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-17 02:49:06 +00:00
Johnny Fernandes	eadeeafb32	Dual-shepherd soft axis-split (HERDING_AXIS_LEAK) The strict 100/0 axis mask reaches drive standoff and deadlocks because each dog has only one degree of freedom left to push the flock. Soften the mask: each dog leads its assigned axis (full gain) and contributes ``HERDING_AXIS_LEAK`` on the other axis. ``0.0`` is the old strict behaviour; ``1.0`` is no mask (both dogs run full policy, role-redundant). Default ``0.3`` breaks the deadlock while preserving the "one dog per axis" coordination story. Implementation: * `controllers/shepherd_dog/shepherd_dog.py` reads `HERDING_AXIS_LEAK` from env / runtime cfg (clamped to [0, 1]), prints it next to the axis tag, and multiplies the off-axis velocity component by it instead of zeroing. * `tools/run_webots.sh` writes `HERDING_AXIS_LEAK` into `herding_runtime.cfg` so Webots-stripped controller subprocesses still see it; defaults to 0.3 when unset. Webots smoke test (HERDING_NDOGS=2, HERDING_AXIS_LEAK=0.3, strombom, diff/field, 5 sheep, LiDAR perception, no GT): 5/5 penned at step 13204, vs the strict 100/0 mask which timed out at 0/5. Penning trail 1/5 → 2/5 → 4/5 → 5/5 between steps 6200 and 13400 — slower than single-dog (Strömbom diff/field n=5: 7528) as expected since the work is split, but the coordination demonstrably succeeds. This gives the writeup a clean three-row ablation: α=0.0 (strict) → deadlock, 0/5 α=0.3 (default) → 5/5 @ 13204 α=1.0 (no mask) → both dogs run full policy (single-dog baseline applied twice; no axis story) 126 pytest cases still pass. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-17 02:43:40 +00:00
Johnny Fernandes	cfbf4a0267	Dual-shepherd axis-split (HERDING_NDOGS=2) The launcher can now spawn two `ShepherdDog` robots, each masked to a single axis of motion, so the herding workload is split orthogonally. Mechanic: * `HERDING_NDOGS=2` (default 1) tells `tools/run_webots.sh` to replace the single-dog node in the generated test world with two copies: - `ShepherdDogX` at (-4, -10), `customData "axis=x"` - `ShepherdDogY` at (+4, -10), `customData "axis=y"` Each spawn position sits south of the field interior so the pair doesn't collide with starting sheep. * `controllers/shepherd_dog/shepherd_dog.py` reads `getCustomData()` at startup; when `axis=x\|y` it zeroes the off-axis component of every action after speed modulation and before EMA smoothing. With `customData` empty the controller behaves identically to single-dog mode, so all existing launches are unaffected. * The dog's emitter line now carries the robot's name (`dog:ShepherdDogX:x:y`), and `controllers/sheep/sheep.py` keeps a `dogs` dict keyed by name, picking the closest one each step for its flee target. Single-dog runs still use the legacy two-field `dog:x:y` format thanks to a length check. * `HERDING_NDOGS` is written into `herding_runtime.cfg` and exported to subprocesses so future tooling can read it. Verified behaviour in Webots smoke tests (HERDING_NDOGS=2, strombom, diff/field, 5 sheep): both dogs spawn with the expected names and axis tags, the dual-dog status print appears, each dog acts only on its assigned axis early in the trial, and the masking is internally consistent. The pair stalls before penning under pure axis-split because each dog reaches its drive standoff and then has only one degree of freedom — useful research finding for the write-up; coordination strategy (shared CoM, role-switching, etc.) is future work. 126 pytest cases still pass. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-17 02:35:38 +00:00
Johnny Fernandes	d00da52c3c	Portable Python env + 360° LiDAR ablation flag Two small features. (1) Portable interpreter * `tools/setup_env.sh` exports HERDING_PYTHON (default points to the project's conda env; override in your shell to retarget). * Both `controllers//runtime.ini` files now use Webots' env-var expansion: `COMMAND = $(HERDING_PYTHON)` so the Webots-launched controllers pick up the same interpreter as the bash scripts. `tools/run_webots.sh`, `tools/webots_sweep{,_gt}.sh` and `tools/calibrate_mecanum.sh` all source `setup_env.sh` at the top instead of hard-coding `/home/jalf/miniconda3/envs/tir/bin`. The hard-coded conda path is now exactly one line in `setup_env.sh`'s fallback default — a single place to edit on a new machine, or override-once via `export HERDING_PYTHON=...`. (2) 360° LiDAR FOV ablation * New `LIDAR_WEBOTS_360` preset matches the existing `protos/ShepherdDog360.proto` (360 rays / 2π FOV / 15 m range). * `tools/run_webots.sh` reads `HERDING_LIDAR=140\|360` and swaps the diff-drive proto accordingly (mecanum keeps 140° — the ShepherdDogMecanum proto has its own LiDAR section). The variant is written into `herding_runtime.cfg` so the controller can read it even when Webots strips env vars. * `controllers/shepherd_dog/shepherd_dog.py` picks the matching `lidar_cfg` (`HERDING_WEBOTS.lidar` for 140°, `LIDAR_WEBOTS_360` otherwise) and feeds it to `detections_from_scan` so the perception pipeline interprets ray angles + max range correctly. Smoke test: `HERDING_LIDAR=360 tools/run_webots.sh 5 strombom differential field` launches with `ShepherdDog360.proto`, the controller logs the new mode/drive/world line, and the dog is penning sheep through 360° perception (4/5 at step 19200 before I killed the test). No retraining required because the gym already trains under `LIDAR_FULL` (360° preset). 126 pytest cases still pass. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-17 02:19:15 +00:00
Johnny Fernandes	7ab69ab0f3	Rename multi-segment functions to two-concept names; polish docstrings Naming pass: rename functions whose third+ segment is redundant or implementation-detail, sticking to the codebase's preferred ``noun_verb`` / ``verb_noun`` two-concept idiom. Renames are atomic across definitions, callers, and tests. is_penned_position → is_penned modulate_speed_near_sheep → modulate_speed mecanum_kinematics_step → mecanum_step policy_forward_mean → forward_mean Two-concept patterns like ``velocity_to_wheels`` / ``detections_from_scan`` / ``make_strombom_predictor`` are left alone — they're idiomatic converters / factories that read as a single concept, and the longer form aids grep-ability. Docstring polish: * ``herding/config.py`` header drops the "previously lived as a module-level literal" historical framing — we ship as a single thing, so the refactor anecdote no longer earns its keep. The usage examples now mention both ``HERDING_WEBOTS`` and ``HERDING_MEC_WEBOTS`` presets. 126 pytest cases still pass. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-17 01:58:15 +00:00
Johnny Fernandes	10c01a938e	Drop versioning vocabulary, polish docstrings, fix world-aware policy resolution User-facing pass after the project was decided to be a single submission with no inner iterations. * Remove every "v1"/"v2"/"versioning" reference from the docs: - README mecanum section trims the "v1 predates the rewrite" prose in favour of a self-contained retrain recipe. - The 3.2 GB `training/runs/v1_clean/` backup directory is deleted. * Refresh control-layer docstrings: - `sheep_tracker.py` header now describes the three actual pipeline stages (consensus, prediction, pen latching) instead of layering the consensus stage on top of a stale "predictive mode" preamble. - `controllers/shepherd_dog/shepherd_dog.py` mode list is up-to-date — adds `universal`, removes outdated single-policy default paths, mentions `HERDING_USE_GT=1` as the perception ablation. * Refresh training command examples: - `training/bc/collect.py` and `training/bc/pretrain.py` usage snippets show the world-suffixed paths the Makefile actually uses; the `--out` arg is now required so old "demos.npz" invocations error loudly instead of silently overwriting. - `training/README.md` rewritten — drops the legacy `runs/bc` diagram, documents the per-(drive, world) pipeline, and adds the mecanum retraining caveat. * Fix policy-directory resolution end-to-end: - `tools/run_webots.sh` now tries `training/runs/{bc,rl}_<drive>_<world>` first, then the drive- only path, then the bare-mode legacy path — matching the actual on-disk layout. Previously it looked for `bc_<drive>` (no world) and silently fell back to `bc`, masking the world selection. - `controllers/shepherd_dog/shepherd_dog.py:_resolve_policy_dir` has the same fix plus a latent NameError unmasked: it referenced `DRIVE_MODE` before that variable was set at module load. The block is restructured so MODE/DRIVE_MODE/WORLD are resolved first, then the function uses them as explicit arguments. 126 pytest cases still pass. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-17 01:50:54 +00:00
Johnny Fernandes	a584a034e9	Project-wide cleanup: gitignore, dead code, stale artifacts, README Repo hygiene pass after a long working session. Files removed: * stage1_train.log — runtime training log (~125 KB), shouldn't have been tracked. * training/bc/demos.npz — orphan default-name demos file from before the world+drive-suffixed naming convention took over; no script references it. * training/runs/bc_dagger{1,2}_differential_field/policy.zip — failed DAgger experiment artifacts. Per `memory/dagger_results.md` the whole DAgger experiment hit 0/5 on Webots transfer; these checkpoints have no consumers. Untracked-but-deleted (no git change) — also cleaned from disk: * Root-level runtime logs (43 .log files, all unused — gitignored now). training/bc/{combined,dagger}.npz (5 huge demo blobs, 2.6 GB reclaimed; not committed). training/bc/v1/ (2.6 GB backup of pre-DAgger demos; reclaimed). * training/runs/at_20260426_/ (orphan timestamped runs; reclaimed). All __pycache__/. Dead code removed: * `herding/control/strombom.py::compute_action_debug` — no callers anywhere in the repo. * `herding/control/sequential.py::compute_action_debug` — same. * `herding/control/universal.py::compute_action_diff` — same. .gitignore extended to cover: * All .log files (training/eval/webots logs are runtime artifacts). training/bc/.npz (re-collectable on demand by `make bc_demos`). training/bc/v1/. * .pytest_cache, .pyc, .claude/. README refreshed: Mecanum + round-world coverage in the headline. * Quick-start updated for DRIVE/WORLD-suffixed Makefile targets, GT-bypass example, and the mecanum-retrain caveat. * Layout reflects the actual current tree (config.py, both protos, both worlds, all tools). * Results table replaced with the Webots end-to-end numbers from the 2026-05-16 sweep (8/8 diff combos + LiDAR/GT comparison). Verification: 126 pytest cases still pass (was 126 going in — no test-coverage regression from the dead-code removal). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-17 01:38:19 +00:00
Johnny Fernandes	3b4c99a6c4	Training pipelines auto-select mecanum-Webots preset * training/bc/collect.py: --use-webots-preset now picks the drive-matched variant. Mecanum drives get HERDING_MEC_WEBOTS (with the Webots-calibrated strafe efficiency and bleed) so the collected demos reflect the imperfect physical mecanum the deployed policy will see. Differential drives still use HERDING_WEBOTS (no behaviour change there). * training/rl/train.py: mecanum fine-tune now unconditionally applies the HERDING_MEC_WEBOTS robot config to the PPO env (the policy must update against the same imperfect kinematics it deploys on). Diff fine-tune unchanged. To retrain a mecanum policy end-to-end against the new proto: python -m training.bc.collect --drive-mode mecanum --world field \ --use-webots-preset \ --out training/bc/demos_mecanum_field_v2.npz python -m training.bc.pretrain --demos training/bc/demos_mecanum_field_v2.npz \ --out training/runs/bc_mecanum_field_v2 ... python -m training.rl.train --bc training/runs/bc_mecanum_field_v2 \ --out training/runs/rl_mecanum_field_v2 \ --drive-mode mecanum --world field --use-webots-preset The same flow for field_round / mecanum/round. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-17 01:12:06 +00:00
Johnny Fernandes	ee77c8606c	Gym mecanum kinematics matching to Webots roller-hinge proto Mecanum proto rewrite in `b3cf990` made the wheels truly omnidirectional in Webots, but with asymmetric slip: forward command produces ~89% of textbook speed while strafe produces only ~38% plus a consistent ~28% backward bleed-through. v1 BC/RL trained on perfect mecanum gym kinematics could not herd the new dynamics. To unblock that: * `mecanum_kinematics_step` gains two parameters that scale the realised motion to match a deployed-platform calibration: - strafe_efficiency ∈ (0, 1] default 1.0 - strafe_to_forward_bleed default 0.0 Forward motion is untouched (textbook X-pattern continues to apply to vx_body); only the lateral channel is scaled and bleed is added. * `RobotConfig` exposes both as drive-config fields with the same pass-through defaults so existing diff-drive code and existing mecanum training pipelines see no behaviour change. * `HERDING_MEC_WEBOTS` preset bakes in the values measured against the current Webots mecanum proto (strafe_efficiency=0.4, strafe_to_forward_bleed=-0.28). Training mecanum BC/RL with this preset produces policies that compensate for the imperfect physical mecanum at deploy. * `HerdingEnv` plumbs `RobotConfig.strafe_` through to `mecanum_kinematics_step` so the preset takes effect. tools/gen_mecanum_wheels.py is added so the proto's 32 roller hinges can be regenerated by editing a single set of constants rather than hand-editing 1500+ lines of VRML. Tests: * 4 new mecanum_kinematics_step tests (default pass-through, strafe scaling, backward bleed, forward unaffected by strafe params). * 3 new RobotConfig tests (defaults, validation, preset shape). * Sanity check: gym strafe with HERDING_MEC_WEBOTS over 100 steps reproduces the Webots calibration to 2 decimal places. 126 unit tests pass (was 120). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-17 01:09:47 +00:00
Johnny Fernandes	b3cf9909a8	Mecanum proto: replace cylinder wheels with physical roller hinges Each wheel is now a hub solid + 8 passive HingeJoint rollers (capsules tilted 45° in body xy plane at the bottom contact point) instead of a single plain Cylinder. The rollers free-spin around their tilt axes so the wheel exhibits mecanum X-pattern behaviour: gym-frame strafe commands now produce body strafe in Webots, where before they produced wrong-direction motion (the plain cylinders behaved as 4- wheel skid-steer). Calibration on flat field, 200 steps each: gym predict webots out err vx=0.5 vy=0 1.33 m/s +x 1.19 m/s +x 10.9% +x 0 m/s +y -0.10 m/s +y ~clean vx=0 vy=0.5 1.33 m/s +y 0.50 m/s +y 62.1% +y 0 m/s +x -0.37 m/s +x noticeable mecanum coupling Strafe is imperfect (-x bleed-through, magnitude under-shoot) but direction is correct and the platform is now omnidirectional. Forward motion is high-fidelity. Tilt signs assigned so diagonal pairs FL+RR and FR+RL share the same body-frame roller orientation (the standard X pattern). Two contact-material names "MecanumWheelA/B" are kept for diagnostic separation; both use the same isotropic Coulomb friction of 2.0 with forceDependentSlip 0.005. tools/run_webots.sh ships the matching contactProperties block on every mecanum launch (re-emitted into the temporary world copy). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-16 21:54:35 +00:00
Johnny Fernandes	1c197e0ff7	Enable consensus tracker by default + round-world Strömbom fix Two changes that together raise diff/round gym success ~52%→88% (BC) and ~68%→88% (RL) without retraining; diff/field stays at 100%. * TrackerConfig.consensus_k default 1 → 3 (radius 0.5 m, max_age 15 frames). The same candidate-promotion mechanism that closed the Webots LiDAR gap also filters gym tracker phantoms — they show up on the round field where sheep run further between detection cycles than GATE_M, so each new position spawns a fresh track while the stale one persists in memory. SheepTracker() called with no tracker_cfg keeps the legacy pass-through behaviour for backwards compatibility. * Strömbom + universal teachers now detect when the natural "behind the flock" drive target leaves the curved boundary and fall back to pushing the flock radially inward toward the centre. Breaks the wall-circling pattern that previously trapped both the analytical baselines and the trained policies. A/B numbers (n_sheep ∈ {1,2,3,5,10}, 5 seeds each, max_steps=15000): diff/field bc: baseline 100% consensus 100% diff/field rl: baseline 100% consensus 100% diff/round bc: baseline 52% consensus 88% diff/round rl: baseline 68% consensus 88% Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-16 21:09:25 +00:00
Johnny Fernandes	03b2df5656	Fix run_webots.sh exit-1 when N=0 (calibration mode) `active=$(grep -c '^Sheep' "$DST")` returns 0 with exit code 1 when no sheep are left in the world, which fires set -e and kills the script before it can launch Webots. Wrap with `\|\| true` so the calibration mode (N=0) can actually run. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-16 20:40:28 +00:00
Johnny Fernandes	2d23289052	Consensus tracker + active scan close Webots 140° LiDAR gap Two deploy-time fixes that take v1 360°-trained BC/RL from 0/n to n/n penned on the canonical 140° LiDAR proto for diff/field: * SheepTracker now supports a consensus stage: new detections start as candidate tracks invisible to get_positions(). A candidate must accumulate consensus_k matches within consensus_radius_m of itself inside a consensus_max_age window to be promoted; otherwise it expires. Real sheep self-confirm within 3 frames (≪0.05 m/step); wall-return cluster centroids jitter beyond 0.3 m as the dog moves and never promote. consensus_k=1 (default) is a no-op so unconfigured callers and HERDING_DEFAULT keep prior behaviour. * HERDING_WEBOTS preset gets consensus_k=3, radius=0.3, max_age=20, plus longer forget_steps=300 and predict_steps=180 so confirmed sheep persist through long FOV-occlusion gaps a narrow 140° cone produces. max_new_tracks_per_step=1 still rate-caps spawn bursts. * shepherd_dog.py BC/RL empty-obs fallback now rotates the desired heading with step_count so the cone actively sweeps the field instead of driving due north into the wall. Verified in headless Webots (HERDING_USE_GT=0, LiDAR only): BC diff/field: 5/5 @ 11698, 10/10 @ 15079 RL diff/field: 5/5 @ 10039, 9/10 @ 18200 (timeout) Strömbom diff/field: 5/5 @ 7528 All previously 0/n. 120 unit tests pass; 9 new consensus tests cover the candidate stage, promotion radius, and one-shot phantom rejection. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-16 20:19:11 +00:00
Johnny Fernandes	876e14e74f	LSTM (RecurrentPPO) experiment + recurrent policy support Adds RecurrentPPO-based training as an alternative to MLP+frame-stack. The LSTM gives the policy unbounded temporal memory, addressing the partial-obs failure mode of the 140° Webots LiDAR (tracker briefly empties when the dog turns; sporadic phantom tracks confuse decisions). * training/rl/train_lstm.py: from-scratch RecurrentPPO trainer (no BC init, no KL term since there's no reference). Uses HERDING_WEBOTS preset so the obs distribution matches deployment. * training/eval.py: auto-detects RecurrentPPO zips, maintains LSTM hidden state across steps, resets between episodes. * controllers/shepherd_dog/policy_loader.py: PolicyHandle supports recurrent policies — state managed inside, reset_recurrent() exposed. Result on diff/field after 3M steps: - Gym (default 360°): 69% avg success across n=1..10 - Gym (HERDING_WEBOTS preset, training env): 2% — penning 3-4/5 but rarely all 5 - Webots LiDAR 140°: 0/5 (same wall as DAgger and v1 policies) Conclusion: architectural changes (LSTM vs MLP) don't close the perception sim-to-real gap. The gym LiDAR sim doesn't faithfully reproduce Webots phantom-track distribution; any policy trained on the gym proxy fails to handle real Webots phantoms regardless of architecture. Closing this gap requires either modeling Webots phantom patterns in the gym sim (multi-day work) or Webots-in-the-loop training (very slow). See memory/lstm_results.md for details. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-16 19:22:32 +00:00
Johnny Fernandes	dd5ac669e5	Webots sim-to-real fixes, DAgger pipeline, 360° proto variant Today's session worked across the full Webots delivery stack — found and fixed a cluster of bugs blocking the BC/RL transfer, then explored training-side mitigations for the residual perception gap. Bug fixes: - Makefile FP_RATE default 2.0 → 0.0: BC demos used fp_rate=0 but RL fine-tune defaulted to fp_rate=2, poisoning the BC obs distribution and stalling PPO at 0% success across 1.46M+ steps. - controllers/{shepherd_dog,sheep}/runtime.ini: Webots was launching controllers under system python3 (no numpy) and they were crashing silently. Pinned to the conda tir env. - herding/config.py HERDING_WEBOTS preset: pen_latch_depth 0.5 → 2.0, max_new_tracks_per_step 3 → 1, static_reject 0.8 → 1.2. Stops phantom FPs near the gate from latching as permanently-penned tracks. - herding/perception/sheep_tracker.py: penned tracks now decay at forget_steps × 8 instead of living forever. Adds get_positions min_freshness filter for deploy-time use. Training/eval matches deployment: - training/bc/collect.py: --dagger-policy flag for DAgger rollouts (policy drives, teacher labels) + --use-webots-preset for matched 140° tracker + DR config. - controllers/shepherd_dog/shepherd_dog.py: scan-fallback (0, 0.6) when BC/RL sees empty sheep_positions — recovers from FOV gaps. Tooling: - tools/dagger_round.sh: one-shot DAgger round (collect + concat + bc). - tools/webots_sweep_gt.sh: full sweep with HERDING_USE_GT=1 for the perception-gap diagnosis matrix. - protos/ShepherdDog360.proto: 360° FOV variant for the FOV-ablation comparison. Canonical proto stays at 140° per project spec. Artifacts: v1 BC/RL policies for all 4 (drive × world) combos trained in clean gym (success: diff/field 90-100%, diff/round 58%, mec/field 60-100%, mec/round 50-100%). DAgger r1/r2 BCs for diff/field show 12%→38% progression on gym HERDING_WEBOTS proxy but did not close to actual Webots LiDAR (0/5 throughout). Next: LSTM policy or learned tracker per the project-state memory. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-16 17:21:02 +00:00
Johnny Fernandes	c61df91950	Checkpoint 10	2026-05-13 23:22:17 +01:00
Johnny Fernandes	aa598fcb83	Checkpoint 10	2026-05-13 23:14:16 +01:00
Johnny Fernandes	0f807003a5	Results from last checkpoint	2026-05-13 20:26:18 +00:00
Johnny Fernandes	683de740af	Checkpoint 9	2026-05-13 13:46:50 +01:00
Johnny Fernandes	be58ad2054	Results from last checkpoinr	2026-05-13 07:49:17 +00:00
Johnny Fernandes	5c2ee4bba5	Checkpoint 8	2026-05-12 22:41:03 +01:00
Johnny Fernandes	a01a5c9cef	Checkpoint 7	2026-05-11 12:21:51 +01:00
Johnny Fernandes	fce0e0c786	Checkpoint 6	2026-05-11 10:35:48 +01:00
Johnny Fernandes	b457155538	Checkpoint 5 - incomplete	2026-05-11 10:35:39 +01:00
Johnny Fernandes	6688325d89	Checkpoint 4	2026-05-11 00:42:52 +01:00
Johnny Fernandes	2a6db038df	Checkpoint 3	2026-05-10 12:46:14 +01:00
Johnny Fernandes	1bb9415414	Checkpoint 2	2026-05-07 22:00:10 +01:00
Johnny Fernandes	90aa3bbcb4	Checkpoint 1	2026-05-07 21:59:58 +01:00
Johnny Fernandes	80a314b9e9	Trying attention method	2026-04-26 22:32:13 +01:00
Johnny Fernandes	a2363d882f	Trying attention method	2026-04-26 22:28:43 +01:00
Johnny Fernandes	57b1735e1a	Mimics webots approach better + debug. Lucky number	2026-04-26 20:36:36 +01:00
Johnny Fernandes	deeae3193e	Mimics webots approach better + debug. Lucky number	2026-04-26 18:55:53 +01:00
Johnny Fernandes	1af7d03ce2	Mimic webots physics	2026-04-26 18:22:26 +01:00
Johnny Fernandes	8110fc3143	Run n3	2026-04-26 16:42:55 +00:00
Johnny Fernandes	ad185b4d7e	Approach v4 simpler version	2026-04-26 17:18:20 +01:00
Johnny Fernandes	27fe6d1bf5	Run v3	2026-04-26 16:01:30 +00:00
Johnny Fernandes	e2883212c5	Approach v3 w/ south penalty fix	2026-04-26 15:26:24 +01:00
Johnny Fernandes	11e13c6980	Approach v3 w/ south penalty	2026-04-26 14:55:13 +01:00
Johnny Fernandes	a561f8a697	Run v2	2026-04-26 13:32:48 +00:00
Johnny Fernandes	a44ddb7b08	Approach refinement	2026-04-26 12:59:04 +01:00
Johnny Fernandes	acf0810425	Test26_1200	2026-04-26 11:04:23 +00:00
Johnny Fernandes	3cfd6b5e81	Approach refinement	2026-04-26 02:55:14 +01:00
Johnny Fernandes	d1aab20322	Approach refinement	2026-04-26 02:19:10 +01:00
Johnny Fernandes	287743709a	Approach refinement	2026-04-26 02:02:25 +01:00
Johnny Fernandes	61f8a7db15	Cleanup and new approach	2026-04-26 01:50:01 +01:00
Johnny Fernandes	b031473758	Behaviour refinement - fence penalty	2026-04-26 01:09:50 +01:00
Johnny Fernandes	6253850620	Behaviour refinement - fence penalty	2026-04-25 23:42:02 +01:00
Johnny Fernandes	6612dbc1ba	Test25_2330	2026-04-25 22:32:06 +00:00

1 2 3

112 Commits