TIR_PROJ

Author	SHA1	Message	Date
Johnny Fernandes	0a27ad9a26	Full retrain pipeline + hybrid policy set Ran end-to-end clean retrain + gym eval + 24-cell Webots sweep (tools/full_pipeline.sh). Results: Differential — all 16 cells pen N/N. Updated policies committed. Mecanum — new training stochastically regressed (only 2/8 cells vs the v2 baseline's 4/8). v2 baseline mec policies are RESTORED in this commit (training/runs/{bc,rl}_ mecanum_*) — they remain the deliverable. The retrain pipeline itself is committed for reproducibility (tools/full_pipeline.sh: clean → train_all → eval_all → 24-cell Webots sweep). The v2 mec policies are also backed up locally to _backup_pretrain/mec_v2_baseline/ (gitignored). Verified after restore: bc mec field_round n=10 → 10/10 in 147 s sim rl diff field n=5 → 5/5 in 137 s sim	2026-05-20 08:07:39 +00:00
Johnny Fernandes	07d1ece3d4	Allow strafe_efficiency=1.0 in mec preset test; minor comment cleanup After a deep investigation into the n=5 mecanum sim-to-real gap, all attempted fixes (consensus tightening, wall_reject tightening, static- phantom drop, deploy-time track merge, in-tracker track merge, fp_rate-augmented retrain, max_range cap, 140° mecanum retrain) failed to reliably pen n=5 in Webots without regressing n=10. The phantom problem at 360° + small flock is genuinely hard and out of scope for the deadline; documented in docs/status.md. Result preserved from the previous mecanum work: * 16/16 differential cells pen N/N. * 4/8 mecanum cells (all n=10) pen 10/10 via Supervisor kinematic injection (commit `27c0f65`). * n=5 mecanum is the known gap. Small changes that survived the iteration: * tests/test_config.py — strafe_efficiency=1.0 is now valid (kinematic injection means the gym preset and Webots controller share the formula, so textbook values produce gym-identical body motion). * tools/run_webots.sh — refreshed the LiDAR-variant comment. * training/rl/train.py — comment polish.	2026-05-19 15:57:27 +00:00
Johnny Fernandes	27c0f65722	Mecanum Webots via Supervisor kinematic injection Replace the failing ODE-rolled mecanum chassis dynamics with a Supervisor.setVelocity call that uses the gym mecanum forward kinematics formula directly. Wheel motors still spin (visual); chassis motion comes from the gym model so training and deployment match by construction. Results (seed=42, n=10 sheep): BC + RL mecanum pen 10/10 in both field and field_round. n=5 mecanum cells still 0/5 due to tracker phantoms anchored to wall corners under the 360° LiDAR — documented in docs/status.md as the remaining gap. Cleanup: drop deploy-time hacks (HERDING_HEADING_, HERDING_OMEGA_CLAMP, HERDING_TRACKER_) that were workarounds for the old ODE chaos; revert the proto inertiaMatrix, roller dampingConstant, and reduced motor torque since they no longer carry load; refresh comments around the mecanum config presets.	2026-05-18 22:46:37 +00:00
Johnny Fernandes	e86fee5ae8	Per-sheep pen-time metrics, seed support, make webots → menu * `controllers/shepherd_dog/shepherd_dog.py` - Tracks the first step at which each sheep crosses the gate; on auto-finish (all sheep penned) prints a `[results]` summary block: mode/drive/world/lidar/dogs/seed line, total simulated time, per-sheep penning order with absolute step + seconds since sim start, and the gate spread between the first and last penning. - Reads `HERDING_SEED` (env / runtime cfg) and seeds the controller's RNG when set. Empty = time-based default = old non-deterministic behaviour. * `controllers/sheep/sheep.py` reads `HERDING_SEED` the same way (loading `herding_runtime.cfg` itself so it works even when Webots strips env vars) and seeds Python's RNG XOR'd with the sheep's name hash, so a fixed seed gives a reproducible flock trajectory without all sheep starting from identical wander state. * `tools/run_webots.sh` writes `HERDING_SEED` into the runtime cfg (empty when unset so existing scripts keep their stochastic behaviour). * `tools/webots_menu.sh` gains a Seed prompt (random / fixed integer); the launch summary box shows the choice next to the perception row. * `Makefile` - `make webots` now fires the interactive picker (replacing the old positional invocation). - `make webots_quick MODE=… DRIVE=… WORLD=… N=…` is the old positional path, kept for batch / scripted use. Smoke-tested: menu renders Mode → Drive → World → LiDAR → Dogs → Sheep → Perception → Seed → Headless prompts and shows the selected Seed value in the launch summary. 126 pytest cases still pass. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-17 10:33:34 +00:00
Johnny Fernandes	bdaff6a3e1	Interactive Webots launcher (tools/webots_menu.sh) Single-command picker that prompts for every experimental knob the project supports, then dispatches to `tools/run_webots.sh` with the matching env vars. The banner reminds the user that the interpreter path lives in `tools/setup_env.sh` (or `$HERDING_PYTHON`) so the "this conda path won't exist on another machine" trap is hard to fall into. Prompts, in order: Mode : bc \| rl \| strombom \| sequential \| universal Drive : differential \| mecanum World : field \| field_round LiDAR FOV : 140° \| 360° (skipped when drive=mecanum) Dogs : 1 \| 2 (axis-split — only ask leak if 2) Sheep : 1..10 Perception : LiDAR \| GT bypass Headless : no (windowed) \| yes (xvfb-run + fast mode) Each prompt has a default marked with `*`; pressing Enter through the whole flow runs the canonical demo (BC / diff / field / 140° / 1 dog / 5 sheep / LiDAR / windowed). The configuration is summarised in a boxed block before the final "Launch? [Y/n]" confirm. README quick-start now lists `tools/webots_menu.sh` as the recommended starting point and shows the env-var-prefixed launcher invocations (HERDING_LIDAR=360, HERDING_NDOGS=2, HERDING_USE_GT=1) for non-interactive use. 126 pytest cases still pass. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-17 02:49:06 +00:00
Johnny Fernandes	eadeeafb32	Dual-shepherd soft axis-split (HERDING_AXIS_LEAK) The strict 100/0 axis mask reaches drive standoff and deadlocks because each dog has only one degree of freedom left to push the flock. Soften the mask: each dog leads its assigned axis (full gain) and contributes ``HERDING_AXIS_LEAK`` on the other axis. ``0.0`` is the old strict behaviour; ``1.0`` is no mask (both dogs run full policy, role-redundant). Default ``0.3`` breaks the deadlock while preserving the "one dog per axis" coordination story. Implementation: * `controllers/shepherd_dog/shepherd_dog.py` reads `HERDING_AXIS_LEAK` from env / runtime cfg (clamped to [0, 1]), prints it next to the axis tag, and multiplies the off-axis velocity component by it instead of zeroing. * `tools/run_webots.sh` writes `HERDING_AXIS_LEAK` into `herding_runtime.cfg` so Webots-stripped controller subprocesses still see it; defaults to 0.3 when unset. Webots smoke test (HERDING_NDOGS=2, HERDING_AXIS_LEAK=0.3, strombom, diff/field, 5 sheep, LiDAR perception, no GT): 5/5 penned at step 13204, vs the strict 100/0 mask which timed out at 0/5. Penning trail 1/5 → 2/5 → 4/5 → 5/5 between steps 6200 and 13400 — slower than single-dog (Strömbom diff/field n=5: 7528) as expected since the work is split, but the coordination demonstrably succeeds. This gives the writeup a clean three-row ablation: α=0.0 (strict) → deadlock, 0/5 α=0.3 (default) → 5/5 @ 13204 α=1.0 (no mask) → both dogs run full policy (single-dog baseline applied twice; no axis story) 126 pytest cases still pass. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-17 02:43:40 +00:00
Johnny Fernandes	cfbf4a0267	Dual-shepherd axis-split (HERDING_NDOGS=2) The launcher can now spawn two `ShepherdDog` robots, each masked to a single axis of motion, so the herding workload is split orthogonally. Mechanic: * `HERDING_NDOGS=2` (default 1) tells `tools/run_webots.sh` to replace the single-dog node in the generated test world with two copies: - `ShepherdDogX` at (-4, -10), `customData "axis=x"` - `ShepherdDogY` at (+4, -10), `customData "axis=y"` Each spawn position sits south of the field interior so the pair doesn't collide with starting sheep. * `controllers/shepherd_dog/shepherd_dog.py` reads `getCustomData()` at startup; when `axis=x\|y` it zeroes the off-axis component of every action after speed modulation and before EMA smoothing. With `customData` empty the controller behaves identically to single-dog mode, so all existing launches are unaffected. * The dog's emitter line now carries the robot's name (`dog:ShepherdDogX:x:y`), and `controllers/sheep/sheep.py` keeps a `dogs` dict keyed by name, picking the closest one each step for its flee target. Single-dog runs still use the legacy two-field `dog:x:y` format thanks to a length check. * `HERDING_NDOGS` is written into `herding_runtime.cfg` and exported to subprocesses so future tooling can read it. Verified behaviour in Webots smoke tests (HERDING_NDOGS=2, strombom, diff/field, 5 sheep): both dogs spawn with the expected names and axis tags, the dual-dog status print appears, each dog acts only on its assigned axis early in the trial, and the masking is internally consistent. The pair stalls before penning under pure axis-split because each dog reaches its drive standoff and then has only one degree of freedom — useful research finding for the write-up; coordination strategy (shared CoM, role-switching, etc.) is future work. 126 pytest cases still pass. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-17 02:35:38 +00:00
Johnny Fernandes	d00da52c3c	Portable Python env + 360° LiDAR ablation flag Two small features. (1) Portable interpreter * `tools/setup_env.sh` exports HERDING_PYTHON (default points to the project's conda env; override in your shell to retarget). * Both `controllers//runtime.ini` files now use Webots' env-var expansion: `COMMAND = $(HERDING_PYTHON)` so the Webots-launched controllers pick up the same interpreter as the bash scripts. `tools/run_webots.sh`, `tools/webots_sweep{,_gt}.sh` and `tools/calibrate_mecanum.sh` all source `setup_env.sh` at the top instead of hard-coding `/home/jalf/miniconda3/envs/tir/bin`. The hard-coded conda path is now exactly one line in `setup_env.sh`'s fallback default — a single place to edit on a new machine, or override-once via `export HERDING_PYTHON=...`. (2) 360° LiDAR FOV ablation * New `LIDAR_WEBOTS_360` preset matches the existing `protos/ShepherdDog360.proto` (360 rays / 2π FOV / 15 m range). * `tools/run_webots.sh` reads `HERDING_LIDAR=140\|360` and swaps the diff-drive proto accordingly (mecanum keeps 140° — the ShepherdDogMecanum proto has its own LiDAR section). The variant is written into `herding_runtime.cfg` so the controller can read it even when Webots strips env vars. * `controllers/shepherd_dog/shepherd_dog.py` picks the matching `lidar_cfg` (`HERDING_WEBOTS.lidar` for 140°, `LIDAR_WEBOTS_360` otherwise) and feeds it to `detections_from_scan` so the perception pipeline interprets ray angles + max range correctly. Smoke test: `HERDING_LIDAR=360 tools/run_webots.sh 5 strombom differential field` launches with `ShepherdDog360.proto`, the controller logs the new mode/drive/world line, and the dog is penning sheep through 360° perception (4/5 at step 19200 before I killed the test). No retraining required because the gym already trains under `LIDAR_FULL` (360° preset). 126 pytest cases still pass. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-17 02:19:15 +00:00
Johnny Fernandes	10c01a938e	Drop versioning vocabulary, polish docstrings, fix world-aware policy resolution User-facing pass after the project was decided to be a single submission with no inner iterations. * Remove every "v1"/"v2"/"versioning" reference from the docs: - README mecanum section trims the "v1 predates the rewrite" prose in favour of a self-contained retrain recipe. - The 3.2 GB `training/runs/v1_clean/` backup directory is deleted. * Refresh control-layer docstrings: - `sheep_tracker.py` header now describes the three actual pipeline stages (consensus, prediction, pen latching) instead of layering the consensus stage on top of a stale "predictive mode" preamble. - `controllers/shepherd_dog/shepherd_dog.py` mode list is up-to-date — adds `universal`, removes outdated single-policy default paths, mentions `HERDING_USE_GT=1` as the perception ablation. * Refresh training command examples: - `training/bc/collect.py` and `training/bc/pretrain.py` usage snippets show the world-suffixed paths the Makefile actually uses; the `--out` arg is now required so old "demos.npz" invocations error loudly instead of silently overwriting. - `training/README.md` rewritten — drops the legacy `runs/bc` diagram, documents the per-(drive, world) pipeline, and adds the mecanum retraining caveat. * Fix policy-directory resolution end-to-end: - `tools/run_webots.sh` now tries `training/runs/{bc,rl}_<drive>_<world>` first, then the drive- only path, then the bare-mode legacy path — matching the actual on-disk layout. Previously it looked for `bc_<drive>` (no world) and silently fell back to `bc`, masking the world selection. - `controllers/shepherd_dog/shepherd_dog.py:_resolve_policy_dir` has the same fix plus a latent NameError unmasked: it referenced `DRIVE_MODE` before that variable was set at module load. The block is restructured so MODE/DRIVE_MODE/WORLD are resolved first, then the function uses them as explicit arguments. 126 pytest cases still pass. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-17 01:50:54 +00:00
Johnny Fernandes	ee77c8606c	Gym mecanum kinematics matching to Webots roller-hinge proto Mecanum proto rewrite in `b3cf990` made the wheels truly omnidirectional in Webots, but with asymmetric slip: forward command produces ~89% of textbook speed while strafe produces only ~38% plus a consistent ~28% backward bleed-through. v1 BC/RL trained on perfect mecanum gym kinematics could not herd the new dynamics. To unblock that: * `mecanum_kinematics_step` gains two parameters that scale the realised motion to match a deployed-platform calibration: - strafe_efficiency ∈ (0, 1] default 1.0 - strafe_to_forward_bleed default 0.0 Forward motion is untouched (textbook X-pattern continues to apply to vx_body); only the lateral channel is scaled and bleed is added. * `RobotConfig` exposes both as drive-config fields with the same pass-through defaults so existing diff-drive code and existing mecanum training pipelines see no behaviour change. * `HERDING_MEC_WEBOTS` preset bakes in the values measured against the current Webots mecanum proto (strafe_efficiency=0.4, strafe_to_forward_bleed=-0.28). Training mecanum BC/RL with this preset produces policies that compensate for the imperfect physical mecanum at deploy. * `HerdingEnv` plumbs `RobotConfig.strafe_` through to `mecanum_kinematics_step` so the preset takes effect. tools/gen_mecanum_wheels.py is added so the proto's 32 roller hinges can be regenerated by editing a single set of constants rather than hand-editing 1500+ lines of VRML. Tests: * 4 new mecanum_kinematics_step tests (default pass-through, strafe scaling, backward bleed, forward unaffected by strafe params). * 3 new RobotConfig tests (defaults, validation, preset shape). * Sanity check: gym strafe with HERDING_MEC_WEBOTS over 100 steps reproduces the Webots calibration to 2 decimal places. 126 unit tests pass (was 120). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-17 01:09:47 +00:00
Johnny Fernandes	b3cf9909a8	Mecanum proto: replace cylinder wheels with physical roller hinges Each wheel is now a hub solid + 8 passive HingeJoint rollers (capsules tilted 45° in body xy plane at the bottom contact point) instead of a single plain Cylinder. The rollers free-spin around their tilt axes so the wheel exhibits mecanum X-pattern behaviour: gym-frame strafe commands now produce body strafe in Webots, where before they produced wrong-direction motion (the plain cylinders behaved as 4- wheel skid-steer). Calibration on flat field, 200 steps each: gym predict webots out err vx=0.5 vy=0 1.33 m/s +x 1.19 m/s +x 10.9% +x 0 m/s +y -0.10 m/s +y ~clean vx=0 vy=0.5 1.33 m/s +y 0.50 m/s +y 62.1% +y 0 m/s +x -0.37 m/s +x noticeable mecanum coupling Strafe is imperfect (-x bleed-through, magnitude under-shoot) but direction is correct and the platform is now omnidirectional. Forward motion is high-fidelity. Tilt signs assigned so diagonal pairs FL+RR and FR+RL share the same body-frame roller orientation (the standard X pattern). Two contact-material names "MecanumWheelA/B" are kept for diagnostic separation; both use the same isotropic Coulomb friction of 2.0 with forceDependentSlip 0.005. tools/run_webots.sh ships the matching contactProperties block on every mecanum launch (re-emitted into the temporary world copy). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-16 21:54:35 +00:00
Johnny Fernandes	03b2df5656	Fix run_webots.sh exit-1 when N=0 (calibration mode) `active=$(grep -c '^Sheep' "$DST")` returns 0 with exit code 1 when no sheep are left in the world, which fires set -e and kills the script before it can launch Webots. Wrap with `\|\| true` so the calibration mode (N=0) can actually run. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-16 20:40:28 +00:00
Johnny Fernandes	dd5ac669e5	Webots sim-to-real fixes, DAgger pipeline, 360° proto variant Today's session worked across the full Webots delivery stack — found and fixed a cluster of bugs blocking the BC/RL transfer, then explored training-side mitigations for the residual perception gap. Bug fixes: - Makefile FP_RATE default 2.0 → 0.0: BC demos used fp_rate=0 but RL fine-tune defaulted to fp_rate=2, poisoning the BC obs distribution and stalling PPO at 0% success across 1.46M+ steps. - controllers/{shepherd_dog,sheep}/runtime.ini: Webots was launching controllers under system python3 (no numpy) and they were crashing silently. Pinned to the conda tir env. - herding/config.py HERDING_WEBOTS preset: pen_latch_depth 0.5 → 2.0, max_new_tracks_per_step 3 → 1, static_reject 0.8 → 1.2. Stops phantom FPs near the gate from latching as permanently-penned tracks. - herding/perception/sheep_tracker.py: penned tracks now decay at forget_steps × 8 instead of living forever. Adds get_positions min_freshness filter for deploy-time use. Training/eval matches deployment: - training/bc/collect.py: --dagger-policy flag for DAgger rollouts (policy drives, teacher labels) + --use-webots-preset for matched 140° tracker + DR config. - controllers/shepherd_dog/shepherd_dog.py: scan-fallback (0, 0.6) when BC/RL sees empty sheep_positions — recovers from FOV gaps. Tooling: - tools/dagger_round.sh: one-shot DAgger round (collect + concat + bc). - tools/webots_sweep_gt.sh: full sweep with HERDING_USE_GT=1 for the perception-gap diagnosis matrix. - protos/ShepherdDog360.proto: 360° FOV variant for the FOV-ablation comparison. Canonical proto stays at 140° per project spec. Artifacts: v1 BC/RL policies for all 4 (drive × world) combos trained in clean gym (success: diff/field 90-100%, diff/round 58%, mec/field 60-100%, mec/round 50-100%). DAgger r1/r2 BCs for diff/field show 12%→38% progression on gym HERDING_WEBOTS proxy but did not close to actual Webots LiDAR (0/5 throughout). Next: LSTM policy or learned tracker per the project-state memory. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-16 17:21:02 +00:00
Johnny Fernandes	5c2ee4bba5	Checkpoint 8	2026-05-12 22:41:03 +01:00
Johnny Fernandes	a01a5c9cef	Checkpoint 7	2026-05-11 12:21:51 +01:00
Johnny Fernandes	fce0e0c786	Checkpoint 6	2026-05-11 10:35:48 +01:00
Johnny Fernandes	b457155538	Checkpoint 5 - incomplete	2026-05-11 10:35:39 +01:00
Johnny Fernandes	6688325d89	Checkpoint 4	2026-05-11 00:42:52 +01:00
Johnny Fernandes	2a6db038df	Checkpoint 3	2026-05-10 12:46:14 +01:00
Johnny Fernandes	1bb9415414	Checkpoint 2	2026-05-07 22:00:10 +01:00
Johnny Fernandes	f256e99a76	Styling and sheep behaviour	2026-04-22 21:01:42 +01:00

21 Commits