TIR_PROJ

Author	SHA1	Message	Date
Johnny Fernandes	1df84ae4b5	Drop webots_quick target; mecanum BC demos now auto-use HERDING_MEC_WEBOTS * Remove `webots_quick` Makefile target — `make webots` is the only webots entry point now (it fires the interactive picker). The positional non-interactive path is still available as `bash tools/run_webots.sh N MODE DRIVE WORLD` for scripted use. * Add `WEBOTS_PRESET_FLAG = --use-webots-preset` for mecanum drive and pass it to the `bc.collect` recipe so demos are collected under the gym kinematics that match the physical-roller Webots mecanum. Without this, mecanum BC demos would record textbook X-pattern teacher actions against textbook gym kinematics, and the resulting policy would fail at deployment exactly the same way the current v1 mecanum policies do. * `rl/train.py` already auto-detects mecanum and applies HERDING_MEC_WEBOTS internally (commit `3b4c99a`), so the rl recipe doesn't need the flag — a one-line comment in the Makefile makes that intent explicit. Diff drive keeps the existing recipe: no --use-webots-preset, so BC demos collected on HERDING_DEFAULT (360° gym, no FP). This is the regime that produced the current diff/field and diff/round policies that pen 5/5 in Webots LiDAR; retraining under the same regime is the safest reproduction. 126 pytest cases still pass. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-17 10:44:15 +00:00
Johnny Fernandes	e86fee5ae8	Per-sheep pen-time metrics, seed support, make webots → menu * `controllers/shepherd_dog/shepherd_dog.py` - Tracks the first step at which each sheep crosses the gate; on auto-finish (all sheep penned) prints a `[results]` summary block: mode/drive/world/lidar/dogs/seed line, total simulated time, per-sheep penning order with absolute step + seconds since sim start, and the gate spread between the first and last penning. - Reads `HERDING_SEED` (env / runtime cfg) and seeds the controller's RNG when set. Empty = time-based default = old non-deterministic behaviour. * `controllers/sheep/sheep.py` reads `HERDING_SEED` the same way (loading `herding_runtime.cfg` itself so it works even when Webots strips env vars) and seeds Python's RNG XOR'd with the sheep's name hash, so a fixed seed gives a reproducible flock trajectory without all sheep starting from identical wander state. * `tools/run_webots.sh` writes `HERDING_SEED` into the runtime cfg (empty when unset so existing scripts keep their stochastic behaviour). * `tools/webots_menu.sh` gains a Seed prompt (random / fixed integer); the launch summary box shows the choice next to the perception row. * `Makefile` - `make webots` now fires the interactive picker (replacing the old positional invocation). - `make webots_quick MODE=… DRIVE=… WORLD=… N=…` is the old positional path, kept for batch / scripted use. Smoke-tested: menu renders Mode → Drive → World → LiDAR → Dogs → Sheep → Perception → Seed → Headless prompts and shows the selected Seed value in the launch summary. 126 pytest cases still pass. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-17 10:33:34 +00:00
Johnny Fernandes	dd5ac669e5	Webots sim-to-real fixes, DAgger pipeline, 360° proto variant Today's session worked across the full Webots delivery stack — found and fixed a cluster of bugs blocking the BC/RL transfer, then explored training-side mitigations for the residual perception gap. Bug fixes: - Makefile FP_RATE default 2.0 → 0.0: BC demos used fp_rate=0 but RL fine-tune defaulted to fp_rate=2, poisoning the BC obs distribution and stalling PPO at 0% success across 1.46M+ steps. - controllers/{shepherd_dog,sheep}/runtime.ini: Webots was launching controllers under system python3 (no numpy) and they were crashing silently. Pinned to the conda tir env. - herding/config.py HERDING_WEBOTS preset: pen_latch_depth 0.5 → 2.0, max_new_tracks_per_step 3 → 1, static_reject 0.8 → 1.2. Stops phantom FPs near the gate from latching as permanently-penned tracks. - herding/perception/sheep_tracker.py: penned tracks now decay at forget_steps × 8 instead of living forever. Adds get_positions min_freshness filter for deploy-time use. Training/eval matches deployment: - training/bc/collect.py: --dagger-policy flag for DAgger rollouts (policy drives, teacher labels) + --use-webots-preset for matched 140° tracker + DR config. - controllers/shepherd_dog/shepherd_dog.py: scan-fallback (0, 0.6) when BC/RL sees empty sheep_positions — recovers from FOV gaps. Tooling: - tools/dagger_round.sh: one-shot DAgger round (collect + concat + bc). - tools/webots_sweep_gt.sh: full sweep with HERDING_USE_GT=1 for the perception-gap diagnosis matrix. - protos/ShepherdDog360.proto: 360° FOV variant for the FOV-ablation comparison. Canonical proto stays at 140° per project spec. Artifacts: v1 BC/RL policies for all 4 (drive × world) combos trained in clean gym (success: diff/field 90-100%, diff/round 58%, mec/field 60-100%, mec/round 50-100%). DAgger r1/r2 BCs for diff/field show 12%→38% progression on gym HERDING_WEBOTS proxy but did not close to actual Webots LiDAR (0/5 throughout). Next: LSTM policy or learned tracker per the project-state memory. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-16 17:21:02 +00:00
Johnny Fernandes	c61df91950	Checkpoint 10	2026-05-13 23:22:17 +01:00
Johnny Fernandes	aa598fcb83	Checkpoint 10	2026-05-13 23:14:16 +01:00
Johnny Fernandes	683de740af	Checkpoint 9	2026-05-13 13:46:50 +01:00
Johnny Fernandes	5c2ee4bba5	Checkpoint 8	2026-05-12 22:41:03 +01:00
Johnny Fernandes	a01a5c9cef	Checkpoint 7	2026-05-11 12:21:51 +01:00

8 Commits