TIR_PROJ

Author	SHA1	Message	Date
Johnny Fernandes	3b4c99a6c4	Training pipelines auto-select mecanum-Webots preset * training/bc/collect.py: --use-webots-preset now picks the drive-matched variant. Mecanum drives get HERDING_MEC_WEBOTS (with the Webots-calibrated strafe efficiency and bleed) so the collected demos reflect the imperfect physical mecanum the deployed policy will see. Differential drives still use HERDING_WEBOTS (no behaviour change there). * training/rl/train.py: mecanum fine-tune now unconditionally applies the HERDING_MEC_WEBOTS robot config to the PPO env (the policy must update against the same imperfect kinematics it deploys on). Diff fine-tune unchanged. To retrain a mecanum policy end-to-end against the new proto: python -m training.bc.collect --drive-mode mecanum --world field \ --use-webots-preset \ --out training/bc/demos_mecanum_field_v2.npz python -m training.bc.pretrain --demos training/bc/demos_mecanum_field_v2.npz \ --out training/runs/bc_mecanum_field_v2 ... python -m training.rl.train --bc training/runs/bc_mecanum_field_v2 \ --out training/runs/rl_mecanum_field_v2 \ --drive-mode mecanum --world field --use-webots-preset The same flow for field_round / mecanum/round. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-17 01:12:06 +00:00
Johnny Fernandes	ee77c8606c	Gym mecanum kinematics matching to Webots roller-hinge proto Mecanum proto rewrite in `b3cf990` made the wheels truly omnidirectional in Webots, but with asymmetric slip: forward command produces ~89% of textbook speed while strafe produces only ~38% plus a consistent ~28% backward bleed-through. v1 BC/RL trained on perfect mecanum gym kinematics could not herd the new dynamics. To unblock that: * `mecanum_kinematics_step` gains two parameters that scale the realised motion to match a deployed-platform calibration: - strafe_efficiency ∈ (0, 1] default 1.0 - strafe_to_forward_bleed default 0.0 Forward motion is untouched (textbook X-pattern continues to apply to vx_body); only the lateral channel is scaled and bleed is added. * `RobotConfig` exposes both as drive-config fields with the same pass-through defaults so existing diff-drive code and existing mecanum training pipelines see no behaviour change. * `HERDING_MEC_WEBOTS` preset bakes in the values measured against the current Webots mecanum proto (strafe_efficiency=0.4, strafe_to_forward_bleed=-0.28). Training mecanum BC/RL with this preset produces policies that compensate for the imperfect physical mecanum at deploy. * `HerdingEnv` plumbs `RobotConfig.strafe_` through to `mecanum_kinematics_step` so the preset takes effect. tools/gen_mecanum_wheels.py is added so the proto's 32 roller hinges can be regenerated by editing a single set of constants rather than hand-editing 1500+ lines of VRML. Tests: * 4 new mecanum_kinematics_step tests (default pass-through, strafe scaling, backward bleed, forward unaffected by strafe params). * 3 new RobotConfig tests (defaults, validation, preset shape). * Sanity check: gym strafe with HERDING_MEC_WEBOTS over 100 steps reproduces the Webots calibration to 2 decimal places. 126 unit tests pass (was 120). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-17 01:09:47 +00:00
Johnny Fernandes	876e14e74f	LSTM (RecurrentPPO) experiment + recurrent policy support Adds RecurrentPPO-based training as an alternative to MLP+frame-stack. The LSTM gives the policy unbounded temporal memory, addressing the partial-obs failure mode of the 140° Webots LiDAR (tracker briefly empties when the dog turns; sporadic phantom tracks confuse decisions). * training/rl/train_lstm.py: from-scratch RecurrentPPO trainer (no BC init, no KL term since there's no reference). Uses HERDING_WEBOTS preset so the obs distribution matches deployment. * training/eval.py: auto-detects RecurrentPPO zips, maintains LSTM hidden state across steps, resets between episodes. * controllers/shepherd_dog/policy_loader.py: PolicyHandle supports recurrent policies — state managed inside, reset_recurrent() exposed. Result on diff/field after 3M steps: - Gym (default 360°): 69% avg success across n=1..10 - Gym (HERDING_WEBOTS preset, training env): 2% — penning 3-4/5 but rarely all 5 - Webots LiDAR 140°: 0/5 (same wall as DAgger and v1 policies) Conclusion: architectural changes (LSTM vs MLP) don't close the perception sim-to-real gap. The gym LiDAR sim doesn't faithfully reproduce Webots phantom-track distribution; any policy trained on the gym proxy fails to handle real Webots phantoms regardless of architecture. Closing this gap requires either modeling Webots phantom patterns in the gym sim (multi-day work) or Webots-in-the-loop training (very slow). See memory/lstm_results.md for details. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-16 19:22:32 +00:00
Johnny Fernandes	dd5ac669e5	Webots sim-to-real fixes, DAgger pipeline, 360° proto variant Today's session worked across the full Webots delivery stack — found and fixed a cluster of bugs blocking the BC/RL transfer, then explored training-side mitigations for the residual perception gap. Bug fixes: - Makefile FP_RATE default 2.0 → 0.0: BC demos used fp_rate=0 but RL fine-tune defaulted to fp_rate=2, poisoning the BC obs distribution and stalling PPO at 0% success across 1.46M+ steps. - controllers/{shepherd_dog,sheep}/runtime.ini: Webots was launching controllers under system python3 (no numpy) and they were crashing silently. Pinned to the conda tir env. - herding/config.py HERDING_WEBOTS preset: pen_latch_depth 0.5 → 2.0, max_new_tracks_per_step 3 → 1, static_reject 0.8 → 1.2. Stops phantom FPs near the gate from latching as permanently-penned tracks. - herding/perception/sheep_tracker.py: penned tracks now decay at forget_steps × 8 instead of living forever. Adds get_positions min_freshness filter for deploy-time use. Training/eval matches deployment: - training/bc/collect.py: --dagger-policy flag for DAgger rollouts (policy drives, teacher labels) + --use-webots-preset for matched 140° tracker + DR config. - controllers/shepherd_dog/shepherd_dog.py: scan-fallback (0, 0.6) when BC/RL sees empty sheep_positions — recovers from FOV gaps. Tooling: - tools/dagger_round.sh: one-shot DAgger round (collect + concat + bc). - tools/webots_sweep_gt.sh: full sweep with HERDING_USE_GT=1 for the perception-gap diagnosis matrix. - protos/ShepherdDog360.proto: 360° FOV variant for the FOV-ablation comparison. Canonical proto stays at 140° per project spec. Artifacts: v1 BC/RL policies for all 4 (drive × world) combos trained in clean gym (success: diff/field 90-100%, diff/round 58%, mec/field 60-100%, mec/round 50-100%). DAgger r1/r2 BCs for diff/field show 12%→38% progression on gym HERDING_WEBOTS proxy but did not close to actual Webots LiDAR (0/5 throughout). Next: LSTM policy or learned tracker per the project-state memory. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-16 17:21:02 +00:00
Johnny Fernandes	0f807003a5	Results from last checkpoint	2026-05-13 20:26:18 +00:00
Johnny Fernandes	be58ad2054	Results from last checkpoinr	2026-05-13 07:49:17 +00:00
Johnny Fernandes	5c2ee4bba5	Checkpoint 8	2026-05-12 22:41:03 +01:00
Johnny Fernandes	a01a5c9cef	Checkpoint 7	2026-05-11 12:21:51 +01:00
Johnny Fernandes	fce0e0c786	Checkpoint 6	2026-05-11 10:35:48 +01:00
Johnny Fernandes	b457155538	Checkpoint 5 - incomplete	2026-05-11 10:35:39 +01:00
Johnny Fernandes	6688325d89	Checkpoint 4	2026-05-11 00:42:52 +01:00
Johnny Fernandes	2a6db038df	Checkpoint 3	2026-05-10 12:46:14 +01:00
Johnny Fernandes	1bb9415414	Checkpoint 2	2026-05-07 22:00:10 +01:00
Johnny Fernandes	80a314b9e9	Trying attention method	2026-04-26 22:32:13 +01:00
Johnny Fernandes	a2363d882f	Trying attention method	2026-04-26 22:28:43 +01:00
Johnny Fernandes	57b1735e1a	Mimics webots approach better + debug. Lucky number	2026-04-26 20:36:36 +01:00
Johnny Fernandes	deeae3193e	Mimics webots approach better + debug. Lucky number	2026-04-26 18:55:53 +01:00
Johnny Fernandes	1af7d03ce2	Mimic webots physics	2026-04-26 18:22:26 +01:00
Johnny Fernandes	8110fc3143	Run n3	2026-04-26 16:42:55 +00:00
Johnny Fernandes	27fe6d1bf5	Run v3	2026-04-26 16:01:30 +00:00
Johnny Fernandes	e2883212c5	Approach v3 w/ south penalty fix	2026-04-26 15:26:24 +01:00
Johnny Fernandes	11e13c6980	Approach v3 w/ south penalty	2026-04-26 14:55:13 +01:00
Johnny Fernandes	a561f8a697	Run v2	2026-04-26 13:32:48 +00:00
Johnny Fernandes	a44ddb7b08	Approach refinement	2026-04-26 12:59:04 +01:00
Johnny Fernandes	acf0810425	Test26_1200	2026-04-26 11:04:23 +00:00
Johnny Fernandes	3cfd6b5e81	Approach refinement	2026-04-26 02:55:14 +01:00
Johnny Fernandes	d1aab20322	Approach refinement	2026-04-26 02:19:10 +01:00
Johnny Fernandes	287743709a	Approach refinement	2026-04-26 02:02:25 +01:00
Johnny Fernandes	61f8a7db15	Cleanup and new approach	2026-04-26 01:50:01 +01:00
Johnny Fernandes	b031473758	Behaviour refinement - fence penalty	2026-04-26 01:09:50 +01:00
Johnny Fernandes	6253850620	Behaviour refinement - fence penalty	2026-04-25 23:42:02 +01:00
Johnny Fernandes	6612dbc1ba	Test25_2330	2026-04-25 22:32:06 +00:00
Johnny Fernandes	7b87908410	Behaviour refinement	2026-04-25 21:35:23 +01:00
Johnny Fernandes	e302c76886	Test25_2025	2026-04-25 19:25:39 +00:00
Johnny Fernandes	841f5fa520	Test25_2000	2026-04-25 19:17:40 +00:00
Johnny Fernandes	7bfb7d3aae	Sheep training flock _ improver	2026-04-25 18:46:41 +01:00
Johnny Fernandes	5005128c07	Test25_1820	2026-04-25 17:19:02 +00:00
Johnny Fernandes	16878c5a0b	Sheep training flock _ improver	2026-04-25 18:02:56 +01:00
Johnny Fernandes	75d030cb49	Test25_1800	2026-04-25 17:00:19 +00:00
Johnny Fernandes	cc6d72e472	Sheep training flock _ improver	2026-04-25 17:07:03 +01:00
Johnny Fernandes	3a5decb185	Test25_1700	2026-04-25 16:02:10 +00:00
Johnny Fernandes	75c5b7c014	Sheep training flock _ improver	2026-04-25 16:28:15 +01:00
Johnny Fernandes	4350c7d320	Test25_1600	2026-04-25 15:06:06 +00:00
Johnny Fernandes	cd7e62b1b2	Sheep training flock _ improver	2026-04-25 13:39:49 +01:00
Johnny Fernandes	9bbef28515	Sheep training flock _ improver	2026-04-25 13:30:37 +01:00
Johnny Fernandes	438fa1be1d	Sheep training flock _ improver	2026-04-25 13:24:52 +01:00
Johnny Fernandes	e7c1d82f5c	Test25_1315	2026-04-25 12:14:36 +00:00
Johnny Fernandes	f889dc78cc	Sheep training flock _ improver	2026-04-25 12:50:06 +01:00
Johnny Fernandes	19bfac9bd9	Test25_1245	2026-04-25 11:47:37 +00:00
Johnny Fernandes	02b20fbdb4	Sheep training flock _ improver	2026-04-25 12:20:42 +01:00

1 2

91 Commits