3b4c99a6c4
* training/bc/collect.py: --use-webots-preset now picks the
drive-matched variant. Mecanum drives get HERDING_MEC_WEBOTS
(with the Webots-calibrated strafe efficiency and bleed) so the
collected demos reflect the imperfect physical mecanum the
deployed policy will see. Differential drives still use
HERDING_WEBOTS (no behaviour change there).
* training/rl/train.py: mecanum fine-tune now *unconditionally*
applies the HERDING_MEC_WEBOTS robot config to the PPO env (the
policy must update against the same imperfect kinematics it
deploys on). Diff fine-tune unchanged.
To retrain a mecanum policy end-to-end against the new proto:
python -m training.bc.collect --drive-mode mecanum --world field \
--use-webots-preset \
--out training/bc/demos_mecanum_field_v2.npz
python -m training.bc.pretrain --demos training/bc/demos_mecanum_field_v2.npz \
--out training/runs/bc_mecanum_field_v2 ...
python -m training.rl.train --bc training/runs/bc_mecanum_field_v2 \
--out training/runs/rl_mecanum_field_v2 \
--drive-mode mecanum --world field --use-webots-preset
The same flow for field_round / mecanum/round.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>