Drop webots_quick target; mecanum BC demos now auto-use HERDING_MEC_WEBOTS

* Remove `webots_quick` Makefile target — `make webots` is the only webots entry point now (it fires the interactive picker). The positional non-interactive path is still available as `bash tools/run_webots.sh N MODE DRIVE WORLD` for scripted use. * Add `WEBOTS_PRESET_FLAG = --use-webots-preset` for mecanum drive and pass it to the `bc.collect` recipe so demos are collected under the gym kinematics that match the physical-roller Webots mecanum. Without this, mecanum BC demos would record textbook X-pattern teacher actions against textbook gym kinematics, and the resulting policy would fail at deployment exactly the same way the current v1 mecanum policies do. * `rl/train.py` already auto-detects mecanum and applies HERDING_MEC_WEBOTS internally (commit 3b4c99a), so the rl recipe doesn't need the flag — a one-line comment in the Makefile makes that intent explicit. Diff drive keeps the existing recipe: no --use-webots-preset, so BC demos collected on HERDING_DEFAULT (360° gym, no FP). This is the regime that produced the current diff/field and diff/round policies that pen 5/5 in Webots LiDAR; retraining under the same regime is the safest reproduction. 126 pytest cases still pass. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-17 10:44:15 +00:00
parent e86fee5ae8
commit 1df84ae4b5
1 changed files with 16 additions and 5 deletions
@@ -146,7 +146,7 @@ MODE             ?= rl
 .PHONY: all bc_demos bc rl rl_fast eval eval_fast eval_all eval_all_fast \
-        test webots webots_quick webots_sweep clean clean_all help \
+        test webots webots_sweep clean clean_all help \
        train_all train_diff_rect train_diff_round \
        train_mec_rect train_mec_round \
        train_all_fast train_diff_rect_fast train_diff_round_fast \
@@ -161,6 +161,17 @@ export HERDING_WORLD = $(WORLD)
 # the build is run under tee / nohup / tmux pipes.
 export PYTHONUNBUFFERED = 1
 # Mecanum needs --use-webots-preset so collect/rl pick up
 # HERDING_MEC_WEBOTS — the gym mecanum kinematics get the strafe
 # efficiency and forward-bleed match against the physical-roller
 # Webots proto. Without this flag the policy trains on textbook
 # X-pattern mecanum and fails on deployment.
 ifeq ($(DRIVE),mecanum)
 WEBOTS_PRESET_FLAG = --use-webots-preset
 else
 WEBOTS_PRESET_FLAG =
 endif
 bc_demos: $(BC_DEMOS)
 $(BC_DEMOS):
 	$(PY) -m training.bc.collect \
@@ -171,7 +182,8 @@ $(BC_DEMOS):
 		--max-steps $(DEMO_MAX_STEPS) \
 		--fp-rate $(FP_RATE) \
 		--action-smooth $(ACTION_SMOOTH_TRAIN) \
-		--wheel-slip-std $(WHEEL_SLIP_STD)
+		--wheel-slip-std $(WHEEL_SLIP_STD) \
 		$(WEBOTS_PRESET_FLAG)
 bc: $(BC_POLICY)
 $(BC_POLICY): $(BC_DEMOS)
@@ -190,6 +202,8 @@ $(RL_POLICY): $(BC_POLICY)
 		--fp-rate $(FP_RATE) \
 		--action-smooth $(ACTION_SMOOTH_TRAIN) \
 		--wheel-slip-std $(WHEEL_SLIP_STD)
 	# (rl/train.py auto-applies HERDING_MEC_WEBOTS when drive=mecanum;
 	# no --use-webots-preset flag needed.)
 eval: $(RL_POLICY)
 	$(PY) -m training.eval --policy $(RL_DIR) \
@@ -223,9 +237,6 @@ test:
 webots:
 	@bash tools/webots_menu.sh
 webots_quick:
 	tools/run_webots.sh $(N) $(MODE) $(DRIVE) $(WORLD)
 # Headless sweep across all modes × worlds × flock sizes.
 # Results are written to webots_sweep.log.
 # Set USE_GT=1 to bypass LiDAR tracker (isolate perception from policy).