Webots sim-to-real fixes, DAgger pipeline, 360° proto variant
Today's session worked across the full Webots delivery stack — found and
fixed a cluster of bugs blocking the BC/RL transfer, then explored
training-side mitigations for the residual perception gap.
Bug fixes:
- Makefile FP_RATE default 2.0 → 0.0: BC demos used fp_rate=0 but RL
fine-tune defaulted to fp_rate=2, poisoning the BC obs distribution
and stalling PPO at 0% success across 1.46M+ steps.
- controllers/{shepherd_dog,sheep}/runtime.ini: Webots was launching
controllers under system python3 (no numpy) and they were crashing
silently. Pinned to the conda tir env.
- herding/config.py HERDING_WEBOTS preset: pen_latch_depth 0.5 → 2.0,
max_new_tracks_per_step 3 → 1, static_reject 0.8 → 1.2. Stops phantom
FPs near the gate from latching as permanently-penned tracks.
- herding/perception/sheep_tracker.py: penned tracks now decay at
forget_steps × 8 instead of living forever. Adds get_positions
min_freshness filter for deploy-time use.
Training/eval matches deployment:
- training/bc/collect.py: --dagger-policy flag for DAgger rollouts
(policy drives, teacher labels) + --use-webots-preset for matched
140° tracker + DR config.
- controllers/shepherd_dog/shepherd_dog.py: scan-fallback (0, 0.6) when
BC/RL sees empty sheep_positions — recovers from FOV gaps.
Tooling:
- tools/dagger_round.sh: one-shot DAgger round (collect + concat + bc).
- tools/webots_sweep_gt.sh: full sweep with HERDING_USE_GT=1 for the
perception-gap diagnosis matrix.
- protos/ShepherdDog360.proto: 360° FOV variant for the FOV-ablation
comparison. Canonical proto stays at 140° per project spec.
Artifacts: v1 BC/RL policies for all 4 (drive × world) combos trained
in clean gym (success: diff/field 90-100%, diff/round 58%, mec/field
60-100%, mec/round 50-100%). DAgger r1/r2 BCs for diff/field show
12%→38% progression on gym HERDING_WEBOTS proxy but did not close
to actual Webots LiDAR (0/5 throughout). Next: LSTM policy or
learned tracker per the project-state memory.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This commit is contained in:
+31
-20
@@ -23,22 +23,9 @@ import argparse
|
||||
import os
|
||||
from pathlib import Path
|
||||
|
||||
# Early CLI pre-parse for --world so geometry is configured before any
|
||||
# herding.* / training.* import binds geometry constants. Matches the
|
||||
# pattern used by training.bc.collect and training.eval.
|
||||
_pre_argv = [a for a in os.sys.argv[1:]]
|
||||
_pre_world = None
|
||||
for i, a in enumerate(_pre_argv):
|
||||
if a == "--world" and i + 1 < len(_pre_argv):
|
||||
_pre_world = _pre_argv[i + 1]
|
||||
break
|
||||
if a.startswith("--world="):
|
||||
_pre_world = a.split("=", 1)[1]
|
||||
break
|
||||
if _pre_world is not None:
|
||||
from herding.world.geometry import configure as _geo_configure
|
||||
_geo_configure(_pre_world)
|
||||
os.environ["HERDING_WORLD"] = _pre_world
|
||||
# Configure field geometry before other herding imports read it at module level.
|
||||
from herding.world.geometry import configure_from_args as _configure_from_args
|
||||
_configure_from_args()
|
||||
|
||||
import numpy as np
|
||||
import torch as th
|
||||
@@ -59,11 +46,12 @@ from training.herding_env import HerdingEnv
|
||||
def _make_env(rank: int, seed: int, frame_stack: int,
|
||||
drive_mode: str = "differential",
|
||||
difficulty: float = 1.0,
|
||||
max_n_sheep: int = 10):
|
||||
max_n_sheep: int = 10,
|
||||
herding_cfg=None):
|
||||
def _thunk():
|
||||
env = HerdingEnv(seed=seed + rank, frame_stack=frame_stack,
|
||||
drive_mode=drive_mode, difficulty=difficulty,
|
||||
max_n_sheep=max_n_sheep)
|
||||
max_n_sheep=max_n_sheep, herding_cfg=herding_cfg)
|
||||
env = Monitor(env, info_keywords=("is_success", "n_sheep", "n_penned"))
|
||||
return env
|
||||
return _thunk
|
||||
@@ -241,6 +229,13 @@ def main() -> None:
|
||||
choices=["field", "field_round"],
|
||||
help="World shape. If not set, uses HERDING_WORLD "
|
||||
"env var or defaults to 'field'.")
|
||||
# Domain randomisation
|
||||
parser.add_argument("--fp-rate", type=float, default=0.0,
|
||||
help="Mean false-positive detections per step (Poisson λ).")
|
||||
parser.add_argument("--action-smooth", type=float, default=0.0,
|
||||
help="EMA on dog actions (0=none, 0.55=Webots match).")
|
||||
parser.add_argument("--wheel-slip-std", type=float, default=0.0,
|
||||
help="Gaussian wheel-speed noise std (rad/s).")
|
||||
args = parser.parse_args()
|
||||
# --world was already honoured in the early pre-parse above; here we
|
||||
# just sanity-check that the final argparse view agrees.
|
||||
@@ -280,15 +275,31 @@ def main() -> None:
|
||||
drive_mode = "differential"
|
||||
print(f"[rl] drive_mode={drive_mode} (BC action_dim={bc_action_dim})")
|
||||
|
||||
from herding.config import HerdingConfig, DomainRandomConfig, RobotConfig
|
||||
herding_cfg = None
|
||||
if args.fp_rate > 0.0 or args.action_smooth > 0.0 or args.wheel_slip_std > 0.0:
|
||||
herding_cfg = HerdingConfig(
|
||||
domain_random=DomainRandomConfig(
|
||||
fp_rate=args.fp_rate,
|
||||
wheel_slip_std=args.wheel_slip_std,
|
||||
),
|
||||
robot=RobotConfig(action_smooth=args.action_smooth),
|
||||
)
|
||||
print(f"[rl] domain-random: fp_rate={args.fp_rate} "
|
||||
f"action_smooth={args.action_smooth} "
|
||||
f"wheel_slip_std={args.wheel_slip_std}")
|
||||
|
||||
env_fns = [_make_env(i, args.seed, frame_stack, drive_mode,
|
||||
difficulty=args.difficulty,
|
||||
max_n_sheep=args.max_n_sheep)
|
||||
max_n_sheep=args.max_n_sheep,
|
||||
herding_cfg=herding_cfg)
|
||||
for i in range(args.n_envs)]
|
||||
venv = SubprocVecEnv(env_fns) if args.n_envs > 1 else DummyVecEnv(env_fns)
|
||||
eval_venv = DummyVecEnv([_make_env(99, args.seed + 999, frame_stack,
|
||||
drive_mode,
|
||||
difficulty=args.difficulty,
|
||||
max_n_sheep=args.max_n_sheep)])
|
||||
max_n_sheep=args.max_n_sheep,
|
||||
herding_cfg=herding_cfg)])
|
||||
print(f"[rl] difficulty={args.difficulty} max_n_sheep={args.max_n_sheep}")
|
||||
|
||||
# Reward-shaping overrides (broadcast to every env instance).
|
||||
|
||||
Reference in New Issue
Block a user