Webots sim-to-real fixes, DAgger pipeline, 360° proto variant

Today's session worked across the full Webots delivery stack — found and
fixed a cluster of bugs blocking the BC/RL transfer, then explored
training-side mitigations for the residual perception gap.

Bug fixes:
- Makefile FP_RATE default 2.0 → 0.0: BC demos used fp_rate=0 but RL
  fine-tune defaulted to fp_rate=2, poisoning the BC obs distribution
  and stalling PPO at 0% success across 1.46M+ steps.
- controllers/{shepherd_dog,sheep}/runtime.ini: Webots was launching
  controllers under system python3 (no numpy) and they were crashing
  silently. Pinned to the conda tir env.
- herding/config.py HERDING_WEBOTS preset: pen_latch_depth 0.5 → 2.0,
  max_new_tracks_per_step 3 → 1, static_reject 0.8 → 1.2. Stops phantom
  FPs near the gate from latching as permanently-penned tracks.
- herding/perception/sheep_tracker.py: penned tracks now decay at
  forget_steps × 8 instead of living forever. Adds get_positions
  min_freshness filter for deploy-time use.

Training/eval matches deployment:
- training/bc/collect.py: --dagger-policy flag for DAgger rollouts
  (policy drives, teacher labels) + --use-webots-preset for matched
  140° tracker + DR config.
- controllers/shepherd_dog/shepherd_dog.py: scan-fallback (0, 0.6) when
  BC/RL sees empty sheep_positions — recovers from FOV gaps.

Tooling:
- tools/dagger_round.sh: one-shot DAgger round (collect + concat + bc).
- tools/webots_sweep_gt.sh: full sweep with HERDING_USE_GT=1 for the
  perception-gap diagnosis matrix.
- protos/ShepherdDog360.proto: 360° FOV variant for the FOV-ablation
  comparison. Canonical proto stays at 140° per project spec.

Artifacts: v1 BC/RL policies for all 4 (drive × world) combos trained
in clean gym (success: diff/field 90-100%, diff/round 58%, mec/field
60-100%, mec/round 50-100%). DAgger r1/r2 BCs for diff/field show
12%→38% progression on gym HERDING_WEBOTS proxy but did not close
to actual Webots LiDAR (0/5 throughout). Next: LSTM policy or
learned tracker per the project-state memory.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This commit is contained in:
Johnny Fernandes
2026-05-16 17:21:02 +00:00
parent c61df91950
commit dd5ac669e5
34 changed files with 2336 additions and 188 deletions
+86 -23
View File
@@ -21,22 +21,9 @@ from pathlib import Path
import numpy as np
# Early CLI parse so we can configure geometry before heavy imports.
# (argparse is used again below for the full parse; this is a lightweight
# pre-pass that only reads --world.)
_pre_argv = [a for a in os.sys.argv[1:]]
_pre_world = None
for i, a in enumerate(_pre_argv):
if a == "--world" and i + 1 < len(_pre_argv):
_pre_world = _pre_argv[i + 1]
break
if a.startswith("--world="):
_pre_world = a.split("=", 1)[1]
break
if _pre_world is not None:
from herding.world.geometry import configure as _geo_configure
_geo_configure(_pre_world)
os.environ["HERDING_WORLD"] = _pre_world
# Configure field geometry before other herding imports read it at module level.
from herding.world.geometry import configure_from_args as _configure_from_args
_configure_from_args()
from herding.control.active_scan import ActiveScanTeacher
from herding.world.geometry import PEN_ENTRY, FIELD_SHAPE
@@ -83,10 +70,17 @@ def _call_teacher(fn, dog_xy, dog_heading, sheep_positions, pen_target,
def collect_one(n_sheep: int, seed: int, max_steps: int, subsample: int,
teacher_fn, frame_stack: int = 1, privileged: bool = False,
drive_mode: str = "differential"):
drive_mode: str = "differential", herding_cfg=None,
actor_policy=None):
"""Collect (obs, teacher_action) pairs from one episode.
``actor_policy`` (DAgger mode): a callable ``policy(obs) -> action`` that
drives the env. The teacher still labels each visited state. If ``None``
(default), the teacher drives.
"""
env = HerdingEnv(n_sheep=n_sheep, max_steps=max_steps,
difficulty=1.0, seed=seed, frame_stack=frame_stack,
drive_mode=drive_mode)
drive_mode=drive_mode, herding_cfg=herding_cfg)
obs, _ = env.reset(seed=seed)
obs_list, action_list = [], []
scan_teacher = ActiveScanTeacher(teacher_fn)
@@ -108,13 +102,16 @@ def collect_one(n_sheep: int, seed: int, max_steps: int, subsample: int,
)
vx, vy, omega, _mode = result
if drive_mode == "mecanum":
action = np.array([vx, vy, omega], dtype=np.float32)
teacher_action = np.array([vx, vy, omega], dtype=np.float32)
else:
action = np.array([vx, vy], dtype=np.float32)
teacher_action = np.array([vx, vy], dtype=np.float32)
if step % subsample == 0:
obs_list.append(obs.copy())
action_list.append(action.copy())
obs, _r, term, trunc, _info = env.step(action)
action_list.append(teacher_action.copy())
# In DAgger mode the policy drives; otherwise the teacher does.
step_action = (actor_policy(obs) if actor_policy is not None
else teacher_action)
obs, _r, term, trunc, _info = env.step(step_action)
if term or trunc:
break
success = bool(env.sheep_penned.all())
@@ -153,6 +150,24 @@ def main():
help="World shape. If not set, uses HERDING_WORLD "
"env var or defaults to 'field'. Must be set "
"before geometry is imported.")
# Domain randomisation — applied to the gym env during collection so
# the teacher demonstrates under the same noise the policy will face.
parser.add_argument("--fp-rate", type=float, default=0.0,
help="Mean false-positive detections injected per "
"step (Poisson λ). 0 = clean sim (default).")
parser.add_argument("--action-smooth", type=float, default=0.0,
help="EMA coefficient on dog actions (0 = none). "
"Set to 0.55 to match the Webots controller.")
parser.add_argument("--wheel-slip-std", type=float, default=0.0,
help="Gaussian noise (rad/s) on wheel speeds for "
"mecanum dynamics domain randomisation.")
parser.add_argument("--dagger-policy", default=None,
help="Path to a BC/PPO policy directory. When set, "
"the policy drives the env (DAgger) while the "
"teacher labels every visited state.")
parser.add_argument("--use-webots-preset", action="store_true",
help="Use HERDING_WEBOTS preset (140° FOV + tight "
"tracker). Match this to deployment for DAgger.")
args = parser.parse_args()
# Validate --world matches geometry (already configured by the
@@ -161,6 +176,53 @@ def main():
print(f"[demos] WARNING: --world={args.world} but geometry is "
f"'{FIELD_SHAPE}'. This should not happen — file a bug.")
from herding.config import HerdingConfig, HERDING_WEBOTS, DomainRandomConfig, RobotConfig
if args.use_webots_preset:
herding_cfg = HERDING_WEBOTS.replace(
domain_random=DomainRandomConfig(
fp_rate=args.fp_rate,
wheel_slip_std=args.wheel_slip_std,
),
robot=RobotConfig(action_smooth=args.action_smooth),
)
print(f"[demos] HERDING_WEBOTS preset + DR: fp_rate={args.fp_rate} "
f"action_smooth={args.action_smooth} wheel_slip_std={args.wheel_slip_std}")
else:
herding_cfg = None
if args.fp_rate > 0.0 or args.action_smooth > 0.0 or args.wheel_slip_std > 0.0:
herding_cfg = HerdingConfig(
domain_random=DomainRandomConfig(
fp_rate=args.fp_rate,
wheel_slip_std=args.wheel_slip_std,
),
robot=RobotConfig(action_smooth=args.action_smooth),
)
print(f"[demos] domain-random: fp_rate={args.fp_rate} "
f"action_smooth={args.action_smooth} "
f"wheel_slip_std={args.wheel_slip_std}")
actor_policy = None
if args.dagger_policy is not None:
# DAgger: failures are the most valuable data (off-policy states
# where the student needs teacher correction). Always keep them.
args.keep_failures = True
from stable_baselines3 import PPO
from pathlib import Path as _P
run = _P(args.dagger_policy)
for name in ("policy.zip", "final.zip"):
if (run / name).exists():
zip_path = run / name
break
else:
raise FileNotFoundError(
f"No policy found in {run} (tried policy.zip, final.zip)")
_model = PPO.load(str(zip_path), device="auto")
print(f"[demos] DAgger mode: actor = {zip_path}")
def actor_policy(obs):
obs_b = np.asarray(obs, dtype=np.float32).reshape(1, -1)
a, _ = _model.predict(obs_b, deterministic=True)
return a[0]
teacher_fn = TEACHERS[args.teacher]
print(f"[demos] teacher: {args.teacher} world: {FIELD_SHAPE}")
@@ -177,7 +239,8 @@ def main():
obs, actions, success, total_steps = collect_one(
n, seed, args.max_steps, args.subsample, teacher_fn,
frame_stack=args.frame_stack, privileged=args.privileged,
drive_mode=args.drive_mode,
drive_mode=args.drive_mode, herding_cfg=herding_cfg,
actor_policy=actor_policy,
)
n_total += 1
if success:
+3 -15
View File
@@ -18,21 +18,9 @@ from statistics import mean
import numpy as np
# Early CLI pre-parse for --world so geometry is configured before
# other herding.* modules are imported.
_pre_argv = [a for a in os.sys.argv[1:]]
_pre_world = None
for i, a in enumerate(_pre_argv):
if a == "--world" and i + 1 < len(_pre_argv):
_pre_world = _pre_argv[i + 1]
break
if a.startswith("--world="):
_pre_world = a.split("=", 1)[1]
break
if _pre_world is not None:
from herding.world.geometry import configure as _geo_configure
_geo_configure(_pre_world)
os.environ["HERDING_WORLD"] = _pre_world
# Configure field geometry before other herding imports read it at module level.
from herding.world.geometry import configure_from_args as _configure_from_args
_configure_from_args()
from herding.world.geometry import MAX_SHEEP, PEN_ENTRY
from herding.control.sequential import compute_action as sequential_action
+77 -1
View File
@@ -47,6 +47,7 @@ from herding.perception.lidar_sim import simulate_scan
from herding.perception.obs import OBS_DIM, build_obs
from herding.perception.sheep_tracker import SheepTracker
from herding.control.strombom import compute_action as strombom_action
from herding.config import HerdingConfig
class HerdingEnv(gym.Env):
@@ -87,13 +88,24 @@ class HerdingEnv(gym.Env):
use_lidar: bool = True,
frame_stack: int = 1,
drive_mode: str = "differential",
herding_cfg: Optional[HerdingConfig] = None,
):
super().__init__()
# Store the config; fall back to defaults when None.
self._herding_cfg = herding_cfg
# Apply robot config overrides — these shadow the class attributes
# so that per-instance configuration is possible without touching
# the class-level defaults used by unconfigured instances.
if herding_cfg is not None:
self.ACTION_SMOOTH = herding_cfg.robot.action_smooth
# ``use_lidar=True`` (default): obs and imitation-reward teacher
# see only LiDAR-perceived positions via a tracker, matching the
# Webots controller. ``False`` exposes ground truth for ablation.
self._use_lidar = bool(use_lidar)
self._tracker = SheepTracker() if self._use_lidar else None
tracker_cfg = herding_cfg.tracker if herding_cfg is not None else None
self._tracker = SheepTracker(tracker_cfg=tracker_cfg) if self._use_lidar else None
self._np_rng_lidar: Optional[np.random.Generator] = None
# Frame stacking: the policy receives the last K obs concatenated,
@@ -261,6 +273,14 @@ class HerdingEnv(gym.Env):
vx, vy = float(self.smoothed_action[0]), float(self.smoothed_action[1])
omega = float(self.smoothed_action[2]) if self._action_dim >= 3 else 0.0
# Domain randomisation: compass (heading) noise.
dr = (self._herding_cfg.domain_random
if self._herding_cfg is not None else None)
slip_std = dr.wheel_slip_std if dr is not None else 0.0
if dr is not None and dr.compass_noise_std > 0.0 and self._np_rng_lidar is not None:
self.dog_heading = float(self.dog_heading + self._np_rng_lidar.normal(
0.0, dr.compass_noise_std))
# Safety supervisor — dog stays north of the gate.
if self.dog_y < DOG_SOUTH_LIMIT and vy < 0.0:
vx, vy = 0.0, 1.0
@@ -282,6 +302,8 @@ class HerdingEnv(gym.Env):
DOG_WHEEL_RADIUS,
DOG_WHEEL_BASE_X / 2.0, DOG_WHEEL_BASE_Y / 2.0,
WEBOTS_DT,
slip_std=slip_std,
rng=self._np_rng_lidar,
)
else:
wL, wR = velocity_to_wheels(
@@ -294,6 +316,8 @@ class HerdingEnv(gym.Env):
self.dog_x, self.dog_y, self.dog_heading = kinematics_step(
self.dog_x, self.dog_y, self.dog_heading,
wL, wR, DOG_WHEEL_RADIUS, DOG_WHEEL_BASE, WEBOTS_DT,
slip_std=slip_std,
rng=self._np_rng_lidar,
)
self.dog_x, self.dog_y = clip_to_field(self.dog_x, self.dog_y, margin=0.3)
# Extra constraint: dog stays north of the gate area.
@@ -460,16 +484,68 @@ class HerdingEnv(gym.Env):
for i in range(self.n_sheep)]
def _update_tracker(self) -> None:
lidar_cfg = (self._herding_cfg.lidar
if self._herding_cfg is not None else None)
detection_cfg = (self._herding_cfg.detection
if self._herding_cfg is not None else None)
ranges = simulate_scan(
self.dog_x, self.dog_y, self.dog_heading,
self._all_sheep_xy(),
rng=self._np_rng_lidar,
lidar_cfg=lidar_cfg,
)
detections = detections_from_scan(
ranges, self.dog_x, self.dog_y, self.dog_heading,
detection_cfg=detection_cfg,
lidar_cfg=lidar_cfg,
)
# Domain randomisation: inject false-positive detections near static
# features to mimic the spurious clusters Webots' physical LiDAR
# produces from real 3D geometry (walls, posts, fence rails).
dr = (self._herding_cfg.domain_random
if self._herding_cfg is not None else None)
if dr is not None and dr.fp_rate > 0.0 and self._np_rng_lidar is not None:
detections = list(detections)
detections.extend(
self._sample_false_positives(dr.fp_rate, dr.fp_std_pos))
self._tracker.update(detections)
# Static feature anchor points used for FP injection.
# The rectangular list covers gate posts and field corners; the round
# list covers just the gate posts (the circular wall is handled separately).
_FP_ANCHORS_RECT = (
(10.0, -15.0), (13.0, -15.0), # gate posts
(15.0, 15.0), (15.0, -15.0),
(-15.0, 15.0), (-15.0, -15.0), # field corners
(15.0, 0.0), (-15.0, 0.0), # mid-wall returns
(0.0, 15.0), (0.0, -15.0),
)
_FP_ANCHORS_ROUND = (
(0.0, -15.0), # gate centre
(-1.5, -15.0), (1.5, -15.0), # gate posts
(0.0, 15.0), # north wall
(10.6, -10.6), (-10.6, -10.6), # circular wall quadrants
)
def _sample_false_positives(
self, fp_rate: float, fp_std: float,
) -> list[tuple[float, float]]:
"""Return a list of spurious (x, y) detections for this step."""
from herding.world.geometry import FIELD_SHAPE
anchors = (self._FP_ANCHORS_ROUND
if FIELD_SHAPE == "field_round"
else self._FP_ANCHORS_RECT)
n_fps = int(self._np_rng_lidar.poisson(fp_rate))
if n_fps == 0:
return []
fps = []
chosen = self._np_rng_lidar.integers(0, len(anchors), size=n_fps)
noise = self._np_rng_lidar.normal(0.0, fp_std, size=(n_fps, 2))
for k in range(n_fps):
ax, ay = anchors[chosen[k]]
fps.append((float(ax + noise[k, 0]), float(ay + noise[k, 1])))
return fps
def perceived_positions(self) -> dict[str, tuple[float, float]]:
"""What the controller would "see" this step: tracker output in
LiDAR mode, ground truth in privileged mode. Used by demo
+31 -20
View File
@@ -23,22 +23,9 @@ import argparse
import os
from pathlib import Path
# Early CLI pre-parse for --world so geometry is configured before any
# herding.* / training.* import binds geometry constants. Matches the
# pattern used by training.bc.collect and training.eval.
_pre_argv = [a for a in os.sys.argv[1:]]
_pre_world = None
for i, a in enumerate(_pre_argv):
if a == "--world" and i + 1 < len(_pre_argv):
_pre_world = _pre_argv[i + 1]
break
if a.startswith("--world="):
_pre_world = a.split("=", 1)[1]
break
if _pre_world is not None:
from herding.world.geometry import configure as _geo_configure
_geo_configure(_pre_world)
os.environ["HERDING_WORLD"] = _pre_world
# Configure field geometry before other herding imports read it at module level.
from herding.world.geometry import configure_from_args as _configure_from_args
_configure_from_args()
import numpy as np
import torch as th
@@ -59,11 +46,12 @@ from training.herding_env import HerdingEnv
def _make_env(rank: int, seed: int, frame_stack: int,
drive_mode: str = "differential",
difficulty: float = 1.0,
max_n_sheep: int = 10):
max_n_sheep: int = 10,
herding_cfg=None):
def _thunk():
env = HerdingEnv(seed=seed + rank, frame_stack=frame_stack,
drive_mode=drive_mode, difficulty=difficulty,
max_n_sheep=max_n_sheep)
max_n_sheep=max_n_sheep, herding_cfg=herding_cfg)
env = Monitor(env, info_keywords=("is_success", "n_sheep", "n_penned"))
return env
return _thunk
@@ -241,6 +229,13 @@ def main() -> None:
choices=["field", "field_round"],
help="World shape. If not set, uses HERDING_WORLD "
"env var or defaults to 'field'.")
# Domain randomisation
parser.add_argument("--fp-rate", type=float, default=0.0,
help="Mean false-positive detections per step (Poisson λ).")
parser.add_argument("--action-smooth", type=float, default=0.0,
help="EMA on dog actions (0=none, 0.55=Webots match).")
parser.add_argument("--wheel-slip-std", type=float, default=0.0,
help="Gaussian wheel-speed noise std (rad/s).")
args = parser.parse_args()
# --world was already honoured in the early pre-parse above; here we
# just sanity-check that the final argparse view agrees.
@@ -280,15 +275,31 @@ def main() -> None:
drive_mode = "differential"
print(f"[rl] drive_mode={drive_mode} (BC action_dim={bc_action_dim})")
from herding.config import HerdingConfig, DomainRandomConfig, RobotConfig
herding_cfg = None
if args.fp_rate > 0.0 or args.action_smooth > 0.0 or args.wheel_slip_std > 0.0:
herding_cfg = HerdingConfig(
domain_random=DomainRandomConfig(
fp_rate=args.fp_rate,
wheel_slip_std=args.wheel_slip_std,
),
robot=RobotConfig(action_smooth=args.action_smooth),
)
print(f"[rl] domain-random: fp_rate={args.fp_rate} "
f"action_smooth={args.action_smooth} "
f"wheel_slip_std={args.wheel_slip_std}")
env_fns = [_make_env(i, args.seed, frame_stack, drive_mode,
difficulty=args.difficulty,
max_n_sheep=args.max_n_sheep)
max_n_sheep=args.max_n_sheep,
herding_cfg=herding_cfg)
for i in range(args.n_envs)]
venv = SubprocVecEnv(env_fns) if args.n_envs > 1 else DummyVecEnv(env_fns)
eval_venv = DummyVecEnv([_make_env(99, args.seed + 999, frame_stack,
drive_mode,
difficulty=args.difficulty,
max_n_sheep=args.max_n_sheep)])
max_n_sheep=args.max_n_sheep,
herding_cfg=herding_cfg)])
print(f"[rl] difficulty={args.difficulty} max_n_sheep={args.max_n_sheep}")
# Reward-shaping overrides (broadcast to every env instance).
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.