Webots sim-to-real fixes, DAgger pipeline, 360° proto variant

Today's session worked across the full Webots delivery stack — found and fixed a cluster of bugs blocking the BC/RL transfer, then explored training-side mitigations for the residual perception gap. Bug fixes: - Makefile FP_RATE default 2.0 → 0.0: BC demos used fp_rate=0 but RL fine-tune defaulted to fp_rate=2, poisoning the BC obs distribution and stalling PPO at 0% success across 1.46M+ steps. - controllers/{shepherd_dog,sheep}/runtime.ini: Webots was launching controllers under system python3 (no numpy) and they were crashing silently. Pinned to the conda tir env. - herding/config.py HERDING_WEBOTS preset: pen_latch_depth 0.5 → 2.0, max_new_tracks_per_step 3 → 1, static_reject 0.8 → 1.2. Stops phantom FPs near the gate from latching as permanently-penned tracks. - herding/perception/sheep_tracker.py: penned tracks now decay at forget_steps × 8 instead of living forever. Adds get_positions min_freshness filter for deploy-time use. Training/eval matches deployment: - training/bc/collect.py: --dagger-policy flag for DAgger rollouts (policy drives, teacher labels) + --use-webots-preset for matched 140° tracker + DR config. - controllers/shepherd_dog/shepherd_dog.py: scan-fallback (0, 0.6) when BC/RL sees empty sheep_positions — recovers from FOV gaps. Tooling: - tools/dagger_round.sh: one-shot DAgger round (collect + concat + bc). - tools/webots_sweep_gt.sh: full sweep with HERDING_USE_GT=1 for the perception-gap diagnosis matrix. - protos/ShepherdDog360.proto: 360° FOV variant for the FOV-ablation comparison. Canonical proto stays at 140° per project spec. Artifacts: v1 BC/RL policies for all 4 (drive × world) combos trained in clean gym (success: diff/field 90-100%, diff/round 58%, mec/field 60-100%, mec/round 50-100%). DAgger r1/r2 BCs for diff/field show 12%→38% progression on gym HERDING_WEBOTS proxy but did not close to actual Webots LiDAR (0/5 throughout). Next: LSTM policy or learned tracker per the project-state memory. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-16 17:21:02 +00:00
parent c61df91950
commit dd5ac669e5
34 changed files with 2336 additions and 188 deletions
@@ -0,0 +1,335 @@
+"""Central configuration dataclasses for the herding simulation.
+
+Every tunable constant that previously lived as a module-level literal in
+perception/lidar_sim.py, perception/lidar_perception.py,
+perception/sheep_tracker.py, world/geometry.py, or training/herding_env.py
+is now represented here as a field with its original default value.
+
+Usage — use the module defaults unchanged::
+
+    env = HerdingEnv()          # same behaviour as before
+
+Override a subset of parameters::
+
+    from herding.config import HerdingConfig, TrackerConfig
+    cfg = HerdingConfig(tracker=TrackerConfig(forget_steps=60))
+    env = HerdingEnv(herding_cfg=cfg)
+
+Use a named preset for Webots-matched training::
+
+    from herding.config import HERDING_WEBOTS
+    env = HerdingEnv(herding_cfg=HERDING_WEBOTS)
+
+Design notes
+------------
+* All dataclasses are frozen — instances are immutable after construction.
+* This module must not import from other ``herding.*`` packages to avoid
+  import cycles.  Field-geometry constants (pen coordinates, field size)
+  stay in ``herding.world.geometry`` because they depend on the world
+  variant selected at runtime via ``HERDING_WORLD``.
+"""
+
+from __future__ import annotations
+
+import math
+from dataclasses import dataclass, field, replace
+
+
+# ---------------------------------------------------------------------------
+# LiDAR hardware spec
+# ---------------------------------------------------------------------------
+
+@dataclass(frozen=True)
+class LidarConfig:
+    """Parameters of the simulated / physical LiDAR sensor.
+
+    The two canonical presets are :data:`LIDAR_FULL` (360°, oracle mode)
+    and :data:`LIDAR_WEBOTS` (140°/180-ray, matches the ShepherdDog proto).
+    """
+
+    n_rays: int = 360
+    """Number of rays in the scan."""
+
+    fov_rad: float = 2.0 * math.pi
+    """Full field-of-view in radians, centred on the robot's forward axis."""
+
+    max_range: float = 12.0
+    """Maximum detectable range in metres."""
+
+    noise_std: float = 0.005
+    """Gaussian standard deviation (metres) applied to each hit reading."""
+
+    sheep_radius: float = 0.30
+    """Effective disc radius of a sheep in the 2-D LiDAR plane (metres)."""
+
+    post_radius: float = 0.25
+    """Effective disc radius of gate / corner posts (metres)."""
+
+    def __post_init__(self) -> None:
+        if self.n_rays < 1:
+            raise ValueError(f"n_rays must be ≥ 1, got {self.n_rays}")
+        if not (0.0 < self.fov_rad <= 2.0 * math.pi):
+            raise ValueError(f"fov_rad must be in (0, 2π], got {self.fov_rad:.4f}")
+        if self.max_range <= 0.0:
+            raise ValueError(f"max_range must be > 0, got {self.max_range}")
+
+
+# Named presets -----------------------------------------------------------
+
+LIDAR_FULL = LidarConfig(
+    n_rays=360,
+    fov_rad=2.0 * math.pi,
+)
+"""360° full-circle scan — oracle / ablation mode."""
+
+LIDAR_WEBOTS = LidarConfig(
+    n_rays=180,
+    fov_rad=math.radians(140.0),
+)
+"""Matches the ShepherdDog.proto Lidar device (180 rays, 140° FOV).
+
+Training with this preset closes the sim-to-real gap for the sensor
+geometry.  Because the observation is built from tracker output (not raw
+rays), a policy trained here can be deployed on a wider-FOV LiDAR (e.g.
+240° or 360°) without retraining — more FOV means more true detections,
+which can only improve tracker quality.
+"""
+
+
+# ---------------------------------------------------------------------------
+# Cluster-detection pipeline
+# ---------------------------------------------------------------------------
+
+@dataclass(frozen=True)
+class DetectionConfig:
+    """Parameters for the LiDAR-scan → detection clustering pipeline."""
+
+    gap_threshold: float = 0.6
+    """Adjacent hit-points farther apart than this (metres) start a new cluster."""
+
+    max_cluster_span: float = 1.5
+    """Clusters wider than this (metres) are rejected as walls / structures."""
+
+    range_hit_eps: float = 0.05
+    """A ray is considered a hit if ``range < max_range - range_hit_eps``."""
+
+    split_range_gap: float = 0.20
+    """Range increase within a cluster that triggers a multi-peak split."""
+
+    wall_reject: float = 0.5
+    """Drop detections within this distance (metres) of any field wall."""
+
+    static_reject: float = 0.8
+    """Drop detections within this distance (metres) of known static features
+    (gate posts, field corners)."""
+
+    def __post_init__(self) -> None:
+        if self.wall_reject < 0.0:
+            raise ValueError(f"wall_reject must be ≥ 0, got {self.wall_reject}")
+        if self.static_reject < 0.0:
+            raise ValueError(f"static_reject must be ≥ 0, got {self.static_reject}")
+
+
+# ---------------------------------------------------------------------------
+# Multi-target tracker
+# ---------------------------------------------------------------------------
+
+@dataclass(frozen=True)
+class TrackerConfig:
+    """Parameters for the nearest-neighbour sheep tracker."""
+
+    gate_m: float = 2.5
+    """Primary NN association gate in metres (recently observed tracks)."""
+
+    reacquire_gate_m: float = 4.5
+    """Wider gate used when re-acquiring tracks stale for ≥ ``reacquire_min_age`` steps."""
+
+    reacquire_min_age: int = 20
+    """Minimum staleness (steps) before the wider re-acquisition gate activates."""
+
+    penned_gate_m: float = 4.0
+    """Gate for matching new detections to already-penned tracks."""
+
+    forget_steps: int = 200
+    """Delete an active track that has not been observed for this many steps (~3.2 s)."""
+
+    predict_steps: int = 120
+    """Extrapolate a track's position using constant velocity for this many steps (~1.9 s)."""
+
+    velocity_clamp: float = 1.0
+    """Maximum predicted speed (m/s) used during extrapolation."""
+
+    max_new_tracks_per_step: int = 10
+    """Maximum number of new tracks that may be spawned in a single step.
+
+    Capping this limits the damage from LiDAR false-positive bursts (e.g.
+    wall reflections in Webots) that would otherwise flood the track set.
+    The default (10 = MAX_SHEEP) preserves the original behaviour; reduce
+    to 2–3 for Webots deployment robustness.
+    """
+
+    pen_latch_depth: float = 0.0
+    """Minimum depth past the gate line (metres) before a track is latched
+    as penned.  0.0 = original behaviour (latch at y ≤ GATE_Y).  Increase
+    to 0.5 for Webots to prevent gate-hardware LiDAR reflections near y=-15
+    from permanently consuming tracker slots as false "penned" sheep.
+    """
+
+    def __post_init__(self) -> None:
+        if self.forget_steps < 1:
+            raise ValueError(f"forget_steps must be ≥ 1, got {self.forget_steps}")
+        if self.max_new_tracks_per_step < 1:
+            raise ValueError(
+                f"max_new_tracks_per_step must be ≥ 1, got {self.max_new_tracks_per_step}"
+            )
+
+
+# ---------------------------------------------------------------------------
+# Robot physical specification
+# ---------------------------------------------------------------------------
+
+@dataclass(frozen=True)
+class RobotConfig:
+    """Physical parameters of the shepherd-dog robot.
+
+    Values mirror ``protos/ShepherdDog.proto`` and ``protos/ShepherdDogMecanum.proto``.
+    """
+
+    wheel_radius: float = 0.038
+    """Wheel radius in metres."""
+
+    wheel_base: float = 0.28
+    """Axle-to-axle distance for differential drive (metres)."""
+
+    wheel_base_x: float = 0.28
+    """Front-to-back axle distance for mecanum drive (metres)."""
+
+    wheel_base_y: float = 0.28
+    """Left-to-right axle distance for mecanum drive (metres)."""
+
+    max_wheel_omega: float = 70.0
+    """Maximum wheel angular velocity (rad/s)."""
+
+    action_smooth: float = 0.0
+    """Exponential moving-average coefficient applied to actions inside the env.
+
+    ``0.0`` means no smoothing (gym default).
+    ``0.55`` matches the hard-coded EMA in ``shepherd_dog.py`` — use this
+    when training so the policy learns to act through the same filter it
+    sees at deployment.
+    """
+
+    def __post_init__(self) -> None:
+        if not (0.0 <= self.action_smooth < 1.0):
+            raise ValueError(
+                f"action_smooth must be in [0, 1), got {self.action_smooth}"
+            )
+
+    @property
+    def max_linear(self) -> float:
+        """Maximum achievable linear speed (m/s)."""
+        return self.wheel_radius * self.max_wheel_omega
+
+
+# ---------------------------------------------------------------------------
+# Domain randomisation
+# ---------------------------------------------------------------------------
+
+@dataclass(frozen=True)
+class DomainRandomConfig:
+    """Parameters that inject physics / sensor noise for domain randomisation.
+
+    All values default to 0 (disabled) so the base env is deterministic and
+    backwards-compatible.  Enable them gradually to close the sim-to-real gap.
+    """
+
+    fp_rate: float = 0.0
+    """Mean number of false-positive detections injected per step (Poisson λ).
+
+    FPs are placed near static features (walls, posts) with positional
+    noise ``fp_std_pos``, mimicking the spurious clusters Webots' physical
+    LiDAR returns from 3D geometry.
+    """
+
+    fp_std_pos: float = 0.3
+    """Positional standard deviation (metres) of injected false-positive clusters."""
+
+    wheel_slip_std: float = 0.0
+    """Gaussian noise standard deviation (rad/s) added to each wheel speed
+    before kinematic integration.  Models real-world wheel slip and motor
+    variation.  Suggested starting value: 0.05.
+    """
+
+    compass_noise_std: float = 0.0
+    """Gaussian noise standard deviation (radians) added to the heading
+    reading each step.  Models magnetometer drift in Webots.
+    Suggested starting value: 0.02.
+    """
+
+    def __post_init__(self) -> None:
+        if self.fp_rate < 0.0:
+            raise ValueError(f"fp_rate must be ≥ 0, got {self.fp_rate}")
+        if self.wheel_slip_std < 0.0:
+            raise ValueError(f"wheel_slip_std must be ≥ 0, got {self.wheel_slip_std}")
+        if self.compass_noise_std < 0.0:
+            raise ValueError(f"compass_noise_std must be ≥ 0, got {self.compass_noise_std}")
+
+
+# ---------------------------------------------------------------------------
+# Aggregate config
+# ---------------------------------------------------------------------------
+
+@dataclass(frozen=True)
+class HerdingConfig:
+    """Root configuration object passed to :class:`~training.herding_env.HerdingEnv`.
+
+    Sub-configs default to the original simulation parameters so that
+    ``HerdingEnv()`` and ``HerdingEnv(herding_cfg=HerdingConfig())`` produce
+    identical behaviour.
+    """
+
+    lidar: LidarConfig = field(default_factory=LidarConfig)
+    detection: DetectionConfig = field(default_factory=DetectionConfig)
+    tracker: TrackerConfig = field(default_factory=TrackerConfig)
+    robot: RobotConfig = field(default_factory=RobotConfig)
+    domain_random: DomainRandomConfig = field(default_factory=DomainRandomConfig)
+
+    def replace(self, **kwargs) -> "HerdingConfig":
+        """Return a new config with selected top-level sub-configs replaced.
+
+        Example::
+
+            cfg = HERDING_WEBOTS.replace(
+                domain_random=DomainRandomConfig(fp_rate=2.0, wheel_slip_std=0.05)
+            )
+        """
+        return replace(self, **kwargs)
+
+
+# ---------------------------------------------------------------------------
+# Named full-pipeline presets
+# ---------------------------------------------------------------------------
+
+HERDING_DEFAULT = HerdingConfig()
+"""Original simulation defaults — zero behaviour change."""
+
+HERDING_WEBOTS = HerdingConfig(
+    lidar=LIDAR_WEBOTS,
+    detection=DetectionConfig(wall_reject=0.5, static_reject=1.2),
+    tracker=TrackerConfig(
+        forget_steps=120,
+        max_new_tracks_per_step=1,
+        pen_latch_depth=2.0,
+    ),
+    robot=RobotConfig(action_smooth=0.55),
+)
+"""Webots-matched training preset.
+
+Changes vs HERDING_DEFAULT:
+* LiDAR: 180 rays / 140° FOV matching ShepherdDog.proto hardware
+* Detection: wall_reject kept at 0.5 m (original default; static_reject
+             handles post FPs; 1.0 m was too aggressive near the south gate)
+* Tracker: forget_steps 200 → 60 (~1 s ghost-track lifetime)
+           max_new_tracks_per_step 10 → 3 (rate-caps FP flooding)
+* Robot: action_smooth 0.0 → 0.55 (matches Webots controller EMA)
+"""