Webots sim-to-real fixes, DAgger pipeline, 360° proto variant

Today's session worked across the full Webots delivery stack — found and
fixed a cluster of bugs blocking the BC/RL transfer, then explored
training-side mitigations for the residual perception gap.

Bug fixes:
- Makefile FP_RATE default 2.0 → 0.0: BC demos used fp_rate=0 but RL
  fine-tune defaulted to fp_rate=2, poisoning the BC obs distribution
  and stalling PPO at 0% success across 1.46M+ steps.
- controllers/{shepherd_dog,sheep}/runtime.ini: Webots was launching
  controllers under system python3 (no numpy) and they were crashing
  silently. Pinned to the conda tir env.
- herding/config.py HERDING_WEBOTS preset: pen_latch_depth 0.5 → 2.0,
  max_new_tracks_per_step 3 → 1, static_reject 0.8 → 1.2. Stops phantom
  FPs near the gate from latching as permanently-penned tracks.
- herding/perception/sheep_tracker.py: penned tracks now decay at
  forget_steps × 8 instead of living forever. Adds get_positions
  min_freshness filter for deploy-time use.

Training/eval matches deployment:
- training/bc/collect.py: --dagger-policy flag for DAgger rollouts
  (policy drives, teacher labels) + --use-webots-preset for matched
  140° tracker + DR config.
- controllers/shepherd_dog/shepherd_dog.py: scan-fallback (0, 0.6) when
  BC/RL sees empty sheep_positions — recovers from FOV gaps.

Tooling:
- tools/dagger_round.sh: one-shot DAgger round (collect + concat + bc).
- tools/webots_sweep_gt.sh: full sweep with HERDING_USE_GT=1 for the
  perception-gap diagnosis matrix.
- protos/ShepherdDog360.proto: 360° FOV variant for the FOV-ablation
  comparison. Canonical proto stays at 140° per project spec.

Artifacts: v1 BC/RL policies for all 4 (drive × world) combos trained
in clean gym (success: diff/field 90-100%, diff/round 58%, mec/field
60-100%, mec/round 50-100%). DAgger r1/r2 BCs for diff/field show
12%→38% progression on gym HERDING_WEBOTS proxy but did not close
to actual Webots LiDAR (0/5 throughout). Next: LSTM policy or
learned tracker per the project-state memory.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This commit is contained in:
Johnny Fernandes
2026-05-16 17:21:02 +00:00
parent c61df91950
commit dd5ac669e5
34 changed files with 2336 additions and 188 deletions
+335
View File
@@ -0,0 +1,335 @@
"""Central configuration dataclasses for the herding simulation.
Every tunable constant that previously lived as a module-level literal in
perception/lidar_sim.py, perception/lidar_perception.py,
perception/sheep_tracker.py, world/geometry.py, or training/herding_env.py
is now represented here as a field with its original default value.
Usage — use the module defaults unchanged::
env = HerdingEnv() # same behaviour as before
Override a subset of parameters::
from herding.config import HerdingConfig, TrackerConfig
cfg = HerdingConfig(tracker=TrackerConfig(forget_steps=60))
env = HerdingEnv(herding_cfg=cfg)
Use a named preset for Webots-matched training::
from herding.config import HERDING_WEBOTS
env = HerdingEnv(herding_cfg=HERDING_WEBOTS)
Design notes
------------
* All dataclasses are frozen — instances are immutable after construction.
* This module must not import from other ``herding.*`` packages to avoid
import cycles. Field-geometry constants (pen coordinates, field size)
stay in ``herding.world.geometry`` because they depend on the world
variant selected at runtime via ``HERDING_WORLD``.
"""
from __future__ import annotations
import math
from dataclasses import dataclass, field, replace
# ---------------------------------------------------------------------------
# LiDAR hardware spec
# ---------------------------------------------------------------------------
@dataclass(frozen=True)
class LidarConfig:
"""Parameters of the simulated / physical LiDAR sensor.
The two canonical presets are :data:`LIDAR_FULL` (360°, oracle mode)
and :data:`LIDAR_WEBOTS` (140°/180-ray, matches the ShepherdDog proto).
"""
n_rays: int = 360
"""Number of rays in the scan."""
fov_rad: float = 2.0 * math.pi
"""Full field-of-view in radians, centred on the robot's forward axis."""
max_range: float = 12.0
"""Maximum detectable range in metres."""
noise_std: float = 0.005
"""Gaussian standard deviation (metres) applied to each hit reading."""
sheep_radius: float = 0.30
"""Effective disc radius of a sheep in the 2-D LiDAR plane (metres)."""
post_radius: float = 0.25
"""Effective disc radius of gate / corner posts (metres)."""
def __post_init__(self) -> None:
if self.n_rays < 1:
raise ValueError(f"n_rays must be ≥ 1, got {self.n_rays}")
if not (0.0 < self.fov_rad <= 2.0 * math.pi):
raise ValueError(f"fov_rad must be in (0, 2π], got {self.fov_rad:.4f}")
if self.max_range <= 0.0:
raise ValueError(f"max_range must be > 0, got {self.max_range}")
# Named presets -----------------------------------------------------------
LIDAR_FULL = LidarConfig(
n_rays=360,
fov_rad=2.0 * math.pi,
)
"""360° full-circle scan — oracle / ablation mode."""
LIDAR_WEBOTS = LidarConfig(
n_rays=180,
fov_rad=math.radians(140.0),
)
"""Matches the ShepherdDog.proto Lidar device (180 rays, 140° FOV).
Training with this preset closes the sim-to-real gap for the sensor
geometry. Because the observation is built from tracker output (not raw
rays), a policy trained here can be deployed on a wider-FOV LiDAR (e.g.
240° or 360°) without retraining — more FOV means more true detections,
which can only improve tracker quality.
"""
# ---------------------------------------------------------------------------
# Cluster-detection pipeline
# ---------------------------------------------------------------------------
@dataclass(frozen=True)
class DetectionConfig:
"""Parameters for the LiDAR-scan → detection clustering pipeline."""
gap_threshold: float = 0.6
"""Adjacent hit-points farther apart than this (metres) start a new cluster."""
max_cluster_span: float = 1.5
"""Clusters wider than this (metres) are rejected as walls / structures."""
range_hit_eps: float = 0.05
"""A ray is considered a hit if ``range < max_range - range_hit_eps``."""
split_range_gap: float = 0.20
"""Range increase within a cluster that triggers a multi-peak split."""
wall_reject: float = 0.5
"""Drop detections within this distance (metres) of any field wall."""
static_reject: float = 0.8
"""Drop detections within this distance (metres) of known static features
(gate posts, field corners)."""
def __post_init__(self) -> None:
if self.wall_reject < 0.0:
raise ValueError(f"wall_reject must be ≥ 0, got {self.wall_reject}")
if self.static_reject < 0.0:
raise ValueError(f"static_reject must be ≥ 0, got {self.static_reject}")
# ---------------------------------------------------------------------------
# Multi-target tracker
# ---------------------------------------------------------------------------
@dataclass(frozen=True)
class TrackerConfig:
"""Parameters for the nearest-neighbour sheep tracker."""
gate_m: float = 2.5
"""Primary NN association gate in metres (recently observed tracks)."""
reacquire_gate_m: float = 4.5
"""Wider gate used when re-acquiring tracks stale for ≥ ``reacquire_min_age`` steps."""
reacquire_min_age: int = 20
"""Minimum staleness (steps) before the wider re-acquisition gate activates."""
penned_gate_m: float = 4.0
"""Gate for matching new detections to already-penned tracks."""
forget_steps: int = 200
"""Delete an active track that has not been observed for this many steps (~3.2 s)."""
predict_steps: int = 120
"""Extrapolate a track's position using constant velocity for this many steps (~1.9 s)."""
velocity_clamp: float = 1.0
"""Maximum predicted speed (m/s) used during extrapolation."""
max_new_tracks_per_step: int = 10
"""Maximum number of new tracks that may be spawned in a single step.
Capping this limits the damage from LiDAR false-positive bursts (e.g.
wall reflections in Webots) that would otherwise flood the track set.
The default (10 = MAX_SHEEP) preserves the original behaviour; reduce
to 23 for Webots deployment robustness.
"""
pen_latch_depth: float = 0.0
"""Minimum depth past the gate line (metres) before a track is latched
as penned. 0.0 = original behaviour (latch at y ≤ GATE_Y). Increase
to 0.5 for Webots to prevent gate-hardware LiDAR reflections near y=-15
from permanently consuming tracker slots as false "penned" sheep.
"""
def __post_init__(self) -> None:
if self.forget_steps < 1:
raise ValueError(f"forget_steps must be ≥ 1, got {self.forget_steps}")
if self.max_new_tracks_per_step < 1:
raise ValueError(
f"max_new_tracks_per_step must be ≥ 1, got {self.max_new_tracks_per_step}"
)
# ---------------------------------------------------------------------------
# Robot physical specification
# ---------------------------------------------------------------------------
@dataclass(frozen=True)
class RobotConfig:
"""Physical parameters of the shepherd-dog robot.
Values mirror ``protos/ShepherdDog.proto`` and ``protos/ShepherdDogMecanum.proto``.
"""
wheel_radius: float = 0.038
"""Wheel radius in metres."""
wheel_base: float = 0.28
"""Axle-to-axle distance for differential drive (metres)."""
wheel_base_x: float = 0.28
"""Front-to-back axle distance for mecanum drive (metres)."""
wheel_base_y: float = 0.28
"""Left-to-right axle distance for mecanum drive (metres)."""
max_wheel_omega: float = 70.0
"""Maximum wheel angular velocity (rad/s)."""
action_smooth: float = 0.0
"""Exponential moving-average coefficient applied to actions inside the env.
``0.0`` means no smoothing (gym default).
``0.55`` matches the hard-coded EMA in ``shepherd_dog.py`` — use this
when training so the policy learns to act through the same filter it
sees at deployment.
"""
def __post_init__(self) -> None:
if not (0.0 <= self.action_smooth < 1.0):
raise ValueError(
f"action_smooth must be in [0, 1), got {self.action_smooth}"
)
@property
def max_linear(self) -> float:
"""Maximum achievable linear speed (m/s)."""
return self.wheel_radius * self.max_wheel_omega
# ---------------------------------------------------------------------------
# Domain randomisation
# ---------------------------------------------------------------------------
@dataclass(frozen=True)
class DomainRandomConfig:
"""Parameters that inject physics / sensor noise for domain randomisation.
All values default to 0 (disabled) so the base env is deterministic and
backwards-compatible. Enable them gradually to close the sim-to-real gap.
"""
fp_rate: float = 0.0
"""Mean number of false-positive detections injected per step (Poisson λ).
FPs are placed near static features (walls, posts) with positional
noise ``fp_std_pos``, mimicking the spurious clusters Webots' physical
LiDAR returns from 3D geometry.
"""
fp_std_pos: float = 0.3
"""Positional standard deviation (metres) of injected false-positive clusters."""
wheel_slip_std: float = 0.0
"""Gaussian noise standard deviation (rad/s) added to each wheel speed
before kinematic integration. Models real-world wheel slip and motor
variation. Suggested starting value: 0.05.
"""
compass_noise_std: float = 0.0
"""Gaussian noise standard deviation (radians) added to the heading
reading each step. Models magnetometer drift in Webots.
Suggested starting value: 0.02.
"""
def __post_init__(self) -> None:
if self.fp_rate < 0.0:
raise ValueError(f"fp_rate must be ≥ 0, got {self.fp_rate}")
if self.wheel_slip_std < 0.0:
raise ValueError(f"wheel_slip_std must be ≥ 0, got {self.wheel_slip_std}")
if self.compass_noise_std < 0.0:
raise ValueError(f"compass_noise_std must be ≥ 0, got {self.compass_noise_std}")
# ---------------------------------------------------------------------------
# Aggregate config
# ---------------------------------------------------------------------------
@dataclass(frozen=True)
class HerdingConfig:
"""Root configuration object passed to :class:`~training.herding_env.HerdingEnv`.
Sub-configs default to the original simulation parameters so that
``HerdingEnv()`` and ``HerdingEnv(herding_cfg=HerdingConfig())`` produce
identical behaviour.
"""
lidar: LidarConfig = field(default_factory=LidarConfig)
detection: DetectionConfig = field(default_factory=DetectionConfig)
tracker: TrackerConfig = field(default_factory=TrackerConfig)
robot: RobotConfig = field(default_factory=RobotConfig)
domain_random: DomainRandomConfig = field(default_factory=DomainRandomConfig)
def replace(self, **kwargs) -> "HerdingConfig":
"""Return a new config with selected top-level sub-configs replaced.
Example::
cfg = HERDING_WEBOTS.replace(
domain_random=DomainRandomConfig(fp_rate=2.0, wheel_slip_std=0.05)
)
"""
return replace(self, **kwargs)
# ---------------------------------------------------------------------------
# Named full-pipeline presets
# ---------------------------------------------------------------------------
HERDING_DEFAULT = HerdingConfig()
"""Original simulation defaults — zero behaviour change."""
HERDING_WEBOTS = HerdingConfig(
lidar=LIDAR_WEBOTS,
detection=DetectionConfig(wall_reject=0.5, static_reject=1.2),
tracker=TrackerConfig(
forget_steps=120,
max_new_tracks_per_step=1,
pen_latch_depth=2.0,
),
robot=RobotConfig(action_smooth=0.55),
)
"""Webots-matched training preset.
Changes vs HERDING_DEFAULT:
* LiDAR: 180 rays / 140° FOV matching ShepherdDog.proto hardware
* Detection: wall_reject kept at 0.5 m (original default; static_reject
handles post FPs; 1.0 m was too aggressive near the south gate)
* Tracker: forget_steps 200 → 60 (~1 s ghost-track lifetime)
max_new_tracks_per_step 10 → 3 (rate-caps FP flooding)
* Robot: action_smooth 0.0 → 0.55 (matches Webots controller EMA)
"""