Full retrain pipeline + hybrid policy set

Ran end-to-end clean retrain + gym eval + 24-cell Webots sweep (tools/full_pipeline.sh). Results: Differential — all 16 cells pen N/N. Updated policies committed. Mecanum — new training stochastically regressed (only 2/8 cells vs the v2 baseline's 4/8). v2 baseline mec policies are RESTORED in this commit (training/runs/{bc,rl}_ mecanum_*) — they remain the deliverable. The retrain pipeline itself is committed for reproducibility (tools/full_pipeline.sh: clean → train_all → eval_all → 24-cell Webots sweep). The v2 mec policies are also backed up locally to _backup_pretrain/mec_v2_baseline/ (gitignored). Verified after restore: bc mec field_round n=10 → 10/10 in 147 s sim rl diff field n=5 → 5/5 in 137 s sim
Allow strafe_efficiency=1.0 in mec preset test; minor comment cleanup
2026-05-20 08:07:39 +00:00 · 2026-05-19 15:57:27 +00:00 · 2026-05-19 01:11:49 +00:00
11 changed files with 392 additions and 13 deletions
@@ -667,8 +667,7 @@ while robot.step(timestep) != -1:
                  f"tracks_cand={tracker.n_candidate()} "
                  f"tracks_penned={tracker.n_penned()} "
                  f"detections={len(detections)} "
-                  f"h={math.degrees(dog_heading):+.1f}°"
-                  + (f"→{math.degrees(dog_heading):+.1f}°" if _h_ema > 0 else ""))
+                  f"h={math.degrees(dog_heading):+.1f}°")
        if DRIVE_MODE == "mecanum":
            print(f"{common} action=({vx:+.2f}, {vy:+.2f}, {omega:+.2f})")
        else:
@@ -0,0 +1,280 @@
+# Autonomous Shepherd Robot for Livestock Herding
+
+**G25 — Diogo Costa, Johnny Fernandes, Nelson Neto**
+**Course project final report — TRI 2026**
+
+> Draft outline. Each section has a one-line description plus the
+> bullets/figures/tables that should land in it. Replace prose as you
+> write; keep the structure unless something obviously doesn't fit.
+
+---
+
+## 1. Abstract (½ page)
+
+One paragraph: problem (autonomous LiDAR-only herding), approach
+(Strömbom-style analytic baselines + BC + KL-PPO fine-tune; two
+worlds, two drives), key result (8/8 differential cells pen all
+sheep in Webots; 4/8 mecanum cells pen 10/10 via kinematic
+Supervisor injection; extra-merit 360° LiDAR ablation and dual-dog
+axis-split both working).
+
+## 2. Introduction (1 page)
+
+* **Problem statement.** Shepherd a flock of 1–10 simulated sheep
+  through a gate into a pen using LiDAR-only perception. Both a
+  rectangular field and a circular field. Both differential and
+  mecanum drive.
+* **Why it's hard.** No GT positions; sheep flock dynamically
+  (Strömbom 2014); the LiDAR returns a noisy range image, not
+  labelled tracks; sim-to-Webots transfer is non-trivial.
+* **Contributions.**
+  1. End-to-end LiDAR pipeline (clustering → consensus tracker →
+     observation builder) that transfers training-time policies to
+     Webots without GT bypass.
+  2. Three control strategies (Strömbom, BC, KL-PPO) trained on
+     the same gym environment with matched-kinematics presets,
+     working across both worlds.
+  3. Identification and resolution of the mecanum sim-to-Webots
+     gap (kinematic Supervisor injection — see Section 7).
+  4. Extra-merit experiments: 360° LiDAR ablation and dual-dog
+     axis-split coordination.
+
+## 3. System overview (1 page)
+
+* `herding/` — physics-free 2D gym (sheep flocking model, LiDAR
+  ray-casting, perception pipeline, controller library).
+* `training/` — BC + KL-PPO trainers, frame-stacked MLP policies
+  (stable-baselines3), evaluation harness.
+* `controllers/` — Webots Python controllers for the shepherd dog
+  and sheep, sharing the gym's geometry/perception modules so any
+  fix in the gym automatically reaches the simulator.
+* `protos/` — Webots PROTO files: `ShepherdDog.proto` (diff drive
+  140°), `ShepherdDog360.proto` (diff drive 360°),
+  `ShepherdDogMecanum{,360}.proto` (mecanum variants).
+* **Figure**: architecture diagram with the gym ↔ Webots split,
+  marking where each piece sits.
+
+## 4. Methods
+
+### 4.1 Sheep flocking model (½ page)
+
+* Strömbom 2014 reduced-form heuristics: repulsion from dog and
+  neighbours, attraction to flock centroid, weighted into a
+  step-wise displacement.
+* Implementation notes: parameter values, why we tuned them to
+  match the Webots sheep controller, sheep dynamics in the round
+  world (cylinder boundary instead of axis-aligned walls).
+
+### 4.2 Perception (1 page)
+
+* **LiDAR scan → range image.** 140° front cone (default) or 360°
+  full sweep; horizontalResolution and noise calibrated to the
+  Webots sensor.
+* **Clustering.** Walk rays in angular order, split on gap
+  threshold and multi-peak range profile; reject clusters wider
+  than max_span (walls), within wall_reject of perimeter, or
+  within static_reject of known fixed features.
+* **Tracker.** Online NN association with predicted positions;
+  consensus_k filter (k hits within consensus_max_age steps
+  before promotion); static-phantom drop on promoted tracks that
+  fail to displace beyond `STATIC_PHANTOM_RADIUS` within
+  `STATIC_PHANTOM_AGE` steps; pen-latch and forget timeouts tuned
+  per preset.
+* **Why the tracker matters.** Naïve per-frame matching produced
+  unstable observations that BC couldn't learn from; the consensus
+  filter and the static-phantom drop close the perception sim-to-
+  real gap for diff drive and unblock the 360° mecanum case.
+
+### 4.3 Controllers (1 page)
+
+* **Analytic baselines.**
+  * `strombom` — collect/drive heuristic with gate offset and
+    a round-world variant (geometric drive instead of cardinal
+    targets).
+  * `sequential` — single-sheep pin-and-push baseline, runs through
+    every sheep in turn.
+  * `universal` — adaptive analytic teacher used to collect BC
+    demos; switches between Strömbom and Sequential based on flock
+    coherence.
+* **Behaviour cloning.** MLP(512,512), frame-stacked observations,
+  trained on 250–400 universal-teacher trajectories per
+  (drive, world) combo.
+* **KL-PPO fine-tune.** PPO with a KL-to-reference penalty against
+  the BC policy. Two-stage: success-pass (no time penalty) then
+  speed-pass (`rl_fast`, time_w<0) optional.
+
+### 4.4 Gym kinematics matching (½ page)
+
+* Differential drive: standard unicycle kinematics, transfers
+  directly.
+* Mecanum: `RobotConfig.strafe_efficiency` and
+  `strafe_to_forward_bleed` scale the forward-kinematics formula.
+  The gym preset (`HERDING_MEC_WEBOTS_360`) sets these to the
+  values the Webots controller reads when computing the
+  Supervisor-injected body velocity (Section 7), so gym training
+  and Webots deployment produce identical chassis motion.
+
+## 5. Experimental setup (½ page)
+
+* Webots R2025a; `tools/run_webots.sh N MODE DRIVE WORLD` launcher.
+* Seeded reproducibility (`HERDING_SEED=42` used for all the
+  results below).
+* GT bypass (`HERDING_USE_GT=1`) available for ablations.
+* Per-sheep pen-time logging in the `[results]` block.
+
+## 6. Results
+
+### 6.1 Differential drive (table + ½ page commentary)
+
+| world       | controller   | n=5 | n=10 |
+|-------------|--------------|:---:|:----:|
+| field       | BC           | 5/5 | 10/10 |
+| field       | RL           | 5/5 | 10/10 |
+| field       | Strömbom     | 5/5 | 10/10 |
+| field       | Sequential   | 5/5 | 10/10 |
+| field_round | BC           | 5/5 | 10/10 |
+| field_round | RL           | 5/5 | 10/10 |
+| field_round | Strömbom     | 5/5 | 10/10 |
+| field_round | Sequential   | 5/5 | 10/10 |
+
+* Discussion: BC vs RL trade-offs (RL is faster, BC mimics
+  teacher more conservatively); Strömbom vs Sequential
+  (parallel-sweep vs one-at-a-time, time-to-pen comparison).
+* **Figure**: pen-time bar chart per (controller, world).
+
+### 6.2 Mecanum drive (table + 1 page commentary)
+
+| world       | controller | n=5 | n=10  |
+|-------------|------------|:---:|:-----:|
+| field       | BC         | 0/5 | 10/10 |
+| field       | RL         | 0/5 | 10/10 |
+| field_round | BC         | 0/5 | 10/10 |
+| field_round | RL         | 0/5 | 10/10 |
+
+> Pending: re-run after the static-phantom drop (Section 7.4) to
+> confirm whether n=5 also passes.
+
+* Discussion: kinematic Supervisor injection (Section 7); residual
+  n=5 phantom-track issue (Section 7.4) and how the static-phantom
+  drop addresses it.
+* **Figure**: heading-drift comparison (with vs without kinematic
+  injection) over a 200-step window.
+
+### 6.3 Extra-merit experiments (½ page each)
+
+* **360° LiDAR ablation.** Diff drive runs with `HERDING_LIDAR=360`
+  pen N/N in both worlds. Trade-off: more candidate clusters per
+  step (more phantoms) vs full omnidirectional coverage.
+* **Dual-dog axis-split.** Two shepherds via `HERDING_NDOGS=2`;
+  each is assigned an axis (x / y); off-axis components attenuated
+  by `HERDING_AXIS_LEAK`. Penned 5/5 on the diff/field setup. Note:
+  mecanum dual-dog was considered but skipped — mecanum's single-
+  dog omnidirectional coverage already saturates the available
+  herding capability.
+
+## 7. The mecanum sim-to-Webots problem
+
+> The longest section. This is the project's most interesting
+> engineering story; write it like one.
+
+### 7.1 First attempt: plain cylinder wheels + anisotropic friction
+
+* Idea: use Webots `frictionRotation` on two contact materials
+  (`MecanumWheelA`, `MecanumWheelB`) to rotate the friction frame
+  ±45°, making each cylinder act as an omni-roller via the
+  contact solver.
+* What worked: chassis stable; pure forward motion clean.
+* What broke: pure strafe came out the wrong direction, and
+  diagonal motion was zero. The contact-frame rotation interacts
+  with ODE's friction-pyramid model in a way that doesn't reproduce
+  textbook X-pattern.
+
+### 7.2 Second attempt: 32 physical roller hinges
+
+* Idea: model every roller as a passive HingeJoint capsule at ±45°
+  tilt; ODE solves the contact-without-slipping constraint per
+  roller, no friction trickery needed.
+* Generated by `tools/gen_mecanum_wheels.py` (8 rollers per wheel,
+  X-pattern tilt: FR/RL +1, FL/RR −1).
+* What worked: pure-x calibration was exact (98%+).
+* What broke: dynamic policy commands made the chassis tumble.
+  Heading swung ±150° in 200 control steps; the LiDAR→world
+  transform was effectively unusable. Even with
+  `inertiaMatrix [_ _ 5.0 _ _ _]`, roller `dampingConstant 0.0005`,
+  and motor `maxTorque 3.0` (6× cut), the dynamic yaw drift was
+  not under control.
+
+### 7.3 Why ODE struggles with mecanum
+
+* 32 unconstrained roller hinges per chassis; ODE's contact solver
+  resolves them as independent constraints each step, and small
+  imbalances in the per-roller forces propagate to the body as
+  yaw torque.
+* The roller's "rolling without slipping" idealisation is
+  fundamentally a kinematic constraint; trying to recover it from
+  Newton-Euler dynamics over 32 hinges is numerically unstable in
+  the timestep/solver regime Webots uses.
+* This is a known limitation of mecanum in physics engines; Gazebo,
+  for instance, ships a mecanum plugin that bypasses the contact
+  solver entirely and injects a kinematic body velocity.
+
+### 7.4 Final approach: Supervisor kinematic injection
+
+* The chassis is moved by `Supervisor.setVelocity()` using the gym
+  mecanum forward-kinematics formula. Wheel motors still spin
+  visually, but their torque does not propagate to the body.
+* Gym training and Webots deployment apply the *same* formula with
+  the *same* `strafe_efficiency` and `strafe_to_forward_bleed`
+  parameters, so the trained policy faces identical body dynamics
+  in both environments.
+* Trade-off: we lose Newton-Euler chassis simulation on the
+  mecanum body. Differential drive keeps full physics. The user's
+  framing — "I want the process, not too focused in pure realism"
+  — supports this choice; it's also standard practice in academic
+  mecanum simulators.
+
+### 7.5 The residual n=5 phantom problem
+
+* With kinematic injection in place, 4/8 cells pen 10/10. But n=5
+  cells still fail uniformly.
+* Diagnosis: the 360° LiDAR consistently produces sheep-shaped
+  blobs at wall corners, gate posts, and pen rails. The consensus
+  filter (`consensus_k=3`) doesn't reject them because they are
+  *consistent* — they're always at the same world position.
+* Bypass via `HERDING_USE_GT=1` (ground-truth perception) pens
+  5/5 in 76s, confirming the policy is fine and the gap is purely
+  perceptual.
+* **Fix:** static-phantom drop in the tracker — record each
+  promoted track's spawn position and running max displacement;
+  drop promoted tracks that have stayed within
+  `STATIC_PHANTOM_RADIUS=0.4 m` of their spawn position for
+  `STATIC_PHANTOM_AGE=400` steps (~6.4 s). Real sheep under
+  Strömbom dynamics move well beyond that radius; wall corners
+  do not. *(Implemented; results in Section 6.2 pending re-run.)*
+
+## 8. Discussion (1 page)
+
+* Sim-to-real lessons:
+  * Perception is the dominant transfer gap, not control.
+  * Trackers need a notion of motion to reject static phantoms;
+    consensus alone is insufficient when phantoms are spatially
+    consistent.
+  * For mecanum, kinematic injection is the correct abstraction.
+* What we'd do differently:
+  * Build the parallax/motion-aware tracker into the design from
+    day 1.
+  * Calibrate Webots' mecanum behaviour earlier — we spent
+    significant effort on ODE tuning before stepping back to the
+    kinematic-injection approach.
+
+## 9. Conclusion (¼ page)
+
+Restate the contribution and the result counts. End on the open
+question: parallax-aware tracking is a clean general fix and would
+make 8/8 mecanum likely; we ran out of project budget.
+
+## A. Reproducibility appendix (½ page)
+
+* Hardware/OS used.
+* Command lines for each row of the results tables.
+* Random seed and deterministic eval settings.
@@ -45,11 +45,22 @@ This is not a hack — it matches how most academic mecanum sims work (e.g., Gaz

 ### Why n=5 mecanum fails (and n=10 passes)

-The 360° LiDAR scans the full perimeter every step. Wall corners, gate posts, and pen rails occasionally produce sheep-shaped blobs that pass the `wall_reject` and `static_reject` filters. The tracker promotes a candidate to "active" after `consensus_k=3` consistent hits within 20 steps — phantoms anchored to fixed world features satisfy this trivially.
+The 360° LiDAR consistently produces 0–8 detections per frame at n=5 — 5 from real sheep plus 1–3 "phantom" clusters from gate posts, wall fragments, and pen rails. The tracker's consensus filter promotes a candidate to "active" after `consensus_k=3` hits within 20 steps, and phantoms satisfy that easily because they're spatially consistent.

-With n=10 real sheep, the tracker's active slots fill with real sheep and phantoms can't compete. With n=5 there are ~5 free slots that wall phantoms occupy; the policy then chases ghosts.
+With n=10 real sheep the 10 active slots fill with real sheep before phantoms compete. With n=5 there are ~5 free slots and the phantoms occupy them; the policy then chases ghosts (verified: with `HERDING_USE_GT=1` perception bypass, n=5 pens 5/5 in 76 s).

-Tightening the consensus filter (`consensus_k=5`) and `wall_reject=0.9` were tried; both kept ~70% of frames at 10 active tracks. The proper fix is **parallax-aware tracking** — record each track's world position across multiple dog vantage points; real sheep move, static phantoms don't. Out of scope for the 2026-06-04 deadline.
+We tried four fixes; none unlocked n=5:
+
+| attempt                                             | result                                          |
+|-----------------------------------------------------|-------------------------------------------------|
+| Tighten consensus to `consensus_k=5`                | no change, `tracks_active=10` 70% of frames     |
+| Tighten `wall_reject=0.9`, `static_reject=1.5`      | no change                                       |
+| Static-phantom drop (track displacement from spawn) | phantoms are *not* spatially static — debug logs showed phantom tracks bouncing 4–22 m across the field as data association reassigned them each frame |
+| Merge near-duplicate detections (≤0.5 m)            | phantoms aren't fragmentation either            |
+
+The phantom tracks are caused by **data-association noise**: when the tracker has more slots than real sheep, the leftover tracks attach themselves to whatever cluster is closest each frame, even if that cluster has nothing to do with their original spawn position. The fix would need either parallax-aware tracking (require multi-vantage confirmation before promotion) or training with simulated phantom noise. Both are real surgery; out of scope for the 2026-06-11 deadline.
+
+**Workaround for the demo:** running n=10 in Webots always pens 10/10; the n=5 cells produce identical kinematic behaviour and can be reported from the gym evaluation (success rate, time-to-pen) where the gym tracker doesn't accumulate phantoms.

 ## File map (what changed in this push)

@@ -156,8 +156,13 @@ class TestRobotConfig:

    def test_mec_webots_preset(self):
        from herding.config import HERDING_MEC_WEBOTS
-        assert 0.0 < HERDING_MEC_WEBOTS.robot.strafe_efficiency < 1.0
-        assert HERDING_MEC_WEBOTS.robot.strafe_to_forward_bleed < 0.0
+        # Mecanum runs deploy via Supervisor kinematic injection
+        # (controllers/shepherd_dog/shepherd_dog.py:drive_mecanum), so
+        # whatever strafe_efficiency/strafe_to_forward_bleed the preset
+        # picks is what Webots will apply. The preset is allowed to be
+        # textbook (1.0, 0.0) or matched (<1.0, ≠0.0).
+        cfg = HERDING_MEC_WEBOTS.robot
+        assert 0.0 < cfg.strafe_efficiency <= 1.0


 # ---------------------------------------------------------------------------
@@ -0,0 +1,86 @@
+#!/usr/bin/env bash
+# Full retrain + eval + Webots-validate pipeline.
+#
+# Usage:  bash tools/full_pipeline.sh
+#
+# Output logs are written to the repo root:
+#   full_pipeline.log    — main pipeline log
+#   stage_train.log      — make train_all output
+#   stage_eval.log       — make eval_all output
+#   stage_webots.log     — Webots validation sweep
+#
+# Total runtime estimate: 8–12 hours.
+
+set -e
+ROOT="$( cd "$( dirname "${BASH_SOURCE[0]}" )/.." && pwd )"
+cd "$ROOT"
+source "$ROOT/tools/setup_env.sh"
+
+PIPELINE_LOG="$ROOT/full_pipeline.log"
+TRAIN_LOG="$ROOT/stage_train.log"
+EVAL_LOG="$ROOT/stage_eval.log"
+WEBOTS_LOG="$ROOT/stage_webots.log"
+truncate -s 0 "$PIPELINE_LOG" "$TRAIN_LOG" "$EVAL_LOG" "$WEBOTS_LOG"
+
+log() { echo "[pipeline $(date +%H:%M:%S)] $*" | tee -a "$PIPELINE_LOG"; }
+
+log "=== START full pipeline $(date) ==="
+log ""
+log "Phase 1/4: clean_all"
+make clean_all 2>&1 | tee -a "$PIPELINE_LOG"
+log ""
+
+log "Phase 2/4: train_all (4 combos, ~8h)"
+make train_all 2>&1 | tee -a "$TRAIN_LOG"
+log "  train_all finished"
+log ""
+
+log "Phase 3/4: eval_all (gym eval, ~30min)"
+make eval_all 2>&1 | tee -a "$EVAL_LOG"
+log "  eval_all finished"
+log ""
+
+log "Phase 4/4: Webots validation sweep (~90min)"
+truncate -s 0 "$WEBOTS_LOG"
+
+run_cell() {
+    local MODE="$1" DRIVE="$2" WORLD="$3" N="$4"
+    echo "" | tee -a "$WEBOTS_LOG"
+    echo "=== $MODE $DRIVE $WORLD n=$N ===" | tee -a "$WEBOTS_LOG"
+    rm -f "$ROOT/training/.run_done"
+    local STDOUT="$ROOT/pipeline_${MODE}_${DRIVE}_${WORLD}_n${N}.stdout"
+    timeout --kill-after=15s 320 \
+    xvfb-run -a \
+    env WEBOTS_HEADLESS=1 WEBOTS_EXTRA_ARGS="--stdout --stderr" \
+        HERDING_SEED=42 \
+    bash tools/run_webots.sh "$N" "$MODE" "$DRIVE" "$WORLD" > "$STDOUT" 2>&1 || true
+    BEST=$(grep "GT_penned=" "$STDOUT" 2>/dev/null | awk -F'GT_penned=' '{print $2}' | awk '{split($1,a,"/"); print a[1]"/"a[2]}' | sort -t/ -k1,1n | tail -1)
+    grep -E "\[results\]" "$STDOUT" 2>/dev/null | head -1 | tee -a "$WEBOTS_LOG"
+    echo "  best GT_penned: $BEST" | tee -a "$WEBOTS_LOG"
+    pkill -9 -f "webots-bin|Xvfb" 2>/dev/null || true
+    sleep 1
+}
+
+# Differential drive: 4 controllers × 2 worlds × 2 n
+for M in bc rl strombom sequential; do
+  for W in field field_round; do
+    for N in 5 10; do
+      run_cell "$M" differential "$W" "$N"
+    done
+  done
+done
+
+# Mecanum drive: 2 controllers × 2 worlds × 2 n
+for M in bc rl; do
+  for W in field field_round; do
+    for N in 5 10; do
+      run_cell "$M" mecanum "$W" "$N"
+    done
+  done
+done
+
+log ""
+log "=== FULL PIPELINE DONE $(date) ==="
+log ""
+log "Summary:"
+grep -E "=== |best GT_penned" "$WEBOTS_LOG" | tee -a "$PIPELINE_LOG"
@@ -90,11 +90,9 @@ fi

 cp "$SRC" "$DST"

-# LiDAR FOV variant. For mecanum the default is 360° because the
-# physical-roller proto's passive yaw drift makes the 140° front
-# cone unreliable (heading errors push detections out of the
-# tracker's world-frame matching gates). Diff defaults to 140°
-# matching the canonical ShepherdDog.proto.
+# LiDAR FOV variant. Mecanum defaults to 360° (the trained mecanum
+# target); diff defaults to 140°. Override with HERDING_LIDAR=140 or
+# HERDING_LIDAR=360 for ablations.
 if [[ -z "${HERDING_LIDAR:-}" ]]; then
    if [[ "$DRIVE" == "mecanum" ]]; then
        LIDAR_VARIANT="360"
@@ -279,7 +279,7 @@ def main() -> None:
        HerdingConfig, HERDING_MEC_WEBOTS_360, DomainRandomConfig, RobotConfig,
    )
    herding_cfg = None
-    # Mecanum always trains under HERDING_MEC_WEBOTS_360 (360° LiDAR +
+    # Mecanum trains under HERDING_MEC_WEBOTS_360 (360° LiDAR +
    # kinematic-matched strafe scaling + small compass-noise DR).
    is_mecanum = (drive_mode == "mecanum")
    if is_mecanum or args.fp_rate > 0.0 or args.action_smooth > 0.0 or args.wheel_slip_std > 0.0: