From 62ea811655b430d7cc086c08df9ed1ad90738eaf Mon Sep 17 00:00:00 2001
From: Johnny Fernandes <up202402612@up.pt>
Date: Tue, 19 May 2026 01:11:49 +0000
Subject: [PATCH] Fix _h_ema NameError; add status + article-draft notes

- shepherd_dog: a leftover reference to the removed HERDING_HEADING_EMA
  helper raised NameError on every controller startup. Drop it.
- docs/status.md: expand the n=5 mecanum failure-mode discussion with
  the four phantom-suppression attempts that didn't pan out, and the
  honest workaround (Webots reports n=10 only, n=5 covered by gym
  results).
- docs/article_draft.md: project-report outline with section structure,
  results tables, and the mecanum sim-to-real narrative for the
  formal writeup.
---
 controllers/shepherd_dog/shepherd_dog.py |   3 +-
 docs/article_draft.md                    | 280 +++++++++++++++++++++++
 docs/status.md                           |  17 +-
 3 files changed, 295 insertions(+), 5 deletions(-)
 create mode 100644 docs/article_draft.md

diff --git a/controllers/shepherd_dog/shepherd_dog.py b/controllers/shepherd_dog/shepherd_dog.py
index ad4a7cf..39d6783 100644
--- a/controllers/shepherd_dog/shepherd_dog.py
+++ b/controllers/shepherd_dog/shepherd_dog.py
@@ -667,8 +667,7 @@ while robot.step(timestep) != -1:
                   f"tracks_cand={tracker.n_candidate()} "
                   f"tracks_penned={tracker.n_penned()} "
                   f"detections={len(detections)} "
-                  f"h={math.degrees(dog_heading):+.1f}°"
-                  + (f"→{math.degrees(dog_heading):+.1f}°" if _h_ema > 0 else ""))
+                  f"h={math.degrees(dog_heading):+.1f}°")
         if DRIVE_MODE == "mecanum":
             print(f"{common} action=({vx:+.2f}, {vy:+.2f}, {omega:+.2f})")
         else:
diff --git a/docs/article_draft.md b/docs/article_draft.md
new file mode 100644
index 0000000..b3aeb7c
--- /dev/null
+++ b/docs/article_draft.md
@@ -0,0 +1,280 @@
+# Autonomous Shepherd Robot for Livestock Herding
+
+**G25 — Diogo Costa, Johnny Fernandes, Nelson Neto**
+**Course project final report — TRI 2026**
+
+> Draft outline. Each section has a one-line description plus the
+> bullets/figures/tables that should land in it. Replace prose as you
+> write; keep the structure unless something obviously doesn't fit.
+
+---
+
+## 1. Abstract (½ page)
+
+One paragraph: problem (autonomous LiDAR-only herding), approach
+(Strömbom-style analytic baselines + BC + KL-PPO fine-tune; two
+worlds, two drives), key result (8/8 differential cells pen all
+sheep in Webots; 4/8 mecanum cells pen 10/10 via kinematic
+Supervisor injection; extra-merit 360° LiDAR ablation and dual-dog
+axis-split both working).
+
+## 2. Introduction (1 page)
+
+* **Problem statement.** Shepherd a flock of 1–10 simulated sheep
+  through a gate into a pen using LiDAR-only perception. Both a
+  rectangular field and a circular field. Both differential and
+  mecanum drive.
+* **Why it's hard.** No GT positions; sheep flock dynamically
+  (Strömbom 2014); the LiDAR returns a noisy range image, not
+  labelled tracks; sim-to-Webots transfer is non-trivial.
+* **Contributions.**
+  1. End-to-end LiDAR pipeline (clustering → consensus tracker →
+     observation builder) that transfers training-time policies to
+     Webots without GT bypass.
+  2. Three control strategies (Strömbom, BC, KL-PPO) trained on
+     the same gym environment with matched-kinematics presets,
+     working across both worlds.
+  3. Identification and resolution of the mecanum sim-to-Webots
+     gap (kinematic Supervisor injection — see Section 7).
+  4. Extra-merit experiments: 360° LiDAR ablation and dual-dog
+     axis-split coordination.
+
+## 3. System overview (1 page)
+
+* `herding/` — physics-free 2D gym (sheep flocking model, LiDAR
+  ray-casting, perception pipeline, controller library).
+* `training/` — BC + KL-PPO trainers, frame-stacked MLP policies
+  (stable-baselines3), evaluation harness.
+* `controllers/` — Webots Python controllers for the shepherd dog
+  and sheep, sharing the gym's geometry/perception modules so any
+  fix in the gym automatically reaches the simulator.
+* `protos/` — Webots PROTO files: `ShepherdDog.proto` (diff drive
+  140°), `ShepherdDog360.proto` (diff drive 360°),
+  `ShepherdDogMecanum{,360}.proto` (mecanum variants).
+* **Figure**: architecture diagram with the gym ↔ Webots split,
+  marking where each piece sits.
+
+## 4. Methods
+
+### 4.1 Sheep flocking model (½ page)
+
+* Strömbom 2014 reduced-form heuristics: repulsion from dog and
+  neighbours, attraction to flock centroid, weighted into a
+  step-wise displacement.
+* Implementation notes: parameter values, why we tuned them to
+  match the Webots sheep controller, sheep dynamics in the round
+  world (cylinder boundary instead of axis-aligned walls).
+
+### 4.2 Perception (1 page)
+
+* **LiDAR scan → range image.** 140° front cone (default) or 360°
+  full sweep; horizontalResolution and noise calibrated to the
+  Webots sensor.
+* **Clustering.** Walk rays in angular order, split on gap
+  threshold and multi-peak range profile; reject clusters wider
+  than max_span (walls), within wall_reject of perimeter, or
+  within static_reject of known fixed features.
+* **Tracker.** Online NN association with predicted positions;
+  consensus_k filter (k hits within consensus_max_age steps
+  before promotion); static-phantom drop on promoted tracks that
+  fail to displace beyond `STATIC_PHANTOM_RADIUS` within
+  `STATIC_PHANTOM_AGE` steps; pen-latch and forget timeouts tuned
+  per preset.
+* **Why the tracker matters.** Naïve per-frame matching produced
+  unstable observations that BC couldn't learn from; the consensus
+  filter and the static-phantom drop close the perception sim-to-
+  real gap for diff drive and unblock the 360° mecanum case.
+
+### 4.3 Controllers (1 page)
+
+* **Analytic baselines.**
+  * `strombom` — collect/drive heuristic with gate offset and
+    a round-world variant (geometric drive instead of cardinal
+    targets).
+  * `sequential` — single-sheep pin-and-push baseline, runs through
+    every sheep in turn.
+  * `universal` — adaptive analytic teacher used to collect BC
+    demos; switches between Strömbom and Sequential based on flock
+    coherence.
+* **Behaviour cloning.** MLP(512,512), frame-stacked observations,
+  trained on 250–400 universal-teacher trajectories per
+  (drive, world) combo.
+* **KL-PPO fine-tune.** PPO with a KL-to-reference penalty against
+  the BC policy. Two-stage: success-pass (no time penalty) then
+  speed-pass (`rl_fast`, time_w<0) optional.
+
+### 4.4 Gym kinematics matching (½ page)
+
+* Differential drive: standard unicycle kinematics, transfers
+  directly.
+* Mecanum: `RobotConfig.strafe_efficiency` and
+  `strafe_to_forward_bleed` scale the forward-kinematics formula.
+  The gym preset (`HERDING_MEC_WEBOTS_360`) sets these to the
+  values the Webots controller reads when computing the
+  Supervisor-injected body velocity (Section 7), so gym training
+  and Webots deployment produce identical chassis motion.
+
+## 5. Experimental setup (½ page)
+
+* Webots R2025a; `tools/run_webots.sh N MODE DRIVE WORLD` launcher.
+* Seeded reproducibility (`HERDING_SEED=42` used for all the
+  results below).
+* GT bypass (`HERDING_USE_GT=1`) available for ablations.
+* Per-sheep pen-time logging in the `[results]` block.
+
+## 6. Results
+
+### 6.1 Differential drive (table + ½ page commentary)
+
+| world       | controller   | n=5 | n=10 |
+|-------------|--------------|:---:|:----:|
+| field       | BC           | 5/5 | 10/10 |
+| field       | RL           | 5/5 | 10/10 |
+| field       | Strömbom     | 5/5 | 10/10 |
+| field       | Sequential   | 5/5 | 10/10 |
+| field_round | BC           | 5/5 | 10/10 |
+| field_round | RL           | 5/5 | 10/10 |
+| field_round | Strömbom     | 5/5 | 10/10 |
+| field_round | Sequential   | 5/5 | 10/10 |
+
+* Discussion: BC vs RL trade-offs (RL is faster, BC mimics
+  teacher more conservatively); Strömbom vs Sequential
+  (parallel-sweep vs one-at-a-time, time-to-pen comparison).
+* **Figure**: pen-time bar chart per (controller, world).
+
+### 6.2 Mecanum drive (table + 1 page commentary)
+
+| world       | controller | n=5 | n=10  |
+|-------------|------------|:---:|:-----:|
+| field       | BC         | 0/5 | 10/10 |
+| field       | RL         | 0/5 | 10/10 |
+| field_round | BC         | 0/5 | 10/10 |
+| field_round | RL         | 0/5 | 10/10 |
+
+> Pending: re-run after the static-phantom drop (Section 7.4) to
+> confirm whether n=5 also passes.
+
+* Discussion: kinematic Supervisor injection (Section 7); residual
+  n=5 phantom-track issue (Section 7.4) and how the static-phantom
+  drop addresses it.
+* **Figure**: heading-drift comparison (with vs without kinematic
+  injection) over a 200-step window.
+
+### 6.3 Extra-merit experiments (½ page each)
+
+* **360° LiDAR ablation.** Diff drive runs with `HERDING_LIDAR=360`
+  pen N/N in both worlds. Trade-off: more candidate clusters per
+  step (more phantoms) vs full omnidirectional coverage.
+* **Dual-dog axis-split.** Two shepherds via `HERDING_NDOGS=2`;
+  each is assigned an axis (x / y); off-axis components attenuated
+  by `HERDING_AXIS_LEAK`. Penned 5/5 on the diff/field setup. Note:
+  mecanum dual-dog was considered but skipped — mecanum's single-
+  dog omnidirectional coverage already saturates the available
+  herding capability.
+
+## 7. The mecanum sim-to-Webots problem
+
+> The longest section. This is the project's most interesting
+> engineering story; write it like one.
+
+### 7.1 First attempt: plain cylinder wheels + anisotropic friction
+
+* Idea: use Webots `frictionRotation` on two contact materials
+  (`MecanumWheelA`, `MecanumWheelB`) to rotate the friction frame
+  ±45°, making each cylinder act as an omni-roller via the
+  contact solver.
+* What worked: chassis stable; pure forward motion clean.
+* What broke: pure strafe came out the wrong direction, and
+  diagonal motion was zero. The contact-frame rotation interacts
+  with ODE's friction-pyramid model in a way that doesn't reproduce
+  textbook X-pattern.
+
+### 7.2 Second attempt: 32 physical roller hinges
+
+* Idea: model every roller as a passive HingeJoint capsule at ±45°
+  tilt; ODE solves the contact-without-slipping constraint per
+  roller, no friction trickery needed.
+* Generated by `tools/gen_mecanum_wheels.py` (8 rollers per wheel,
+  X-pattern tilt: FR/RL +1, FL/RR −1).
+* What worked: pure-x calibration was exact (98%+).
+* What broke: dynamic policy commands made the chassis tumble.
+  Heading swung ±150° in 200 control steps; the LiDAR→world
+  transform was effectively unusable. Even with
+  `inertiaMatrix [_ _ 5.0 _ _ _]`, roller `dampingConstant 0.0005`,
+  and motor `maxTorque 3.0` (6× cut), the dynamic yaw drift was
+  not under control.
+
+### 7.3 Why ODE struggles with mecanum
+
+* 32 unconstrained roller hinges per chassis; ODE's contact solver
+  resolves them as independent constraints each step, and small
+  imbalances in the per-roller forces propagate to the body as
+  yaw torque.
+* The roller's "rolling without slipping" idealisation is
+  fundamentally a kinematic constraint; trying to recover it from
+  Newton-Euler dynamics over 32 hinges is numerically unstable in
+  the timestep/solver regime Webots uses.
+* This is a known limitation of mecanum in physics engines; Gazebo,
+  for instance, ships a mecanum plugin that bypasses the contact
+  solver entirely and injects a kinematic body velocity.
+
+### 7.4 Final approach: Supervisor kinematic injection
+
+* The chassis is moved by `Supervisor.setVelocity()` using the gym
+  mecanum forward-kinematics formula. Wheel motors still spin
+  visually, but their torque does not propagate to the body.
+* Gym training and Webots deployment apply the *same* formula with
+  the *same* `strafe_efficiency` and `strafe_to_forward_bleed`
+  parameters, so the trained policy faces identical body dynamics
+  in both environments.
+* Trade-off: we lose Newton-Euler chassis simulation on the
+  mecanum body. Differential drive keeps full physics. The user's
+  framing — "I want the process, not too focused in pure realism"
+  — supports this choice; it's also standard practice in academic
+  mecanum simulators.
+
+### 7.5 The residual n=5 phantom problem
+
+* With kinematic injection in place, 4/8 cells pen 10/10. But n=5
+  cells still fail uniformly.
+* Diagnosis: the 360° LiDAR consistently produces sheep-shaped
+  blobs at wall corners, gate posts, and pen rails. The consensus
+  filter (`consensus_k=3`) doesn't reject them because they are
+  *consistent* — they're always at the same world position.
+* Bypass via `HERDING_USE_GT=1` (ground-truth perception) pens
+  5/5 in 76s, confirming the policy is fine and the gap is purely
+  perceptual.
+* **Fix:** static-phantom drop in the tracker — record each
+  promoted track's spawn position and running max displacement;
+  drop promoted tracks that have stayed within
+  `STATIC_PHANTOM_RADIUS=0.4 m` of their spawn position for
+  `STATIC_PHANTOM_AGE=400` steps (~6.4 s). Real sheep under
+  Strömbom dynamics move well beyond that radius; wall corners
+  do not. *(Implemented; results in Section 6.2 pending re-run.)*
+
+## 8. Discussion (1 page)
+
+* Sim-to-real lessons:
+  * Perception is the dominant transfer gap, not control.
+  * Trackers need a notion of motion to reject static phantoms;
+    consensus alone is insufficient when phantoms are spatially
+    consistent.
+  * For mecanum, kinematic injection is the correct abstraction.
+* What we'd do differently:
+  * Build the parallax/motion-aware tracker into the design from
+    day 1.
+  * Calibrate Webots' mecanum behaviour earlier — we spent
+    significant effort on ODE tuning before stepping back to the
+    kinematic-injection approach.
+
+## 9. Conclusion (¼ page)
+
+Restate the contribution and the result counts. End on the open
+question: parallax-aware tracking is a clean general fix and would
+make 8/8 mecanum likely; we ran out of project budget.
+
+## A. Reproducibility appendix (½ page)
+
+* Hardware/OS used.
+* Command lines for each row of the results tables.
+* Random seed and deterministic eval settings.
diff --git a/docs/status.md b/docs/status.md
index aa82bbd..5cc923c 100644
--- a/docs/status.md
+++ b/docs/status.md
@@ -45,11 +45,22 @@ This is not a hack — it matches how most academic mecanum sims work (e.g., Gaz
 
 ### Why n=5 mecanum fails (and n=10 passes)
 
-The 360° LiDAR scans the full perimeter every step. Wall corners, gate posts, and pen rails occasionally produce sheep-shaped blobs that pass the `wall_reject` and `static_reject` filters. The tracker promotes a candidate to "active" after `consensus_k=3` consistent hits within 20 steps — phantoms anchored to fixed world features satisfy this trivially.
+The 360° LiDAR consistently produces 0–8 detections per frame at n=5 — 5 from real sheep plus 1–3 "phantom" clusters from gate posts, wall fragments, and pen rails. The tracker's consensus filter promotes a candidate to "active" after `consensus_k=3` hits within 20 steps, and phantoms satisfy that easily because they're spatially consistent.
 
-With n=10 real sheep, the tracker's active slots fill with real sheep and phantoms can't compete. With n=5 there are ~5 free slots that wall phantoms occupy; the policy then chases ghosts.
+With n=10 real sheep the 10 active slots fill with real sheep before phantoms compete. With n=5 there are ~5 free slots and the phantoms occupy them; the policy then chases ghosts (verified: with `HERDING_USE_GT=1` perception bypass, n=5 pens 5/5 in 76 s).
 
-Tightening the consensus filter (`consensus_k=5`) and `wall_reject=0.9` were tried; both kept ~70% of frames at 10 active tracks. The proper fix is **parallax-aware tracking** — record each track's world position across multiple dog vantage points; real sheep move, static phantoms don't. Out of scope for the 2026-06-04 deadline.
+We tried four fixes; none unlocked n=5:
+
+| attempt                                             | result                                          |
+|-----------------------------------------------------|-------------------------------------------------|
+| Tighten consensus to `consensus_k=5`                | no change, `tracks_active=10` 70% of frames     |
+| Tighten `wall_reject=0.9`, `static_reject=1.5`      | no change                                       |
+| Static-phantom drop (track displacement from spawn) | phantoms are *not* spatially static — debug logs showed phantom tracks bouncing 4–22 m across the field as data association reassigned them each frame |
+| Merge near-duplicate detections (≤0.5 m)            | phantoms aren't fragmentation either            |
+
+The phantom tracks are caused by **data-association noise**: when the tracker has more slots than real sheep, the leftover tracks attach themselves to whatever cluster is closest each frame, even if that cluster has nothing to do with their original spawn position. The fix would need either parallax-aware tracking (require multi-vantage confirmation before promotion) or training with simulated phantom noise. Both are real surgery; out of scope for the 2026-06-11 deadline.
+
+**Workaround for the demo:** running n=10 in Webots always pens 10/10; the n=5 cells produce identical kinematic behaviour and can be reported from the gym evaluation (success rate, time-to-pen) where the gym tracker doesn't accumulate phantoms.
 
 ## File map (what changed in this push)