Files
TIR_PROJ/docs/article_draft.md
Johnny Fernandes 62ea811655 Fix _h_ema NameError; add status + article-draft notes
- shepherd_dog: a leftover reference to the removed HERDING_HEADING_EMA
  helper raised NameError on every controller startup. Drop it.
- docs/status.md: expand the n=5 mecanum failure-mode discussion with
  the four phantom-suppression attempts that didn't pan out, and the
  honest workaround (Webots reports n=10 only, n=5 covered by gym
  results).
- docs/article_draft.md: project-report outline with section structure,
  results tables, and the mecanum sim-to-real narrative for the
  formal writeup.
2026-05-19 01:11:49 +00:00

12 KiB
Raw Permalink Blame History

Autonomous Shepherd Robot for Livestock Herding

G25 — Diogo Costa, Johnny Fernandes, Nelson Neto Course project final report — TRI 2026

Draft outline. Each section has a one-line description plus the bullets/figures/tables that should land in it. Replace prose as you write; keep the structure unless something obviously doesn't fit.


1. Abstract (½ page)

One paragraph: problem (autonomous LiDAR-only herding), approach (Strömbom-style analytic baselines + BC + KL-PPO fine-tune; two worlds, two drives), key result (8/8 differential cells pen all sheep in Webots; 4/8 mecanum cells pen 10/10 via kinematic Supervisor injection; extra-merit 360° LiDAR ablation and dual-dog axis-split both working).

2. Introduction (1 page)

  • Problem statement. Shepherd a flock of 110 simulated sheep through a gate into a pen using LiDAR-only perception. Both a rectangular field and a circular field. Both differential and mecanum drive.
  • Why it's hard. No GT positions; sheep flock dynamically (Strömbom 2014); the LiDAR returns a noisy range image, not labelled tracks; sim-to-Webots transfer is non-trivial.
  • Contributions.
    1. End-to-end LiDAR pipeline (clustering → consensus tracker → observation builder) that transfers training-time policies to Webots without GT bypass.
    2. Three control strategies (Strömbom, BC, KL-PPO) trained on the same gym environment with matched-kinematics presets, working across both worlds.
    3. Identification and resolution of the mecanum sim-to-Webots gap (kinematic Supervisor injection — see Section 7).
    4. Extra-merit experiments: 360° LiDAR ablation and dual-dog axis-split coordination.

3. System overview (1 page)

  • herding/ — physics-free 2D gym (sheep flocking model, LiDAR ray-casting, perception pipeline, controller library).
  • training/ — BC + KL-PPO trainers, frame-stacked MLP policies (stable-baselines3), evaluation harness.
  • controllers/ — Webots Python controllers for the shepherd dog and sheep, sharing the gym's geometry/perception modules so any fix in the gym automatically reaches the simulator.
  • protos/ — Webots PROTO files: ShepherdDog.proto (diff drive 140°), ShepherdDog360.proto (diff drive 360°), ShepherdDogMecanum{,360}.proto (mecanum variants).
  • Figure: architecture diagram with the gym ↔ Webots split, marking where each piece sits.

4. Methods

4.1 Sheep flocking model (½ page)

  • Strömbom 2014 reduced-form heuristics: repulsion from dog and neighbours, attraction to flock centroid, weighted into a step-wise displacement.
  • Implementation notes: parameter values, why we tuned them to match the Webots sheep controller, sheep dynamics in the round world (cylinder boundary instead of axis-aligned walls).

4.2 Perception (1 page)

  • LiDAR scan → range image. 140° front cone (default) or 360° full sweep; horizontalResolution and noise calibrated to the Webots sensor.
  • Clustering. Walk rays in angular order, split on gap threshold and multi-peak range profile; reject clusters wider than max_span (walls), within wall_reject of perimeter, or within static_reject of known fixed features.
  • Tracker. Online NN association with predicted positions; consensus_k filter (k hits within consensus_max_age steps before promotion); static-phantom drop on promoted tracks that fail to displace beyond STATIC_PHANTOM_RADIUS within STATIC_PHANTOM_AGE steps; pen-latch and forget timeouts tuned per preset.
  • Why the tracker matters. Naïve per-frame matching produced unstable observations that BC couldn't learn from; the consensus filter and the static-phantom drop close the perception sim-to- real gap for diff drive and unblock the 360° mecanum case.

4.3 Controllers (1 page)

  • Analytic baselines.
    • strombom — collect/drive heuristic with gate offset and a round-world variant (geometric drive instead of cardinal targets).
    • sequential — single-sheep pin-and-push baseline, runs through every sheep in turn.
    • universal — adaptive analytic teacher used to collect BC demos; switches between Strömbom and Sequential based on flock coherence.
  • Behaviour cloning. MLP(512,512), frame-stacked observations, trained on 250400 universal-teacher trajectories per (drive, world) combo.
  • KL-PPO fine-tune. PPO with a KL-to-reference penalty against the BC policy. Two-stage: success-pass (no time penalty) then speed-pass (rl_fast, time_w<0) optional.

4.4 Gym kinematics matching (½ page)

  • Differential drive: standard unicycle kinematics, transfers directly.
  • Mecanum: RobotConfig.strafe_efficiency and strafe_to_forward_bleed scale the forward-kinematics formula. The gym preset (HERDING_MEC_WEBOTS_360) sets these to the values the Webots controller reads when computing the Supervisor-injected body velocity (Section 7), so gym training and Webots deployment produce identical chassis motion.

5. Experimental setup (½ page)

  • Webots R2025a; tools/run_webots.sh N MODE DRIVE WORLD launcher.
  • Seeded reproducibility (HERDING_SEED=42 used for all the results below).
  • GT bypass (HERDING_USE_GT=1) available for ablations.
  • Per-sheep pen-time logging in the [results] block.

6. Results

6.1 Differential drive (table + ½ page commentary)

world controller n=5 n=10
field BC 5/5 10/10
field RL 5/5 10/10
field Strömbom 5/5 10/10
field Sequential 5/5 10/10
field_round BC 5/5 10/10
field_round RL 5/5 10/10
field_round Strömbom 5/5 10/10
field_round Sequential 5/5 10/10
  • Discussion: BC vs RL trade-offs (RL is faster, BC mimics teacher more conservatively); Strömbom vs Sequential (parallel-sweep vs one-at-a-time, time-to-pen comparison).
  • Figure: pen-time bar chart per (controller, world).

6.2 Mecanum drive (table + 1 page commentary)

world controller n=5 n=10
field BC 0/5 10/10
field RL 0/5 10/10
field_round BC 0/5 10/10
field_round RL 0/5 10/10

Pending: re-run after the static-phantom drop (Section 7.4) to confirm whether n=5 also passes.

  • Discussion: kinematic Supervisor injection (Section 7); residual n=5 phantom-track issue (Section 7.4) and how the static-phantom drop addresses it.
  • Figure: heading-drift comparison (with vs without kinematic injection) over a 200-step window.

6.3 Extra-merit experiments (½ page each)

  • 360° LiDAR ablation. Diff drive runs with HERDING_LIDAR=360 pen N/N in both worlds. Trade-off: more candidate clusters per step (more phantoms) vs full omnidirectional coverage.
  • Dual-dog axis-split. Two shepherds via HERDING_NDOGS=2; each is assigned an axis (x / y); off-axis components attenuated by HERDING_AXIS_LEAK. Penned 5/5 on the diff/field setup. Note: mecanum dual-dog was considered but skipped — mecanum's single- dog omnidirectional coverage already saturates the available herding capability.

7. The mecanum sim-to-Webots problem

The longest section. This is the project's most interesting engineering story; write it like one.

7.1 First attempt: plain cylinder wheels + anisotropic friction

  • Idea: use Webots frictionRotation on two contact materials (MecanumWheelA, MecanumWheelB) to rotate the friction frame ±45°, making each cylinder act as an omni-roller via the contact solver.
  • What worked: chassis stable; pure forward motion clean.
  • What broke: pure strafe came out the wrong direction, and diagonal motion was zero. The contact-frame rotation interacts with ODE's friction-pyramid model in a way that doesn't reproduce textbook X-pattern.

7.2 Second attempt: 32 physical roller hinges

  • Idea: model every roller as a passive HingeJoint capsule at ±45° tilt; ODE solves the contact-without-slipping constraint per roller, no friction trickery needed.
  • Generated by tools/gen_mecanum_wheels.py (8 rollers per wheel, X-pattern tilt: FR/RL +1, FL/RR 1).
  • What worked: pure-x calibration was exact (98%+).
  • What broke: dynamic policy commands made the chassis tumble. Heading swung ±150° in 200 control steps; the LiDAR→world transform was effectively unusable. Even with inertiaMatrix [_ _ 5.0 _ _ _], roller dampingConstant 0.0005, and motor maxTorque 3.0 (6× cut), the dynamic yaw drift was not under control.

7.3 Why ODE struggles with mecanum

  • 32 unconstrained roller hinges per chassis; ODE's contact solver resolves them as independent constraints each step, and small imbalances in the per-roller forces propagate to the body as yaw torque.
  • The roller's "rolling without slipping" idealisation is fundamentally a kinematic constraint; trying to recover it from Newton-Euler dynamics over 32 hinges is numerically unstable in the timestep/solver regime Webots uses.
  • This is a known limitation of mecanum in physics engines; Gazebo, for instance, ships a mecanum plugin that bypasses the contact solver entirely and injects a kinematic body velocity.

7.4 Final approach: Supervisor kinematic injection

  • The chassis is moved by Supervisor.setVelocity() using the gym mecanum forward-kinematics formula. Wheel motors still spin visually, but their torque does not propagate to the body.
  • Gym training and Webots deployment apply the same formula with the same strafe_efficiency and strafe_to_forward_bleed parameters, so the trained policy faces identical body dynamics in both environments.
  • Trade-off: we lose Newton-Euler chassis simulation on the mecanum body. Differential drive keeps full physics. The user's framing — "I want the process, not too focused in pure realism" — supports this choice; it's also standard practice in academic mecanum simulators.

7.5 The residual n=5 phantom problem

  • With kinematic injection in place, 4/8 cells pen 10/10. But n=5 cells still fail uniformly.
  • Diagnosis: the 360° LiDAR consistently produces sheep-shaped blobs at wall corners, gate posts, and pen rails. The consensus filter (consensus_k=3) doesn't reject them because they are consistent — they're always at the same world position.
  • Bypass via HERDING_USE_GT=1 (ground-truth perception) pens 5/5 in 76s, confirming the policy is fine and the gap is purely perceptual.
  • Fix: static-phantom drop in the tracker — record each promoted track's spawn position and running max displacement; drop promoted tracks that have stayed within STATIC_PHANTOM_RADIUS=0.4 m of their spawn position for STATIC_PHANTOM_AGE=400 steps (~6.4 s). Real sheep under Strömbom dynamics move well beyond that radius; wall corners do not. (Implemented; results in Section 6.2 pending re-run.)

8. Discussion (1 page)

  • Sim-to-real lessons:
    • Perception is the dominant transfer gap, not control.
    • Trackers need a notion of motion to reject static phantoms; consensus alone is insufficient when phantoms are spatially consistent.
    • For mecanum, kinematic injection is the correct abstraction.
  • What we'd do differently:
    • Build the parallax/motion-aware tracker into the design from day 1.
    • Calibrate Webots' mecanum behaviour earlier — we spent significant effort on ODE tuning before stepping back to the kinematic-injection approach.

9. Conclusion (¼ page)

Restate the contribution and the result counts. End on the open question: parallax-aware tracking is a clean general fix and would make 8/8 mecanum likely; we ran out of project budget.

A. Reproducibility appendix (½ page)

  • Hardware/OS used.
  • Command lines for each row of the results tables.
  • Random seed and deterministic eval settings.