Fix _h_ema NameError; add status + article-draft notes
- shepherd_dog: a leftover reference to the removed HERDING_HEADING_EMA helper raised NameError on every controller startup. Drop it. - docs/status.md: expand the n=5 mecanum failure-mode discussion with the four phantom-suppression attempts that didn't pan out, and the honest workaround (Webots reports n=10 only, n=5 covered by gym results). - docs/article_draft.md: project-report outline with section structure, results tables, and the mecanum sim-to-real narrative for the formal writeup.
This commit is contained in:
@@ -0,0 +1,280 @@
|
||||
# Autonomous Shepherd Robot for Livestock Herding
|
||||
|
||||
**G25 — Diogo Costa, Johnny Fernandes, Nelson Neto**
|
||||
**Course project final report — TRI 2026**
|
||||
|
||||
> Draft outline. Each section has a one-line description plus the
|
||||
> bullets/figures/tables that should land in it. Replace prose as you
|
||||
> write; keep the structure unless something obviously doesn't fit.
|
||||
|
||||
---
|
||||
|
||||
## 1. Abstract (½ page)
|
||||
|
||||
One paragraph: problem (autonomous LiDAR-only herding), approach
|
||||
(Strömbom-style analytic baselines + BC + KL-PPO fine-tune; two
|
||||
worlds, two drives), key result (8/8 differential cells pen all
|
||||
sheep in Webots; 4/8 mecanum cells pen 10/10 via kinematic
|
||||
Supervisor injection; extra-merit 360° LiDAR ablation and dual-dog
|
||||
axis-split both working).
|
||||
|
||||
## 2. Introduction (1 page)
|
||||
|
||||
* **Problem statement.** Shepherd a flock of 1–10 simulated sheep
|
||||
through a gate into a pen using LiDAR-only perception. Both a
|
||||
rectangular field and a circular field. Both differential and
|
||||
mecanum drive.
|
||||
* **Why it's hard.** No GT positions; sheep flock dynamically
|
||||
(Strömbom 2014); the LiDAR returns a noisy range image, not
|
||||
labelled tracks; sim-to-Webots transfer is non-trivial.
|
||||
* **Contributions.**
|
||||
1. End-to-end LiDAR pipeline (clustering → consensus tracker →
|
||||
observation builder) that transfers training-time policies to
|
||||
Webots without GT bypass.
|
||||
2. Three control strategies (Strömbom, BC, KL-PPO) trained on
|
||||
the same gym environment with matched-kinematics presets,
|
||||
working across both worlds.
|
||||
3. Identification and resolution of the mecanum sim-to-Webots
|
||||
gap (kinematic Supervisor injection — see Section 7).
|
||||
4. Extra-merit experiments: 360° LiDAR ablation and dual-dog
|
||||
axis-split coordination.
|
||||
|
||||
## 3. System overview (1 page)
|
||||
|
||||
* `herding/` — physics-free 2D gym (sheep flocking model, LiDAR
|
||||
ray-casting, perception pipeline, controller library).
|
||||
* `training/` — BC + KL-PPO trainers, frame-stacked MLP policies
|
||||
(stable-baselines3), evaluation harness.
|
||||
* `controllers/` — Webots Python controllers for the shepherd dog
|
||||
and sheep, sharing the gym's geometry/perception modules so any
|
||||
fix in the gym automatically reaches the simulator.
|
||||
* `protos/` — Webots PROTO files: `ShepherdDog.proto` (diff drive
|
||||
140°), `ShepherdDog360.proto` (diff drive 360°),
|
||||
`ShepherdDogMecanum{,360}.proto` (mecanum variants).
|
||||
* **Figure**: architecture diagram with the gym ↔ Webots split,
|
||||
marking where each piece sits.
|
||||
|
||||
## 4. Methods
|
||||
|
||||
### 4.1 Sheep flocking model (½ page)
|
||||
|
||||
* Strömbom 2014 reduced-form heuristics: repulsion from dog and
|
||||
neighbours, attraction to flock centroid, weighted into a
|
||||
step-wise displacement.
|
||||
* Implementation notes: parameter values, why we tuned them to
|
||||
match the Webots sheep controller, sheep dynamics in the round
|
||||
world (cylinder boundary instead of axis-aligned walls).
|
||||
|
||||
### 4.2 Perception (1 page)
|
||||
|
||||
* **LiDAR scan → range image.** 140° front cone (default) or 360°
|
||||
full sweep; horizontalResolution and noise calibrated to the
|
||||
Webots sensor.
|
||||
* **Clustering.** Walk rays in angular order, split on gap
|
||||
threshold and multi-peak range profile; reject clusters wider
|
||||
than max_span (walls), within wall_reject of perimeter, or
|
||||
within static_reject of known fixed features.
|
||||
* **Tracker.** Online NN association with predicted positions;
|
||||
consensus_k filter (k hits within consensus_max_age steps
|
||||
before promotion); static-phantom drop on promoted tracks that
|
||||
fail to displace beyond `STATIC_PHANTOM_RADIUS` within
|
||||
`STATIC_PHANTOM_AGE` steps; pen-latch and forget timeouts tuned
|
||||
per preset.
|
||||
* **Why the tracker matters.** Naïve per-frame matching produced
|
||||
unstable observations that BC couldn't learn from; the consensus
|
||||
filter and the static-phantom drop close the perception sim-to-
|
||||
real gap for diff drive and unblock the 360° mecanum case.
|
||||
|
||||
### 4.3 Controllers (1 page)
|
||||
|
||||
* **Analytic baselines.**
|
||||
* `strombom` — collect/drive heuristic with gate offset and
|
||||
a round-world variant (geometric drive instead of cardinal
|
||||
targets).
|
||||
* `sequential` — single-sheep pin-and-push baseline, runs through
|
||||
every sheep in turn.
|
||||
* `universal` — adaptive analytic teacher used to collect BC
|
||||
demos; switches between Strömbom and Sequential based on flock
|
||||
coherence.
|
||||
* **Behaviour cloning.** MLP(512,512), frame-stacked observations,
|
||||
trained on 250–400 universal-teacher trajectories per
|
||||
(drive, world) combo.
|
||||
* **KL-PPO fine-tune.** PPO with a KL-to-reference penalty against
|
||||
the BC policy. Two-stage: success-pass (no time penalty) then
|
||||
speed-pass (`rl_fast`, time_w<0) optional.
|
||||
|
||||
### 4.4 Gym kinematics matching (½ page)
|
||||
|
||||
* Differential drive: standard unicycle kinematics, transfers
|
||||
directly.
|
||||
* Mecanum: `RobotConfig.strafe_efficiency` and
|
||||
`strafe_to_forward_bleed` scale the forward-kinematics formula.
|
||||
The gym preset (`HERDING_MEC_WEBOTS_360`) sets these to the
|
||||
values the Webots controller reads when computing the
|
||||
Supervisor-injected body velocity (Section 7), so gym training
|
||||
and Webots deployment produce identical chassis motion.
|
||||
|
||||
## 5. Experimental setup (½ page)
|
||||
|
||||
* Webots R2025a; `tools/run_webots.sh N MODE DRIVE WORLD` launcher.
|
||||
* Seeded reproducibility (`HERDING_SEED=42` used for all the
|
||||
results below).
|
||||
* GT bypass (`HERDING_USE_GT=1`) available for ablations.
|
||||
* Per-sheep pen-time logging in the `[results]` block.
|
||||
|
||||
## 6. Results
|
||||
|
||||
### 6.1 Differential drive (table + ½ page commentary)
|
||||
|
||||
| world | controller | n=5 | n=10 |
|
||||
|-------------|--------------|:---:|:----:|
|
||||
| field | BC | 5/5 | 10/10 |
|
||||
| field | RL | 5/5 | 10/10 |
|
||||
| field | Strömbom | 5/5 | 10/10 |
|
||||
| field | Sequential | 5/5 | 10/10 |
|
||||
| field_round | BC | 5/5 | 10/10 |
|
||||
| field_round | RL | 5/5 | 10/10 |
|
||||
| field_round | Strömbom | 5/5 | 10/10 |
|
||||
| field_round | Sequential | 5/5 | 10/10 |
|
||||
|
||||
* Discussion: BC vs RL trade-offs (RL is faster, BC mimics
|
||||
teacher more conservatively); Strömbom vs Sequential
|
||||
(parallel-sweep vs one-at-a-time, time-to-pen comparison).
|
||||
* **Figure**: pen-time bar chart per (controller, world).
|
||||
|
||||
### 6.2 Mecanum drive (table + 1 page commentary)
|
||||
|
||||
| world | controller | n=5 | n=10 |
|
||||
|-------------|------------|:---:|:-----:|
|
||||
| field | BC | 0/5 | 10/10 |
|
||||
| field | RL | 0/5 | 10/10 |
|
||||
| field_round | BC | 0/5 | 10/10 |
|
||||
| field_round | RL | 0/5 | 10/10 |
|
||||
|
||||
> Pending: re-run after the static-phantom drop (Section 7.4) to
|
||||
> confirm whether n=5 also passes.
|
||||
|
||||
* Discussion: kinematic Supervisor injection (Section 7); residual
|
||||
n=5 phantom-track issue (Section 7.4) and how the static-phantom
|
||||
drop addresses it.
|
||||
* **Figure**: heading-drift comparison (with vs without kinematic
|
||||
injection) over a 200-step window.
|
||||
|
||||
### 6.3 Extra-merit experiments (½ page each)
|
||||
|
||||
* **360° LiDAR ablation.** Diff drive runs with `HERDING_LIDAR=360`
|
||||
pen N/N in both worlds. Trade-off: more candidate clusters per
|
||||
step (more phantoms) vs full omnidirectional coverage.
|
||||
* **Dual-dog axis-split.** Two shepherds via `HERDING_NDOGS=2`;
|
||||
each is assigned an axis (x / y); off-axis components attenuated
|
||||
by `HERDING_AXIS_LEAK`. Penned 5/5 on the diff/field setup. Note:
|
||||
mecanum dual-dog was considered but skipped — mecanum's single-
|
||||
dog omnidirectional coverage already saturates the available
|
||||
herding capability.
|
||||
|
||||
## 7. The mecanum sim-to-Webots problem
|
||||
|
||||
> The longest section. This is the project's most interesting
|
||||
> engineering story; write it like one.
|
||||
|
||||
### 7.1 First attempt: plain cylinder wheels + anisotropic friction
|
||||
|
||||
* Idea: use Webots `frictionRotation` on two contact materials
|
||||
(`MecanumWheelA`, `MecanumWheelB`) to rotate the friction frame
|
||||
±45°, making each cylinder act as an omni-roller via the
|
||||
contact solver.
|
||||
* What worked: chassis stable; pure forward motion clean.
|
||||
* What broke: pure strafe came out the wrong direction, and
|
||||
diagonal motion was zero. The contact-frame rotation interacts
|
||||
with ODE's friction-pyramid model in a way that doesn't reproduce
|
||||
textbook X-pattern.
|
||||
|
||||
### 7.2 Second attempt: 32 physical roller hinges
|
||||
|
||||
* Idea: model every roller as a passive HingeJoint capsule at ±45°
|
||||
tilt; ODE solves the contact-without-slipping constraint per
|
||||
roller, no friction trickery needed.
|
||||
* Generated by `tools/gen_mecanum_wheels.py` (8 rollers per wheel,
|
||||
X-pattern tilt: FR/RL +1, FL/RR −1).
|
||||
* What worked: pure-x calibration was exact (98%+).
|
||||
* What broke: dynamic policy commands made the chassis tumble.
|
||||
Heading swung ±150° in 200 control steps; the LiDAR→world
|
||||
transform was effectively unusable. Even with
|
||||
`inertiaMatrix [_ _ 5.0 _ _ _]`, roller `dampingConstant 0.0005`,
|
||||
and motor `maxTorque 3.0` (6× cut), the dynamic yaw drift was
|
||||
not under control.
|
||||
|
||||
### 7.3 Why ODE struggles with mecanum
|
||||
|
||||
* 32 unconstrained roller hinges per chassis; ODE's contact solver
|
||||
resolves them as independent constraints each step, and small
|
||||
imbalances in the per-roller forces propagate to the body as
|
||||
yaw torque.
|
||||
* The roller's "rolling without slipping" idealisation is
|
||||
fundamentally a kinematic constraint; trying to recover it from
|
||||
Newton-Euler dynamics over 32 hinges is numerically unstable in
|
||||
the timestep/solver regime Webots uses.
|
||||
* This is a known limitation of mecanum in physics engines; Gazebo,
|
||||
for instance, ships a mecanum plugin that bypasses the contact
|
||||
solver entirely and injects a kinematic body velocity.
|
||||
|
||||
### 7.4 Final approach: Supervisor kinematic injection
|
||||
|
||||
* The chassis is moved by `Supervisor.setVelocity()` using the gym
|
||||
mecanum forward-kinematics formula. Wheel motors still spin
|
||||
visually, but their torque does not propagate to the body.
|
||||
* Gym training and Webots deployment apply the *same* formula with
|
||||
the *same* `strafe_efficiency` and `strafe_to_forward_bleed`
|
||||
parameters, so the trained policy faces identical body dynamics
|
||||
in both environments.
|
||||
* Trade-off: we lose Newton-Euler chassis simulation on the
|
||||
mecanum body. Differential drive keeps full physics. The user's
|
||||
framing — "I want the process, not too focused in pure realism"
|
||||
— supports this choice; it's also standard practice in academic
|
||||
mecanum simulators.
|
||||
|
||||
### 7.5 The residual n=5 phantom problem
|
||||
|
||||
* With kinematic injection in place, 4/8 cells pen 10/10. But n=5
|
||||
cells still fail uniformly.
|
||||
* Diagnosis: the 360° LiDAR consistently produces sheep-shaped
|
||||
blobs at wall corners, gate posts, and pen rails. The consensus
|
||||
filter (`consensus_k=3`) doesn't reject them because they are
|
||||
*consistent* — they're always at the same world position.
|
||||
* Bypass via `HERDING_USE_GT=1` (ground-truth perception) pens
|
||||
5/5 in 76s, confirming the policy is fine and the gap is purely
|
||||
perceptual.
|
||||
* **Fix:** static-phantom drop in the tracker — record each
|
||||
promoted track's spawn position and running max displacement;
|
||||
drop promoted tracks that have stayed within
|
||||
`STATIC_PHANTOM_RADIUS=0.4 m` of their spawn position for
|
||||
`STATIC_PHANTOM_AGE=400` steps (~6.4 s). Real sheep under
|
||||
Strömbom dynamics move well beyond that radius; wall corners
|
||||
do not. *(Implemented; results in Section 6.2 pending re-run.)*
|
||||
|
||||
## 8. Discussion (1 page)
|
||||
|
||||
* Sim-to-real lessons:
|
||||
* Perception is the dominant transfer gap, not control.
|
||||
* Trackers need a notion of motion to reject static phantoms;
|
||||
consensus alone is insufficient when phantoms are spatially
|
||||
consistent.
|
||||
* For mecanum, kinematic injection is the correct abstraction.
|
||||
* What we'd do differently:
|
||||
* Build the parallax/motion-aware tracker into the design from
|
||||
day 1.
|
||||
* Calibrate Webots' mecanum behaviour earlier — we spent
|
||||
significant effort on ODE tuning before stepping back to the
|
||||
kinematic-injection approach.
|
||||
|
||||
## 9. Conclusion (¼ page)
|
||||
|
||||
Restate the contribution and the result counts. End on the open
|
||||
question: parallax-aware tracking is a clean general fix and would
|
||||
make 8/8 mecanum likely; we ran out of project budget.
|
||||
|
||||
## A. Reproducibility appendix (½ page)
|
||||
|
||||
* Hardware/OS used.
|
||||
* Command lines for each row of the results tables.
|
||||
* Random seed and deterministic eval settings.
|
||||
Reference in New Issue
Block a user