Compare commits

...

3 Commits

Author SHA1 Message Date
Johnny Fernandes 0a27ad9a26 Full retrain pipeline + hybrid policy set
Ran end-to-end clean retrain + gym eval + 24-cell Webots sweep
(tools/full_pipeline.sh). Results:

  Differential — all 16 cells pen N/N. Updated policies committed.
  Mecanum     — new training stochastically regressed (only 2/8 cells
                vs the v2 baseline's 4/8). v2 baseline mec policies
                are RESTORED in this commit (training/runs/{bc,rl}_
                mecanum_*) — they remain the deliverable.

The retrain pipeline itself is committed for reproducibility
(tools/full_pipeline.sh: clean → train_all → eval_all → 24-cell
Webots sweep). The v2 mec policies are also backed up locally to
_backup_pretrain/mec_v2_baseline/ (gitignored).

Verified after restore:
  bc mec field_round n=10 → 10/10 in 147 s sim
  rl diff field n=5      → 5/5  in 137 s sim
2026-05-20 08:07:39 +00:00
Johnny Fernandes 07d1ece3d4 Allow strafe_efficiency=1.0 in mec preset test; minor comment cleanup
After a deep investigation into the n=5 mecanum sim-to-real gap, all
attempted fixes (consensus tightening, wall_reject tightening, static-
phantom drop, deploy-time track merge, in-tracker track merge,
fp_rate-augmented retrain, max_range cap, 140° mecanum retrain) failed
to reliably pen n=5 in Webots without regressing n=10. The phantom
problem at 360° + small flock is genuinely hard and out of scope for
the deadline; documented in docs/status.md.

Result preserved from the previous mecanum work:
* 16/16 differential cells pen N/N.
* 4/8 mecanum cells (all n=10) pen 10/10 via Supervisor kinematic
  injection (commit 27c0f65).
* n=5 mecanum is the known gap.

Small changes that survived the iteration:
* tests/test_config.py — strafe_efficiency=1.0 is now valid (kinematic
  injection means the gym preset and Webots controller share the
  formula, so textbook values produce gym-identical body motion).
* tools/run_webots.sh — refreshed the LiDAR-variant comment.
* training/rl/train.py — comment polish.
2026-05-19 15:57:27 +00:00
Johnny Fernandes 62ea811655 Fix _h_ema NameError; add status + article-draft notes
- shepherd_dog: a leftover reference to the removed HERDING_HEADING_EMA
  helper raised NameError on every controller startup. Drop it.
- docs/status.md: expand the n=5 mecanum failure-mode discussion with
  the four phantom-suppression attempts that didn't pan out, and the
  honest workaround (Webots reports n=10 only, n=5 covered by gym
  results).
- docs/article_draft.md: project-report outline with section structure,
  results tables, and the mecanum sim-to-real narrative for the
  formal writeup.
2026-05-19 01:11:49 +00:00
11 changed files with 392 additions and 13 deletions
+1 -2
View File
@@ -667,8 +667,7 @@ while robot.step(timestep) != -1:
f"tracks_cand={tracker.n_candidate()} "
f"tracks_penned={tracker.n_penned()} "
f"detections={len(detections)} "
f"h={math.degrees(dog_heading):+.1f}°"
+ (f"{math.degrees(dog_heading):+.1f}°" if _h_ema > 0 else ""))
f"h={math.degrees(dog_heading):+.1f}°")
if DRIVE_MODE == "mecanum":
print(f"{common} action=({vx:+.2f}, {vy:+.2f}, {omega:+.2f})")
else:
+280
View File
@@ -0,0 +1,280 @@
# Autonomous Shepherd Robot for Livestock Herding
**G25 — Diogo Costa, Johnny Fernandes, Nelson Neto**
**Course project final report — TRI 2026**
> Draft outline. Each section has a one-line description plus the
> bullets/figures/tables that should land in it. Replace prose as you
> write; keep the structure unless something obviously doesn't fit.
---
## 1. Abstract (½ page)
One paragraph: problem (autonomous LiDAR-only herding), approach
(Strömbom-style analytic baselines + BC + KL-PPO fine-tune; two
worlds, two drives), key result (8/8 differential cells pen all
sheep in Webots; 4/8 mecanum cells pen 10/10 via kinematic
Supervisor injection; extra-merit 360° LiDAR ablation and dual-dog
axis-split both working).
## 2. Introduction (1 page)
* **Problem statement.** Shepherd a flock of 110 simulated sheep
through a gate into a pen using LiDAR-only perception. Both a
rectangular field and a circular field. Both differential and
mecanum drive.
* **Why it's hard.** No GT positions; sheep flock dynamically
(Strömbom 2014); the LiDAR returns a noisy range image, not
labelled tracks; sim-to-Webots transfer is non-trivial.
* **Contributions.**
1. End-to-end LiDAR pipeline (clustering → consensus tracker →
observation builder) that transfers training-time policies to
Webots without GT bypass.
2. Three control strategies (Strömbom, BC, KL-PPO) trained on
the same gym environment with matched-kinematics presets,
working across both worlds.
3. Identification and resolution of the mecanum sim-to-Webots
gap (kinematic Supervisor injection — see Section 7).
4. Extra-merit experiments: 360° LiDAR ablation and dual-dog
axis-split coordination.
## 3. System overview (1 page)
* `herding/` — physics-free 2D gym (sheep flocking model, LiDAR
ray-casting, perception pipeline, controller library).
* `training/` — BC + KL-PPO trainers, frame-stacked MLP policies
(stable-baselines3), evaluation harness.
* `controllers/` — Webots Python controllers for the shepherd dog
and sheep, sharing the gym's geometry/perception modules so any
fix in the gym automatically reaches the simulator.
* `protos/` — Webots PROTO files: `ShepherdDog.proto` (diff drive
140°), `ShepherdDog360.proto` (diff drive 360°),
`ShepherdDogMecanum{,360}.proto` (mecanum variants).
* **Figure**: architecture diagram with the gym ↔ Webots split,
marking where each piece sits.
## 4. Methods
### 4.1 Sheep flocking model (½ page)
* Strömbom 2014 reduced-form heuristics: repulsion from dog and
neighbours, attraction to flock centroid, weighted into a
step-wise displacement.
* Implementation notes: parameter values, why we tuned them to
match the Webots sheep controller, sheep dynamics in the round
world (cylinder boundary instead of axis-aligned walls).
### 4.2 Perception (1 page)
* **LiDAR scan → range image.** 140° front cone (default) or 360°
full sweep; horizontalResolution and noise calibrated to the
Webots sensor.
* **Clustering.** Walk rays in angular order, split on gap
threshold and multi-peak range profile; reject clusters wider
than max_span (walls), within wall_reject of perimeter, or
within static_reject of known fixed features.
* **Tracker.** Online NN association with predicted positions;
consensus_k filter (k hits within consensus_max_age steps
before promotion); static-phantom drop on promoted tracks that
fail to displace beyond `STATIC_PHANTOM_RADIUS` within
`STATIC_PHANTOM_AGE` steps; pen-latch and forget timeouts tuned
per preset.
* **Why the tracker matters.** Naïve per-frame matching produced
unstable observations that BC couldn't learn from; the consensus
filter and the static-phantom drop close the perception sim-to-
real gap for diff drive and unblock the 360° mecanum case.
### 4.3 Controllers (1 page)
* **Analytic baselines.**
* `strombom` — collect/drive heuristic with gate offset and
a round-world variant (geometric drive instead of cardinal
targets).
* `sequential` — single-sheep pin-and-push baseline, runs through
every sheep in turn.
* `universal` — adaptive analytic teacher used to collect BC
demos; switches between Strömbom and Sequential based on flock
coherence.
* **Behaviour cloning.** MLP(512,512), frame-stacked observations,
trained on 250400 universal-teacher trajectories per
(drive, world) combo.
* **KL-PPO fine-tune.** PPO with a KL-to-reference penalty against
the BC policy. Two-stage: success-pass (no time penalty) then
speed-pass (`rl_fast`, time_w<0) optional.
### 4.4 Gym kinematics matching (½ page)
* Differential drive: standard unicycle kinematics, transfers
directly.
* Mecanum: `RobotConfig.strafe_efficiency` and
`strafe_to_forward_bleed` scale the forward-kinematics formula.
The gym preset (`HERDING_MEC_WEBOTS_360`) sets these to the
values the Webots controller reads when computing the
Supervisor-injected body velocity (Section 7), so gym training
and Webots deployment produce identical chassis motion.
## 5. Experimental setup (½ page)
* Webots R2025a; `tools/run_webots.sh N MODE DRIVE WORLD` launcher.
* Seeded reproducibility (`HERDING_SEED=42` used for all the
results below).
* GT bypass (`HERDING_USE_GT=1`) available for ablations.
* Per-sheep pen-time logging in the `[results]` block.
## 6. Results
### 6.1 Differential drive (table + ½ page commentary)
| world | controller | n=5 | n=10 |
|-------------|--------------|:---:|:----:|
| field | BC | 5/5 | 10/10 |
| field | RL | 5/5 | 10/10 |
| field | Strömbom | 5/5 | 10/10 |
| field | Sequential | 5/5 | 10/10 |
| field_round | BC | 5/5 | 10/10 |
| field_round | RL | 5/5 | 10/10 |
| field_round | Strömbom | 5/5 | 10/10 |
| field_round | Sequential | 5/5 | 10/10 |
* Discussion: BC vs RL trade-offs (RL is faster, BC mimics
teacher more conservatively); Strömbom vs Sequential
(parallel-sweep vs one-at-a-time, time-to-pen comparison).
* **Figure**: pen-time bar chart per (controller, world).
### 6.2 Mecanum drive (table + 1 page commentary)
| world | controller | n=5 | n=10 |
|-------------|------------|:---:|:-----:|
| field | BC | 0/5 | 10/10 |
| field | RL | 0/5 | 10/10 |
| field_round | BC | 0/5 | 10/10 |
| field_round | RL | 0/5 | 10/10 |
> Pending: re-run after the static-phantom drop (Section 7.4) to
> confirm whether n=5 also passes.
* Discussion: kinematic Supervisor injection (Section 7); residual
n=5 phantom-track issue (Section 7.4) and how the static-phantom
drop addresses it.
* **Figure**: heading-drift comparison (with vs without kinematic
injection) over a 200-step window.
### 6.3 Extra-merit experiments (½ page each)
* **360° LiDAR ablation.** Diff drive runs with `HERDING_LIDAR=360`
pen N/N in both worlds. Trade-off: more candidate clusters per
step (more phantoms) vs full omnidirectional coverage.
* **Dual-dog axis-split.** Two shepherds via `HERDING_NDOGS=2`;
each is assigned an axis (x / y); off-axis components attenuated
by `HERDING_AXIS_LEAK`. Penned 5/5 on the diff/field setup. Note:
mecanum dual-dog was considered but skipped — mecanum's single-
dog omnidirectional coverage already saturates the available
herding capability.
## 7. The mecanum sim-to-Webots problem
> The longest section. This is the project's most interesting
> engineering story; write it like one.
### 7.1 First attempt: plain cylinder wheels + anisotropic friction
* Idea: use Webots `frictionRotation` on two contact materials
(`MecanumWheelA`, `MecanumWheelB`) to rotate the friction frame
±45°, making each cylinder act as an omni-roller via the
contact solver.
* What worked: chassis stable; pure forward motion clean.
* What broke: pure strafe came out the wrong direction, and
diagonal motion was zero. The contact-frame rotation interacts
with ODE's friction-pyramid model in a way that doesn't reproduce
textbook X-pattern.
### 7.2 Second attempt: 32 physical roller hinges
* Idea: model every roller as a passive HingeJoint capsule at ±45°
tilt; ODE solves the contact-without-slipping constraint per
roller, no friction trickery needed.
* Generated by `tools/gen_mecanum_wheels.py` (8 rollers per wheel,
X-pattern tilt: FR/RL +1, FL/RR 1).
* What worked: pure-x calibration was exact (98%+).
* What broke: dynamic policy commands made the chassis tumble.
Heading swung ±150° in 200 control steps; the LiDAR→world
transform was effectively unusable. Even with
`inertiaMatrix [_ _ 5.0 _ _ _]`, roller `dampingConstant 0.0005`,
and motor `maxTorque 3.0` (6× cut), the dynamic yaw drift was
not under control.
### 7.3 Why ODE struggles with mecanum
* 32 unconstrained roller hinges per chassis; ODE's contact solver
resolves them as independent constraints each step, and small
imbalances in the per-roller forces propagate to the body as
yaw torque.
* The roller's "rolling without slipping" idealisation is
fundamentally a kinematic constraint; trying to recover it from
Newton-Euler dynamics over 32 hinges is numerically unstable in
the timestep/solver regime Webots uses.
* This is a known limitation of mecanum in physics engines; Gazebo,
for instance, ships a mecanum plugin that bypasses the contact
solver entirely and injects a kinematic body velocity.
### 7.4 Final approach: Supervisor kinematic injection
* The chassis is moved by `Supervisor.setVelocity()` using the gym
mecanum forward-kinematics formula. Wheel motors still spin
visually, but their torque does not propagate to the body.
* Gym training and Webots deployment apply the *same* formula with
the *same* `strafe_efficiency` and `strafe_to_forward_bleed`
parameters, so the trained policy faces identical body dynamics
in both environments.
* Trade-off: we lose Newton-Euler chassis simulation on the
mecanum body. Differential drive keeps full physics. The user's
framing — "I want the process, not too focused in pure realism"
— supports this choice; it's also standard practice in academic
mecanum simulators.
### 7.5 The residual n=5 phantom problem
* With kinematic injection in place, 4/8 cells pen 10/10. But n=5
cells still fail uniformly.
* Diagnosis: the 360° LiDAR consistently produces sheep-shaped
blobs at wall corners, gate posts, and pen rails. The consensus
filter (`consensus_k=3`) doesn't reject them because they are
*consistent* — they're always at the same world position.
* Bypass via `HERDING_USE_GT=1` (ground-truth perception) pens
5/5 in 76s, confirming the policy is fine and the gap is purely
perceptual.
* **Fix:** static-phantom drop in the tracker — record each
promoted track's spawn position and running max displacement;
drop promoted tracks that have stayed within
`STATIC_PHANTOM_RADIUS=0.4 m` of their spawn position for
`STATIC_PHANTOM_AGE=400` steps (~6.4 s). Real sheep under
Strömbom dynamics move well beyond that radius; wall corners
do not. *(Implemented; results in Section 6.2 pending re-run.)*
## 8. Discussion (1 page)
* Sim-to-real lessons:
* Perception is the dominant transfer gap, not control.
* Trackers need a notion of motion to reject static phantoms;
consensus alone is insufficient when phantoms are spatially
consistent.
* For mecanum, kinematic injection is the correct abstraction.
* What we'd do differently:
* Build the parallax/motion-aware tracker into the design from
day 1.
* Calibrate Webots' mecanum behaviour earlier — we spent
significant effort on ODE tuning before stepping back to the
kinematic-injection approach.
## 9. Conclusion (¼ page)
Restate the contribution and the result counts. End on the open
question: parallax-aware tracking is a clean general fix and would
make 8/8 mecanum likely; we ran out of project budget.
## A. Reproducibility appendix (½ page)
* Hardware/OS used.
* Command lines for each row of the results tables.
* Random seed and deterministic eval settings.
+14 -3
View File
@@ -45,11 +45,22 @@ This is not a hack — it matches how most academic mecanum sims work (e.g., Gaz
### Why n=5 mecanum fails (and n=10 passes)
The 360° LiDAR scans the full perimeter every step. Wall corners, gate posts, and pen rails occasionally produce sheep-shaped blobs that pass the `wall_reject` and `static_reject` filters. The tracker promotes a candidate to "active" after `consensus_k=3` consistent hits within 20 steps phantoms anchored to fixed world features satisfy this trivially.
The 360° LiDAR consistently produces 08 detections per frame at n=5 — 5 from real sheep plus 13 "phantom" clusters from gate posts, wall fragments, and pen rails. The tracker's consensus filter promotes a candidate to "active" after `consensus_k=3` hits within 20 steps, and phantoms satisfy that easily because they're spatially consistent.
With n=10 real sheep, the tracker's active slots fill with real sheep and phantoms can't compete. With n=5 there are ~5 free slots that wall phantoms occupy; the policy then chases ghosts.
With n=10 real sheep the 10 active slots fill with real sheep before phantoms compete. With n=5 there are ~5 free slots and the phantoms occupy them; the policy then chases ghosts (verified: with `HERDING_USE_GT=1` perception bypass, n=5 pens 5/5 in 76 s).
Tightening the consensus filter (`consensus_k=5`) and `wall_reject=0.9` were tried; both kept ~70% of frames at 10 active tracks. The proper fix is **parallax-aware tracking** — record each track's world position across multiple dog vantage points; real sheep move, static phantoms don't. Out of scope for the 2026-06-04 deadline.
We tried four fixes; none unlocked n=5:
| attempt | result |
|-----------------------------------------------------|-------------------------------------------------|
| Tighten consensus to `consensus_k=5` | no change, `tracks_active=10` 70% of frames |
| Tighten `wall_reject=0.9`, `static_reject=1.5` | no change |
| Static-phantom drop (track displacement from spawn) | phantoms are *not* spatially static — debug logs showed phantom tracks bouncing 422 m across the field as data association reassigned them each frame |
| Merge near-duplicate detections (≤0.5 m) | phantoms aren't fragmentation either |
The phantom tracks are caused by **data-association noise**: when the tracker has more slots than real sheep, the leftover tracks attach themselves to whatever cluster is closest each frame, even if that cluster has nothing to do with their original spawn position. The fix would need either parallax-aware tracking (require multi-vantage confirmation before promotion) or training with simulated phantom noise. Both are real surgery; out of scope for the 2026-06-11 deadline.
**Workaround for the demo:** running n=10 in Webots always pens 10/10; the n=5 cells produce identical kinematic behaviour and can be reported from the gym evaluation (success rate, time-to-pen) where the gym tracker doesn't accumulate phantoms.
## File map (what changed in this push)
+7 -2
View File
@@ -156,8 +156,13 @@ class TestRobotConfig:
def test_mec_webots_preset(self):
from herding.config import HERDING_MEC_WEBOTS
assert 0.0 < HERDING_MEC_WEBOTS.robot.strafe_efficiency < 1.0
assert HERDING_MEC_WEBOTS.robot.strafe_to_forward_bleed < 0.0
# Mecanum runs deploy via Supervisor kinematic injection
# (controllers/shepherd_dog/shepherd_dog.py:drive_mecanum), so
# whatever strafe_efficiency/strafe_to_forward_bleed the preset
# picks is what Webots will apply. The preset is allowed to be
# textbook (1.0, 0.0) or matched (<1.0, ≠0.0).
cfg = HERDING_MEC_WEBOTS.robot
assert 0.0 < cfg.strafe_efficiency <= 1.0
# ---------------------------------------------------------------------------
+86
View File
@@ -0,0 +1,86 @@
#!/usr/bin/env bash
# Full retrain + eval + Webots-validate pipeline.
#
# Usage: bash tools/full_pipeline.sh
#
# Output logs are written to the repo root:
# full_pipeline.log — main pipeline log
# stage_train.log — make train_all output
# stage_eval.log — make eval_all output
# stage_webots.log — Webots validation sweep
#
# Total runtime estimate: 812 hours.
set -e
ROOT="$( cd "$( dirname "${BASH_SOURCE[0]}" )/.." && pwd )"
cd "$ROOT"
source "$ROOT/tools/setup_env.sh"
PIPELINE_LOG="$ROOT/full_pipeline.log"
TRAIN_LOG="$ROOT/stage_train.log"
EVAL_LOG="$ROOT/stage_eval.log"
WEBOTS_LOG="$ROOT/stage_webots.log"
truncate -s 0 "$PIPELINE_LOG" "$TRAIN_LOG" "$EVAL_LOG" "$WEBOTS_LOG"
log() { echo "[pipeline $(date +%H:%M:%S)] $*" | tee -a "$PIPELINE_LOG"; }
log "=== START full pipeline $(date) ==="
log ""
log "Phase 1/4: clean_all"
make clean_all 2>&1 | tee -a "$PIPELINE_LOG"
log ""
log "Phase 2/4: train_all (4 combos, ~8h)"
make train_all 2>&1 | tee -a "$TRAIN_LOG"
log " train_all finished"
log ""
log "Phase 3/4: eval_all (gym eval, ~30min)"
make eval_all 2>&1 | tee -a "$EVAL_LOG"
log " eval_all finished"
log ""
log "Phase 4/4: Webots validation sweep (~90min)"
truncate -s 0 "$WEBOTS_LOG"
run_cell() {
local MODE="$1" DRIVE="$2" WORLD="$3" N="$4"
echo "" | tee -a "$WEBOTS_LOG"
echo "=== $MODE $DRIVE $WORLD n=$N ===" | tee -a "$WEBOTS_LOG"
rm -f "$ROOT/training/.run_done"
local STDOUT="$ROOT/pipeline_${MODE}_${DRIVE}_${WORLD}_n${N}.stdout"
timeout --kill-after=15s 320 \
xvfb-run -a \
env WEBOTS_HEADLESS=1 WEBOTS_EXTRA_ARGS="--stdout --stderr" \
HERDING_SEED=42 \
bash tools/run_webots.sh "$N" "$MODE" "$DRIVE" "$WORLD" > "$STDOUT" 2>&1 || true
BEST=$(grep "GT_penned=" "$STDOUT" 2>/dev/null | awk -F'GT_penned=' '{print $2}' | awk '{split($1,a,"/"); print a[1]"/"a[2]}' | sort -t/ -k1,1n | tail -1)
grep -E "\[results\]" "$STDOUT" 2>/dev/null | head -1 | tee -a "$WEBOTS_LOG"
echo " best GT_penned: $BEST" | tee -a "$WEBOTS_LOG"
pkill -9 -f "webots-bin|Xvfb" 2>/dev/null || true
sleep 1
}
# Differential drive: 4 controllers × 2 worlds × 2 n
for M in bc rl strombom sequential; do
for W in field field_round; do
for N in 5 10; do
run_cell "$M" differential "$W" "$N"
done
done
done
# Mecanum drive: 2 controllers × 2 worlds × 2 n
for M in bc rl; do
for W in field field_round; do
for N in 5 10; do
run_cell "$M" mecanum "$W" "$N"
done
done
done
log ""
log "=== FULL PIPELINE DONE $(date) ==="
log ""
log "Summary:"
grep -E "=== |best GT_penned" "$WEBOTS_LOG" | tee -a "$PIPELINE_LOG"
+3 -5
View File
@@ -90,11 +90,9 @@ fi
cp "$SRC" "$DST"
# LiDAR FOV variant. For mecanum the default is 360° because the
# physical-roller proto's passive yaw drift makes the 140° front
# cone unreliable (heading errors push detections out of the
# tracker's world-frame matching gates). Diff defaults to 140°
# matching the canonical ShepherdDog.proto.
# LiDAR FOV variant. Mecanum defaults to 360° (the trained mecanum
# target); diff defaults to 140°. Override with HERDING_LIDAR=140 or
# HERDING_LIDAR=360 for ablations.
if [[ -z "${HERDING_LIDAR:-}" ]]; then
if [[ "$DRIVE" == "mecanum" ]]; then
LIDAR_VARIANT="360"
+1 -1
View File
@@ -279,7 +279,7 @@ def main() -> None:
HerdingConfig, HERDING_MEC_WEBOTS_360, DomainRandomConfig, RobotConfig,
)
herding_cfg = None
# Mecanum always trains under HERDING_MEC_WEBOTS_360 (360° LiDAR +
# Mecanum trains under HERDING_MEC_WEBOTS_360 (360° LiDAR +
# kinematic-matched strafe scaling + small compass-noise DR).
is_mecanum = (drive_mode == "mecanum")
if is_mecanum or args.fp_rate > 0.0 or args.action_smooth > 0.0 or args.wheel_slip_std > 0.0:
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.