Each wheel is now a hub solid + 8 passive HingeJoint rollers (capsules
tilted 45° in body xy plane at the bottom contact point) instead of
a single plain Cylinder. The rollers free-spin around their tilt axes
so the wheel exhibits mecanum X-pattern behaviour: gym-frame strafe
commands now produce body strafe in Webots, where before they
produced wrong-direction motion (the plain cylinders behaved as 4-
wheel skid-steer).
Calibration on flat field, 200 steps each:
gym predict webots out err
vx=0.5 vy=0 1.33 m/s +x 1.19 m/s +x 10.9% +x
0 m/s +y -0.10 m/s +y ~clean
vx=0 vy=0.5 1.33 m/s +y 0.50 m/s +y 62.1% +y
0 m/s +x -0.37 m/s +x noticeable
mecanum
coupling
Strafe is imperfect (-x bleed-through, magnitude under-shoot) but
direction is correct and the platform is now omnidirectional. Forward
motion is high-fidelity. Tilt signs assigned so diagonal pairs FL+RR
and FR+RL share the same body-frame roller orientation (the standard
X pattern). Two contact-material names "MecanumWheelA/B" are kept for
diagnostic separation; both use the same isotropic Coulomb friction
of 2.0 with forceDependentSlip 0.005.
tools/run_webots.sh ships the matching contactProperties block on
every mecanum launch (re-emitted into the temporary world copy).
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Two changes that together raise diff/round gym success ~52%→88% (BC)
and ~68%→88% (RL) without retraining; diff/field stays at 100%.
* TrackerConfig.consensus_k default 1 → 3 (radius 0.5 m, max_age 15
frames). The same candidate-promotion mechanism that closed the
Webots LiDAR gap also filters gym tracker phantoms — they show up
on the round field where sheep run further between detection
cycles than GATE_M, so each new position spawns a fresh track
while the stale one persists in memory. SheepTracker() called with
no tracker_cfg keeps the legacy pass-through behaviour for
backwards compatibility.
* Strömbom + universal teachers now detect when the natural
"behind the flock" drive target leaves the curved boundary and
fall back to pushing the flock radially inward toward the centre.
Breaks the wall-circling pattern that previously trapped both the
analytical baselines and the trained policies.
A/B numbers (n_sheep ∈ {1,2,3,5,10}, 5 seeds each, max_steps=15000):
diff/field bc: baseline 100% consensus 100%
diff/field rl: baseline 100% consensus 100%
diff/round bc: baseline 52% consensus 88%
diff/round rl: baseline 68% consensus 88%
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
`active=$(grep -c '^Sheep' "$DST")` returns 0 with exit code 1 when
no sheep are left in the world, which fires set -e and kills the
script before it can launch Webots. Wrap with `|| true` so the
calibration mode (N=0) can actually run.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Two deploy-time fixes that take v1 360°-trained BC/RL from 0/n to n/n
penned on the canonical 140° LiDAR proto for diff/field:
* SheepTracker now supports a consensus stage: new detections start as
candidate tracks invisible to get_positions(). A candidate must
accumulate consensus_k matches within consensus_radius_m of itself
inside a consensus_max_age window to be promoted; otherwise it
expires. Real sheep self-confirm within 3 frames (≪0.05 m/step);
wall-return cluster centroids jitter beyond 0.3 m as the dog moves
and never promote. consensus_k=1 (default) is a no-op so unconfigured
callers and HERDING_DEFAULT keep prior behaviour.
* HERDING_WEBOTS preset gets consensus_k=3, radius=0.3, max_age=20,
plus longer forget_steps=300 and predict_steps=180 so confirmed
sheep persist through long FOV-occlusion gaps a narrow 140° cone
produces. max_new_tracks_per_step=1 still rate-caps spawn bursts.
* shepherd_dog.py BC/RL empty-obs fallback now rotates the desired
heading with step_count so the cone actively sweeps the field
instead of driving due north into the wall.
Verified in headless Webots (HERDING_USE_GT=0, LiDAR only):
BC diff/field: 5/5 @ 11698, 10/10 @ 15079
RL diff/field: 5/5 @ 10039, 9/10 @ 18200 (timeout)
Strömbom diff/field: 5/5 @ 7528
All previously 0/n. 120 unit tests pass; 9 new consensus tests cover
the candidate stage, promotion radius, and one-shot phantom rejection.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Adds RecurrentPPO-based training as an alternative to MLP+frame-stack.
The LSTM gives the policy unbounded temporal memory, addressing the
partial-obs failure mode of the 140° Webots LiDAR (tracker briefly
empties when the dog turns; sporadic phantom tracks confuse decisions).
* training/rl/train_lstm.py: from-scratch RecurrentPPO trainer (no BC
init, no KL term since there's no reference). Uses HERDING_WEBOTS
preset so the obs distribution matches deployment.
* training/eval.py: auto-detects RecurrentPPO zips, maintains LSTM
hidden state across steps, resets between episodes.
* controllers/shepherd_dog/policy_loader.py: PolicyHandle supports
recurrent policies — state managed inside, reset_recurrent() exposed.
Result on diff/field after 3M steps:
- Gym (default 360°): 69% avg success across n=1..10
- Gym (HERDING_WEBOTS preset, training env): 2% — penning 3-4/5 but
rarely all 5
- Webots LiDAR 140°: 0/5 (same wall as DAgger and v1 policies)
Conclusion: architectural changes (LSTM vs MLP) don't close the
perception sim-to-real gap. The gym LiDAR sim doesn't faithfully
reproduce Webots phantom-track distribution; any policy trained on the
gym proxy fails to handle real Webots phantoms regardless of
architecture. Closing this gap requires either modeling Webots phantom
patterns in the gym sim (multi-day work) or Webots-in-the-loop
training (very slow). See memory/lstm_results.md for details.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Today's session worked across the full Webots delivery stack — found and
fixed a cluster of bugs blocking the BC/RL transfer, then explored
training-side mitigations for the residual perception gap.
Bug fixes:
- Makefile FP_RATE default 2.0 → 0.0: BC demos used fp_rate=0 but RL
fine-tune defaulted to fp_rate=2, poisoning the BC obs distribution
and stalling PPO at 0% success across 1.46M+ steps.
- controllers/{shepherd_dog,sheep}/runtime.ini: Webots was launching
controllers under system python3 (no numpy) and they were crashing
silently. Pinned to the conda tir env.
- herding/config.py HERDING_WEBOTS preset: pen_latch_depth 0.5 → 2.0,
max_new_tracks_per_step 3 → 1, static_reject 0.8 → 1.2. Stops phantom
FPs near the gate from latching as permanently-penned tracks.
- herding/perception/sheep_tracker.py: penned tracks now decay at
forget_steps × 8 instead of living forever. Adds get_positions
min_freshness filter for deploy-time use.
Training/eval matches deployment:
- training/bc/collect.py: --dagger-policy flag for DAgger rollouts
(policy drives, teacher labels) + --use-webots-preset for matched
140° tracker + DR config.
- controllers/shepherd_dog/shepherd_dog.py: scan-fallback (0, 0.6) when
BC/RL sees empty sheep_positions — recovers from FOV gaps.
Tooling:
- tools/dagger_round.sh: one-shot DAgger round (collect + concat + bc).
- tools/webots_sweep_gt.sh: full sweep with HERDING_USE_GT=1 for the
perception-gap diagnosis matrix.
- protos/ShepherdDog360.proto: 360° FOV variant for the FOV-ablation
comparison. Canonical proto stays at 140° per project spec.
Artifacts: v1 BC/RL policies for all 4 (drive × world) combos trained
in clean gym (success: diff/field 90-100%, diff/round 58%, mec/field
60-100%, mec/round 50-100%). DAgger r1/r2 BCs for diff/field show
12%→38% progression on gym HERDING_WEBOTS proxy but did not close
to actual Webots LiDAR (0/5 throughout). Next: LSTM policy or
learned tracker per the project-state memory.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>