Replace the failing ODE-rolled mecanum chassis dynamics with a Supervisor.setVelocity call that uses the gym mecanum forward kinematics formula directly. Wheel motors still spin (visual); chassis motion comes from the gym model so training and deployment match by construction. Results (seed=42, n=10 sheep): BC + RL mecanum pen 10/10 in both field and field_round. n=5 mecanum cells still 0/5 due to tracker phantoms anchored to wall corners under the 360° LiDAR — documented in docs/status.md as the remaining gap. Cleanup: drop deploy-time hacks (HERDING_HEADING_*, HERDING_OMEGA_CLAMP, HERDING_TRACKER_*) that were workarounds for the old ODE chaos; revert the proto inertiaMatrix, roller dampingConstant, and reduced motor torque since they no longer carry load; refresh comments around the mecanum config presets.
12 KiB
Project handoff — TRI_PROJ2 herding (2026-05-16)
Context for a fresh model picking this project up. Project deadline: 2026-06-04.
Branch: test/johnny8. Last commits: 876e14e (LSTM), dd5ac66 (core fixes).
What this project is
Group G25 course project: an autonomous shepherd dog that herds 1–10 sheep through a gate into a pen. Two worlds (rectangular field, circular field_round), two drives (differential, mecanum), and five control strategies:
strombom— analytical Strömbom collect/drive heuristicsequential— analytical single-target pin-and-push baselineuniversal— analytical teacher used to collect BC demosbc— MLP policy trained via behaviour cloning ofuniversalrl— KL-regularised PPO fine-tune ofbc
The dog perceives sheep only through a front-mounted LiDAR (protos/ShepherdDog.proto).
A 2D Gym env (training/herding_env.py) is used for training and headless evaluation;
Webots is used for sim-to-deployment validation.
See docs/project.md for the formal course objectives. See
~/.claude/projects/-home-jalf-code-TRI-PROJ2/memory/ for the running notes
(project_state.md, dagger_results.md, lstm_results.md, webots_perception_gap.md).
What's working today
Everything below is verified, with command lines you can copy-paste.
Analytical strategies (Strömbom, Sequential, Universal)
Work in Webots with GT bypass (HERDING_USE_GT=1) — 12/12 trials across
both worlds × {5, 10 sheep}. User has signed off on GT bypass for these
analytical baselines (they take a position list as input; GT vs LiDAR is a
perception-layer concern, not a strategy concern).
Validated by webots_sweep_gt.log (full matrix, all OK).
Gym performance (clean 360° LiDAR sim, default tracker)
BC diff/field: 96% avg (90-100% across n=1..10)
RL diff/field: 99% avg (90-100%)
BC diff/round: 58% ← weak combo
RL diff/round: 58% ← weak combo
BC mec/field: 86%
RL mec/field: 90%
BC mec/round: 73%
RL mec/round: 79%
Plus a Stage-2 rl_fast time-penalty pass on diff/field and mec/field
(rl_fast_* directories) that slightly accelerates time-to-pen with similar
success.
Webots LiDAR — 360° proto variant (protos/ShepherdDog360.proto)
Created today as a robustness ablation. v1 policies (trained on default 360° gym LiDAR) transfer cleanly:
strombom/sequential/universal: 12/12 OK
bc diff (5 and 10 sheep, both worlds): 3/4 OK (only diff/field n=10 timed out)
bc mecanum: 0/4 — separate dynamics gap
rl any: 0/4 — RL more brittle than BC, unexpectedly
Validated by webots_sweep_360.log.
What does NOT work (despite multiple attempts)
Any learned policy (BC, RL, DAgger, LSTM) in Webots LiDAR with the canonical 140° FOV proto. All hit the same wall: tracker phantom-track patterns from real Webots LiDAR don't match what the gym FP-injection model produces, so policies trained on the gym proxy can't handle the obs they see in Webots.
Approaches tried today (all detailed in ~/.claude/projects/.../memory/):
| Approach | Gym proxy | Webots LiDAR 140° |
|---|---|---|
| v1 MLP + frame stack, clean training | 99% | 0/5 |
| DAgger (3 rounds, privileged teacher labels) | 12% → 38% on proxy | 0/5 |
| LSTM RecurrentPPO from scratch, 3M steps | 69% clean / 2% proxy | 0/5 |
Diagnosis: gym HERDING_WEBOTS preset (herding/config.py) is an
approximation but not faithful to actual Webots LiDAR. Real Webots produces
~4 phantom tracks per step for 5 real sheep due to wall/post/leg returns;
gym injection uses a Poisson process at static anchor points which is
distributionally different.
Critical bug fixes shipped today
If you're picking this up, these are real bugs that took hours to find:
-
Webots controllers were silently crashing on numpy import. Webots launched them under system
python3(no numpy). Fixed by addingruntime.inifiles atcontrollers/{shepherd_dog,sheep}/runtime.inithat point Webots to the conda env's python. -
FP_RATE mismatch BC=0 vs RL=2 poisoned PPO. Default in Makefile was
FP_RATE=2.0for RL but--fp-rate 0.0hard-coded for BC demos. PPO stalled at 0% success for 1.46M steps. NowFP_RATE=0.0consistent. -
Tracker phantom-penned tracks.
pen_latch_depth=0.5was too shallow (FPs at y≈-15 latched and lived forever). Now 2.0, and penned tracks decay atforget_steps × 8instead of being eternal. -
HERDING_WEBOTS preset tuning in
herding/config.py—max_new_tracks_per_step=1,static_reject=1.2. Reduces phantom-track spawning rate but doesn't eliminate it.
Recommended path to a strong June 4 deliverable
You don't need to fix the 140° LiDAR gap — there's a defensible story already. The article framing writes itself:
"Wide-FOV (360°) LiDAR enables clean sim-to-real transfer of learned shepherding policies. Narrow-FOV (140°) introduces phantom-track noise that current policies cannot fully reject — closing this gap is future work, likely requiring either a faithful gym-side LiDAR model or Webots-in-the-loop training."
Concrete deliverable plan:
- Demo video and screenshots: use the 360° proto for BC/RL demonstrations and GT bypass for analyticals on 140°. All combos covered.
- Quantitative results: gym eval already gives success%, mean steps.
Add a flock-dispersion metric (
max(distances from CoM)at end of episode) — about 30 lines ineval.py. - Collision tracking: add a counter in
HerdingEnv.step()fordog-sheep distance < 0.30 m. Currently the env knows aboutCOLLISION_DISTbut doesn't expose it in info. ~20 lines. - Mecanum: the mecanum Webots dynamics gap is separate from the
perception issue.
tools/calibrate_mecanum.shexists for this. Run it and see if it gives matching dynamics. This is the most valuable remaining technical task — closing the mecanum gap would let you complete the "diff vs mecanum" extra-merit comparison indocs/project.md. - Round world: gym performance is ~58-79% across approaches. The
curved walls break Strömbom's "stand behind the centroid" geometry
(the position behind sometimes lies outside the field). Two cheap
tweaks worth trying: (a) a per-episode
W_RADIUSreward bonus for compact flocks (gather-first behavior), (b) curriculum on the env'sdifficultyknob (already wired inHerdingEnv).
Bonuses still on the table (from docs/project.md extra merit):
- Multi-shepherd axis-split — user's idea, ~1 day work. Each dog computes one component of the analytical Strömbom action. No multi-agent RL needed.
- Robustness / DR ablation — FP/wheel-slip knobs exist; run an ablation table.
Repository layout (essentials)
herding/
config.py # HerdingConfig dataclasses, HERDING_DEFAULT / HERDING_WEBOTS presets
control/ # strombom.py, sequential.py, universal.py (analytical teachers)
perception/ # lidar_sim.py, lidar_perception.py, sheep_tracker.py
world/ # diffdrive.py kinematics, flocking_sim.py, geometry.py (PEN_*/GATE_*/FIELD_*)
training/
herding_env.py # Gym env: HerdingEnv. ~560 lines. Step/reset/reward/obs.
bc/
collect.py # Demo collector — supports --privileged and --dagger-policy
pretrain.py # MLP BC trainer (MSE + 1-cos loss)
rl/
train.py # KL-regularised PPO fine-tune of BC
train_lstm.py # NEW today: RecurrentPPO (sb3-contrib) from scratch
eval.py # Env-side evaluator; supports MLP + LSTM policies
runs/ # Trained artifacts (bc_*, rl_*, rl_fast_*, lstm_*)
v1_clean/ # Backup of pre-DAgger artifacts
controllers/
shepherd_dog/
shepherd_dog.py # Webots controller. Mode selection via HERDING_MODE env.
policy_loader.py # Auto-detects MLP vs LSTM zip. Handles obs / state.
runtime.ini # ← critical, points Webots to conda python
sheep/
runtime.ini # ← same fix
protos/
ShepherdDog.proto # canonical 140° FOV (matches the physical robot)
ShepherdDog360.proto # 360° variant for the FOV ablation / fallback delivery
ShepherdDogMecanum.proto
Sheep.proto
worlds/
field.wbt # rectangular world
field_round.wbt # circular world
tools/
run_webots.sh # launcher: tools/run_webots.sh N MODE DRIVE WORLD
webots_sweep.sh # full LiDAR sweep across all modes × drives × worlds
webots_sweep_gt.sh # same but with HERDING_USE_GT=1
dagger_round.sh # NEW today: one-shot DAgger collect + train
calibrate_mecanum.sh # mecanum dynamics calibration (not run today)
Makefile # Top-level: make train_all, make eval_all, etc.
Quick commands
# Run pytest (111 tests, all passing)
make test
# Train one combo end-to-end (BC → RL → eval, ~1h on 2 cores)
make DRIVE=differential WORLD=field
# Train all 4 combos (~5h)
make train_all
# Eval an existing policy directory in gym
python -m training.eval --policy training/runs/rl_differential_field \
--max-flock 10 --max-steps 15000 --n-seeds 10 \
--drive-mode differential --world field
# Webots — analytical, GT bypass (this works for all combos)
HERDING_USE_GT=1 tools/run_webots.sh 5 strombom differential field
# Webots — BC with the 360° proto (currently the 140° proto is active;
# swap by editing protos/ShepherdDog.proto or use the 360° variant directly)
tools/run_webots.sh 5 bc differential field
# Headless full sweep (~80 min)
tools/webots_sweep.sh webots_sweep.log
# Train LSTM (sb3-contrib must be installed)
python -m training.rl.train_lstm \
--out training/runs/lstm_differential_field \
--total-timesteps 3000000 --use-webots-preset \
--drive-mode differential --world field
Hardware/environment
- 3.8 GB RAM, 8 GB swap, 2 cores. Memory pressure is real — saw the
OS OOM-kill RL training during chained
train_allonce. If you re-run full pipelines, monitor memory and consider splitting. - Conda env:
tirat/home/jalf/miniconda3/envs/tir/. Has SB3, sb3-contrib, PyTorch, gymnasium. Webots controllers point to this python via the newruntime.inifiles. - Webots installed at
/usr/local/webots/. Headless mode requiresxvfb-run -a(no X display on this machine).
What I'd suggest for a fresh attempt at the 140° LiDAR gap
If the user wants you to keep pushing on it, the highest-EV experiment not yet tried is:
Consensus tracker — modify herding/perception/sheep_tracker.py to
require K consecutive detections within a small radius before promoting
a track to "real." Phantom tracks from sporadic wall returns wouldn't
survive the K-step consensus; real sheep continuously visible in FOV
would. The current max_new_tracks_per_step=1 rate-limits new tracks
but every detection still spawns one immediately.
Implementation sketch: add a "candidate" track type that doesn't appear
in get_positions(). After K (e.g. 3-5) consecutive matched detections,
promote candidate → real track. Roughly 30-50 lines of code.
This is a tracker-level fix at deploy time only, so it wouldn't require retraining the policies — v1 BC/RL should transfer cleanly if the tracker output looks more like what they were trained on (one position per real sheep, no phantoms).
I would NOT recommend more architectural training experiments (DAgger round 4, larger LSTM, etc.) — three independent approaches today already showed the bottleneck is upstream of the policy.