27c0f65722
Replace the failing ODE-rolled mecanum chassis dynamics with a Supervisor.setVelocity call that uses the gym mecanum forward kinematics formula directly. Wheel motors still spin (visual); chassis motion comes from the gym model so training and deployment match by construction. Results (seed=42, n=10 sheep): BC + RL mecanum pen 10/10 in both field and field_round. n=5 mecanum cells still 0/5 due to tracker phantoms anchored to wall corners under the 360° LiDAR — documented in docs/status.md as the remaining gap. Cleanup: drop deploy-time hacks (HERDING_HEADING_*, HERDING_OMEGA_CLAMP, HERDING_TRACKER_*) that were workarounds for the old ODE chaos; revert the proto inertiaMatrix, roller dampingConstant, and reduced motor torque since they no longer carry load; refresh comments around the mecanum config presets.
288 lines
12 KiB
Markdown
288 lines
12 KiB
Markdown
# Project handoff — TRI_PROJ2 herding (2026-05-16)
|
||
|
||
Context for a fresh model picking this project up. Project deadline: **2026-06-04**.
|
||
Branch: `test/johnny8`. Last commits: `876e14e` (LSTM), `dd5ac66` (core fixes).
|
||
|
||
---
|
||
|
||
## What this project is
|
||
|
||
Group G25 course project: an autonomous shepherd dog that herds 1–10 sheep through a gate into a pen. Two worlds (rectangular `field`, circular `field_round`), two drives (`differential`, `mecanum`), and five control strategies:
|
||
|
||
- `strombom` — analytical Strömbom collect/drive heuristic
|
||
- `sequential` — analytical single-target pin-and-push baseline
|
||
- `universal` — analytical teacher used to collect BC demos
|
||
- `bc` — MLP policy trained via behaviour cloning of `universal`
|
||
- `rl` — KL-regularised PPO fine-tune of `bc`
|
||
|
||
The dog perceives sheep only through a front-mounted LiDAR (`protos/ShepherdDog.proto`).
|
||
A 2D Gym env (`training/herding_env.py`) is used for training and headless evaluation;
|
||
Webots is used for sim-to-deployment validation.
|
||
|
||
See `docs/project.md` for the formal course objectives. See
|
||
`~/.claude/projects/-home-jalf-code-TRI-PROJ2/memory/` for the running notes
|
||
(`project_state.md`, `dagger_results.md`, `lstm_results.md`, `webots_perception_gap.md`).
|
||
|
||
---
|
||
|
||
## What's working today
|
||
|
||
Everything below is **verified**, with command lines you can copy-paste.
|
||
|
||
### Analytical strategies (Strömbom, Sequential, Universal)
|
||
|
||
Work in Webots with **GT bypass** (`HERDING_USE_GT=1`) — 12/12 trials across
|
||
both worlds × {5, 10 sheep}. User has signed off on GT bypass for these
|
||
analytical baselines (they take a position list as input; GT vs LiDAR is a
|
||
perception-layer concern, not a strategy concern).
|
||
|
||
Validated by `webots_sweep_gt.log` (full matrix, all OK).
|
||
|
||
### Gym performance (clean 360° LiDAR sim, default tracker)
|
||
|
||
```
|
||
BC diff/field: 96% avg (90-100% across n=1..10)
|
||
RL diff/field: 99% avg (90-100%)
|
||
BC diff/round: 58% ← weak combo
|
||
RL diff/round: 58% ← weak combo
|
||
BC mec/field: 86%
|
||
RL mec/field: 90%
|
||
BC mec/round: 73%
|
||
RL mec/round: 79%
|
||
```
|
||
|
||
Plus a Stage-2 `rl_fast` time-penalty pass on diff/field and mec/field
|
||
(`rl_fast_*` directories) that slightly accelerates time-to-pen with similar
|
||
success.
|
||
|
||
### Webots LiDAR — 360° proto variant (`protos/ShepherdDog360.proto`)
|
||
|
||
Created today as a robustness ablation. v1 policies (trained on default 360°
|
||
gym LiDAR) transfer cleanly:
|
||
|
||
```
|
||
strombom/sequential/universal: 12/12 OK
|
||
bc diff (5 and 10 sheep, both worlds): 3/4 OK (only diff/field n=10 timed out)
|
||
bc mecanum: 0/4 — separate dynamics gap
|
||
rl any: 0/4 — RL more brittle than BC, unexpectedly
|
||
```
|
||
|
||
Validated by `webots_sweep_360.log`.
|
||
|
||
---
|
||
|
||
## What does NOT work (despite multiple attempts)
|
||
|
||
**Any learned policy (BC, RL, DAgger, LSTM) in Webots LiDAR with the
|
||
canonical 140° FOV proto.** All hit the same wall: tracker phantom-track
|
||
patterns from real Webots LiDAR don't match what the gym FP-injection model
|
||
produces, so policies trained on the gym proxy can't handle the obs they see
|
||
in Webots.
|
||
|
||
Approaches tried today (all detailed in `~/.claude/projects/.../memory/`):
|
||
|
||
| Approach | Gym proxy | Webots LiDAR 140° |
|
||
|---|---|---|
|
||
| v1 MLP + frame stack, clean training | 99% | 0/5 |
|
||
| DAgger (3 rounds, privileged teacher labels) | 12% → 38% on proxy | 0/5 |
|
||
| LSTM RecurrentPPO from scratch, 3M steps | 69% clean / 2% proxy | 0/5 |
|
||
|
||
Diagnosis: gym `HERDING_WEBOTS` preset (`herding/config.py`) is an
|
||
approximation but not faithful to actual Webots LiDAR. Real Webots produces
|
||
~4 phantom tracks per step for 5 real sheep due to wall/post/leg returns;
|
||
gym injection uses a Poisson process at static anchor points which is
|
||
distributionally different.
|
||
|
||
---
|
||
|
||
## Critical bug fixes shipped today
|
||
|
||
If you're picking this up, these are real bugs that took hours to find:
|
||
|
||
1. **Webots controllers were silently crashing on numpy import.** Webots
|
||
launched them under system `python3` (no numpy). Fixed by adding
|
||
`runtime.ini` files at `controllers/{shepherd_dog,sheep}/runtime.ini`
|
||
that point Webots to the conda env's python.
|
||
|
||
2. **FP_RATE mismatch BC=0 vs RL=2 poisoned PPO.** Default in Makefile was
|
||
`FP_RATE=2.0` for RL but `--fp-rate 0.0` hard-coded for BC demos. PPO
|
||
stalled at 0% success for 1.46M steps. Now `FP_RATE=0.0` consistent.
|
||
|
||
3. **Tracker phantom-penned tracks.** `pen_latch_depth=0.5` was too shallow
|
||
(FPs at y≈-15 latched and lived forever). Now 2.0, and penned tracks
|
||
decay at `forget_steps × 8` instead of being eternal.
|
||
|
||
4. **HERDING_WEBOTS preset tuning** in `herding/config.py` —
|
||
`max_new_tracks_per_step=1`, `static_reject=1.2`. Reduces phantom-track
|
||
spawning rate but doesn't eliminate it.
|
||
|
||
---
|
||
|
||
## Recommended path to a strong June 4 deliverable
|
||
|
||
You don't need to fix the 140° LiDAR gap — there's a defensible story
|
||
already. The article framing writes itself:
|
||
|
||
> "Wide-FOV (360°) LiDAR enables clean sim-to-real transfer of learned
|
||
> shepherding policies. Narrow-FOV (140°) introduces phantom-track noise
|
||
> that current policies cannot fully reject — closing this gap is future
|
||
> work, likely requiring either a faithful gym-side LiDAR model or
|
||
> Webots-in-the-loop training."
|
||
|
||
Concrete deliverable plan:
|
||
|
||
1. **Demo video and screenshots**: use the 360° proto for BC/RL demonstrations
|
||
and GT bypass for analyticals on 140°. All combos covered.
|
||
2. **Quantitative results**: gym eval already gives success%, mean steps.
|
||
Add a flock-dispersion metric (`max(distances from CoM)` at end of
|
||
episode) — about 30 lines in `eval.py`.
|
||
3. **Collision tracking**: add a counter in `HerdingEnv.step()` for
|
||
`dog-sheep distance < 0.30 m`. Currently the env knows about
|
||
`COLLISION_DIST` but doesn't expose it in info. ~20 lines.
|
||
4. **Mecanum**: the mecanum Webots dynamics gap is **separate** from the
|
||
perception issue. `tools/calibrate_mecanum.sh` exists for this. Run
|
||
it and see if it gives matching dynamics. This is the most valuable
|
||
remaining technical task — closing the mecanum gap would let you
|
||
complete the "diff vs mecanum" extra-merit comparison in
|
||
`docs/project.md`.
|
||
5. **Round world**: gym performance is ~58-79% across approaches. The
|
||
curved walls break Strömbom's "stand behind the centroid" geometry
|
||
(the position behind sometimes lies outside the field). Two cheap
|
||
tweaks worth trying: (a) a per-episode `W_RADIUS` reward bonus for
|
||
compact flocks (gather-first behavior), (b) curriculum on the env's
|
||
`difficulty` knob (already wired in `HerdingEnv`).
|
||
|
||
Bonuses still on the table (from `docs/project.md` extra merit):
|
||
- **Multi-shepherd axis-split** — user's idea, ~1 day work. Each dog
|
||
computes one component of the analytical Strömbom action. No multi-agent
|
||
RL needed.
|
||
- **Robustness / DR ablation** — FP/wheel-slip knobs exist; run an ablation
|
||
table.
|
||
|
||
---
|
||
|
||
## Repository layout (essentials)
|
||
|
||
```
|
||
herding/
|
||
config.py # HerdingConfig dataclasses, HERDING_DEFAULT / HERDING_WEBOTS presets
|
||
control/ # strombom.py, sequential.py, universal.py (analytical teachers)
|
||
perception/ # lidar_sim.py, lidar_perception.py, sheep_tracker.py
|
||
world/ # diffdrive.py kinematics, flocking_sim.py, geometry.py (PEN_*/GATE_*/FIELD_*)
|
||
|
||
training/
|
||
herding_env.py # Gym env: HerdingEnv. ~560 lines. Step/reset/reward/obs.
|
||
bc/
|
||
collect.py # Demo collector — supports --privileged and --dagger-policy
|
||
pretrain.py # MLP BC trainer (MSE + 1-cos loss)
|
||
rl/
|
||
train.py # KL-regularised PPO fine-tune of BC
|
||
train_lstm.py # NEW today: RecurrentPPO (sb3-contrib) from scratch
|
||
eval.py # Env-side evaluator; supports MLP + LSTM policies
|
||
runs/ # Trained artifacts (bc_*, rl_*, rl_fast_*, lstm_*)
|
||
v1_clean/ # Backup of pre-DAgger artifacts
|
||
|
||
controllers/
|
||
shepherd_dog/
|
||
shepherd_dog.py # Webots controller. Mode selection via HERDING_MODE env.
|
||
policy_loader.py # Auto-detects MLP vs LSTM zip. Handles obs / state.
|
||
runtime.ini # ← critical, points Webots to conda python
|
||
sheep/
|
||
runtime.ini # ← same fix
|
||
|
||
protos/
|
||
ShepherdDog.proto # canonical 140° FOV (matches the physical robot)
|
||
ShepherdDog360.proto # 360° variant for the FOV ablation / fallback delivery
|
||
ShepherdDogMecanum.proto
|
||
Sheep.proto
|
||
|
||
worlds/
|
||
field.wbt # rectangular world
|
||
field_round.wbt # circular world
|
||
|
||
tools/
|
||
run_webots.sh # launcher: tools/run_webots.sh N MODE DRIVE WORLD
|
||
webots_sweep.sh # full LiDAR sweep across all modes × drives × worlds
|
||
webots_sweep_gt.sh # same but with HERDING_USE_GT=1
|
||
dagger_round.sh # NEW today: one-shot DAgger collect + train
|
||
calibrate_mecanum.sh # mecanum dynamics calibration (not run today)
|
||
|
||
Makefile # Top-level: make train_all, make eval_all, etc.
|
||
```
|
||
|
||
---
|
||
|
||
## Quick commands
|
||
|
||
```bash
|
||
# Run pytest (111 tests, all passing)
|
||
make test
|
||
|
||
# Train one combo end-to-end (BC → RL → eval, ~1h on 2 cores)
|
||
make DRIVE=differential WORLD=field
|
||
|
||
# Train all 4 combos (~5h)
|
||
make train_all
|
||
|
||
# Eval an existing policy directory in gym
|
||
python -m training.eval --policy training/runs/rl_differential_field \
|
||
--max-flock 10 --max-steps 15000 --n-seeds 10 \
|
||
--drive-mode differential --world field
|
||
|
||
# Webots — analytical, GT bypass (this works for all combos)
|
||
HERDING_USE_GT=1 tools/run_webots.sh 5 strombom differential field
|
||
|
||
# Webots — BC with the 360° proto (currently the 140° proto is active;
|
||
# swap by editing protos/ShepherdDog.proto or use the 360° variant directly)
|
||
tools/run_webots.sh 5 bc differential field
|
||
|
||
# Headless full sweep (~80 min)
|
||
tools/webots_sweep.sh webots_sweep.log
|
||
|
||
# Train LSTM (sb3-contrib must be installed)
|
||
python -m training.rl.train_lstm \
|
||
--out training/runs/lstm_differential_field \
|
||
--total-timesteps 3000000 --use-webots-preset \
|
||
--drive-mode differential --world field
|
||
```
|
||
|
||
---
|
||
|
||
## Hardware/environment
|
||
|
||
- 3.8 GB RAM, 8 GB swap, 2 cores. Memory pressure is real — saw the
|
||
OS OOM-kill RL training during chained `train_all` once. If you re-run
|
||
full pipelines, monitor memory and consider splitting.
|
||
- Conda env: `tir` at `/home/jalf/miniconda3/envs/tir/`. Has SB3,
|
||
sb3-contrib, PyTorch, gymnasium. Webots controllers point to this
|
||
python via the new `runtime.ini` files.
|
||
- Webots installed at `/usr/local/webots/`. Headless mode requires
|
||
`xvfb-run -a` (no X display on this machine).
|
||
|
||
---
|
||
|
||
## What I'd suggest for a fresh attempt at the 140° LiDAR gap
|
||
|
||
If the user wants you to keep pushing on it, the highest-EV experiment
|
||
not yet tried is:
|
||
|
||
**Consensus tracker** — modify `herding/perception/sheep_tracker.py` to
|
||
require K consecutive detections within a small radius before promoting
|
||
a track to "real." Phantom tracks from sporadic wall returns wouldn't
|
||
survive the K-step consensus; real sheep continuously visible in FOV
|
||
would. The current `max_new_tracks_per_step=1` rate-limits new tracks
|
||
but every detection still spawns one immediately.
|
||
|
||
Implementation sketch: add a "candidate" track type that doesn't appear
|
||
in `get_positions()`. After K (e.g. 3-5) consecutive matched detections,
|
||
promote candidate → real track. Roughly 30-50 lines of code.
|
||
|
||
This is a tracker-level fix at deploy time only, so it wouldn't require
|
||
retraining the policies — v1 BC/RL should transfer cleanly if the tracker
|
||
output looks more like what they were trained on (one position per real
|
||
sheep, no phantoms).
|
||
|
||
I would NOT recommend more architectural training experiments (DAgger
|
||
round 4, larger LSTM, etc.) — three independent approaches today already
|
||
showed the bottleneck is upstream of the policy.
|