Files
TIR_PROJ/docs/handoff.md
T
Johnny Fernandes 27c0f65722 Mecanum Webots via Supervisor kinematic injection
Replace the failing ODE-rolled mecanum chassis dynamics with a
Supervisor.setVelocity call that uses the gym mecanum forward
kinematics formula directly. Wheel motors still spin (visual);
chassis motion comes from the gym model so training and deployment
match by construction.

Results (seed=42, n=10 sheep): BC + RL mecanum pen 10/10 in both
field and field_round. n=5 mecanum cells still 0/5 due to tracker
phantoms anchored to wall corners under the 360° LiDAR — documented
in docs/status.md as the remaining gap.

Cleanup: drop deploy-time hacks (HERDING_HEADING_*, HERDING_OMEGA_CLAMP,
HERDING_TRACKER_*) that were workarounds for the old ODE chaos;
revert the proto inertiaMatrix, roller dampingConstant, and reduced
motor torque since they no longer carry load; refresh comments
around the mecanum config presets.
2026-05-18 22:46:37 +00:00

288 lines
12 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Project handoff — TRI_PROJ2 herding (2026-05-16)
Context for a fresh model picking this project up. Project deadline: **2026-06-04**.
Branch: `test/johnny8`. Last commits: `876e14e` (LSTM), `dd5ac66` (core fixes).
---
## What this project is
Group G25 course project: an autonomous shepherd dog that herds 110 sheep through a gate into a pen. Two worlds (rectangular `field`, circular `field_round`), two drives (`differential`, `mecanum`), and five control strategies:
- `strombom` — analytical Strömbom collect/drive heuristic
- `sequential` — analytical single-target pin-and-push baseline
- `universal` — analytical teacher used to collect BC demos
- `bc` — MLP policy trained via behaviour cloning of `universal`
- `rl` — KL-regularised PPO fine-tune of `bc`
The dog perceives sheep only through a front-mounted LiDAR (`protos/ShepherdDog.proto`).
A 2D Gym env (`training/herding_env.py`) is used for training and headless evaluation;
Webots is used for sim-to-deployment validation.
See `docs/project.md` for the formal course objectives. See
`~/.claude/projects/-home-jalf-code-TRI-PROJ2/memory/` for the running notes
(`project_state.md`, `dagger_results.md`, `lstm_results.md`, `webots_perception_gap.md`).
---
## What's working today
Everything below is **verified**, with command lines you can copy-paste.
### Analytical strategies (Strömbom, Sequential, Universal)
Work in Webots with **GT bypass** (`HERDING_USE_GT=1`) — 12/12 trials across
both worlds × {5, 10 sheep}. User has signed off on GT bypass for these
analytical baselines (they take a position list as input; GT vs LiDAR is a
perception-layer concern, not a strategy concern).
Validated by `webots_sweep_gt.log` (full matrix, all OK).
### Gym performance (clean 360° LiDAR sim, default tracker)
```
BC diff/field: 96% avg (90-100% across n=1..10)
RL diff/field: 99% avg (90-100%)
BC diff/round: 58% ← weak combo
RL diff/round: 58% ← weak combo
BC mec/field: 86%
RL mec/field: 90%
BC mec/round: 73%
RL mec/round: 79%
```
Plus a Stage-2 `rl_fast` time-penalty pass on diff/field and mec/field
(`rl_fast_*` directories) that slightly accelerates time-to-pen with similar
success.
### Webots LiDAR — 360° proto variant (`protos/ShepherdDog360.proto`)
Created today as a robustness ablation. v1 policies (trained on default 360°
gym LiDAR) transfer cleanly:
```
strombom/sequential/universal: 12/12 OK
bc diff (5 and 10 sheep, both worlds): 3/4 OK (only diff/field n=10 timed out)
bc mecanum: 0/4 — separate dynamics gap
rl any: 0/4 — RL more brittle than BC, unexpectedly
```
Validated by `webots_sweep_360.log`.
---
## What does NOT work (despite multiple attempts)
**Any learned policy (BC, RL, DAgger, LSTM) in Webots LiDAR with the
canonical 140° FOV proto.** All hit the same wall: tracker phantom-track
patterns from real Webots LiDAR don't match what the gym FP-injection model
produces, so policies trained on the gym proxy can't handle the obs they see
in Webots.
Approaches tried today (all detailed in `~/.claude/projects/.../memory/`):
| Approach | Gym proxy | Webots LiDAR 140° |
|---|---|---|
| v1 MLP + frame stack, clean training | 99% | 0/5 |
| DAgger (3 rounds, privileged teacher labels) | 12% → 38% on proxy | 0/5 |
| LSTM RecurrentPPO from scratch, 3M steps | 69% clean / 2% proxy | 0/5 |
Diagnosis: gym `HERDING_WEBOTS` preset (`herding/config.py`) is an
approximation but not faithful to actual Webots LiDAR. Real Webots produces
~4 phantom tracks per step for 5 real sheep due to wall/post/leg returns;
gym injection uses a Poisson process at static anchor points which is
distributionally different.
---
## Critical bug fixes shipped today
If you're picking this up, these are real bugs that took hours to find:
1. **Webots controllers were silently crashing on numpy import.** Webots
launched them under system `python3` (no numpy). Fixed by adding
`runtime.ini` files at `controllers/{shepherd_dog,sheep}/runtime.ini`
that point Webots to the conda env's python.
2. **FP_RATE mismatch BC=0 vs RL=2 poisoned PPO.** Default in Makefile was
`FP_RATE=2.0` for RL but `--fp-rate 0.0` hard-coded for BC demos. PPO
stalled at 0% success for 1.46M steps. Now `FP_RATE=0.0` consistent.
3. **Tracker phantom-penned tracks.** `pen_latch_depth=0.5` was too shallow
(FPs at y≈-15 latched and lived forever). Now 2.0, and penned tracks
decay at `forget_steps × 8` instead of being eternal.
4. **HERDING_WEBOTS preset tuning** in `herding/config.py`
`max_new_tracks_per_step=1`, `static_reject=1.2`. Reduces phantom-track
spawning rate but doesn't eliminate it.
---
## Recommended path to a strong June 4 deliverable
You don't need to fix the 140° LiDAR gap — there's a defensible story
already. The article framing writes itself:
> "Wide-FOV (360°) LiDAR enables clean sim-to-real transfer of learned
> shepherding policies. Narrow-FOV (140°) introduces phantom-track noise
> that current policies cannot fully reject — closing this gap is future
> work, likely requiring either a faithful gym-side LiDAR model or
> Webots-in-the-loop training."
Concrete deliverable plan:
1. **Demo video and screenshots**: use the 360° proto for BC/RL demonstrations
and GT bypass for analyticals on 140°. All combos covered.
2. **Quantitative results**: gym eval already gives success%, mean steps.
Add a flock-dispersion metric (`max(distances from CoM)` at end of
episode) — about 30 lines in `eval.py`.
3. **Collision tracking**: add a counter in `HerdingEnv.step()` for
`dog-sheep distance < 0.30 m`. Currently the env knows about
`COLLISION_DIST` but doesn't expose it in info. ~20 lines.
4. **Mecanum**: the mecanum Webots dynamics gap is **separate** from the
perception issue. `tools/calibrate_mecanum.sh` exists for this. Run
it and see if it gives matching dynamics. This is the most valuable
remaining technical task — closing the mecanum gap would let you
complete the "diff vs mecanum" extra-merit comparison in
`docs/project.md`.
5. **Round world**: gym performance is ~58-79% across approaches. The
curved walls break Strömbom's "stand behind the centroid" geometry
(the position behind sometimes lies outside the field). Two cheap
tweaks worth trying: (a) a per-episode `W_RADIUS` reward bonus for
compact flocks (gather-first behavior), (b) curriculum on the env's
`difficulty` knob (already wired in `HerdingEnv`).
Bonuses still on the table (from `docs/project.md` extra merit):
- **Multi-shepherd axis-split** — user's idea, ~1 day work. Each dog
computes one component of the analytical Strömbom action. No multi-agent
RL needed.
- **Robustness / DR ablation** — FP/wheel-slip knobs exist; run an ablation
table.
---
## Repository layout (essentials)
```
herding/
config.py # HerdingConfig dataclasses, HERDING_DEFAULT / HERDING_WEBOTS presets
control/ # strombom.py, sequential.py, universal.py (analytical teachers)
perception/ # lidar_sim.py, lidar_perception.py, sheep_tracker.py
world/ # diffdrive.py kinematics, flocking_sim.py, geometry.py (PEN_*/GATE_*/FIELD_*)
training/
herding_env.py # Gym env: HerdingEnv. ~560 lines. Step/reset/reward/obs.
bc/
collect.py # Demo collector — supports --privileged and --dagger-policy
pretrain.py # MLP BC trainer (MSE + 1-cos loss)
rl/
train.py # KL-regularised PPO fine-tune of BC
train_lstm.py # NEW today: RecurrentPPO (sb3-contrib) from scratch
eval.py # Env-side evaluator; supports MLP + LSTM policies
runs/ # Trained artifacts (bc_*, rl_*, rl_fast_*, lstm_*)
v1_clean/ # Backup of pre-DAgger artifacts
controllers/
shepherd_dog/
shepherd_dog.py # Webots controller. Mode selection via HERDING_MODE env.
policy_loader.py # Auto-detects MLP vs LSTM zip. Handles obs / state.
runtime.ini # ← critical, points Webots to conda python
sheep/
runtime.ini # ← same fix
protos/
ShepherdDog.proto # canonical 140° FOV (matches the physical robot)
ShepherdDog360.proto # 360° variant for the FOV ablation / fallback delivery
ShepherdDogMecanum.proto
Sheep.proto
worlds/
field.wbt # rectangular world
field_round.wbt # circular world
tools/
run_webots.sh # launcher: tools/run_webots.sh N MODE DRIVE WORLD
webots_sweep.sh # full LiDAR sweep across all modes × drives × worlds
webots_sweep_gt.sh # same but with HERDING_USE_GT=1
dagger_round.sh # NEW today: one-shot DAgger collect + train
calibrate_mecanum.sh # mecanum dynamics calibration (not run today)
Makefile # Top-level: make train_all, make eval_all, etc.
```
---
## Quick commands
```bash
# Run pytest (111 tests, all passing)
make test
# Train one combo end-to-end (BC → RL → eval, ~1h on 2 cores)
make DRIVE=differential WORLD=field
# Train all 4 combos (~5h)
make train_all
# Eval an existing policy directory in gym
python -m training.eval --policy training/runs/rl_differential_field \
--max-flock 10 --max-steps 15000 --n-seeds 10 \
--drive-mode differential --world field
# Webots — analytical, GT bypass (this works for all combos)
HERDING_USE_GT=1 tools/run_webots.sh 5 strombom differential field
# Webots — BC with the 360° proto (currently the 140° proto is active;
# swap by editing protos/ShepherdDog.proto or use the 360° variant directly)
tools/run_webots.sh 5 bc differential field
# Headless full sweep (~80 min)
tools/webots_sweep.sh webots_sweep.log
# Train LSTM (sb3-contrib must be installed)
python -m training.rl.train_lstm \
--out training/runs/lstm_differential_field \
--total-timesteps 3000000 --use-webots-preset \
--drive-mode differential --world field
```
---
## Hardware/environment
- 3.8 GB RAM, 8 GB swap, 2 cores. Memory pressure is real — saw the
OS OOM-kill RL training during chained `train_all` once. If you re-run
full pipelines, monitor memory and consider splitting.
- Conda env: `tir` at `/home/jalf/miniconda3/envs/tir/`. Has SB3,
sb3-contrib, PyTorch, gymnasium. Webots controllers point to this
python via the new `runtime.ini` files.
- Webots installed at `/usr/local/webots/`. Headless mode requires
`xvfb-run -a` (no X display on this machine).
---
## What I'd suggest for a fresh attempt at the 140° LiDAR gap
If the user wants you to keep pushing on it, the highest-EV experiment
not yet tried is:
**Consensus tracker** — modify `herding/perception/sheep_tracker.py` to
require K consecutive detections within a small radius before promoting
a track to "real." Phantom tracks from sporadic wall returns wouldn't
survive the K-step consensus; real sheep continuously visible in FOV
would. The current `max_new_tracks_per_step=1` rate-limits new tracks
but every detection still spawns one immediately.
Implementation sketch: add a "candidate" track type that doesn't appear
in `get_positions()`. After K (e.g. 3-5) consecutive matched detections,
promote candidate → real track. Roughly 30-50 lines of code.
This is a tracker-level fix at deploy time only, so it wouldn't require
retraining the policies — v1 BC/RL should transfer cleanly if the tracker
output looks more like what they were trained on (one position per real
sheep, no phantoms).
I would NOT recommend more architectural training experiments (DAgger
round 4, larger LSTM, etc.) — three independent approaches today already
showed the bottleneck is upstream of the policy.