a584a034e9
Repo hygiene pass after a long working session.
Files removed:
* stage1_train.log — runtime training log (~125 KB), shouldn't have
been tracked.
* training/bc/demos.npz — orphan default-name demos file from before
the world+drive-suffixed naming convention took over; no script
references it.
* training/runs/bc_dagger{1,2}_differential_field/policy.zip — failed
DAgger experiment artifacts. Per `memory/dagger_results.md` the
whole DAgger experiment hit 0/5 on Webots transfer; these checkpoints
have no consumers.
Untracked-but-deleted (no git change) — also cleaned from disk:
* Root-level runtime logs (43 *.log files, all unused — gitignored now).
* training/bc/{combined,dagger}*.npz (5 huge demo blobs, 2.6 GB
reclaimed; not committed).
* training/bc/v1/ (2.6 GB backup of pre-DAgger demos; reclaimed).
* training/runs/at_20260426_*/ (orphan timestamped runs; reclaimed).
* All __pycache__/.
Dead code removed:
* `herding/control/strombom.py::compute_action_debug` — no callers
anywhere in the repo.
* `herding/control/sequential.py::compute_action_debug` — same.
* `herding/control/universal.py::compute_action_diff` — same.
.gitignore extended to cover:
* All *.log files (training/eval/webots logs are runtime artifacts).
* training/bc/*.npz (re-collectable on demand by `make bc_demos`).
* training/bc/v1/.
* .pytest_cache, *.pyc, .claude/.
README refreshed:
* Mecanum + round-world coverage in the headline.
* Quick-start updated for DRIVE/WORLD-suffixed Makefile targets,
GT-bypass example, and the mecanum-retrain caveat.
* Layout reflects the actual current tree (config.py, both protos,
both worlds, all tools).
* Results table replaced with the Webots end-to-end numbers from
the 2026-05-16 sweep (8/8 diff combos + LiDAR/GT comparison).
Verification: 126 pytest cases still pass (was 126 going in — no
test-coverage regression from the dead-code removal).
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
232 lines
9.9 KiB
Markdown
232 lines
9.9 KiB
Markdown
# Autonomous Shepherd-Dog Herding (Webots + RL)
|
||
|
||
Group G25 — *Diogo Costa, Johnny Fernandes, Nelson Neto*
|
||
|
||
A shepherd dog that herds 1–10 sheep through a 3 m gate into an
|
||
external pen. Two worlds (`field` rectangular, `field_round` circular),
|
||
two drives (`differential`, `mecanum`), and four deployable control
|
||
modes:
|
||
|
||
| Mode | Source | Role |
|
||
|---|---|---|
|
||
| `strombom` | Strömbom et al. (2014) collect/drive heuristic | Analytic baseline |
|
||
| `sequential` | Single-target "pin-and-push" | Alternative analytic baseline |
|
||
| `bc` | Behaviour cloning of the universal teacher | Imitation learning result |
|
||
| `rl` | KL-regularised PPO fine-tune of `bc` | Reward-driven refinement |
|
||
|
||
## Perception
|
||
|
||
The dog perceives sheep **only through its front-mounted 140° LiDAR**
|
||
(180 rays, 12 m max range — see `protos/ShepherdDog.proto`). Each
|
||
control step:
|
||
|
||
1. Read `lidar.getRangeImage()`,
|
||
2. Cluster returns into world-frame `(x, y)` estimates
|
||
(`herding/perception/lidar_perception.py`),
|
||
3. Fold them into a multi-target tracker that maintains last-seen
|
||
positions for sheep currently outside the FOV
|
||
(`herding/perception/sheep_tracker.py`).
|
||
|
||
**LiDAR validation** (intermediate-goal item v from `docs/project.md`):
|
||
during development a diagnostic-dump controller captured 80 real
|
||
Webots scans plus the ground-truth sheep positions. Comparing
|
||
detections against GT showed clustered centroids match GT positions
|
||
within 0.15 m after the +SHEEP_RADIUS surface-to-centre correction —
|
||
i.e. the LiDAR pipeline produces correct sheep-position estimates
|
||
from the real Webots scan, validating the sensor for the herding
|
||
task.
|
||
|
||
The tracker outputs a `{name: (x, y)}` dict shaped exactly like the
|
||
prior receiver-based one, so Strömbom, Sequential, and the BC obs
|
||
builder all run unchanged on top of it. The 2D Gymnasium env
|
||
(`herding/perception/lidar_sim.py`) raycasts sheep discs at training time, so
|
||
demos collected in the env match the perception the deployed
|
||
controller sees in Webots.
|
||
|
||
Privileged ground-truth perception is available for ablation —
|
||
`HerdingEnv(use_lidar=False)`.
|
||
|
||
## Quick start
|
||
|
||
```bash
|
||
# 1. Set up the Python env (any venv with PyTorch + SB3)
|
||
pip install -r training/requirements.txt
|
||
|
||
# 2. Smoke test (126 pytest cases, < 1 s)
|
||
make test
|
||
|
||
# 3. Reproduce a full pipeline (DRIVE+WORLD specific, ~1 h CPU)
|
||
make DRIVE=differential WORLD=field # demos -> bc -> rl -> eval
|
||
make DRIVE=differential WORLD=field_round
|
||
make DRIVE=mecanum WORLD=field # see note below
|
||
make train_all # all 4 combos sequentially
|
||
|
||
# Individual stages (each rebuilds upstream artefacts if missing):
|
||
make DRIVE=differential WORLD=field bc_demos # sim demos
|
||
make DRIVE=differential WORLD=field bc # behaviour clone
|
||
make DRIVE=differential WORLD=field rl # KL-PPO fine-tune
|
||
make DRIVE=differential WORLD=field eval # 10-seed env eval
|
||
|
||
# 4. Run in Webots
|
||
tools/run_webots.sh 10 bc differential field # BC, diff, rect field
|
||
tools/run_webots.sh 10 rl differential field_round # RL, diff, round field
|
||
tools/run_webots.sh 5 strombom differential field # analytic baseline
|
||
HERDING_USE_GT=1 tools/run_webots.sh 5 strombom differential field
|
||
# GT bypass for ablation
|
||
```
|
||
|
||
`make help` lists every target and the overridable hyperparameters.
|
||
|
||
**Mecanum note**: the `ShepherdDogMecanum.proto` uses physical roller
|
||
hinges in Webots (committed 2026-05-16). The Webots calibration shows
|
||
a ~60% strafe efficiency and ~28% backward bleed compared to textbook
|
||
mecanum; the gym kinematics in `HERDING_MEC_WEBOTS` are tuned to
|
||
match. **Mecanum BC/RL policies need to be retrained against this
|
||
preset** — see `mecanum_proto_gap.md` in `memory/` for the 3-command
|
||
flow. The v1 policies in `training/runs/{bc,rl}_mecanum_*` predate the
|
||
proto rewrite and will not herd reliably in Webots until retrained.
|
||
|
||
## Documentation map
|
||
|
||
- This README is the project overview: architecture, quick start, and
|
||
headline results.
|
||
- `training/README.md` has the command-level training and evaluation
|
||
details for demo collection, BC, PPO fine-tuning, and policy artifacts.
|
||
- `docs/project.md` is the original course proposal/goals document, kept
|
||
for traceability rather than as run instructions.
|
||
|
||
## Layout
|
||
|
||
```
|
||
herding/ — perception / control / world primitives
|
||
config.py — frozen dataclasses for all tunable parameters;
|
||
named presets HERDING_DEFAULT / HERDING_WEBOTS /
|
||
HERDING_MEC_WEBOTS
|
||
world/
|
||
geometry.py field/pen constants, world-shape switch
|
||
diffdrive.py differential + mecanum kinematics
|
||
flocking_sim.py Reynolds + Strömbom 2014 sheep dynamics
|
||
perception/
|
||
lidar_sim.py fast 2D raycast for the gym env
|
||
lidar_perception.py scan → world-frame cluster centroids + filters
|
||
sheep_tracker.py multi-target NN tracker with FOV memory
|
||
and the consensus-promotion stage
|
||
obs.py 32-D order-invariant observation builder
|
||
control/
|
||
strombom.py canonical CoM collect/drive heuristic
|
||
(round-world aware)
|
||
sequential.py single-target "pin-and-push" alternative
|
||
universal.py teacher used for BC demo collection
|
||
(Strömbom + mecanum omega + straggler recovery)
|
||
active_scan.py rotate-on-empty + walk-to-centre fallback
|
||
modulation.py shared near-sheep speed-modulation helper
|
||
|
||
controllers/
|
||
sheep/sheep.py — Webots sheep controller
|
||
shepherd_dog/
|
||
shepherd_dog.py — Webots dog controller, mode-switched
|
||
policy_loader.py — SB3 PPO / RecurrentPPO loader with frame stack
|
||
|
||
training/
|
||
herding_env.py — Gymnasium env (LiDAR + tracker by default)
|
||
bc/collect.py — sim demos via the active-scan teacher
|
||
bc/pretrain.py — supervised BC into MLP
|
||
rl/train.py — KL-regularised PPO fine-tune of BC
|
||
rl/train_lstm.py — RecurrentPPO variant (ablation)
|
||
eval.py — analytic + learned policy comparison harness
|
||
runs/ — checkpoints (whitelisted in .gitignore)
|
||
requirements.txt
|
||
|
||
tests/ — 126 pytest cases, < 1 s on CPU
|
||
|
||
tools/
|
||
run_webots.sh — launch Webots with N sheep + chosen mode + world
|
||
webots_sweep.sh — headless sweep across modes × drives × worlds
|
||
webots_sweep_gt.sh — same with HERDING_USE_GT=1 (perfect perception)
|
||
calibrate_mecanum.sh — measure mecanum body velocity vs gym prediction
|
||
gen_mecanum_wheels.py — regenerate the 32 mecanum roller hinges
|
||
benchmark_lidar.py — tracker quality benchmark
|
||
|
||
Makefile — pipeline orchestrator
|
||
(make DRIVE=… WORLD=… rl, make train_all, …)
|
||
|
||
worlds/
|
||
field.wbt — rectangular world (3 m gate, external pen)
|
||
field_round.wbt — circular world (radius 15 m, same pen)
|
||
|
||
protos/
|
||
Sheep.proto — sheep robot
|
||
ShepherdDog.proto — diff-drive dog, 140° LiDAR
|
||
ShepherdDog360.proto — diff-drive dog, 360° LiDAR (ablation)
|
||
ShepherdDogMecanum.proto — 4-wheel mecanum with physical roller hinges
|
||
|
||
docs/project.md — original course proposal/goals
|
||
```
|
||
|
||
## Shared low-level control
|
||
|
||
Every dog mode (Strömbom, Sequential, BC, RL) routes its action
|
||
through `herding/control/modulation.py:modulate_speed_near_sheep`,
|
||
which scales action magnitude down when within ~2.5 m of the nearest
|
||
tracked sheep. This stops the dog from charging in at full speed and
|
||
scattering the flock. Direction (intent) is preserved.
|
||
|
||
All modes also share the same EMA action smoother in
|
||
`controllers/shepherd_dog/shepherd_dog.py:ACTION_SMOOTH = 0.55`.
|
||
|
||
## Results — Webots end-to-end, canonical 140° LiDAR
|
||
|
||
Each cell = "OK at step X" means the dog penned all N sheep in a single
|
||
trial, `HERDING_USE_GT=0` (LiDAR perception, no ground truth bypass),
|
||
default consensus tracker.
|
||
|
||
### Differential drive
|
||
|
||
| Mode | World | n=5 | n=10 |
|
||
|---|---|---:|---:|
|
||
| Strömbom | field | 7528 | 11620 |
|
||
| Strömbom | field_round | 8611 | 10339 |
|
||
| Sequential | field | 7135 | 16843 |
|
||
| Sequential | field_round | 6019 | 8494 |
|
||
| BC | field | 11698 | 15079 |
|
||
| BC | field_round | 7234 | 11320 |
|
||
| RL | field | 10039 | 13954 |
|
||
| RL | field_round | 5803 | 9151 |
|
||
|
||
RL is **strictly faster than BC** on every comparable cell.
|
||
|
||
### LiDAR vs GT bypass (diff drive)
|
||
|
||
GT bypass replaces the LiDAR tracker with perfect emitter positions.
|
||
LiDAR is the default; GT is a perception ablation
|
||
(`HERDING_USE_GT=1`):
|
||
|
||
| Mode | World | n=5 LiDAR | n=5 GT | n=10 LiDAR | n=10 GT |
|
||
|---|---|---:|---:|---:|---:|
|
||
| Strömbom | field | 7528 | **5254** | 11620 | **7342** |
|
||
| Strömbom | field_round | 8611 | **3631** | 10339 | **7084** |
|
||
| Sequential | field | **7135** | 11092 | 16843 | **8698** |
|
||
| Sequential | field_round | 6019 | **3454** | 8494 | **7324** |
|
||
|
||
GT is generally faster (perfect perception → fewer wasted steps).
|
||
Sequential n=5 / field is the one cell where GT is *slower* — its
|
||
straggler heuristic appears to over-correct when the dog has full
|
||
information.
|
||
|
||
### Mecanum (differential is the headline)
|
||
|
||
The `ShepherdDogMecanum.proto` was rewritten on 2026-05-16 with 32
|
||
physical roller hinges, giving true omnidirectional motion in Webots
|
||
(`tools/calibrate_mecanum.sh` confirms the X-pattern). The mecanum
|
||
calibration shows ~60% strafe efficiency vs textbook (vs ~89% on
|
||
forward), so v1 mecanum BC/RL policies trained on textbook gym
|
||
mecanum no longer herd reliably. The fix is staged but not run:
|
||
the gym now has `HERDING_MEC_WEBOTS` which matches Webots' physical
|
||
mecanum, and `training/bc/collect.py` / `training/rl/train.py` auto-
|
||
select this preset for mecanum runs. Retraining (≈ 2 h per combo,
|
||
4 combos) is the documented future step.
|
||
|
||
## License
|
||
|
||
Educational project for the *Topics in Intelligent Robotics* course.
|