bdaff6a3e1
Single-command picker that prompts for every experimental knob the project supports, then dispatches to `tools/run_webots.sh` with the matching env vars. The banner reminds the user that the interpreter path lives in `tools/setup_env.sh` (or `$HERDING_PYTHON`) so the "this conda path won't exist on another machine" trap is hard to fall into. Prompts, in order: Mode : bc | rl | strombom | sequential | universal Drive : differential | mecanum World : field | field_round LiDAR FOV : 140° | 360° (skipped when drive=mecanum) Dogs : 1 | 2 (axis-split — only ask leak if 2) Sheep : 1..10 Perception : LiDAR | GT bypass Headless : no (windowed) | yes (xvfb-run + fast mode) Each prompt has a default marked with `*`; pressing Enter through the whole flow runs the canonical demo (BC / diff / field / 140° / 1 dog / 5 sheep / LiDAR / windowed). The configuration is summarised in a boxed block before the final "Launch? [Y/n]" confirm. README quick-start now lists `tools/webots_menu.sh` as the recommended starting point and shows the env-var-prefixed launcher invocations (HERDING_LIDAR=360, HERDING_NDOGS=2, HERDING_USE_GT=1) for non-interactive use. 126 pytest cases still pass. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
253 lines
11 KiB
Markdown
253 lines
11 KiB
Markdown
# Autonomous Shepherd-Dog Herding (Webots + RL)
|
||
|
||
Group G25 — *Diogo Costa, Johnny Fernandes, Nelson Neto*
|
||
|
||
A shepherd dog that herds 1–10 sheep through a 3 m gate into an
|
||
external pen. Two worlds (`field` rectangular, `field_round` circular),
|
||
two drives (`differential`, `mecanum`), and four deployable control
|
||
modes:
|
||
|
||
| Mode | Source | Role |
|
||
|---|---|---|
|
||
| `strombom` | Strömbom et al. (2014) collect/drive heuristic | Analytic baseline |
|
||
| `sequential` | Single-target "pin-and-push" | Alternative analytic baseline |
|
||
| `bc` | Behaviour cloning of the universal teacher | Imitation learning result |
|
||
| `rl` | KL-regularised PPO fine-tune of `bc` | Reward-driven refinement |
|
||
|
||
## Perception
|
||
|
||
The dog perceives sheep **only through its front-mounted 140° LiDAR**
|
||
(180 rays, 12 m max range — see `protos/ShepherdDog.proto`). Each
|
||
control step:
|
||
|
||
1. Read `lidar.getRangeImage()`,
|
||
2. Cluster returns into world-frame `(x, y)` estimates
|
||
(`herding/perception/lidar_perception.py`),
|
||
3. Fold them into a multi-target tracker that maintains last-seen
|
||
positions for sheep currently outside the FOV
|
||
(`herding/perception/sheep_tracker.py`).
|
||
|
||
**LiDAR validation** (intermediate-goal item v from `docs/project.md`):
|
||
during development a diagnostic-dump controller captured 80 real
|
||
Webots scans plus the ground-truth sheep positions. Comparing
|
||
detections against GT showed clustered centroids match GT positions
|
||
within 0.15 m after the +SHEEP_RADIUS surface-to-centre correction —
|
||
i.e. the LiDAR pipeline produces correct sheep-position estimates
|
||
from the real Webots scan, validating the sensor for the herding
|
||
task.
|
||
|
||
The tracker outputs a `{name: (x, y)}` dict shaped exactly like the
|
||
prior receiver-based one, so Strömbom, Sequential, and the BC obs
|
||
builder all run unchanged on top of it. The 2D Gymnasium env
|
||
(`herding/perception/lidar_sim.py`) raycasts sheep discs at training time, so
|
||
demos collected in the env match the perception the deployed
|
||
controller sees in Webots.
|
||
|
||
Privileged ground-truth perception is available for ablation —
|
||
`HerdingEnv(use_lidar=False)`.
|
||
|
||
## Quick start
|
||
|
||
```bash
|
||
# 1. Set up the Python env (any venv with PyTorch + SB3)
|
||
pip install -r training/requirements.txt
|
||
|
||
# 2. Smoke test (126 pytest cases, < 1 s)
|
||
make test
|
||
|
||
# 3. Reproduce a full pipeline (DRIVE+WORLD specific, ~1 h CPU)
|
||
make DRIVE=differential WORLD=field # demos -> bc -> rl -> eval
|
||
make DRIVE=differential WORLD=field_round
|
||
make DRIVE=mecanum WORLD=field # see note below
|
||
make train_all # all 4 combos sequentially
|
||
|
||
# Individual stages (each rebuilds upstream artefacts if missing):
|
||
make DRIVE=differential WORLD=field bc_demos # sim demos
|
||
make DRIVE=differential WORLD=field bc # behaviour clone
|
||
make DRIVE=differential WORLD=field rl # KL-PPO fine-tune
|
||
make DRIVE=differential WORLD=field eval # 10-seed env eval
|
||
|
||
# 4. Run in Webots — interactive picker (recommended starting point)
|
||
tools/webots_menu.sh
|
||
# Prompts for mode / drive / world / LiDAR FOV / number of dogs /
|
||
# flock size / perception (LiDAR vs GT) / headless, then dispatches.
|
||
|
||
# Or invoke the launcher directly:
|
||
tools/run_webots.sh 10 bc differential field # BC, diff, rect field
|
||
tools/run_webots.sh 10 rl differential field_round # RL, diff, round field
|
||
tools/run_webots.sh 5 strombom differential field # analytic baseline
|
||
HERDING_USE_GT=1 tools/run_webots.sh 5 strombom differential field
|
||
# GT bypass ablation
|
||
HERDING_LIDAR=360 tools/run_webots.sh 5 bc differential field
|
||
# 360° FOV ablation
|
||
HERDING_NDOGS=2 HERDING_AXIS_LEAK=0.3 tools/run_webots.sh 5 strombom differential field
|
||
# dual-shepherd axis split
|
||
```
|
||
|
||
`make help` lists every Makefile target and the overridable hyperparameters.
|
||
|
||
**Mecanum note**: the `ShepherdDogMecanum.proto` uses physical roller
|
||
hinges in Webots. The Webots calibration shows ~60% strafe efficiency
|
||
and ~28% backward bleed compared to textbook mecanum; the gym
|
||
kinematics in `HERDING_MEC_WEBOTS` are tuned to match. **Mecanum BC/RL
|
||
policies need to be retrained against this preset** — see the retrain
|
||
flow in the Mecanum results section below.
|
||
|
||
## Documentation map
|
||
|
||
- This README is the project overview: architecture, quick start, and
|
||
headline results.
|
||
- `training/README.md` has the command-level training and evaluation
|
||
details for demo collection, BC, PPO fine-tuning, and policy artifacts.
|
||
- `docs/project.md` is the original course proposal/goals document, kept
|
||
for traceability rather than as run instructions.
|
||
|
||
## Layout
|
||
|
||
```
|
||
herding/ — perception / control / world primitives
|
||
config.py — frozen dataclasses for all tunable parameters;
|
||
named presets HERDING_DEFAULT / HERDING_WEBOTS /
|
||
HERDING_MEC_WEBOTS
|
||
world/
|
||
geometry.py field/pen constants, world-shape switch
|
||
diffdrive.py differential + mecanum kinematics
|
||
flocking_sim.py Reynolds + Strömbom 2014 sheep dynamics
|
||
perception/
|
||
lidar_sim.py fast 2D raycast for the gym env
|
||
lidar_perception.py scan → world-frame cluster centroids + filters
|
||
sheep_tracker.py multi-target NN tracker with FOV memory
|
||
and the consensus-promotion stage
|
||
obs.py 32-D order-invariant observation builder
|
||
control/
|
||
strombom.py canonical CoM collect/drive heuristic
|
||
(round-world aware)
|
||
sequential.py single-target "pin-and-push" alternative
|
||
universal.py teacher used for BC demo collection
|
||
(Strömbom + mecanum omega + straggler recovery)
|
||
active_scan.py rotate-on-empty + walk-to-centre fallback
|
||
modulation.py shared near-sheep speed-modulation helper
|
||
|
||
controllers/
|
||
sheep/sheep.py — Webots sheep controller
|
||
shepherd_dog/
|
||
shepherd_dog.py — Webots dog controller, mode-switched
|
||
policy_loader.py — SB3 PPO / RecurrentPPO loader with frame stack
|
||
|
||
training/
|
||
herding_env.py — Gymnasium env (LiDAR + tracker by default)
|
||
bc/collect.py — sim demos via the active-scan teacher
|
||
bc/pretrain.py — supervised BC into MLP
|
||
rl/train.py — KL-regularised PPO fine-tune of BC
|
||
rl/train_lstm.py — RecurrentPPO variant (ablation)
|
||
eval.py — analytic + learned policy comparison harness
|
||
runs/ — checkpoints (whitelisted in .gitignore)
|
||
requirements.txt
|
||
|
||
tests/ — 126 pytest cases, < 1 s on CPU
|
||
|
||
tools/
|
||
run_webots.sh — launch Webots with N sheep + chosen mode + world
|
||
webots_sweep.sh — headless sweep across modes × drives × worlds
|
||
webots_sweep_gt.sh — same with HERDING_USE_GT=1 (perfect perception)
|
||
calibrate_mecanum.sh — measure mecanum body velocity vs gym prediction
|
||
gen_mecanum_wheels.py — regenerate the 32 mecanum roller hinges
|
||
benchmark_lidar.py — tracker quality benchmark
|
||
|
||
Makefile — pipeline orchestrator
|
||
(make DRIVE=… WORLD=… rl, make train_all, …)
|
||
|
||
worlds/
|
||
field.wbt — rectangular world (3 m gate, external pen)
|
||
field_round.wbt — circular world (radius 15 m, same pen)
|
||
|
||
protos/
|
||
Sheep.proto — sheep robot
|
||
ShepherdDog.proto — diff-drive dog, 140° LiDAR
|
||
ShepherdDog360.proto — diff-drive dog, 360° LiDAR (ablation)
|
||
ShepherdDogMecanum.proto — 4-wheel mecanum with physical roller hinges
|
||
|
||
docs/project.md — original course proposal/goals
|
||
```
|
||
|
||
## Shared low-level control
|
||
|
||
Every dog mode (Strömbom, Sequential, BC, RL) routes its action
|
||
through `herding/control/modulation.py:modulate_speed_near_sheep`,
|
||
which scales action magnitude down when within ~2.5 m of the nearest
|
||
tracked sheep. This stops the dog from charging in at full speed and
|
||
scattering the flock. Direction (intent) is preserved.
|
||
|
||
All modes also share the same EMA action smoother in
|
||
`controllers/shepherd_dog/shepherd_dog.py:ACTION_SMOOTH = 0.55`.
|
||
|
||
## Results — Webots end-to-end, canonical 140° LiDAR
|
||
|
||
Each cell = "OK at step X" means the dog penned all N sheep in a single
|
||
trial, `HERDING_USE_GT=0` (LiDAR perception, no ground truth bypass),
|
||
default consensus tracker.
|
||
|
||
### Differential drive
|
||
|
||
| Mode | World | n=5 | n=10 |
|
||
|---|---|---:|---:|
|
||
| Strömbom | field | 7528 | 11620 |
|
||
| Strömbom | field_round | 8611 | 10339 |
|
||
| Sequential | field | 7135 | 16843 |
|
||
| Sequential | field_round | 6019 | 8494 |
|
||
| BC | field | 11698 | 15079 |
|
||
| BC | field_round | 7234 | 11320 |
|
||
| RL | field | 10039 | 13954 |
|
||
| RL | field_round | 5803 | 9151 |
|
||
|
||
RL is **strictly faster than BC** on every comparable cell.
|
||
|
||
### LiDAR vs GT bypass (diff drive)
|
||
|
||
GT bypass replaces the LiDAR tracker with perfect emitter positions.
|
||
LiDAR is the default; GT is a perception ablation
|
||
(`HERDING_USE_GT=1`):
|
||
|
||
| Mode | World | n=5 LiDAR | n=5 GT | n=10 LiDAR | n=10 GT |
|
||
|---|---|---:|---:|---:|---:|
|
||
| Strömbom | field | 7528 | **5254** | 11620 | **7342** |
|
||
| Strömbom | field_round | 8611 | **3631** | 10339 | **7084** |
|
||
| Sequential | field | **7135** | 11092 | 16843 | **8698** |
|
||
| Sequential | field_round | 6019 | **3454** | 8494 | **7324** |
|
||
|
||
GT is generally faster (perfect perception → fewer wasted steps).
|
||
Sequential n=5 / field is the one cell where GT is *slower* — its
|
||
straggler heuristic appears to over-correct when the dog has full
|
||
information.
|
||
|
||
### Mecanum (differential is the headline)
|
||
|
||
`ShepherdDogMecanum.proto` has 32 physical roller hinges giving true
|
||
omnidirectional motion in Webots — `tools/calibrate_mecanum.sh`
|
||
confirms the X-pattern. Calibration shows ~60% strafe efficiency vs
|
||
textbook (versus ~89% on forward), so the gym needs to match the
|
||
imperfect physical mecanum for the trained policy to compensate.
|
||
`HERDING_MEC_WEBOTS` is the matched preset; `training/bc/collect.py`
|
||
and `training/rl/train.py` auto-select it for mecanum runs. Mecanum
|
||
policies were trained on the textbook gym, so they need to be
|
||
retrained against `HERDING_MEC_WEBOTS` (≈ 2 h per combo, 4 combos):
|
||
|
||
```bash
|
||
python -m training.bc.collect \
|
||
--drive-mode mecanum --world field --use-webots-preset \
|
||
--out training/bc/demos_mecanum_field.npz
|
||
python -m training.bc.pretrain \
|
||
--demos training/bc/demos_mecanum_field.npz \
|
||
--out training/runs/bc_mecanum_field
|
||
python -m training.rl.train \
|
||
--bc training/runs/bc_mecanum_field \
|
||
--out training/runs/rl_mecanum_field \
|
||
--drive-mode mecanum --world field --use-webots-preset
|
||
```
|
||
|
||
Repeat for `field_round`.
|
||
|
||
## License
|
||
|
||
Educational project for the *Topics in Intelligent Robotics* course.
|