Single-command picker that prompts for every experimental knob the project supports, then dispatches to `tools/run_webots.sh` with the matching env vars. The banner reminds the user that the interpreter path lives in `tools/setup_env.sh` (or `$HERDING_PYTHON`) so the "this conda path won't exist on another machine" trap is hard to fall into. Prompts, in order: Mode : bc | rl | strombom | sequential | universal Drive : differential | mecanum World : field | field_round LiDAR FOV : 140° | 360° (skipped when drive=mecanum) Dogs : 1 | 2 (axis-split — only ask leak if 2) Sheep : 1..10 Perception : LiDAR | GT bypass Headless : no (windowed) | yes (xvfb-run + fast mode) Each prompt has a default marked with `*`; pressing Enter through the whole flow runs the canonical demo (BC / diff / field / 140° / 1 dog / 5 sheep / LiDAR / windowed). The configuration is summarised in a boxed block before the final "Launch? [Y/n]" confirm. README quick-start now lists `tools/webots_menu.sh` as the recommended starting point and shows the env-var-prefixed launcher invocations (HERDING_LIDAR=360, HERDING_NDOGS=2, HERDING_USE_GT=1) for non-interactive use. 126 pytest cases still pass. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
11 KiB
Autonomous Shepherd-Dog Herding (Webots + RL)
Group G25 — Diogo Costa, Johnny Fernandes, Nelson Neto
A shepherd dog that herds 1–10 sheep through a 3 m gate into an
external pen. Two worlds (field rectangular, field_round circular),
two drives (differential, mecanum), and four deployable control
modes:
| Mode | Source | Role |
|---|---|---|
strombom |
Strömbom et al. (2014) collect/drive heuristic | Analytic baseline |
sequential |
Single-target "pin-and-push" | Alternative analytic baseline |
bc |
Behaviour cloning of the universal teacher | Imitation learning result |
rl |
KL-regularised PPO fine-tune of bc |
Reward-driven refinement |
Perception
The dog perceives sheep only through its front-mounted 140° LiDAR
(180 rays, 12 m max range — see protos/ShepherdDog.proto). Each
control step:
- Read
lidar.getRangeImage(), - Cluster returns into world-frame
(x, y)estimates (herding/perception/lidar_perception.py), - Fold them into a multi-target tracker that maintains last-seen
positions for sheep currently outside the FOV
(
herding/perception/sheep_tracker.py).
LiDAR validation (intermediate-goal item v from docs/project.md):
during development a diagnostic-dump controller captured 80 real
Webots scans plus the ground-truth sheep positions. Comparing
detections against GT showed clustered centroids match GT positions
within 0.15 m after the +SHEEP_RADIUS surface-to-centre correction —
i.e. the LiDAR pipeline produces correct sheep-position estimates
from the real Webots scan, validating the sensor for the herding
task.
The tracker outputs a {name: (x, y)} dict shaped exactly like the
prior receiver-based one, so Strömbom, Sequential, and the BC obs
builder all run unchanged on top of it. The 2D Gymnasium env
(herding/perception/lidar_sim.py) raycasts sheep discs at training time, so
demos collected in the env match the perception the deployed
controller sees in Webots.
Privileged ground-truth perception is available for ablation —
HerdingEnv(use_lidar=False).
Quick start
# 1. Set up the Python env (any venv with PyTorch + SB3)
pip install -r training/requirements.txt
# 2. Smoke test (126 pytest cases, < 1 s)
make test
# 3. Reproduce a full pipeline (DRIVE+WORLD specific, ~1 h CPU)
make DRIVE=differential WORLD=field # demos -> bc -> rl -> eval
make DRIVE=differential WORLD=field_round
make DRIVE=mecanum WORLD=field # see note below
make train_all # all 4 combos sequentially
# Individual stages (each rebuilds upstream artefacts if missing):
make DRIVE=differential WORLD=field bc_demos # sim demos
make DRIVE=differential WORLD=field bc # behaviour clone
make DRIVE=differential WORLD=field rl # KL-PPO fine-tune
make DRIVE=differential WORLD=field eval # 10-seed env eval
# 4. Run in Webots — interactive picker (recommended starting point)
tools/webots_menu.sh
# Prompts for mode / drive / world / LiDAR FOV / number of dogs /
# flock size / perception (LiDAR vs GT) / headless, then dispatches.
# Or invoke the launcher directly:
tools/run_webots.sh 10 bc differential field # BC, diff, rect field
tools/run_webots.sh 10 rl differential field_round # RL, diff, round field
tools/run_webots.sh 5 strombom differential field # analytic baseline
HERDING_USE_GT=1 tools/run_webots.sh 5 strombom differential field
# GT bypass ablation
HERDING_LIDAR=360 tools/run_webots.sh 5 bc differential field
# 360° FOV ablation
HERDING_NDOGS=2 HERDING_AXIS_LEAK=0.3 tools/run_webots.sh 5 strombom differential field
# dual-shepherd axis split
make help lists every Makefile target and the overridable hyperparameters.
Mecanum note: the ShepherdDogMecanum.proto uses physical roller
hinges in Webots. The Webots calibration shows ~60% strafe efficiency
and ~28% backward bleed compared to textbook mecanum; the gym
kinematics in HERDING_MEC_WEBOTS are tuned to match. Mecanum BC/RL
policies need to be retrained against this preset — see the retrain
flow in the Mecanum results section below.
Documentation map
- This README is the project overview: architecture, quick start, and headline results.
training/README.mdhas the command-level training and evaluation details for demo collection, BC, PPO fine-tuning, and policy artifacts.docs/project.mdis the original course proposal/goals document, kept for traceability rather than as run instructions.
Layout
herding/ — perception / control / world primitives
config.py — frozen dataclasses for all tunable parameters;
named presets HERDING_DEFAULT / HERDING_WEBOTS /
HERDING_MEC_WEBOTS
world/
geometry.py field/pen constants, world-shape switch
diffdrive.py differential + mecanum kinematics
flocking_sim.py Reynolds + Strömbom 2014 sheep dynamics
perception/
lidar_sim.py fast 2D raycast for the gym env
lidar_perception.py scan → world-frame cluster centroids + filters
sheep_tracker.py multi-target NN tracker with FOV memory
and the consensus-promotion stage
obs.py 32-D order-invariant observation builder
control/
strombom.py canonical CoM collect/drive heuristic
(round-world aware)
sequential.py single-target "pin-and-push" alternative
universal.py teacher used for BC demo collection
(Strömbom + mecanum omega + straggler recovery)
active_scan.py rotate-on-empty + walk-to-centre fallback
modulation.py shared near-sheep speed-modulation helper
controllers/
sheep/sheep.py — Webots sheep controller
shepherd_dog/
shepherd_dog.py — Webots dog controller, mode-switched
policy_loader.py — SB3 PPO / RecurrentPPO loader with frame stack
training/
herding_env.py — Gymnasium env (LiDAR + tracker by default)
bc/collect.py — sim demos via the active-scan teacher
bc/pretrain.py — supervised BC into MLP
rl/train.py — KL-regularised PPO fine-tune of BC
rl/train_lstm.py — RecurrentPPO variant (ablation)
eval.py — analytic + learned policy comparison harness
runs/ — checkpoints (whitelisted in .gitignore)
requirements.txt
tests/ — 126 pytest cases, < 1 s on CPU
tools/
run_webots.sh — launch Webots with N sheep + chosen mode + world
webots_sweep.sh — headless sweep across modes × drives × worlds
webots_sweep_gt.sh — same with HERDING_USE_GT=1 (perfect perception)
calibrate_mecanum.sh — measure mecanum body velocity vs gym prediction
gen_mecanum_wheels.py — regenerate the 32 mecanum roller hinges
benchmark_lidar.py — tracker quality benchmark
Makefile — pipeline orchestrator
(make DRIVE=… WORLD=… rl, make train_all, …)
worlds/
field.wbt — rectangular world (3 m gate, external pen)
field_round.wbt — circular world (radius 15 m, same pen)
protos/
Sheep.proto — sheep robot
ShepherdDog.proto — diff-drive dog, 140° LiDAR
ShepherdDog360.proto — diff-drive dog, 360° LiDAR (ablation)
ShepherdDogMecanum.proto — 4-wheel mecanum with physical roller hinges
docs/project.md — original course proposal/goals
Shared low-level control
Every dog mode (Strömbom, Sequential, BC, RL) routes its action
through herding/control/modulation.py:modulate_speed_near_sheep,
which scales action magnitude down when within ~2.5 m of the nearest
tracked sheep. This stops the dog from charging in at full speed and
scattering the flock. Direction (intent) is preserved.
All modes also share the same EMA action smoother in
controllers/shepherd_dog/shepherd_dog.py:ACTION_SMOOTH = 0.55.
Results — Webots end-to-end, canonical 140° LiDAR
Each cell = "OK at step X" means the dog penned all N sheep in a single
trial, HERDING_USE_GT=0 (LiDAR perception, no ground truth bypass),
default consensus tracker.
Differential drive
| Mode | World | n=5 | n=10 |
|---|---|---|---|
| Strömbom | field | 7528 | 11620 |
| Strömbom | field_round | 8611 | 10339 |
| Sequential | field | 7135 | 16843 |
| Sequential | field_round | 6019 | 8494 |
| BC | field | 11698 | 15079 |
| BC | field_round | 7234 | 11320 |
| RL | field | 10039 | 13954 |
| RL | field_round | 5803 | 9151 |
RL is strictly faster than BC on every comparable cell.
LiDAR vs GT bypass (diff drive)
GT bypass replaces the LiDAR tracker with perfect emitter positions.
LiDAR is the default; GT is a perception ablation
(HERDING_USE_GT=1):
| Mode | World | n=5 LiDAR | n=5 GT | n=10 LiDAR | n=10 GT |
|---|---|---|---|---|---|
| Strömbom | field | 7528 | 5254 | 11620 | 7342 |
| Strömbom | field_round | 8611 | 3631 | 10339 | 7084 |
| Sequential | field | 7135 | 11092 | 16843 | 8698 |
| Sequential | field_round | 6019 | 3454 | 8494 | 7324 |
GT is generally faster (perfect perception → fewer wasted steps). Sequential n=5 / field is the one cell where GT is slower — its straggler heuristic appears to over-correct when the dog has full information.
Mecanum (differential is the headline)
ShepherdDogMecanum.proto has 32 physical roller hinges giving true
omnidirectional motion in Webots — tools/calibrate_mecanum.sh
confirms the X-pattern. Calibration shows ~60% strafe efficiency vs
textbook (versus ~89% on forward), so the gym needs to match the
imperfect physical mecanum for the trained policy to compensate.
HERDING_MEC_WEBOTS is the matched preset; training/bc/collect.py
and training/rl/train.py auto-select it for mecanum runs. Mecanum
policies were trained on the textbook gym, so they need to be
retrained against HERDING_MEC_WEBOTS (≈ 2 h per combo, 4 combos):
python -m training.bc.collect \
--drive-mode mecanum --world field --use-webots-preset \
--out training/bc/demos_mecanum_field.npz
python -m training.bc.pretrain \
--demos training/bc/demos_mecanum_field.npz \
--out training/runs/bc_mecanum_field
python -m training.rl.train \
--bc training/runs/bc_mecanum_field \
--out training/runs/rl_mecanum_field \
--drive-mode mecanum --world field --use-webots-preset
Repeat for field_round.
License
Educational project for the Topics in Intelligent Robotics course.