# Autonomous Shepherd-Dog Herding (Webots + RL) Group G25 — *Diogo Costa, Johnny Fernandes, Nelson Neto* A shepherd dog that herds 1–10 sheep through a 3 m gate into an external pen. Two worlds (`field` rectangular, `field_round` circular), two drives (`differential`, `mecanum`), and four deployable control modes: | Mode | Source | Role | |---|---|---| | `strombom` | Strömbom et al. (2014) collect/drive heuristic | Analytic baseline | | `sequential` | Single-target "pin-and-push" | Alternative analytic baseline | | `bc` | Behaviour cloning of the universal teacher | Imitation learning result | | `rl` | KL-regularised PPO fine-tune of `bc` | Reward-driven refinement | ## Perception The dog perceives sheep **only through its front-mounted 140° LiDAR** (180 rays, 12 m max range — see `protos/ShepherdDog.proto`). Each control step: 1. Read `lidar.getRangeImage()`, 2. Cluster returns into world-frame `(x, y)` estimates (`herding/perception/lidar_perception.py`), 3. Fold them into a multi-target tracker that maintains last-seen positions for sheep currently outside the FOV (`herding/perception/sheep_tracker.py`). **LiDAR validation** (intermediate-goal item v from `docs/project.md`): during development a diagnostic-dump controller captured 80 real Webots scans plus the ground-truth sheep positions. Comparing detections against GT showed clustered centroids match GT positions within 0.15 m after the +SHEEP_RADIUS surface-to-centre correction — i.e. the LiDAR pipeline produces correct sheep-position estimates from the real Webots scan, validating the sensor for the herding task. The tracker outputs a `{name: (x, y)}` dict shaped exactly like the prior receiver-based one, so Strömbom, Sequential, and the BC obs builder all run unchanged on top of it. The 2D Gymnasium env (`herding/perception/lidar_sim.py`) raycasts sheep discs at training time, so demos collected in the env match the perception the deployed controller sees in Webots. Privileged ground-truth perception is available for ablation — `HerdingEnv(use_lidar=False)`. ## Quick start ```bash # 1. Set up the Python env (any venv with PyTorch + SB3) pip install -r training/requirements.txt # 2. Smoke test (126 pytest cases, < 1 s) make test # 3. Reproduce a full pipeline (DRIVE+WORLD specific, ~1 h CPU) make DRIVE=differential WORLD=field # demos -> bc -> rl -> eval make DRIVE=differential WORLD=field_round make DRIVE=mecanum WORLD=field # see note below make train_all # all 4 combos sequentially # Individual stages (each rebuilds upstream artefacts if missing): make DRIVE=differential WORLD=field bc_demos # sim demos make DRIVE=differential WORLD=field bc # behaviour clone make DRIVE=differential WORLD=field rl # KL-PPO fine-tune make DRIVE=differential WORLD=field eval # 10-seed env eval # 4. Run in Webots — interactive picker (recommended starting point) tools/webots_menu.sh # Prompts for mode / drive / world / LiDAR FOV / number of dogs / # flock size / perception (LiDAR vs GT) / headless, then dispatches. # Or invoke the launcher directly: tools/run_webots.sh 10 bc differential field # BC, diff, rect field tools/run_webots.sh 10 rl differential field_round # RL, diff, round field tools/run_webots.sh 5 strombom differential field # analytic baseline HERDING_USE_GT=1 tools/run_webots.sh 5 strombom differential field # GT bypass ablation HERDING_LIDAR=360 tools/run_webots.sh 5 bc differential field # 360° FOV ablation HERDING_NDOGS=2 HERDING_AXIS_LEAK=0.3 tools/run_webots.sh 5 strombom differential field # dual-shepherd axis split ``` `make help` lists every Makefile target and the overridable hyperparameters. **Mecanum note**: the `ShepherdDogMecanum.proto` uses physical roller hinges in Webots. The Webots calibration shows ~60% strafe efficiency and ~28% backward bleed compared to textbook mecanum; the gym kinematics in `HERDING_MEC_WEBOTS` are tuned to match. **Mecanum BC/RL policies need to be retrained against this preset** — see the retrain flow in the Mecanum results section below. ## Documentation map - This README is the project overview: architecture, quick start, and headline results. - `training/README.md` has the command-level training and evaluation details for demo collection, BC, PPO fine-tuning, and policy artifacts. - `docs/project.md` is the original course proposal/goals document, kept for traceability rather than as run instructions. ## Layout ``` herding/ — perception / control / world primitives config.py — frozen dataclasses for all tunable parameters; named presets HERDING_DEFAULT / HERDING_WEBOTS / HERDING_MEC_WEBOTS world/ geometry.py field/pen constants, world-shape switch diffdrive.py differential + mecanum kinematics flocking_sim.py Reynolds + Strömbom 2014 sheep dynamics perception/ lidar_sim.py fast 2D raycast for the gym env lidar_perception.py scan → world-frame cluster centroids + filters sheep_tracker.py multi-target NN tracker with FOV memory and the consensus-promotion stage obs.py 32-D order-invariant observation builder control/ strombom.py canonical CoM collect/drive heuristic (round-world aware) sequential.py single-target "pin-and-push" alternative universal.py teacher used for BC demo collection (Strömbom + mecanum omega + straggler recovery) active_scan.py rotate-on-empty + walk-to-centre fallback modulation.py shared near-sheep speed-modulation helper controllers/ sheep/sheep.py — Webots sheep controller shepherd_dog/ shepherd_dog.py — Webots dog controller, mode-switched policy_loader.py — SB3 PPO / RecurrentPPO loader with frame stack training/ herding_env.py — Gymnasium env (LiDAR + tracker by default) bc/collect.py — sim demos via the active-scan teacher bc/pretrain.py — supervised BC into MLP rl/train.py — KL-regularised PPO fine-tune of BC rl/train_lstm.py — RecurrentPPO variant (ablation) eval.py — analytic + learned policy comparison harness runs/ — checkpoints (whitelisted in .gitignore) requirements.txt tests/ — 126 pytest cases, < 1 s on CPU tools/ run_webots.sh — launch Webots with N sheep + chosen mode + world webots_sweep.sh — headless sweep across modes × drives × worlds webots_sweep_gt.sh — same with HERDING_USE_GT=1 (perfect perception) calibrate_mecanum.sh — measure mecanum body velocity vs gym prediction gen_mecanum_wheels.py — regenerate the 32 mecanum roller hinges benchmark_lidar.py — tracker quality benchmark Makefile — pipeline orchestrator (make DRIVE=… WORLD=… rl, make train_all, …) worlds/ field.wbt — rectangular world (3 m gate, external pen) field_round.wbt — circular world (radius 15 m, same pen) protos/ Sheep.proto — sheep robot ShepherdDog.proto — diff-drive dog, 140° LiDAR ShepherdDog360.proto — diff-drive dog, 360° LiDAR (ablation) ShepherdDogMecanum.proto — 4-wheel mecanum with physical roller hinges docs/project.md — original course proposal/goals ``` ## Shared low-level control Every dog mode (Strömbom, Sequential, BC, RL) routes its action through `herding/control/modulation.py:modulate_speed_near_sheep`, which scales action magnitude down when within ~2.5 m of the nearest tracked sheep. This stops the dog from charging in at full speed and scattering the flock. Direction (intent) is preserved. All modes also share the same EMA action smoother in `controllers/shepherd_dog/shepherd_dog.py:ACTION_SMOOTH = 0.55`. ## Results — Webots end-to-end, canonical 140° LiDAR Each cell = "OK at step X" means the dog penned all N sheep in a single trial, `HERDING_USE_GT=0` (LiDAR perception, no ground truth bypass), default consensus tracker. ### Differential drive | Mode | World | n=5 | n=10 | |---|---|---:|---:| | Strömbom | field | 7528 | 11620 | | Strömbom | field_round | 8611 | 10339 | | Sequential | field | 7135 | 16843 | | Sequential | field_round | 6019 | 8494 | | BC | field | 11698 | 15079 | | BC | field_round | 7234 | 11320 | | RL | field | 10039 | 13954 | | RL | field_round | 5803 | 9151 | RL is **strictly faster than BC** on every comparable cell. ### LiDAR vs GT bypass (diff drive) GT bypass replaces the LiDAR tracker with perfect emitter positions. LiDAR is the default; GT is a perception ablation (`HERDING_USE_GT=1`): | Mode | World | n=5 LiDAR | n=5 GT | n=10 LiDAR | n=10 GT | |---|---|---:|---:|---:|---:| | Strömbom | field | 7528 | **5254** | 11620 | **7342** | | Strömbom | field_round | 8611 | **3631** | 10339 | **7084** | | Sequential | field | **7135** | 11092 | 16843 | **8698** | | Sequential | field_round | 6019 | **3454** | 8494 | **7324** | GT is generally faster (perfect perception → fewer wasted steps). Sequential n=5 / field is the one cell where GT is *slower* — its straggler heuristic appears to over-correct when the dog has full information. ### Mecanum (differential is the headline) `ShepherdDogMecanum.proto` has 32 physical roller hinges giving true omnidirectional motion in Webots — `tools/calibrate_mecanum.sh` confirms the X-pattern. Calibration shows ~60% strafe efficiency vs textbook (versus ~89% on forward), so the gym needs to match the imperfect physical mecanum for the trained policy to compensate. `HERDING_MEC_WEBOTS` is the matched preset; `training/bc/collect.py` and `training/rl/train.py` auto-select it for mecanum runs. Mecanum policies were trained on the textbook gym, so they need to be retrained against `HERDING_MEC_WEBOTS` (≈ 2 h per combo, 4 combos): ```bash python -m training.bc.collect \ --drive-mode mecanum --world field --use-webots-preset \ --out training/bc/demos_mecanum_field.npz python -m training.bc.pretrain \ --demos training/bc/demos_mecanum_field.npz \ --out training/runs/bc_mecanum_field python -m training.rl.train \ --bc training/runs/bc_mecanum_field \ --out training/runs/rl_mecanum_field \ --drive-mode mecanum --world field --use-webots-preset ``` Repeat for `field_round`. ## License Educational project for the *Topics in Intelligent Robotics* course.