# Autonomous Shepherd-Dog Herding (Webots + RL) Group G25 — *Diogo Costa, Johnny Fernandes, Nelson Neto* A differential-drive shepherd dog that herds 1–10 sheep through a 3 m gate into an external pen. The dog has three deployable modes: | Mode | Source | Role | |---|---|---| | `strombom` | Strömbom et al. (2014) collect/drive heuristic | Analytic baseline | | `bc` | Behaviour cloning of the Strömbom teacher | Imitation learning result | | `rl` | KL-regularised PPO fine-tune of `bc` | Reward-driven refinement | `sequential` (single-target pin-and-push) is kept as an alternative analytic baseline. ## Perception The dog perceives sheep **only through its front-mounted 140° LiDAR** (180 rays, 12 m max range — see `protos/ShepherdDog.proto`). Each control step: 1. Read `lidar.getRangeImage()`, 2. Cluster returns into world-frame `(x, y)` estimates (`herding/perception/lidar_perception.py`), 3. Fold them into a multi-target tracker that maintains last-seen positions for sheep currently outside the FOV (`herding/perception/sheep_tracker.py`). **LiDAR validation** (intermediate-goal item v from `docs/project.md`): during development a diagnostic-dump controller captured 80 real Webots scans plus the ground-truth sheep positions. Comparing detections against GT showed clustered centroids match GT positions within 0.15 m after the +SHEEP_RADIUS surface-to-centre correction — i.e. the LiDAR pipeline produces correct sheep-position estimates from the real Webots scan, validating the sensor for the herding task. The tracker outputs a `{name: (x, y)}` dict shaped exactly like the prior receiver-based one, so Strömbom, Sequential, and the BC obs builder all run unchanged on top of it. The 2D Gymnasium env (`herding/perception/lidar_sim.py`) raycasts sheep discs at training time, so demos collected in the env match the perception the deployed controller sees in Webots. Privileged ground-truth perception is available for ablation — `HerdingEnv(use_lidar=False)`. ## Quick start ```bash # 1. Set up the Python env (any venv with PyTorch + SB3) pip install -r training/requirements.txt # 2. Smoke test (70 pytest cases, < 1 s) make test # 3. Reproduce the full pipeline (~30–60 min CPU) make # demos -> bc -> rl -> eval # Individual stages (each rebuilds upstream artefacts if missing): make bc_demos # sim demos make bc # behaviour clone make rl # KL-PPO fine-tune make eval # 10-seed env eval of rl # 4. Run in Webots make webots N=10 MODE=bc # behaviour-cloned MLP make webots N=10 MODE=rl # KL-PPO fine-tune make webots N=10 MODE=strombom # analytic baseline # (or invoke directly: tools/run_webots.sh 10 rl) ``` `make help` lists every target and the overridable hyperparameters (e.g. `make rl PPO_STEPS=2000000 KL=0.02`). ## Documentation map - This README is the project overview: architecture, quick start, and headline results. - `training/README.md` has the command-level training and evaluation details for demo collection, BC, PPO fine-tuning, and policy artifacts. - `docs/project.md` is the original course proposal/goals document, kept for traceability rather than as run instructions. ## Layout ``` herding/ — perception / control / world primitives world/ — environment-side physics & geometry geometry.py field/pen constants, robot specs diffdrive.py differential-drive kinematics flocking_sim.py Reynolds + Strömbom 2014 sheep dynamics perception/ — LiDAR → tracked-sheep pipeline lidar_sim.py fast 2D raycast for the env lidar_perception.py scan → world-frame cluster centroids + filters sheep_tracker.py multi-target NN tracker with FOV memory obs.py 32-D order-invariant observation builder control/ — every dog mode's action source strombom.py canonical CoM collect/drive heuristic sequential.py single-target "pin-and-push" alternative active_scan.py wraps a base teacher with opening rotation + walk-to-centre fallback modulation.py shared near-sheep speed-modulation helper controllers/ sheep/sheep.py — Webots sheep controller (uses herding.world.flocking_sim) shepherd_dog/ shepherd_dog.py — Webots dog controller, mode-switched policy_loader.py — lazy SB3 policy loader (auto-detects frame stack) training/ herding_env.py — Gymnasium env (LiDAR + tracker by default) bc/collect.py — sim demos via the active-scan teacher bc/pretrain.py — supervised BC of (obs, action) demos into MLP rl/train.py — KL-regularised PPO fine-tune of BC eval.py — analytic + learned policy comparison harness bc/demos.npz — collected demonstrations (gitignored) runs/ — checkpoints (whitelisted in .gitignore) requirements.txt tests/ conftest.py — pytest setup (adds project root to sys.path) test_geometry.py — geometric predicates + constants test_diffdrive.py — kinematics and (vx, vy) → wheel-speed map test_obs.py — observation builder (shape, normalisation, order) test_control.py — speed modulation + analytic teachers + active scan test_perception.py — LiDAR sim + clustering + tracker test_env.py — Gymnasium contract + determinism + reward tools/ run_webots.sh — launch Webots with N sheep + chosen mode Makefile — pipeline orchestrator (make / make rl / make test / …) worlds/ field.wbt — main world (3 m gate, external pen) protos/ — Sheep / ShepherdDog robot definitions docs/project.md — original course proposal/goals ``` ## Shared low-level control Every dog mode (Strömbom, Sequential, BC, RL) routes its action through `herding/control/modulation.py:modulate_speed_near_sheep`, which scales action magnitude down when within ~2.5 m of the nearest tracked sheep. This stops the dog from charging in at full speed and scattering the flock. Direction (intent) is preserved. All modes also share the same EMA action smoother in `controllers/shepherd_dog/shepherd_dog.py:ACTION_SMOOTH = 0.55`. ## Results — env eval, 10 seeds × n=1..10 `max_steps=15000`, full-field spawn distribution. Success rate per flock size, then mean steps over successful seeds. ### Success rate (%) | n | Strömbom | `bc` | `rl` | |---:|---:|---:|---:| | 1 | 30 | 80 | **90** | | 2 | 90 | 50 | **90** | | 3 | 60 | 90 | **90** | | 4 | 40 | 80 | **90** | | 5 | 60 | 70 | **100** | | 6 | 30 | 80 | 80 | | 7 | 70 | 80 | **100** | | 8 | 30 | 100 | **100** | | 9 | 40 | 90 | **100** | | 10 | 50 | 100 | **100** | ### Mean penned per episode (out of n) | n | Strömbom | `bc` | `rl` | |---:|---:|---:|---:| | 1 | 0.30 | 0.80 | **0.90** | | 5 | 3.90 | 4.10 | **5.00** | | 8 | 4.20 | 8.00 | **8.00** | | 10 | 7.40 | 10.00 | **10.00** | ### Takeaways - **BC clearly beats Strömbom** under realistic LiDAR conditions (full field, partial observability). Strömbom struggles on small flocks where a single sheep can spawn beyond the LiDAR's 12 m range; BC learned active perception from the demos. - **RL refines BC** without regressing on any cell. Ties or beats BC at every flock size; biggest gains at n=1 and n=4 where BC's imitation of Strömbom's drive heuristic was sub-optimal. - **Aggressive reward shaping doesn't help** — a more aggressive variant (β=0.02, W_TIME=-0.1, W_IMITATE=0, 3 M steps) trained as an ablation was strictly worse than the conservative tune shipped here (β=0.05, W_IMITATE=0.5, 1 M steps). ## License Educational project for the *Topics in Intelligent Robotics* course.