# Autonomous Shepherd-Dog Herding (Webots + RL) Group G25 — *Diogo Costa, Johnny Fernandes, Nelson Neto* A differential-drive shepherd dog that herds 1–10 sheep through a 3 m gate into an external pen. The dog has three modes: | Mode | Source | Notes | |---|---|---| | `rl` | Behavior cloning of an analytic teacher | The deliverable RL policy | | `strombom` | Strömbom (2014) collect/drive heuristic | Canonical baseline | | `sequential` | Single-target "pin and push" | Robust across n=1–10 | Plus three documented experimental teachers (`hybrid`, `drive_only`, `strombom_smooth`) — see `herding/` for details. ## Quick start ```bash # 1. Set up the Python env (any venv with PyTorch + SB3) pip install -r training/requirements.txt # 2. Smoke test python -m training.parity_test # 3. Reproduce the BC policy from scratch (~25 min on CPU) python -m tools.collect_demos --teacher strombom --out training/demos.npz \ --seeds-per-n 30 --subsample 3 python -m training.bc_pretrain --demos training/demos.npz \ --out training/runs/bc_flock --epochs 100 --net-arch 512,512 # 4. Evaluate python -m training.eval --policy training/runs/bc_flock \ --max-flock 10 --max-steps 30000 --n-seeds 5 # 5. Run in Webots (any of the three modes; n is the flock size) HERDING_POLICY_DIR=$PWD/training/runs/bc_flock tools/run_webots.sh 10 rl tools/run_webots.sh 10 strombom tools/run_webots.sh 10 sequential ``` ## Layout ``` herding/ — single source of truth (env + Webots both import) geometry.py — field/pen constants, robot specs flocking_sim.py — Reynolds-style sheep dynamics diffdrive.py — differential-drive kinematics obs.py — 32-D order-invariant observation builder strombom.py — canonical CoM-drive teacher sequential.py — single-target "pin-and-push" teacher hybrid.py — flock-then-funnel (experimental, did not scale) drive_only.py — Strömbom drive without collect (experimental) strombom_smooth.py — sigmoid-blended Strömbom (experimental) controllers/ sheep/sheep.py — Webots sheep controller (uses herding.flocking_sim) shepherd_dog/ shepherd_dog.py — Webots dog controller, mode-switched policy_loader.py — lazy SB3 PPO loader strombom.py — backwards-compat shim training/ herding_env.py — Gymnasium env (used for demo collection + eval) bc_pretrain.py — supervised BC of analytic teachers into MLP policy collect_demos.py — wrapper, see tools/ eval.py — RL / analytic comparison harness parity_test.py — smoke tests train_ppo.py — PPO/RL fine-tune (experimental, BC alone preferred) requirements.txt configs/ppo_default.yaml tools/ collect_demos.py — generate (obs, action) demonstrations run_webots.sh — launch Webots with N sheep + chosen controller mode worlds/ field.wbt — main world (3 m gate, external pen) protos/ — Sheep / ShepherdDog robot definitions docs/project.md — original project goals plan.md — design notes / decision log ``` ## Two cohesion regimes Sheep cohesion strength controls which teacher works: | Regime | `flocking_sim.py` setting | Strömbom | Sequential | |---|---|---:|---:| | **Tight** (current) | `w=3.0/1.0`, `dist=12` | works (flock-style) | breaks (cohesion fights single-sheep targeting) | | Loose | `w=1.5/0.6`, `dist=8` | breaks (flock fragments at gate) | works (1-by-1 style) | The codebase ships with the **tight** regime. To use the loose-regime Sequential clone, edit those constants in `herding/flocking_sim.py` and load `training/runs/bc_solo/`. ## Results Eval at `--max-steps 30000 --n-seeds 5`, deployment difficulty (full field spawn distribution): | n | Strömbom | Sequential | BC-flock (RL) | |---:|---:|---:|---:| | 1 | 100 % | 100 % | 100 % | | 5 | 100 % | 100 % | 80–100 % | | 8 | 100 % | 100 % | 80 % | | 10 | **100 %** | 80 % | **80 %** (mean_penned 8/10) | The BC policy hits ~80 % of the analytic teacher's success rate in 100 % neural-network inference, with no hand-coded logic. ## License Educational project for the *Topics in Intelligent Robotics* course.