Autonomous Shepherd-Dog Herding (Webots + RL)
Group G25 — Diogo Costa, Johnny Fernandes, Nelson Neto
A differential-drive shepherd dog that herds 1–10 sheep through a 3 m gate into an external pen. The dog has three modes:
| Mode | Source | Notes |
|---|---|---|
rl |
Behavior cloning of an analytic teacher | The deliverable RL policy |
strombom |
Strömbom (2014) collect/drive heuristic | Canonical baseline |
sequential |
Single-target "pin and push" | Robust across n=1–10 |
Plus three documented experimental teachers (hybrid, drive_only,
strombom_smooth) — see herding/ for details.
Quick start
# 1. Set up the Python env (any venv with PyTorch + SB3)
pip install -r training/requirements.txt
# 2. Smoke test
python -m training.parity_test
# 3. Reproduce the BC policy from scratch (~25 min on CPU)
python -m tools.collect_demos --teacher strombom --out training/demos.npz \
--seeds-per-n 30 --subsample 3
python -m training.bc_pretrain --demos training/demos.npz \
--out training/runs/bc_flock --epochs 100 --net-arch 512,512
# 4. Evaluate
python -m training.eval --policy training/runs/bc_flock \
--max-flock 10 --max-steps 30000 --n-seeds 5
# 5. Run in Webots (any of the three modes; n is the flock size)
HERDING_POLICY_DIR=$PWD/training/runs/bc_flock tools/run_webots.sh 10 rl
tools/run_webots.sh 10 strombom
tools/run_webots.sh 10 sequential
Layout
herding/ — single source of truth (env + Webots both import)
geometry.py — field/pen constants, robot specs
flocking_sim.py — Reynolds-style sheep dynamics
diffdrive.py — differential-drive kinematics
obs.py — 32-D order-invariant observation builder
strombom.py — canonical CoM-drive teacher
sequential.py — single-target "pin-and-push" teacher
hybrid.py — flock-then-funnel (experimental, did not scale)
drive_only.py — Strömbom drive without collect (experimental)
strombom_smooth.py — sigmoid-blended Strömbom (experimental)
controllers/
sheep/sheep.py — Webots sheep controller (uses herding.flocking_sim)
shepherd_dog/
shepherd_dog.py — Webots dog controller, mode-switched
policy_loader.py — lazy SB3 PPO loader
strombom.py — backwards-compat shim
training/
herding_env.py — Gymnasium env (used for demo collection + eval)
bc_pretrain.py — supervised BC of analytic teachers into MLP policy
collect_demos.py — wrapper, see tools/
eval.py — RL / analytic comparison harness
parity_test.py — smoke tests
train_ppo.py — PPO/RL fine-tune (experimental, BC alone preferred)
requirements.txt
configs/ppo_default.yaml
tools/
collect_demos.py — generate (obs, action) demonstrations
run_webots.sh — launch Webots with N sheep + chosen controller mode
worlds/
field.wbt — main world (3 m gate, external pen)
protos/ — Sheep / ShepherdDog robot definitions
docs/project.md — original project goals
plan.md — design notes / decision log
Two cohesion regimes
Sheep cohesion strength controls which teacher works:
| Regime | flocking_sim.py setting |
Strömbom | Sequential |
|---|---|---|---|
| Tight (current) | w=3.0/1.0, dist=12 |
works (flock-style) | breaks (cohesion fights single-sheep targeting) |
| Loose | w=1.5/0.6, dist=8 |
breaks (flock fragments at gate) | works (1-by-1 style) |
The codebase ships with the tight regime. To use the loose-regime
Sequential clone, edit those constants in herding/flocking_sim.py and
load training/runs/bc_solo/.
Results
Eval at --max-steps 30000 --n-seeds 5, deployment difficulty (full
field spawn distribution):
| n | Strömbom | Sequential | BC-flock (RL) |
|---|---|---|---|
| 1 | 100 % | 100 % | 100 % |
| 5 | 100 % | 100 % | 80–100 % |
| 8 | 100 % | 100 % | 80 % |
| 10 | 100 % | 80 % | 80 % (mean_penned 8/10) |
The BC policy hits ~80 % of the analytic teacher's success rate in 100 % neural-network inference, with no hand-coded logic.
License
Educational project for the Topics in Intelligent Robotics course.