Autonomous Shepherd-Dog Herding (Webots + RL)

Group G25 — Diogo Costa, Johnny Fernandes, Nelson Neto

A differential-drive shepherd dog that herds 1–10 sheep through a 3 m gate into an external pen. The dog has three deployable modes:

Mode	Source	Role
`strombom`	Strömbom et al. (2014) collect/drive heuristic	Analytic baseline
`bc`	Behaviour cloning of the Strömbom teacher	Imitation learning result
`rl`	KL-regularised PPO fine-tune of `bc`	Reward-driven refinement

sequential (single-target pin-and-push) is kept as an alternative analytic baseline. dagger is a data-collection mode, not deployment.

Perception

The dog perceives sheep only through its front-mounted 140° LiDAR (180 rays, 12 m max range — see protos/ShepherdDog.proto). Each control step:

Read lidar.getRangeImage(),
Cluster returns into world-frame (x, y) estimates (herding/lidar_perception.py),
Fold them into a multi-target tracker that maintains last-seen positions for sheep currently outside the FOV (herding/sheep_tracker.py).

The tracker outputs a {name: (x, y)} dict shaped exactly like the prior receiver-based one, so Strömbom, Sequential, and the BC obs builder all run unchanged on top of it. The 2D Gymnasium env (herding/lidar_sim.py) raycasts sheep discs at training time, so demos collected in the env match the perception the deployed controller sees in Webots.

Privileged ground-truth perception is available for ablation — HerdingEnv(use_lidar=False).

Quick start

# 1. Set up the Python env (any venv with PyTorch + SB3)
pip install -r training/requirements.txt

# 2. Smoke test
python -m training.parity_test

# 3. Reproduce the BC policy (~10 min on CPU: ~5 min demos + ~3 min BC)
python -m tools.collect_demos --teacher strombom \
    --out training/demos_v3.npz --seeds-per-n 15 --subsample 3 --frame-stack 4
python -m training.bc_pretrain --demos training/demos_v3.npz \
    --out training/runs/bc_v3 --epochs 60 --net-arch 512,512

# 4. Optional: DAgger from inside Webots if sim-trained doesn't transfer
tools/auto_dagger.sh 3 60
python -m tools.dagger_merge_train --out training/runs/bc_dagger

# 5. Evaluate (env)
python -m training.eval --policy training/runs/bc_v3 \
    --max-flock 10 --max-steps 8000 --n-seeds 5

# 6. Optional RL fine-tune of the BC policy (~40 min on CPU, 1 M steps)
python -m training.train_ppo \
    --bc training/runs/bc_v3 \
    --out training/runs/rl_v1 \
    --total-timesteps 1000000

# 7. Run in Webots
tools/run_webots.sh 10 bc          # behaviour-cloned MLP
tools/run_webots.sh 10 rl          # KL-PPO fine-tune
tools/run_webots.sh 10 strombom    # analytic baseline

Layout

herding/                  — single source of truth (env + Webots both import)
  geometry.py             — field/pen constants, robot specs
  flocking_sim.py         — Reynolds-style sheep dynamics
  diffdrive.py            — differential-drive kinematics
  control.py              — shared near-sheep speed-modulation helper
  obs.py                  — 32-D order-invariant observation builder
  strombom.py             — canonical CoM-drive teacher
  sequential.py           — single-target "pin-and-push" teacher
  active_scan.py          — wraps a base teacher with opening rotation +
                            walk-to-centre + speed modulation
  lidar_sim.py            — fast 2D raycast for the env (sheep + walls + posts)
  lidar_perception.py     — scan → world-frame cluster centroids + filters
  sheep_tracker.py        — multi-target NN tracker with FOV memory

controllers/
  sheep/sheep.py          — Webots sheep controller (uses herding.flocking_sim)
  shepherd_dog/
    shepherd_dog.py       — Webots dog controller, mode-switched
    policy_loader.py      — lazy SB3 policy loader (auto-detects frame stack)

training/
  herding_env.py          — Gymnasium env (LiDAR + tracker by default)
  bc_pretrain.py          — supervised BC of (obs, action) demos into MLP
  eval.py                 — analytic + BC policy comparison harness
  parity_test.py          — shape / determinism smoke test
  runs/                   — checkpoints (whitelisted in .gitignore)
  requirements.txt

tools/
  collect_demos.py        — sim demos via the active-scan teacher
  dagger_merge_train.py   — merge Webots-collected DAgger demos and retrain
  run_webots.sh           — launch Webots with N sheep + chosen mode
  auto_dagger.sh          — headless DAgger collection across many runs

worlds/
  field.wbt               — main world (3 m gate, external pen)

protos/                   — Sheep / ShepherdDog robot definitions
docs/project.md           — original project goals

Shared low-level control

Every dog mode (RL, Strömbom, Sequential, the DAgger teacher) routes its action through herding/control.py:modulate_speed_near_sheep, which scales action magnitude down when within ~2.5 m of the nearest tracked sheep. This stops the dog from charging in at full speed and scattering the flock. Direction (intent) is preserved.

All modes also share the same EMA action smoother in controllers/shepherd_dog/shepherd_dog.py:ACTION_SMOOTH = 0.55.

Webots results (steps to all-penned, fast mode)

Single seed per cell using worlds/field.wbt defaults. All modes hit 100 % pen rate; numbers shown are time-to-all-penned in simulation steps (16 ms each).

n	Strömbom	`bc`	`rl` (KL-PPO of `bc`)
3	5 800	9 800	4 800
5	10 200	9 200	9 800
8	14 000	17 600	15 400
10	18 600	19 600	12 000

The RL fine-tune is 39 % faster than bc on n=10 and 51 % faster on n=3, confirming the KL-anchored PPO actually finds reward-driven improvements over the BC imitation baseline rather than just collapsing back to it.

License

Educational project for the Topics in Intelligent Robotics course.

5.9 KiB Raw Blame History Unescape Escape