Files
TIR_PROJ/README.md
T
Johnny Fernandes 2a6db038df Checkpoint 3
2026-05-10 12:46:14 +01:00

4.3 KiB
Raw Blame History

Autonomous Shepherd-Dog Herding (Webots + RL)

Group G25 — Diogo Costa, Johnny Fernandes, Nelson Neto

A differential-drive shepherd dog that herds 110 sheep through a 3 m gate into an external pen. The dog has three modes:

Mode Source Notes
rl Behavior cloning of an analytic teacher The deliverable RL policy
strombom Strömbom (2014) collect/drive heuristic Canonical baseline
sequential Single-target "pin and push" Robust across n=110

Plus three documented experimental teachers (hybrid, drive_only, strombom_smooth) — see herding/ for details.

Quick start

# 1. Set up the Python env (any venv with PyTorch + SB3)
pip install -r training/requirements.txt

# 2. Smoke test
python -m training.parity_test

# 3. Reproduce the BC policy from scratch (~25 min on CPU)
python -m tools.collect_demos --teacher strombom --out training/demos.npz \
    --seeds-per-n 30 --subsample 3
python -m training.bc_pretrain --demos training/demos.npz \
    --out training/runs/bc_flock --epochs 100 --net-arch 512,512

# 4. Evaluate
python -m training.eval --policy training/runs/bc_flock \
    --max-flock 10 --max-steps 30000 --n-seeds 5

# 5. Run in Webots (any of the three modes; n is the flock size)
HERDING_POLICY_DIR=$PWD/training/runs/bc_flock tools/run_webots.sh 10 rl
tools/run_webots.sh 10 strombom
tools/run_webots.sh 10 sequential

Layout

herding/                  — single source of truth (env + Webots both import)
  geometry.py             — field/pen constants, robot specs
  flocking_sim.py         — Reynolds-style sheep dynamics
  diffdrive.py            — differential-drive kinematics
  obs.py                  — 32-D order-invariant observation builder
  strombom.py             — canonical CoM-drive teacher
  sequential.py           — single-target "pin-and-push" teacher
  hybrid.py               — flock-then-funnel (experimental, did not scale)
  drive_only.py           — Strömbom drive without collect (experimental)
  strombom_smooth.py      — sigmoid-blended Strömbom (experimental)

controllers/
  sheep/sheep.py          — Webots sheep controller (uses herding.flocking_sim)
  shepherd_dog/
    shepherd_dog.py       — Webots dog controller, mode-switched
    policy_loader.py      — lazy SB3 PPO loader
    strombom.py           — backwards-compat shim

training/
  herding_env.py          — Gymnasium env (used for demo collection + eval)
  bc_pretrain.py          — supervised BC of analytic teachers into MLP policy
  collect_demos.py — wrapper, see tools/
  eval.py                 — RL / analytic comparison harness
  parity_test.py          — smoke tests
  train_ppo.py            — PPO/RL fine-tune (experimental, BC alone preferred)
  requirements.txt
  configs/ppo_default.yaml

tools/
  collect_demos.py        — generate (obs, action) demonstrations
  run_webots.sh           — launch Webots with N sheep + chosen controller mode

worlds/
  field.wbt               — main world (3 m gate, external pen)

protos/                   — Sheep / ShepherdDog robot definitions
docs/project.md           — original project goals
plan.md                   — design notes / decision log

Two cohesion regimes

Sheep cohesion strength controls which teacher works:

Regime flocking_sim.py setting Strömbom Sequential
Tight (current) w=3.0/1.0, dist=12 works (flock-style) breaks (cohesion fights single-sheep targeting)
Loose w=1.5/0.6, dist=8 breaks (flock fragments at gate) works (1-by-1 style)

The codebase ships with the tight regime. To use the loose-regime Sequential clone, edit those constants in herding/flocking_sim.py and load training/runs/bc_solo/.

Results

Eval at --max-steps 30000 --n-seeds 5, deployment difficulty (full field spawn distribution):

n Strömbom Sequential BC-flock (RL)
1 100 % 100 % 100 %
5 100 % 100 % 80100 %
8 100 % 100 % 80 %
10 100 % 80 % 80 % (mean_penned 8/10)

The BC policy hits ~80 % of the analytic teacher's success rate in 100 % neural-network inference, with no hand-coded logic.

License

Educational project for the Topics in Intelligent Robotics course.