Checkpoint 3

2026-05-10 12:46:14 +01:00
parent 1bb9415414
commit 2a6db038df
16 changed files with 305 additions and 662 deletions
@@ -0,0 +1,115 @@
+# Autonomous Shepherd-Dog Herding (Webots + RL)
+
+Group G25 — *Diogo Costa, Johnny Fernandes, Nelson Neto*
+
+A differential-drive shepherd dog that herds 1–10 sheep through a 3 m
+gate into an external pen. The dog has three modes:
+
+| Mode | Source | Notes |
+|---|---|---|
+| `rl` | Behavior cloning of an analytic teacher | The deliverable RL policy |
+| `strombom` | Strömbom (2014) collect/drive heuristic | Canonical baseline |
+| `sequential` | Single-target "pin and push" | Robust across n=1–10 |
+
+Plus three documented experimental teachers (`hybrid`, `drive_only`,
+`strombom_smooth`) — see `herding/` for details.
+
+## Quick start
+
+```bash
+# 1. Set up the Python env (any venv with PyTorch + SB3)
+pip install -r training/requirements.txt
+
+# 2. Smoke test
+python -m training.parity_test
+
+# 3. Reproduce the BC policy from scratch (~25 min on CPU)
+python -m tools.collect_demos --teacher strombom --out training/demos.npz \
+    --seeds-per-n 30 --subsample 3
+python -m training.bc_pretrain --demos training/demos.npz \
+    --out training/runs/bc_flock --epochs 100 --net-arch 512,512
+
+# 4. Evaluate
+python -m training.eval --policy training/runs/bc_flock \
+    --max-flock 10 --max-steps 30000 --n-seeds 5
+
+# 5. Run in Webots (any of the three modes; n is the flock size)
+HERDING_POLICY_DIR=$PWD/training/runs/bc_flock tools/run_webots.sh 10 rl
+tools/run_webots.sh 10 strombom
+tools/run_webots.sh 10 sequential
+```
+
+## Layout
+
+```
+herding/                  — single source of truth (env + Webots both import)
+  geometry.py             — field/pen constants, robot specs
+  flocking_sim.py         — Reynolds-style sheep dynamics
+  diffdrive.py            — differential-drive kinematics
+  obs.py                  — 32-D order-invariant observation builder
+  strombom.py             — canonical CoM-drive teacher
+  sequential.py           — single-target "pin-and-push" teacher
+  hybrid.py               — flock-then-funnel (experimental, did not scale)
+  drive_only.py           — Strömbom drive without collect (experimental)
+  strombom_smooth.py      — sigmoid-blended Strömbom (experimental)
+
+controllers/
+  sheep/sheep.py          — Webots sheep controller (uses herding.flocking_sim)
+  shepherd_dog/
+    shepherd_dog.py       — Webots dog controller, mode-switched
+    policy_loader.py      — lazy SB3 PPO loader
+    strombom.py           — backwards-compat shim
+
+training/
+  herding_env.py          — Gymnasium env (used for demo collection + eval)
+  bc_pretrain.py          — supervised BC of analytic teachers into MLP policy
+  collect_demos.py — wrapper, see tools/
+  eval.py                 — RL / analytic comparison harness
+  parity_test.py          — smoke tests
+  train_ppo.py            — PPO/RL fine-tune (experimental, BC alone preferred)
+  requirements.txt
+  configs/ppo_default.yaml
+
+tools/
+  collect_demos.py        — generate (obs, action) demonstrations
+  run_webots.sh           — launch Webots with N sheep + chosen controller mode
+
+worlds/
+  field.wbt               — main world (3 m gate, external pen)
+
+protos/                   — Sheep / ShepherdDog robot definitions
+docs/project.md           — original project goals
+plan.md                   — design notes / decision log
+```
+
+## Two cohesion regimes
+
+Sheep cohesion strength controls which teacher works:
+
+| Regime | `flocking_sim.py` setting | Strömbom | Sequential |
+|---|---|---:|---:|
+| **Tight** (current) | `w=3.0/1.0`, `dist=12` | works (flock-style) | breaks (cohesion fights single-sheep targeting) |
+| Loose | `w=1.5/0.6`, `dist=8` | breaks (flock fragments at gate) | works (1-by-1 style) |
+
+The codebase ships with the **tight** regime. To use the loose-regime
+Sequential clone, edit those constants in `herding/flocking_sim.py` and
+load `training/runs/bc_solo/`.
+
+## Results
+
+Eval at `--max-steps 30000 --n-seeds 5`, deployment difficulty (full
+field spawn distribution):
+
+| n | Strömbom | Sequential | BC-flock (RL) |
+|---:|---:|---:|---:|
+| 1 | 100 % | 100 % | 100 % |
+| 5 | 100 % | 100 % | 80–100 % |
+| 8 | 100 % | 100 % | 80 % |
+| 10 | **100 %** | 80 % | **80 %** (mean_penned 8/10) |
+
+The BC policy hits ~80 % of the analytic teacher's success rate in 100 %
+neural-network inference, with no hand-coded logic.
+
+## License
+
+Educational project for the *Topics in Intelligent Robotics* course.