116 lines
4.3 KiB
Markdown
116 lines
4.3 KiB
Markdown
# Autonomous Shepherd-Dog Herding (Webots + RL)
|
||
|
||
Group G25 — *Diogo Costa, Johnny Fernandes, Nelson Neto*
|
||
|
||
A differential-drive shepherd dog that herds 1–10 sheep through a 3 m
|
||
gate into an external pen. The dog has three modes:
|
||
|
||
| Mode | Source | Notes |
|
||
|---|---|---|
|
||
| `rl` | Behavior cloning of an analytic teacher | The deliverable RL policy |
|
||
| `strombom` | Strömbom (2014) collect/drive heuristic | Canonical baseline |
|
||
| `sequential` | Single-target "pin and push" | Robust across n=1–10 |
|
||
|
||
Plus three documented experimental teachers (`hybrid`, `drive_only`,
|
||
`strombom_smooth`) — see `herding/` for details.
|
||
|
||
## Quick start
|
||
|
||
```bash
|
||
# 1. Set up the Python env (any venv with PyTorch + SB3)
|
||
pip install -r training/requirements.txt
|
||
|
||
# 2. Smoke test
|
||
python -m training.parity_test
|
||
|
||
# 3. Reproduce the BC policy from scratch (~25 min on CPU)
|
||
python -m tools.collect_demos --teacher strombom --out training/demos.npz \
|
||
--seeds-per-n 30 --subsample 3
|
||
python -m training.bc_pretrain --demos training/demos.npz \
|
||
--out training/runs/bc_flock --epochs 100 --net-arch 512,512
|
||
|
||
# 4. Evaluate
|
||
python -m training.eval --policy training/runs/bc_flock \
|
||
--max-flock 10 --max-steps 30000 --n-seeds 5
|
||
|
||
# 5. Run in Webots (any of the three modes; n is the flock size)
|
||
HERDING_POLICY_DIR=$PWD/training/runs/bc_flock tools/run_webots.sh 10 rl
|
||
tools/run_webots.sh 10 strombom
|
||
tools/run_webots.sh 10 sequential
|
||
```
|
||
|
||
## Layout
|
||
|
||
```
|
||
herding/ — single source of truth (env + Webots both import)
|
||
geometry.py — field/pen constants, robot specs
|
||
flocking_sim.py — Reynolds-style sheep dynamics
|
||
diffdrive.py — differential-drive kinematics
|
||
obs.py — 32-D order-invariant observation builder
|
||
strombom.py — canonical CoM-drive teacher
|
||
sequential.py — single-target "pin-and-push" teacher
|
||
hybrid.py — flock-then-funnel (experimental, did not scale)
|
||
drive_only.py — Strömbom drive without collect (experimental)
|
||
strombom_smooth.py — sigmoid-blended Strömbom (experimental)
|
||
|
||
controllers/
|
||
sheep/sheep.py — Webots sheep controller (uses herding.flocking_sim)
|
||
shepherd_dog/
|
||
shepherd_dog.py — Webots dog controller, mode-switched
|
||
policy_loader.py — lazy SB3 PPO loader
|
||
strombom.py — backwards-compat shim
|
||
|
||
training/
|
||
herding_env.py — Gymnasium env (used for demo collection + eval)
|
||
bc_pretrain.py — supervised BC of analytic teachers into MLP policy
|
||
collect_demos.py — wrapper, see tools/
|
||
eval.py — RL / analytic comparison harness
|
||
parity_test.py — smoke tests
|
||
train_ppo.py — PPO/RL fine-tune (experimental, BC alone preferred)
|
||
requirements.txt
|
||
configs/ppo_default.yaml
|
||
|
||
tools/
|
||
collect_demos.py — generate (obs, action) demonstrations
|
||
run_webots.sh — launch Webots with N sheep + chosen controller mode
|
||
|
||
worlds/
|
||
field.wbt — main world (3 m gate, external pen)
|
||
|
||
protos/ — Sheep / ShepherdDog robot definitions
|
||
docs/project.md — original project goals
|
||
plan.md — design notes / decision log
|
||
```
|
||
|
||
## Two cohesion regimes
|
||
|
||
Sheep cohesion strength controls which teacher works:
|
||
|
||
| Regime | `flocking_sim.py` setting | Strömbom | Sequential |
|
||
|---|---|---:|---:|
|
||
| **Tight** (current) | `w=3.0/1.0`, `dist=12` | works (flock-style) | breaks (cohesion fights single-sheep targeting) |
|
||
| Loose | `w=1.5/0.6`, `dist=8` | breaks (flock fragments at gate) | works (1-by-1 style) |
|
||
|
||
The codebase ships with the **tight** regime. To use the loose-regime
|
||
Sequential clone, edit those constants in `herding/flocking_sim.py` and
|
||
load `training/runs/bc_solo/`.
|
||
|
||
## Results
|
||
|
||
Eval at `--max-steps 30000 --n-seeds 5`, deployment difficulty (full
|
||
field spawn distribution):
|
||
|
||
| n | Strömbom | Sequential | BC-flock (RL) |
|
||
|---:|---:|---:|---:|
|
||
| 1 | 100 % | 100 % | 100 % |
|
||
| 5 | 100 % | 100 % | 80–100 % |
|
||
| 8 | 100 % | 100 % | 80 % |
|
||
| 10 | **100 %** | 80 % | **80 %** (mean_penned 8/10) |
|
||
|
||
The BC policy hits ~80 % of the analytic teacher's success rate in 100 %
|
||
neural-network inference, with no hand-coded logic.
|
||
|
||
## License
|
||
|
||
Educational project for the *Topics in Intelligent Robotics* course.
|