151 lines
5.9 KiB
Markdown
151 lines
5.9 KiB
Markdown
# Autonomous Shepherd-Dog Herding (Webots + RL)
|
||
|
||
Group G25 — *Diogo Costa, Johnny Fernandes, Nelson Neto*
|
||
|
||
A differential-drive shepherd dog that herds 1–10 sheep through a 3 m
|
||
gate into an external pen. The dog has three deployable modes:
|
||
|
||
| Mode | Source | Role |
|
||
|---|---|---|
|
||
| `strombom` | Strömbom et al. (2014) collect/drive heuristic | Analytic baseline |
|
||
| `bc` | Behaviour cloning of the Strömbom teacher | Imitation learning result |
|
||
| `rl` | KL-regularised PPO fine-tune of `bc` | Reward-driven refinement |
|
||
|
||
`sequential` (single-target pin-and-push) is kept as an alternative
|
||
analytic baseline. `dagger` is a data-collection mode, not deployment.
|
||
|
||
## Perception
|
||
|
||
The dog perceives sheep **only through its front-mounted 140° LiDAR**
|
||
(180 rays, 12 m max range — see `protos/ShepherdDog.proto`). Each
|
||
control step:
|
||
|
||
1. Read `lidar.getRangeImage()`,
|
||
2. Cluster returns into world-frame `(x, y)` estimates
|
||
(`herding/lidar_perception.py`),
|
||
3. Fold them into a multi-target tracker that maintains last-seen
|
||
positions for sheep currently outside the FOV
|
||
(`herding/sheep_tracker.py`).
|
||
|
||
The tracker outputs a `{name: (x, y)}` dict shaped exactly like the
|
||
prior receiver-based one, so Strömbom, Sequential, and the BC obs
|
||
builder all run unchanged on top of it. The 2D Gymnasium env
|
||
(`herding/lidar_sim.py`) raycasts sheep discs at training time, so
|
||
demos collected in the env match the perception the deployed
|
||
controller sees in Webots.
|
||
|
||
Privileged ground-truth perception is available for ablation —
|
||
`HerdingEnv(use_lidar=False)`.
|
||
|
||
## Quick start
|
||
|
||
```bash
|
||
# 1. Set up the Python env (any venv with PyTorch + SB3)
|
||
pip install -r training/requirements.txt
|
||
|
||
# 2. Smoke test
|
||
python -m training.parity_test
|
||
|
||
# 3. Reproduce the BC policy (~10 min on CPU: ~5 min demos + ~3 min BC)
|
||
python -m tools.collect_demos --teacher strombom \
|
||
--out training/demos_v3.npz --seeds-per-n 15 --subsample 3 --frame-stack 4
|
||
python -m training.bc_pretrain --demos training/demos_v3.npz \
|
||
--out training/runs/bc_v3 --epochs 60 --net-arch 512,512
|
||
|
||
# 4. Optional: DAgger from inside Webots if sim-trained doesn't transfer
|
||
tools/auto_dagger.sh 3 60
|
||
python -m tools.dagger_merge_train --out training/runs/bc_dagger
|
||
|
||
# 5. Evaluate (env)
|
||
python -m training.eval --policy training/runs/bc_v3 \
|
||
--max-flock 10 --max-steps 8000 --n-seeds 5
|
||
|
||
# 6. Optional RL fine-tune of the BC policy (~40 min on CPU, 1 M steps)
|
||
python -m training.train_ppo \
|
||
--bc training/runs/bc_v3 \
|
||
--out training/runs/rl_v1 \
|
||
--total-timesteps 1000000
|
||
|
||
# 7. Run in Webots
|
||
tools/run_webots.sh 10 bc # behaviour-cloned MLP
|
||
tools/run_webots.sh 10 rl # KL-PPO fine-tune
|
||
tools/run_webots.sh 10 strombom # analytic baseline
|
||
```
|
||
|
||
## Layout
|
||
|
||
```
|
||
herding/ — single source of truth (env + Webots both import)
|
||
geometry.py — field/pen constants, robot specs
|
||
flocking_sim.py — Reynolds-style sheep dynamics
|
||
diffdrive.py — differential-drive kinematics
|
||
control.py — shared near-sheep speed-modulation helper
|
||
obs.py — 32-D order-invariant observation builder
|
||
strombom.py — canonical CoM-drive teacher
|
||
sequential.py — single-target "pin-and-push" teacher
|
||
active_scan.py — wraps a base teacher with opening rotation +
|
||
walk-to-centre + speed modulation
|
||
lidar_sim.py — fast 2D raycast for the env (sheep + walls + posts)
|
||
lidar_perception.py — scan → world-frame cluster centroids + filters
|
||
sheep_tracker.py — multi-target NN tracker with FOV memory
|
||
|
||
controllers/
|
||
sheep/sheep.py — Webots sheep controller (uses herding.flocking_sim)
|
||
shepherd_dog/
|
||
shepherd_dog.py — Webots dog controller, mode-switched
|
||
policy_loader.py — lazy SB3 policy loader (auto-detects frame stack)
|
||
|
||
training/
|
||
herding_env.py — Gymnasium env (LiDAR + tracker by default)
|
||
bc_pretrain.py — supervised BC of (obs, action) demos into MLP
|
||
eval.py — analytic + BC policy comparison harness
|
||
parity_test.py — shape / determinism smoke test
|
||
runs/ — checkpoints (whitelisted in .gitignore)
|
||
requirements.txt
|
||
|
||
tools/
|
||
collect_demos.py — sim demos via the active-scan teacher
|
||
dagger_merge_train.py — merge Webots-collected DAgger demos and retrain
|
||
run_webots.sh — launch Webots with N sheep + chosen mode
|
||
auto_dagger.sh — headless DAgger collection across many runs
|
||
|
||
worlds/
|
||
field.wbt — main world (3 m gate, external pen)
|
||
|
||
protos/ — Sheep / ShepherdDog robot definitions
|
||
docs/project.md — original project goals
|
||
```
|
||
|
||
## Shared low-level control
|
||
|
||
Every dog mode (RL, Strömbom, Sequential, the DAgger teacher) routes
|
||
its action through `herding/control.py:modulate_speed_near_sheep`,
|
||
which scales action magnitude down when within ~2.5 m of the nearest
|
||
tracked sheep. This stops the dog from charging in at full speed and
|
||
scattering the flock. Direction (intent) is preserved.
|
||
|
||
All modes also share the same EMA action smoother in
|
||
`controllers/shepherd_dog/shepherd_dog.py:ACTION_SMOOTH = 0.55`.
|
||
|
||
## Webots results (steps to all-penned, fast mode)
|
||
|
||
Single seed per cell using `worlds/field.wbt` defaults. All modes hit
|
||
100 % pen rate; numbers shown are time-to-all-penned in simulation
|
||
steps (16 ms each).
|
||
|
||
| n | Strömbom | `bc` | `rl` (KL-PPO of `bc`) |
|
||
|---:|---:|---:|---:|
|
||
| 3 | 5 800 | 9 800 | **4 800** |
|
||
| 5 | 10 200 | 9 200 | 9 800 |
|
||
| 8 | 14 000 | 17 600 | **15 400** |
|
||
| 10 | 18 600 | 19 600 | **12 000** |
|
||
|
||
The RL fine-tune is **39 % faster than `bc` on n=10** and **51 % faster
|
||
on n=3**, confirming the KL-anchored PPO actually finds reward-driven
|
||
improvements over the BC imitation baseline rather than just collapsing
|
||
back to it.
|
||
|
||
## License
|
||
|
||
Educational project for the *Topics in Intelligent Robotics* course.
|