Checkpoint 3
This commit is contained in:
@@ -0,0 +1,115 @@
|
||||
# Autonomous Shepherd-Dog Herding (Webots + RL)
|
||||
|
||||
Group G25 — *Diogo Costa, Johnny Fernandes, Nelson Neto*
|
||||
|
||||
A differential-drive shepherd dog that herds 1–10 sheep through a 3 m
|
||||
gate into an external pen. The dog has three modes:
|
||||
|
||||
| Mode | Source | Notes |
|
||||
|---|---|---|
|
||||
| `rl` | Behavior cloning of an analytic teacher | The deliverable RL policy |
|
||||
| `strombom` | Strömbom (2014) collect/drive heuristic | Canonical baseline |
|
||||
| `sequential` | Single-target "pin and push" | Robust across n=1–10 |
|
||||
|
||||
Plus three documented experimental teachers (`hybrid`, `drive_only`,
|
||||
`strombom_smooth`) — see `herding/` for details.
|
||||
|
||||
## Quick start
|
||||
|
||||
```bash
|
||||
# 1. Set up the Python env (any venv with PyTorch + SB3)
|
||||
pip install -r training/requirements.txt
|
||||
|
||||
# 2. Smoke test
|
||||
python -m training.parity_test
|
||||
|
||||
# 3. Reproduce the BC policy from scratch (~25 min on CPU)
|
||||
python -m tools.collect_demos --teacher strombom --out training/demos.npz \
|
||||
--seeds-per-n 30 --subsample 3
|
||||
python -m training.bc_pretrain --demos training/demos.npz \
|
||||
--out training/runs/bc_flock --epochs 100 --net-arch 512,512
|
||||
|
||||
# 4. Evaluate
|
||||
python -m training.eval --policy training/runs/bc_flock \
|
||||
--max-flock 10 --max-steps 30000 --n-seeds 5
|
||||
|
||||
# 5. Run in Webots (any of the three modes; n is the flock size)
|
||||
HERDING_POLICY_DIR=$PWD/training/runs/bc_flock tools/run_webots.sh 10 rl
|
||||
tools/run_webots.sh 10 strombom
|
||||
tools/run_webots.sh 10 sequential
|
||||
```
|
||||
|
||||
## Layout
|
||||
|
||||
```
|
||||
herding/ — single source of truth (env + Webots both import)
|
||||
geometry.py — field/pen constants, robot specs
|
||||
flocking_sim.py — Reynolds-style sheep dynamics
|
||||
diffdrive.py — differential-drive kinematics
|
||||
obs.py — 32-D order-invariant observation builder
|
||||
strombom.py — canonical CoM-drive teacher
|
||||
sequential.py — single-target "pin-and-push" teacher
|
||||
hybrid.py — flock-then-funnel (experimental, did not scale)
|
||||
drive_only.py — Strömbom drive without collect (experimental)
|
||||
strombom_smooth.py — sigmoid-blended Strömbom (experimental)
|
||||
|
||||
controllers/
|
||||
sheep/sheep.py — Webots sheep controller (uses herding.flocking_sim)
|
||||
shepherd_dog/
|
||||
shepherd_dog.py — Webots dog controller, mode-switched
|
||||
policy_loader.py — lazy SB3 PPO loader
|
||||
strombom.py — backwards-compat shim
|
||||
|
||||
training/
|
||||
herding_env.py — Gymnasium env (used for demo collection + eval)
|
||||
bc_pretrain.py — supervised BC of analytic teachers into MLP policy
|
||||
collect_demos.py — wrapper, see tools/
|
||||
eval.py — RL / analytic comparison harness
|
||||
parity_test.py — smoke tests
|
||||
train_ppo.py — PPO/RL fine-tune (experimental, BC alone preferred)
|
||||
requirements.txt
|
||||
configs/ppo_default.yaml
|
||||
|
||||
tools/
|
||||
collect_demos.py — generate (obs, action) demonstrations
|
||||
run_webots.sh — launch Webots with N sheep + chosen controller mode
|
||||
|
||||
worlds/
|
||||
field.wbt — main world (3 m gate, external pen)
|
||||
|
||||
protos/ — Sheep / ShepherdDog robot definitions
|
||||
docs/project.md — original project goals
|
||||
plan.md — design notes / decision log
|
||||
```
|
||||
|
||||
## Two cohesion regimes
|
||||
|
||||
Sheep cohesion strength controls which teacher works:
|
||||
|
||||
| Regime | `flocking_sim.py` setting | Strömbom | Sequential |
|
||||
|---|---|---:|---:|
|
||||
| **Tight** (current) | `w=3.0/1.0`, `dist=12` | works (flock-style) | breaks (cohesion fights single-sheep targeting) |
|
||||
| Loose | `w=1.5/0.6`, `dist=8` | breaks (flock fragments at gate) | works (1-by-1 style) |
|
||||
|
||||
The codebase ships with the **tight** regime. To use the loose-regime
|
||||
Sequential clone, edit those constants in `herding/flocking_sim.py` and
|
||||
load `training/runs/bc_solo/`.
|
||||
|
||||
## Results
|
||||
|
||||
Eval at `--max-steps 30000 --n-seeds 5`, deployment difficulty (full
|
||||
field spawn distribution):
|
||||
|
||||
| n | Strömbom | Sequential | BC-flock (RL) |
|
||||
|---:|---:|---:|---:|
|
||||
| 1 | 100 % | 100 % | 100 % |
|
||||
| 5 | 100 % | 100 % | 80–100 % |
|
||||
| 8 | 100 % | 100 % | 80 % |
|
||||
| 10 | **100 %** | 80 % | **80 %** (mean_penned 8/10) |
|
||||
|
||||
The BC policy hits ~80 % of the analytic teacher's success rate in 100 %
|
||||
neural-network inference, with no hand-coded logic.
|
||||
|
||||
## License
|
||||
|
||||
Educational project for the *Topics in Intelligent Robotics* course.
|
||||
Reference in New Issue
Block a user