Checkpoint 6

This commit is contained in:
Johnny Fernandes
2026-05-11 10:35:48 +01:00
parent b457155538
commit fce0e0c786
27 changed files with 194 additions and 704 deletions
+38 -38
View File
@@ -12,7 +12,7 @@ gate into an external pen. The dog has three deployable modes:
| `rl` | KL-regularised PPO fine-tune of `bc` | Reward-driven refinement |
`sequential` (single-target pin-and-push) is kept as an alternative
analytic baseline. `dagger` is a data-collection mode, not deployment.
analytic baseline.
## Perception
@@ -28,13 +28,13 @@ control step:
(`herding/sheep_tracker.py`).
**LiDAR validation** (intermediate-goal item v from `docs/project.md`):
run the dog controller in `HERDING_MODE=diag` mode to capture 80
real Webots scans plus the ground-truth sheep positions in
`training/dagger/diag_<ts>.npz`. Comparing detections against GT in
that file showed clustered centroids match GT positions within 0.15 m
after the +SHEEP_RADIUS surface-to-centre correction — i.e. the
LiDAR pipeline produces correct sheep-position estimates from the
real Webots scan, validating the sensor for the herding task.
during development a diagnostic-dump controller captured 80 real
Webots scans plus the ground-truth sheep positions. Comparing
detections against GT showed clustered centroids match GT positions
within 0.15 m after the +SHEEP_RADIUS surface-to-centre correction —
i.e. the LiDAR pipeline produces correct sheep-position estimates
from the real Webots scan, validating the sensor for the herding
task.
The tracker outputs a `{name: (x, y)}` dict shaped exactly like the
prior receiver-based one, so Strömbom, Sequential, and the BC obs
@@ -53,7 +53,7 @@ Privileged ground-truth perception is available for ablation —
pip install -r training/requirements.txt
# 2. Smoke test
python -m training.parity_test
python -m tests.parity_test
# 3. Reproduce the BC policy (~10 min on CPU: ~5 min demos + ~3 min BC)
python -m tools.collect_demos --teacher strombom \
@@ -61,21 +61,17 @@ python -m tools.collect_demos --teacher strombom \
python -m training.bc_pretrain --demos training/demos.npz \
--out training/runs/bc --epochs 60 --net-arch 512,512
# 4. Optional: DAgger from inside Webots if sim-trained doesn't transfer
tools/auto_dagger.sh 3 60
python -m tools.dagger_merge_train --out training/runs/bc_dagger
# 5. Evaluate (env)
python -m training.eval --policy training/runs/bc \
--max-flock 10 --max-steps 8000 --n-seeds 5
# 6. Optional RL fine-tune of the BC policy (~40 min on CPU, 1 M steps)
# 4. KL-PPO fine-tune of the BC policy (~30 min on CPU, 1 M steps)
python -m training.train_ppo \
--bc training/runs/bc \
--out training/runs/rl \
--total-timesteps 1000000
# 7. Run in Webots
# 5. Evaluate (env)
python -m training.eval --policy training/runs/rl \
--max-flock 10 --max-steps 15000 --n-seeds 10
# 6. Run in Webots
tools/run_webots.sh 10 bc # behaviour-cloned MLP
tools/run_webots.sh 10 rl # KL-PPO fine-tune
tools/run_webots.sh 10 strombom # analytic baseline
@@ -84,22 +80,25 @@ tools/run_webots.sh 10 strombom # analytic baseline
## Layout
```
herding/ — single source of truth (env + Webots both import)
geometry.py — field/pen constants, robot specs
flocking_sim.py — Reynolds-style sheep dynamics
diffdrive.py — differential-drive kinematics
control.py — shared near-sheep speed-modulation helper
herding/ — perception / control / world primitives
obs.py — 32-D order-invariant observation builder
strombom.pycanonical CoM-drive teacher
sequential.py — single-target "pin-and-push" teacher
active_scan.py — wraps a base teacher with opening rotation +
walk-to-centre + speed modulation
lidar_sim.pyfast 2D raycast for the env (sheep + walls + posts)
lidar_perception.py — scan → world-frame cluster centroids + filters
sheep_tracker.py — multi-target NN tracker with FOV memory
world/ environment-side physics & geometry
geometry.py field/pen constants, robot specs
diffdrive.py differential-drive kinematics
flocking_sim.py Reynolds + Strömbom 2014 sheep dynamics
perception/ LiDAR → tracked-sheep pipeline
lidar_sim.py fast 2D raycast for the env
lidar_perception.py scan → world-frame cluster centroids + filters
sheep_tracker.py multi-target NN tracker with FOV memory
control/ — every dog mode's action source
strombom.py canonical CoM collect/drive heuristic
sequential.py single-target "pin-and-push" alternative
active_scan.py wraps a base teacher with opening rotation +
walk-to-centre fallback
modulation.py shared near-sheep speed-modulation helper
controllers/
sheep/sheep.py — Webots sheep controller (uses herding.flocking_sim)
sheep/sheep.py — Webots sheep controller (uses herding.world.flocking_sim)
shepherd_dog/
shepherd_dog.py — Webots dog controller, mode-switched
policy_loader.py — lazy SB3 policy loader (auto-detects frame stack)
@@ -107,16 +106,17 @@ controllers/
training/
herding_env.py — Gymnasium env (LiDAR + tracker by default)
bc_pretrain.py — supervised BC of (obs, action) demos into MLP
eval.py — analytic + BC policy comparison harness
parity_test.py — shape / determinism smoke test
train_ppo.py — KL-regularised PPO fine-tune of BC
eval.py — analytic + learned policy comparison harness
runs/ — checkpoints (whitelisted in .gitignore)
requirements.txt
tests/
parity_test.py — shape / determinism / baseline smoke test
tools/
collect_demos.py — sim demos via the active-scan teacher
dagger_merge_train.py — merge Webots-collected DAgger demos and retrain
run_webots.sh — launch Webots with N sheep + chosen mode
auto_dagger.sh — headless DAgger collection across many runs
worlds/
field.wbt — main world (3 m gate, external pen)
@@ -127,8 +127,8 @@ docs/project.md — original project goals
## Shared low-level control
Every dog mode (RL, Strömbom, Sequential, the DAgger teacher) routes
its action through `herding/control.py:modulate_speed_near_sheep`,
Every dog mode (Strömbom, Sequential, BC, RL) routes its action
through `herding/control/modulation.py:modulate_speed_near_sheep`,
which scales action magnitude down when within ~2.5 m of the nearest
tracked sheep. This stops the dog from charging in at full speed and
scattering the flock. Direction (intent) is preserved.