Checkpoint 6
This commit is contained in:
@@ -12,7 +12,7 @@ gate into an external pen. The dog has three deployable modes:
|
||||
| `rl` | KL-regularised PPO fine-tune of `bc` | Reward-driven refinement |
|
||||
|
||||
`sequential` (single-target pin-and-push) is kept as an alternative
|
||||
analytic baseline. `dagger` is a data-collection mode, not deployment.
|
||||
analytic baseline.
|
||||
|
||||
## Perception
|
||||
|
||||
@@ -28,13 +28,13 @@ control step:
|
||||
(`herding/sheep_tracker.py`).
|
||||
|
||||
**LiDAR validation** (intermediate-goal item v from `docs/project.md`):
|
||||
run the dog controller in `HERDING_MODE=diag` mode to capture 80
|
||||
real Webots scans plus the ground-truth sheep positions in
|
||||
`training/dagger/diag_<ts>.npz`. Comparing detections against GT in
|
||||
that file showed clustered centroids match GT positions within 0.15 m
|
||||
after the +SHEEP_RADIUS surface-to-centre correction — i.e. the
|
||||
LiDAR pipeline produces correct sheep-position estimates from the
|
||||
real Webots scan, validating the sensor for the herding task.
|
||||
during development a diagnostic-dump controller captured 80 real
|
||||
Webots scans plus the ground-truth sheep positions. Comparing
|
||||
detections against GT showed clustered centroids match GT positions
|
||||
within 0.15 m after the +SHEEP_RADIUS surface-to-centre correction —
|
||||
i.e. the LiDAR pipeline produces correct sheep-position estimates
|
||||
from the real Webots scan, validating the sensor for the herding
|
||||
task.
|
||||
|
||||
The tracker outputs a `{name: (x, y)}` dict shaped exactly like the
|
||||
prior receiver-based one, so Strömbom, Sequential, and the BC obs
|
||||
@@ -53,7 +53,7 @@ Privileged ground-truth perception is available for ablation —
|
||||
pip install -r training/requirements.txt
|
||||
|
||||
# 2. Smoke test
|
||||
python -m training.parity_test
|
||||
python -m tests.parity_test
|
||||
|
||||
# 3. Reproduce the BC policy (~10 min on CPU: ~5 min demos + ~3 min BC)
|
||||
python -m tools.collect_demos --teacher strombom \
|
||||
@@ -61,21 +61,17 @@ python -m tools.collect_demos --teacher strombom \
|
||||
python -m training.bc_pretrain --demos training/demos.npz \
|
||||
--out training/runs/bc --epochs 60 --net-arch 512,512
|
||||
|
||||
# 4. Optional: DAgger from inside Webots if sim-trained doesn't transfer
|
||||
tools/auto_dagger.sh 3 60
|
||||
python -m tools.dagger_merge_train --out training/runs/bc_dagger
|
||||
|
||||
# 5. Evaluate (env)
|
||||
python -m training.eval --policy training/runs/bc \
|
||||
--max-flock 10 --max-steps 8000 --n-seeds 5
|
||||
|
||||
# 6. Optional RL fine-tune of the BC policy (~40 min on CPU, 1 M steps)
|
||||
# 4. KL-PPO fine-tune of the BC policy (~30 min on CPU, 1 M steps)
|
||||
python -m training.train_ppo \
|
||||
--bc training/runs/bc \
|
||||
--out training/runs/rl \
|
||||
--total-timesteps 1000000
|
||||
|
||||
# 7. Run in Webots
|
||||
# 5. Evaluate (env)
|
||||
python -m training.eval --policy training/runs/rl \
|
||||
--max-flock 10 --max-steps 15000 --n-seeds 10
|
||||
|
||||
# 6. Run in Webots
|
||||
tools/run_webots.sh 10 bc # behaviour-cloned MLP
|
||||
tools/run_webots.sh 10 rl # KL-PPO fine-tune
|
||||
tools/run_webots.sh 10 strombom # analytic baseline
|
||||
@@ -84,22 +80,25 @@ tools/run_webots.sh 10 strombom # analytic baseline
|
||||
## Layout
|
||||
|
||||
```
|
||||
herding/ — single source of truth (env + Webots both import)
|
||||
geometry.py — field/pen constants, robot specs
|
||||
flocking_sim.py — Reynolds-style sheep dynamics
|
||||
diffdrive.py — differential-drive kinematics
|
||||
control.py — shared near-sheep speed-modulation helper
|
||||
herding/ — perception / control / world primitives
|
||||
obs.py — 32-D order-invariant observation builder
|
||||
strombom.py — canonical CoM-drive teacher
|
||||
sequential.py — single-target "pin-and-push" teacher
|
||||
active_scan.py — wraps a base teacher with opening rotation +
|
||||
walk-to-centre + speed modulation
|
||||
lidar_sim.py — fast 2D raycast for the env (sheep + walls + posts)
|
||||
lidar_perception.py — scan → world-frame cluster centroids + filters
|
||||
sheep_tracker.py — multi-target NN tracker with FOV memory
|
||||
world/ — environment-side physics & geometry
|
||||
geometry.py field/pen constants, robot specs
|
||||
diffdrive.py differential-drive kinematics
|
||||
flocking_sim.py Reynolds + Strömbom 2014 sheep dynamics
|
||||
perception/ — LiDAR → tracked-sheep pipeline
|
||||
lidar_sim.py fast 2D raycast for the env
|
||||
lidar_perception.py scan → world-frame cluster centroids + filters
|
||||
sheep_tracker.py multi-target NN tracker with FOV memory
|
||||
control/ — every dog mode's action source
|
||||
strombom.py canonical CoM collect/drive heuristic
|
||||
sequential.py single-target "pin-and-push" alternative
|
||||
active_scan.py wraps a base teacher with opening rotation +
|
||||
walk-to-centre fallback
|
||||
modulation.py shared near-sheep speed-modulation helper
|
||||
|
||||
controllers/
|
||||
sheep/sheep.py — Webots sheep controller (uses herding.flocking_sim)
|
||||
sheep/sheep.py — Webots sheep controller (uses herding.world.flocking_sim)
|
||||
shepherd_dog/
|
||||
shepherd_dog.py — Webots dog controller, mode-switched
|
||||
policy_loader.py — lazy SB3 policy loader (auto-detects frame stack)
|
||||
@@ -107,16 +106,17 @@ controllers/
|
||||
training/
|
||||
herding_env.py — Gymnasium env (LiDAR + tracker by default)
|
||||
bc_pretrain.py — supervised BC of (obs, action) demos into MLP
|
||||
eval.py — analytic + BC policy comparison harness
|
||||
parity_test.py — shape / determinism smoke test
|
||||
train_ppo.py — KL-regularised PPO fine-tune of BC
|
||||
eval.py — analytic + learned policy comparison harness
|
||||
runs/ — checkpoints (whitelisted in .gitignore)
|
||||
requirements.txt
|
||||
|
||||
tests/
|
||||
parity_test.py — shape / determinism / baseline smoke test
|
||||
|
||||
tools/
|
||||
collect_demos.py — sim demos via the active-scan teacher
|
||||
dagger_merge_train.py — merge Webots-collected DAgger demos and retrain
|
||||
run_webots.sh — launch Webots with N sheep + chosen mode
|
||||
auto_dagger.sh — headless DAgger collection across many runs
|
||||
|
||||
worlds/
|
||||
field.wbt — main world (3 m gate, external pen)
|
||||
@@ -127,8 +127,8 @@ docs/project.md — original project goals
|
||||
|
||||
## Shared low-level control
|
||||
|
||||
Every dog mode (RL, Strömbom, Sequential, the DAgger teacher) routes
|
||||
its action through `herding/control.py:modulate_speed_near_sheep`,
|
||||
Every dog mode (Strömbom, Sequential, BC, RL) routes its action
|
||||
through `herding/control/modulation.py:modulate_speed_near_sheep`,
|
||||
which scales action magnitude down when within ~2.5 m of the nearest
|
||||
tracked sheep. This stops the dog from charging in at full speed and
|
||||
scattering the flock. Direction (intent) is preserved.
|
||||
|
||||
Reference in New Issue
Block a user