Checkpoint 7

This commit is contained in:
Johnny Fernandes
2026-05-11 12:21:51 +01:00
parent fce0e0c786
commit a01a5c9cef
34 changed files with 1266 additions and 1038 deletions
+34 -28
View File
@@ -22,10 +22,10 @@ control step:
1. Read `lidar.getRangeImage()`,
2. Cluster returns into world-frame `(x, y)` estimates
(`herding/lidar_perception.py`),
(`herding/perception/lidar_perception.py`),
3. Fold them into a multi-target tracker that maintains last-seen
positions for sheep currently outside the FOV
(`herding/sheep_tracker.py`).
(`herding/perception/sheep_tracker.py`).
**LiDAR validation** (intermediate-goal item v from `docs/project.md`):
during development a diagnostic-dump controller captured 80 real
@@ -39,7 +39,7 @@ task.
The tracker outputs a `{name: (x, y)}` dict shaped exactly like the
prior receiver-based one, so Strömbom, Sequential, and the BC obs
builder all run unchanged on top of it. The 2D Gymnasium env
(`herding/lidar_sim.py`) raycasts sheep discs at training time, so
(`herding/perception/lidar_sim.py`) raycasts sheep discs at training time, so
demos collected in the env match the perception the deployed
controller sees in Webots.
@@ -52,36 +52,32 @@ Privileged ground-truth perception is available for ablation —
# 1. Set up the Python env (any venv with PyTorch + SB3)
pip install -r training/requirements.txt
# 2. Smoke test
python -m tests.parity_test
# 2. Smoke test (70 pytest cases, < 1 s)
make test
# 3. Reproduce the BC policy (~10 min on CPU: ~5 min demos + ~3 min BC)
python -m tools.collect_demos --teacher strombom \
--out training/demos.npz --seeds-per-n 15 --subsample 3 --frame-stack 4
python -m training.bc_pretrain --demos training/demos.npz \
--out training/runs/bc --epochs 60 --net-arch 512,512
# 3. Reproduce the full pipeline (~3060 min CPU)
make # demos -> bc -> rl -> eval
# 4. KL-PPO fine-tune of the BC policy (~30 min on CPU, 1 M steps)
python -m training.train_ppo \
--bc training/runs/bc \
--out training/runs/rl \
--total-timesteps 1000000
# Individual stages (each rebuilds upstream artefacts if missing):
make bc_demos # sim demos
make bc # behaviour clone
make rl # KL-PPO fine-tune
make eval # 10-seed env eval of rl
# 5. Evaluate (env)
python -m training.eval --policy training/runs/rl \
--max-flock 10 --max-steps 15000 --n-seeds 10
# 6. Run in Webots
tools/run_webots.sh 10 bc # behaviour-cloned MLP
tools/run_webots.sh 10 rl # KL-PPO fine-tune
tools/run_webots.sh 10 strombom # analytic baseline
# 4. Run in Webots
make webots N=10 MODE=bc # behaviour-cloned MLP
make webots N=10 MODE=rl # KL-PPO fine-tune
make webots N=10 MODE=strombom # analytic baseline
# (or invoke directly: tools/run_webots.sh 10 rl)
```
`make help` lists every target and the overridable hyperparameters
(e.g. `make rl PPO_STEPS=2000000 KL=0.02`).
## Layout
```
herding/ — perception / control / world primitives
obs.py — 32-D order-invariant observation builder
world/ — environment-side physics & geometry
geometry.py field/pen constants, robot specs
diffdrive.py differential-drive kinematics
@@ -90,6 +86,7 @@ herding/ — perception / control / world primitives
lidar_sim.py fast 2D raycast for the env
lidar_perception.py scan → world-frame cluster centroids + filters
sheep_tracker.py multi-target NN tracker with FOV memory
obs.py 32-D order-invariant observation builder
control/ — every dog mode's action source
strombom.py canonical CoM collect/drive heuristic
sequential.py single-target "pin-and-push" alternative
@@ -105,19 +102,28 @@ controllers/
training/
herding_env.py — Gymnasium env (LiDAR + tracker by default)
bc_pretrain.py — supervised BC of (obs, action) demos into MLP
train_ppo.py — KL-regularised PPO fine-tune of BC
bc/collect.py — sim demos via the active-scan teacher
bc/pretrain.py — supervised BC of (obs, action) demos into MLP
rl/train.py — KL-regularised PPO fine-tune of BC
eval.py — analytic + learned policy comparison harness
bc/demos.npz — collected demonstrations (gitignored)
runs/ — checkpoints (whitelisted in .gitignore)
requirements.txt
tests/
parity_test.py — shape / determinism / baseline smoke test
conftest.py — pytest setup (adds project root to sys.path)
test_geometry.py — geometric predicates + constants
test_diffdrive.py — kinematics and (vx, vy) → wheel-speed map
test_obs.py — observation builder (shape, normalisation, order)
test_control.py — speed modulation + analytic teachers + active scan
test_perception.py — LiDAR sim + clustering + tracker
test_env.py — Gymnasium contract + determinism + reward
tools/
collect_demos.py — sim demos via the active-scan teacher
run_webots.sh — launch Webots with N sheep + chosen mode
Makefile — pipeline orchestrator (make / make rl / make test / …)
worlds/
field.wbt — main world (3 m gate, external pen)