Checkpoint 7
This commit is contained in:
@@ -22,10 +22,10 @@ control step:
|
||||
|
||||
1. Read `lidar.getRangeImage()`,
|
||||
2. Cluster returns into world-frame `(x, y)` estimates
|
||||
(`herding/lidar_perception.py`),
|
||||
(`herding/perception/lidar_perception.py`),
|
||||
3. Fold them into a multi-target tracker that maintains last-seen
|
||||
positions for sheep currently outside the FOV
|
||||
(`herding/sheep_tracker.py`).
|
||||
(`herding/perception/sheep_tracker.py`).
|
||||
|
||||
**LiDAR validation** (intermediate-goal item v from `docs/project.md`):
|
||||
during development a diagnostic-dump controller captured 80 real
|
||||
@@ -39,7 +39,7 @@ task.
|
||||
The tracker outputs a `{name: (x, y)}` dict shaped exactly like the
|
||||
prior receiver-based one, so Strömbom, Sequential, and the BC obs
|
||||
builder all run unchanged on top of it. The 2D Gymnasium env
|
||||
(`herding/lidar_sim.py`) raycasts sheep discs at training time, so
|
||||
(`herding/perception/lidar_sim.py`) raycasts sheep discs at training time, so
|
||||
demos collected in the env match the perception the deployed
|
||||
controller sees in Webots.
|
||||
|
||||
@@ -52,36 +52,32 @@ Privileged ground-truth perception is available for ablation —
|
||||
# 1. Set up the Python env (any venv with PyTorch + SB3)
|
||||
pip install -r training/requirements.txt
|
||||
|
||||
# 2. Smoke test
|
||||
python -m tests.parity_test
|
||||
# 2. Smoke test (70 pytest cases, < 1 s)
|
||||
make test
|
||||
|
||||
# 3. Reproduce the BC policy (~10 min on CPU: ~5 min demos + ~3 min BC)
|
||||
python -m tools.collect_demos --teacher strombom \
|
||||
--out training/demos.npz --seeds-per-n 15 --subsample 3 --frame-stack 4
|
||||
python -m training.bc_pretrain --demos training/demos.npz \
|
||||
--out training/runs/bc --epochs 60 --net-arch 512,512
|
||||
# 3. Reproduce the full pipeline (~30–60 min CPU)
|
||||
make # demos -> bc -> rl -> eval
|
||||
|
||||
# 4. KL-PPO fine-tune of the BC policy (~30 min on CPU, 1 M steps)
|
||||
python -m training.train_ppo \
|
||||
--bc training/runs/bc \
|
||||
--out training/runs/rl \
|
||||
--total-timesteps 1000000
|
||||
# Individual stages (each rebuilds upstream artefacts if missing):
|
||||
make bc_demos # sim demos
|
||||
make bc # behaviour clone
|
||||
make rl # KL-PPO fine-tune
|
||||
make eval # 10-seed env eval of rl
|
||||
|
||||
# 5. Evaluate (env)
|
||||
python -m training.eval --policy training/runs/rl \
|
||||
--max-flock 10 --max-steps 15000 --n-seeds 10
|
||||
|
||||
# 6. Run in Webots
|
||||
tools/run_webots.sh 10 bc # behaviour-cloned MLP
|
||||
tools/run_webots.sh 10 rl # KL-PPO fine-tune
|
||||
tools/run_webots.sh 10 strombom # analytic baseline
|
||||
# 4. Run in Webots
|
||||
make webots N=10 MODE=bc # behaviour-cloned MLP
|
||||
make webots N=10 MODE=rl # KL-PPO fine-tune
|
||||
make webots N=10 MODE=strombom # analytic baseline
|
||||
# (or invoke directly: tools/run_webots.sh 10 rl)
|
||||
```
|
||||
|
||||
`make help` lists every target and the overridable hyperparameters
|
||||
(e.g. `make rl PPO_STEPS=2000000 KL=0.02`).
|
||||
|
||||
## Layout
|
||||
|
||||
```
|
||||
herding/ — perception / control / world primitives
|
||||
obs.py — 32-D order-invariant observation builder
|
||||
world/ — environment-side physics & geometry
|
||||
geometry.py field/pen constants, robot specs
|
||||
diffdrive.py differential-drive kinematics
|
||||
@@ -90,6 +86,7 @@ herding/ — perception / control / world primitives
|
||||
lidar_sim.py fast 2D raycast for the env
|
||||
lidar_perception.py scan → world-frame cluster centroids + filters
|
||||
sheep_tracker.py multi-target NN tracker with FOV memory
|
||||
obs.py 32-D order-invariant observation builder
|
||||
control/ — every dog mode's action source
|
||||
strombom.py canonical CoM collect/drive heuristic
|
||||
sequential.py single-target "pin-and-push" alternative
|
||||
@@ -105,19 +102,28 @@ controllers/
|
||||
|
||||
training/
|
||||
herding_env.py — Gymnasium env (LiDAR + tracker by default)
|
||||
bc_pretrain.py — supervised BC of (obs, action) demos into MLP
|
||||
train_ppo.py — KL-regularised PPO fine-tune of BC
|
||||
bc/collect.py — sim demos via the active-scan teacher
|
||||
bc/pretrain.py — supervised BC of (obs, action) demos into MLP
|
||||
rl/train.py — KL-regularised PPO fine-tune of BC
|
||||
eval.py — analytic + learned policy comparison harness
|
||||
bc/demos.npz — collected demonstrations (gitignored)
|
||||
runs/ — checkpoints (whitelisted in .gitignore)
|
||||
requirements.txt
|
||||
|
||||
tests/
|
||||
parity_test.py — shape / determinism / baseline smoke test
|
||||
conftest.py — pytest setup (adds project root to sys.path)
|
||||
test_geometry.py — geometric predicates + constants
|
||||
test_diffdrive.py — kinematics and (vx, vy) → wheel-speed map
|
||||
test_obs.py — observation builder (shape, normalisation, order)
|
||||
test_control.py — speed modulation + analytic teachers + active scan
|
||||
test_perception.py — LiDAR sim + clustering + tracker
|
||||
test_env.py — Gymnasium contract + determinism + reward
|
||||
|
||||
tools/
|
||||
collect_demos.py — sim demos via the active-scan teacher
|
||||
run_webots.sh — launch Webots with N sheep + chosen mode
|
||||
|
||||
Makefile — pipeline orchestrator (make / make rl / make test / …)
|
||||
|
||||
worlds/
|
||||
field.wbt — main world (3 m gate, external pen)
|
||||
|
||||
|
||||
Reference in New Issue
Block a user