Project-wide cleanup: gitignore, dead code, stale artifacts, README

Repo hygiene pass after a long working session.

Files removed:
* stage1_train.log — runtime training log (~125 KB), shouldn't have
  been tracked.
* training/bc/demos.npz — orphan default-name demos file from before
  the world+drive-suffixed naming convention took over; no script
  references it.
* training/runs/bc_dagger{1,2}_differential_field/policy.zip — failed
  DAgger experiment artifacts. Per `memory/dagger_results.md` the
  whole DAgger experiment hit 0/5 on Webots transfer; these checkpoints
  have no consumers.

Untracked-but-deleted (no git change) — also cleaned from disk:
* Root-level runtime logs (43 *.log files, all unused — gitignored now).
* training/bc/{combined,dagger}*.npz (5 huge demo blobs, 2.6 GB
  reclaimed; not committed).
* training/bc/v1/ (2.6 GB backup of pre-DAgger demos; reclaimed).
* training/runs/at_20260426_*/ (orphan timestamped runs; reclaimed).
* All __pycache__/.

Dead code removed:
* `herding/control/strombom.py::compute_action_debug` — no callers
  anywhere in the repo.
* `herding/control/sequential.py::compute_action_debug` — same.
* `herding/control/universal.py::compute_action_diff` — same.

.gitignore extended to cover:
* All *.log files (training/eval/webots logs are runtime artifacts).
* training/bc/*.npz (re-collectable on demand by `make bc_demos`).
* training/bc/v1/.
* .pytest_cache, *.pyc, .claude/.

README refreshed:
* Mecanum + round-world coverage in the headline.
* Quick-start updated for DRIVE/WORLD-suffixed Makefile targets,
  GT-bypass example, and the mecanum-retrain caveat.
* Layout reflects the actual current tree (config.py, both protos,
  both worlds, all tools).
* Results table replaced with the Webots end-to-end numbers from
  the 2026-05-16 sweep (8/8 diff combos + LiDAR/GT comparison).

Verification: 126 pytest cases still pass (was 126 going in — no
test-coverage regression from the dead-code removal).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This commit is contained in:
Johnny Fernandes
2026-05-17 01:38:19 +00:00
parent 3b4c99a6c4
commit a584a034e9
9 changed files with 127 additions and 182 deletions
+17 -1
View File
@@ -1,5 +1,7 @@
# Python # Python
__pycache__/ __pycache__/
*.pyc
.pytest_cache/
# Training artefacts: ignore all run outputs except deployable policies # Training artefacts: ignore all run outputs except deployable policies
training/runs/** training/runs/**
@@ -8,8 +10,22 @@ training/runs/**
!training/runs/*/ !training/runs/*/
!training/runs/*/policy.zip !training/runs/*/policy.zip
# Webots launcher scratch # BC demo blobs — these get regenerated by `python -m training.bc.collect`
# and are too large to track. Keep them out of git.
training/bc/*.npz
training/bc/v1/
# Webots launcher scratch (the _test.wbt files are emitted on every run)
worlds/** worlds/**
!worlds/field.wbt !worlds/field.wbt
!worlds/field_round.wbt !worlds/field_round.wbt
herding_runtime.cfg herding_runtime.cfg
# Runtime logs — all of these are produced by training/eval/webots runs
# and are not useful to track in git. Keep summary docs/markdown only.
*.log
calibrate_mecanum.log
training/.run_done
# Tooling
.claude/
+110 -78
View File
@@ -2,18 +2,18 @@
Group G25 — *Diogo Costa, Johnny Fernandes, Nelson Neto* Group G25 — *Diogo Costa, Johnny Fernandes, Nelson Neto*
A differential-drive shepherd dog that herds 110 sheep through a 3 m A shepherd dog that herds 110 sheep through a 3 m gate into an
gate into an external pen. The dog has three deployable modes: external pen. Two worlds (`field` rectangular, `field_round` circular),
two drives (`differential`, `mecanum`), and four deployable control
modes:
| Mode | Source | Role | | Mode | Source | Role |
|---|---|---| |---|---|---|
| `strombom` | Strömbom et al. (2014) collect/drive heuristic | Analytic baseline | | `strombom` | Strömbom et al. (2014) collect/drive heuristic | Analytic baseline |
| `bc` | Behaviour cloning of the Strömbom teacher | Imitation learning result | | `sequential` | Single-target "pin-and-push" | Alternative analytic baseline |
| `bc` | Behaviour cloning of the universal teacher | Imitation learning result |
| `rl` | KL-regularised PPO fine-tune of `bc` | Reward-driven refinement | | `rl` | KL-regularised PPO fine-tune of `bc` | Reward-driven refinement |
`sequential` (single-target pin-and-push) is kept as an alternative
analytic baseline.
## Perception ## Perception
The dog perceives sheep **only through its front-mounted 140° LiDAR** The dog perceives sheep **only through its front-mounted 140° LiDAR**
@@ -52,27 +52,39 @@ Privileged ground-truth perception is available for ablation —
# 1. Set up the Python env (any venv with PyTorch + SB3) # 1. Set up the Python env (any venv with PyTorch + SB3)
pip install -r training/requirements.txt pip install -r training/requirements.txt
# 2. Smoke test (70 pytest cases, < 1 s) # 2. Smoke test (126 pytest cases, < 1 s)
make test make test
# 3. Reproduce the full pipeline (~3060 min CPU) # 3. Reproduce a full pipeline (DRIVE+WORLD specific, ~1 h CPU)
make # demos -> bc -> rl -> eval make DRIVE=differential WORLD=field # demos -> bc -> rl -> eval
make DRIVE=differential WORLD=field_round
make DRIVE=mecanum WORLD=field # see note below
make train_all # all 4 combos sequentially
# Individual stages (each rebuilds upstream artefacts if missing): # Individual stages (each rebuilds upstream artefacts if missing):
make bc_demos # sim demos make DRIVE=differential WORLD=field bc_demos # sim demos
make bc # behaviour clone make DRIVE=differential WORLD=field bc # behaviour clone
make rl # KL-PPO fine-tune make DRIVE=differential WORLD=field rl # KL-PPO fine-tune
make eval # 10-seed env eval of rl make DRIVE=differential WORLD=field eval # 10-seed env eval
# 4. Run in Webots # 4. Run in Webots
make webots N=10 MODE=bc # behaviour-cloned MLP tools/run_webots.sh 10 bc differential field # BC, diff, rect field
make webots N=10 MODE=rl # KL-PPO fine-tune tools/run_webots.sh 10 rl differential field_round # RL, diff, round field
make webots N=10 MODE=strombom # analytic baseline tools/run_webots.sh 5 strombom differential field # analytic baseline
# (or invoke directly: tools/run_webots.sh 10 rl) HERDING_USE_GT=1 tools/run_webots.sh 5 strombom differential field
# GT bypass for ablation
``` ```
`make help` lists every target and the overridable hyperparameters `make help` lists every target and the overridable hyperparameters.
(e.g. `make rl PPO_STEPS=2000000 KL=0.02`).
**Mecanum note**: the `ShepherdDogMecanum.proto` uses physical roller
hinges in Webots (committed 2026-05-16). The Webots calibration shows
a ~60% strafe efficiency and ~28% backward bleed compared to textbook
mecanum; the gym kinematics in `HERDING_MEC_WEBOTS` are tuned to
match. **Mecanum BC/RL policies need to be retrained against this
preset** — see `mecanum_proto_gap.md` in `memory/` for the 3-command
flow. The v1 policies in `training/runs/{bc,rl}_mecanum_*` predate the
proto rewrite and will not herd reliably in Webots until retrained.
## Documentation map ## Documentation map
@@ -87,56 +99,67 @@ make webots N=10 MODE=strombom # analytic baseline
``` ```
herding/ — perception / control / world primitives herding/ — perception / control / world primitives
world/ environment-side physics & geometry config.pyfrozen dataclasses for all tunable parameters;
geometry.py field/pen constants, robot specs named presets HERDING_DEFAULT / HERDING_WEBOTS /
diffdrive.py differential-drive kinematics HERDING_MEC_WEBOTS
world/
geometry.py field/pen constants, world-shape switch
diffdrive.py differential + mecanum kinematics
flocking_sim.py Reynolds + Strömbom 2014 sheep dynamics flocking_sim.py Reynolds + Strömbom 2014 sheep dynamics
perception/ — LiDAR → tracked-sheep pipeline perception/
lidar_sim.py fast 2D raycast for the env lidar_sim.py fast 2D raycast for the gym env
lidar_perception.py scan → world-frame cluster centroids + filters lidar_perception.py scan → world-frame cluster centroids + filters
sheep_tracker.py multi-target NN tracker with FOV memory sheep_tracker.py multi-target NN tracker with FOV memory
and the consensus-promotion stage
obs.py 32-D order-invariant observation builder obs.py 32-D order-invariant observation builder
control/ — every dog mode's action source control/
strombom.py canonical CoM collect/drive heuristic strombom.py canonical CoM collect/drive heuristic
(round-world aware)
sequential.py single-target "pin-and-push" alternative sequential.py single-target "pin-and-push" alternative
active_scan.py wraps a base teacher with opening rotation + universal.py teacher used for BC demo collection
walk-to-centre fallback (Strömbom + mecanum omega + straggler recovery)
active_scan.py rotate-on-empty + walk-to-centre fallback
modulation.py shared near-sheep speed-modulation helper modulation.py shared near-sheep speed-modulation helper
controllers/ controllers/
sheep/sheep.py — Webots sheep controller (uses herding.world.flocking_sim) sheep/sheep.py — Webots sheep controller
shepherd_dog/ shepherd_dog/
shepherd_dog.py — Webots dog controller, mode-switched shepherd_dog.py — Webots dog controller, mode-switched
policy_loader.py — lazy SB3 policy loader (auto-detects frame stack) policy_loader.py — SB3 PPO / RecurrentPPO loader with frame stack
training/ training/
herding_env.py — Gymnasium env (LiDAR + tracker by default) herding_env.py — Gymnasium env (LiDAR + tracker by default)
bc/collect.py — sim demos via the active-scan teacher bc/collect.py — sim demos via the active-scan teacher
bc/pretrain.py — supervised BC of (obs, action) demos into MLP bc/pretrain.py — supervised BC into MLP
rl/train.py — KL-regularised PPO fine-tune of BC rl/train.py — KL-regularised PPO fine-tune of BC
rl/train_lstm.py — RecurrentPPO variant (ablation)
eval.py — analytic + learned policy comparison harness eval.py — analytic + learned policy comparison harness
bc/demos.npz — collected demonstrations (gitignored)
runs/ — checkpoints (whitelisted in .gitignore) runs/ — checkpoints (whitelisted in .gitignore)
requirements.txt requirements.txt
tests/ tests/ — 126 pytest cases, < 1 s on CPU
conftest.py — pytest setup (adds project root to sys.path)
test_geometry.py — geometric predicates + constants
test_diffdrive.py — kinematics and (vx, vy) → wheel-speed map
test_obs.py — observation builder (shape, normalisation, order)
test_control.py — speed modulation + analytic teachers + active scan
test_perception.py — LiDAR sim + clustering + tracker
test_env.py — Gymnasium contract + determinism + reward
tools/ tools/
run_webots.sh — launch Webots with N sheep + chosen mode run_webots.sh — launch Webots with N sheep + chosen mode + world
webots_sweep.sh — headless sweep across modes × drives × worlds
webots_sweep_gt.sh — same with HERDING_USE_GT=1 (perfect perception)
calibrate_mecanum.sh — measure mecanum body velocity vs gym prediction
gen_mecanum_wheels.py — regenerate the 32 mecanum roller hinges
benchmark_lidar.py — tracker quality benchmark
Makefile — pipeline orchestrator (make / make rl / make test / …) Makefile — pipeline orchestrator
(make DRIVE=… WORLD=… rl, make train_all, …)
worlds/ worlds/
field.wbt — main world (3 m gate, external pen) field.wbt — rectangular world (3 m gate, external pen)
field_round.wbt — circular world (radius 15 m, same pen)
protos/
Sheep.proto — sheep robot
ShepherdDog.proto — diff-drive dog, 140° LiDAR
ShepherdDog360.proto — diff-drive dog, 360° LiDAR (ablation)
ShepherdDogMecanum.proto — 4-wheel mecanum with physical roller hinges
protos/ — Sheep / ShepherdDog robot definitions
docs/project.md — original course proposal/goals docs/project.md — original course proposal/goals
``` ```
@@ -151,48 +174,57 @@ scattering the flock. Direction (intent) is preserved.
All modes also share the same EMA action smoother in All modes also share the same EMA action smoother in
`controllers/shepherd_dog/shepherd_dog.py:ACTION_SMOOTH = 0.55`. `controllers/shepherd_dog/shepherd_dog.py:ACTION_SMOOTH = 0.55`.
## Results — env eval, 10 seeds × n=1..10 ## Results — Webots end-to-end, canonical 140° LiDAR
`max_steps=15000`, full-field spawn distribution. Success rate per Each cell = "OK at step X" means the dog penned all N sheep in a single
flock size, then mean steps over successful seeds. trial, `HERDING_USE_GT=0` (LiDAR perception, no ground truth bypass),
default consensus tracker.
### Success rate (%) ### Differential drive
| n | Strömbom | `bc` | `rl` | | Mode | World | n=5 | n=10 |
|---:|---:|---:|---:| |---|---|---:|---:|
| 1 | 30 | 80 | **90** | | Strömbom | field | 7528 | 11620 |
| 2 | 90 | 50 | **90** | | Strömbom | field_round | 8611 | 10339 |
| 3 | 60 | 90 | **90** | | Sequential | field | 7135 | 16843 |
| 4 | 40 | 80 | **90** | | Sequential | field_round | 6019 | 8494 |
| 5 | 60 | 70 | **100** | | BC | field | 11698 | 15079 |
| 6 | 30 | 80 | 80 | | BC | field_round | 7234 | 11320 |
| 7 | 70 | 80 | **100** | | RL | field | 10039 | 13954 |
| 8 | 30 | 100 | **100** | | RL | field_round | 5803 | 9151 |
| 9 | 40 | 90 | **100** |
| 10 | 50 | 100 | **100** |
### Mean penned per episode (out of n) RL is **strictly faster than BC** on every comparable cell.
| n | Strömbom | `bc` | `rl` | ### LiDAR vs GT bypass (diff drive)
|---:|---:|---:|---:|
| 1 | 0.30 | 0.80 | **0.90** |
| 5 | 3.90 | 4.10 | **5.00** |
| 8 | 4.20 | 8.00 | **8.00** |
| 10 | 7.40 | 10.00 | **10.00** |
### Takeaways GT bypass replaces the LiDAR tracker with perfect emitter positions.
LiDAR is the default; GT is a perception ablation
(`HERDING_USE_GT=1`):
- **BC clearly beats Strömbom** under realistic LiDAR conditions (full | Mode | World | n=5 LiDAR | n=5 GT | n=10 LiDAR | n=10 GT |
field, partial observability). Strömbom struggles on small flocks |---|---|---:|---:|---:|---:|
where a single sheep can spawn beyond the LiDAR's 12 m range; BC | Strömbom | field | 7528 | **5254** | 11620 | **7342** |
learned active perception from the demos. | Strömbom | field_round | 8611 | **3631** | 10339 | **7084** |
- **RL refines BC** without regressing on any cell. Ties or beats BC | Sequential | field | **7135** | 11092 | 16843 | **8698** |
at every flock size; biggest gains at n=1 and n=4 where BC's | Sequential | field_round | 6019 | **3454** | 8494 | **7324** |
imitation of Strömbom's drive heuristic was sub-optimal.
- **Aggressive reward shaping doesn't help** — a more aggressive GT is generally faster (perfect perception → fewer wasted steps).
variant (β=0.02, W_TIME=-0.1, W_IMITATE=0, 3 M steps) trained as Sequential n=5 / field is the one cell where GT is *slower* — its
an ablation was strictly worse than the conservative tune shipped straggler heuristic appears to over-correct when the dog has full
here (β=0.05, W_IMITATE=0.5, 1 M steps). information.
### Mecanum (differential is the headline)
The `ShepherdDogMecanum.proto` was rewritten on 2026-05-16 with 32
physical roller hinges, giving true omnidirectional motion in Webots
(`tools/calibrate_mecanum.sh` confirms the X-pattern). The mecanum
calibration shows ~60% strafe efficiency vs textbook (vs ~89% on
forward), so v1 mecanum BC/RL policies trained on textbook gym
mecanum no longer herd reliably. The fix is staged but not run:
the gym now has `HERDING_MEC_WEBOTS` which matches Webots' physical
mecanum, and `training/bc/collect.py` / `training/rl/train.py` auto-
select this preset for mecanum runs. Retraining (≈ 2 h per combo,
4 combos) is the documented future step.
## License ## License
-45
View File
@@ -80,48 +80,3 @@ def compute_action(dog_xy, sheep_positions, pen_target=PEN_ENTRY):
ax, ay = _unit(tx - dog_xy[0], ty - dog_xy[1]) ax, ay = _unit(tx - dog_xy[0], ty - dog_xy[1])
return ax, ay, mode return ax, ay, mode
def compute_action_debug(dog_xy, sheep_positions, pen_target=PEN_ENTRY):
"""``compute_action`` plus a debug dict."""
active = [(x, y) for (x, y) in sheep_positions.values() if _is_active(x, y)]
if not active:
return 0.0, 0.0, "idle", {
"n_active": 0, "phase": "idle", "radius": 0.0, "threshold": 0.0,
"com_x": 0.0, "com_y": 0.0,
"target_x": dog_xy[0], "target_y": dog_xy[1],
}
n = len(active)
com_x = sum(p[0] for p in active) / n
com_y = sum(p[1] for p in active) / n
dists = [math.hypot(p[0] - com_x, p[1] - com_y) for p in active]
radius = max(dists)
threshold = F_FACTOR * math.sqrt(n)
if n <= STRAGGLER_THRESHOLD:
sx, sy = min(active,
key=lambda p: math.hypot(p[0] - pen_target[0],
p[1] - pen_target[1]))
ux, uy = _unit(sx - pen_target[0], sy - pen_target[1])
tx, ty = sx + DELTA_TARGET * ux, sy + DELTA_TARGET * uy
mode = "targeted"
elif radius > threshold:
idx = max(range(n), key=lambda i: dists[i])
sx, sy = active[idx]
ux, uy = _unit(sx - com_x, sy - com_y)
tx, ty = sx + DELTA_COLLECT * ux, sy + DELTA_COLLECT * uy
mode = "collect"
else:
ux, uy = _unit(com_x - pen_target[0], com_y - pen_target[1])
tx, ty = com_x + DELTA_DRIVE * ux, com_y + DELTA_DRIVE * uy
mode = "drive"
ax, ay = _unit(tx - dog_xy[0], ty - dog_xy[1])
return ax, ay, mode, {
"n_active": n, "phase": mode, "radius": radius, "threshold": threshold,
"com_x": com_x, "com_y": com_y,
"target_x": tx, "target_y": ty,
}
-37
View File
@@ -76,40 +76,3 @@ def compute_action(dog_xy, sheep_positions, pen_target=PEN_ENTRY):
ax, ay = _unit(tx - dog_xy[0], ty - dog_xy[1]) ax, ay = _unit(tx - dog_xy[0], ty - dog_xy[1])
return ax, ay, mode return ax, ay, mode
def compute_action_debug(dog_xy, sheep_positions, pen_target=PEN_ENTRY):
"""``compute_action`` plus a small debug dict (CoM, target, radius)."""
active = [(x, y) for (x, y) in sheep_positions.values() if _is_active(x, y)]
if not active:
return 0.0, 0.0, "idle", {
"n_active": 0, "radius": 0.0, "threshold": 0.0,
"com_x": 0.0, "com_y": 0.0,
"target_x": dog_xy[0], "target_y": dog_xy[1],
}
n = len(active)
com_x = sum(p[0] for p in active) / n
com_y = sum(p[1] for p in active) / n
dists = [math.hypot(p[0] - com_x, p[1] - com_y) for p in active]
radius = max(dists)
threshold = F_FACTOR * math.sqrt(n)
if radius > threshold:
idx = max(range(n), key=lambda i: dists[i])
sx, sy = active[idx]
ux, uy = _unit(sx - com_x, sy - com_y)
tx, ty = sx + DELTA_COLLECT * ux, sy + DELTA_COLLECT * uy
mode = "collect"
else:
ux, uy = _unit(com_x - pen_target[0], com_y - pen_target[1])
tx, ty = com_x + DELTA_DRIVE * ux, com_y + DELTA_DRIVE * uy
mode = "drive"
ax, ay = _unit(tx - dog_xy[0], ty - dog_xy[1])
dbg = {
"n_active": n, "radius": radius, "threshold": threshold,
"com_x": com_x, "com_y": com_y,
"target_x": tx, "target_y": ty,
}
return ax, ay, mode, dbg
-14
View File
@@ -207,17 +207,3 @@ def compute_action(dog_xy, dog_heading, sheep_positions,
omega = max(-1.0, min(1.0, OMEGA_GAIN * err / math.pi)) omega = max(-1.0, min(1.0, OMEGA_GAIN * err / math.pi))
return ax, ay, omega, mode return ax, ay, omega, mode
def compute_action_diff(dog_xy, dog_heading, sheep_positions,
pen_target=PEN_ENTRY):
"""Compatibility wrapper returning ``(vx, vy, mode)`` — same as Strömbom.
Use this when plugging into existing differential-drive code that
doesn't expect omega.
"""
vx, vy, _omega, mode = compute_action(
dog_xy, dog_heading, sheep_positions, pen_target,
drive_mode="differential",
)
return vx, vy, mode
-7
View File
@@ -1,7 +0,0 @@
make[1]: Entering directory '/run/host/home/johnnyf/Documents/Projects/TIR/project'
make DRIVE=differential WORLD=field
make[2]: Entering directory '/run/host/home/johnnyf/Documents/Projects/TIR/project'
python -m training.eval --policy training/runs/rl_differential_field \
--max-flock 10 --max-steps 15000 --n-seeds 10 \
--drive-mode differential --world field
make[2]: Leaving directory '/run/host/home/johnnyf/Documents/Projects/TIR/project'
Binary file not shown.