T

Johnny Fernandes a584a034e9 Project-wide cleanup: gitignore, dead code, stale artifacts, README

Repo hygiene pass after a long working session.

Files removed:
* stage1_train.log — runtime training log (~125 KB), shouldn't have
  been tracked.
* training/bc/demos.npz — orphan default-name demos file from before
  the world+drive-suffixed naming convention took over; no script
  references it.
* training/runs/bc_dagger{1,2}_differential_field/policy.zip — failed
  DAgger experiment artifacts. Per `memory/dagger_results.md` the
  whole DAgger experiment hit 0/5 on Webots transfer; these checkpoints
  have no consumers.

Untracked-but-deleted (no git change) — also cleaned from disk:
* Root-level runtime logs (43 *.log files, all unused — gitignored now).
* training/bc/{combined,dagger}*.npz (5 huge demo blobs, 2.6 GB
  reclaimed; not committed).
* training/bc/v1/ (2.6 GB backup of pre-DAgger demos; reclaimed).
* training/runs/at_20260426_*/ (orphan timestamped runs; reclaimed).
* All __pycache__/.

Dead code removed:
* `herding/control/strombom.py::compute_action_debug` — no callers
  anywhere in the repo.
* `herding/control/sequential.py::compute_action_debug` — same.
* `herding/control/universal.py::compute_action_diff` — same.

.gitignore extended to cover:
* All *.log files (training/eval/webots logs are runtime artifacts).
* training/bc/*.npz (re-collectable on demand by `make bc_demos`).
* training/bc/v1/.
* .pytest_cache, *.pyc, .claude/.

README refreshed:
* Mecanum + round-world coverage in the headline.
* Quick-start updated for DRIVE/WORLD-suffixed Makefile targets,
  GT-bypass example, and the mecanum-retrain caveat.
* Layout reflects the actual current tree (config.py, both protos,
  both worlds, all tools).
* Results table replaced with the Webots end-to-end numbers from
  the 2026-05-16 sweep (8/8 diff combos + LiDAR/GT comparison).

Verification: 126 pytest cases still pass (was 126 going in — no
test-coverage regression from the dead-code removal).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

2026-05-17 01:38:19 +00:00

controllers

Consensus tracker + active scan close Webots 140° LiDAR gap

2026-05-16 20:19:11 +00:00

docs

Checkpoint 8

2026-05-12 22:41:03 +01:00

herding

Project-wide cleanup: gitignore, dead code, stale artifacts, README

2026-05-17 01:38:19 +00:00

protos

Mecanum proto: replace cylinder wheels with physical roller hinges

2026-05-16 21:54:35 +00:00

tests

Gym mecanum kinematics matching to Webots roller-hinge proto

2026-05-17 01:09:47 +00:00

tools

Gym mecanum kinematics matching to Webots roller-hinge proto

2026-05-17 01:09:47 +00:00

training

Project-wide cleanup: gitignore, dead code, stale artifacts, README

2026-05-17 01:38:19 +00:00

worlds

Checkpoint 8

2026-05-12 22:41:03 +01:00

.gitignore

Project-wide cleanup: gitignore, dead code, stale artifacts, README

2026-05-17 01:38:19 +00:00

Makefile

Webots sim-to-real fixes, DAgger pipeline, 360° proto variant

2026-05-16 17:21:02 +00:00

README.md

Project-wide cleanup: gitignore, dead code, stale artifacts, README

2026-05-17 01:38:19 +00:00

README.md

Autonomous Shepherd-Dog Herding (Webots + RL)

Group G25 — Diogo Costa, Johnny Fernandes, Nelson Neto

A shepherd dog that herds 1–10 sheep through a 3 m gate into an external pen. Two worlds (field rectangular, field_round circular), two drives (differential, mecanum), and four deployable control modes:

Mode	Source	Role
`strombom`	Strömbom et al. (2014) collect/drive heuristic	Analytic baseline
`sequential`	Single-target "pin-and-push"	Alternative analytic baseline
`bc`	Behaviour cloning of the universal teacher	Imitation learning result
`rl`	KL-regularised PPO fine-tune of `bc`	Reward-driven refinement

Perception

The dog perceives sheep only through its front-mounted 140° LiDAR (180 rays, 12 m max range — see protos/ShepherdDog.proto). Each control step:

Read lidar.getRangeImage(),
Cluster returns into world-frame (x, y) estimates (herding/perception/lidar_perception.py),
Fold them into a multi-target tracker that maintains last-seen positions for sheep currently outside the FOV (herding/perception/sheep_tracker.py).

LiDAR validation (intermediate-goal item v from docs/project.md): during development a diagnostic-dump controller captured 80 real Webots scans plus the ground-truth sheep positions. Comparing detections against GT showed clustered centroids match GT positions within 0.15 m after the +SHEEP_RADIUS surface-to-centre correction — i.e. the LiDAR pipeline produces correct sheep-position estimates from the real Webots scan, validating the sensor for the herding task.

The tracker outputs a {name: (x, y)} dict shaped exactly like the prior receiver-based one, so Strömbom, Sequential, and the BC obs builder all run unchanged on top of it. The 2D Gymnasium env (herding/perception/lidar_sim.py) raycasts sheep discs at training time, so demos collected in the env match the perception the deployed controller sees in Webots.

Privileged ground-truth perception is available for ablation — HerdingEnv(use_lidar=False).

Quick start

# 1. Set up the Python env (any venv with PyTorch + SB3)
pip install -r training/requirements.txt

# 2. Smoke test (126 pytest cases, < 1 s)
make test

# 3. Reproduce a full pipeline (DRIVE+WORLD specific, ~1 h CPU)
make DRIVE=differential WORLD=field       # demos -> bc -> rl -> eval
make DRIVE=differential WORLD=field_round
make DRIVE=mecanum     WORLD=field        # see note below
make train_all                            # all 4 combos sequentially

# Individual stages (each rebuilds upstream artefacts if missing):
make DRIVE=differential WORLD=field bc_demos   # sim demos
make DRIVE=differential WORLD=field bc         # behaviour clone
make DRIVE=differential WORLD=field rl         # KL-PPO fine-tune
make DRIVE=differential WORLD=field eval       # 10-seed env eval

# 4. Run in Webots
tools/run_webots.sh 10 bc differential field        # BC, diff, rect field
tools/run_webots.sh 10 rl differential field_round  # RL, diff, round field
tools/run_webots.sh 5 strombom differential field   # analytic baseline
HERDING_USE_GT=1 tools/run_webots.sh 5 strombom differential field
                                                    # GT bypass for ablation

make help lists every target and the overridable hyperparameters.

Mecanum note: the ShepherdDogMecanum.proto uses physical roller hinges in Webots (committed 2026-05-16). The Webots calibration shows a ~60% strafe efficiency and ~28% backward bleed compared to textbook mecanum; the gym kinematics in HERDING_MEC_WEBOTS are tuned to match. Mecanum BC/RL policies need to be retrained against this preset — see mecanum_proto_gap.md in memory/ for the 3-command flow. The v1 policies in training/runs/{bc,rl}_mecanum_* predate the proto rewrite and will not herd reliably in Webots until retrained.

Documentation map

This README is the project overview: architecture, quick start, and headline results.
training/README.md has the command-level training and evaluation details for demo collection, BC, PPO fine-tuning, and policy artifacts.
docs/project.md is the original course proposal/goals document, kept for traceability rather than as run instructions.

Layout

herding/                  — perception / control / world primitives
  config.py               — frozen dataclasses for all tunable parameters;
                            named presets HERDING_DEFAULT / HERDING_WEBOTS /
                            HERDING_MEC_WEBOTS
  world/
    geometry.py             field/pen constants, world-shape switch
    diffdrive.py            differential + mecanum kinematics
    flocking_sim.py         Reynolds + Strömbom 2014 sheep dynamics
  perception/
    lidar_sim.py            fast 2D raycast for the gym env
    lidar_perception.py     scan → world-frame cluster centroids + filters
    sheep_tracker.py        multi-target NN tracker with FOV memory
                            and the consensus-promotion stage
    obs.py                  32-D order-invariant observation builder
  control/
    strombom.py             canonical CoM collect/drive heuristic
                            (round-world aware)
    sequential.py           single-target "pin-and-push" alternative
    universal.py            teacher used for BC demo collection
                            (Strömbom + mecanum omega + straggler recovery)
    active_scan.py          rotate-on-empty + walk-to-centre fallback
    modulation.py           shared near-sheep speed-modulation helper

controllers/
  sheep/sheep.py          — Webots sheep controller
  shepherd_dog/
    shepherd_dog.py       — Webots dog controller, mode-switched
    policy_loader.py      — SB3 PPO / RecurrentPPO loader with frame stack

training/
  herding_env.py          — Gymnasium env (LiDAR + tracker by default)
  bc/collect.py           — sim demos via the active-scan teacher
  bc/pretrain.py          — supervised BC into MLP
  rl/train.py             — KL-regularised PPO fine-tune of BC
  rl/train_lstm.py        — RecurrentPPO variant (ablation)
  eval.py                 — analytic + learned policy comparison harness
  runs/                   — checkpoints (whitelisted in .gitignore)
  requirements.txt

tests/                    — 126 pytest cases, < 1 s on CPU

tools/
  run_webots.sh           — launch Webots with N sheep + chosen mode + world
  webots_sweep.sh         — headless sweep across modes × drives × worlds
  webots_sweep_gt.sh      — same with HERDING_USE_GT=1 (perfect perception)
  calibrate_mecanum.sh    — measure mecanum body velocity vs gym prediction
  gen_mecanum_wheels.py   — regenerate the 32 mecanum roller hinges
  benchmark_lidar.py      — tracker quality benchmark

Makefile                  — pipeline orchestrator
                            (make DRIVE=… WORLD=… rl, make train_all, …)

worlds/
  field.wbt               — rectangular world (3 m gate, external pen)
  field_round.wbt         — circular world (radius 15 m, same pen)

protos/
  Sheep.proto             — sheep robot
  ShepherdDog.proto       — diff-drive dog, 140° LiDAR
  ShepherdDog360.proto    — diff-drive dog, 360° LiDAR (ablation)
  ShepherdDogMecanum.proto — 4-wheel mecanum with physical roller hinges

docs/project.md           — original course proposal/goals

Shared low-level control

Every dog mode (Strömbom, Sequential, BC, RL) routes its action through herding/control/modulation.py:modulate_speed_near_sheep, which scales action magnitude down when within ~2.5 m of the nearest tracked sheep. This stops the dog from charging in at full speed and scattering the flock. Direction (intent) is preserved.

All modes also share the same EMA action smoother in controllers/shepherd_dog/shepherd_dog.py:ACTION_SMOOTH = 0.55.

Results — Webots end-to-end, canonical 140° LiDAR

Each cell = "OK at step X" means the dog penned all N sheep in a single trial, HERDING_USE_GT=0 (LiDAR perception, no ground truth bypass), default consensus tracker.

Differential drive

Mode	World	n=5	n=10
Strömbom	field	7528	11620
Strömbom	field_round	8611	10339
Sequential	field	7135	16843
Sequential	field_round	6019	8494
BC	field	11698	15079
BC	field_round	7234	11320
RL	field	10039	13954
RL	field_round	5803	9151

RL is strictly faster than BC on every comparable cell.

LiDAR vs GT bypass (diff drive)

GT bypass replaces the LiDAR tracker with perfect emitter positions. LiDAR is the default; GT is a perception ablation (HERDING_USE_GT=1):

Mode	World	n=5 LiDAR	n=5 GT	n=10 LiDAR	n=10 GT
Strömbom	field	7528	5254	11620	7342
Strömbom	field_round	8611	3631	10339	7084
Sequential	field	7135	11092	16843	8698
Sequential	field_round	6019	3454	8494	7324

GT is generally faster (perfect perception → fewer wasted steps). Sequential n=5 / field is the one cell where GT is slower — its straggler heuristic appears to over-correct when the dog has full information.

Mecanum (differential is the headline)

The ShepherdDogMecanum.proto was rewritten on 2026-05-16 with 32 physical roller hinges, giving true omnidirectional motion in Webots (tools/calibrate_mecanum.sh confirms the X-pattern). The mecanum calibration shows ~60% strafe efficiency vs textbook (vs ~89% on forward), so v1 mecanum BC/RL policies trained on textbook gym mecanum no longer herd reliably. The fix is staged but not run: the gym now has HERDING_MEC_WEBOTS which matches Webots' physical mecanum, and training/bc/collect.py / training/rl/train.py auto- select this preset for mecanum runs. Retraining (≈ 2 h per combo, 4 combos) is the documented future step.

License

Educational project for the Topics in Intelligent Robotics course.

README.md Unescape Escape

Autonomous Shepherd-Dog Herding (Webots + RL)

Perception

Quick start

Documentation map

Layout

Shared low-level control

Results — Webots end-to-end, canonical 140° LiDAR

Differential drive

LiDAR vs GT bypass (diff drive)

Mecanum (differential is the headline)

License

README.md