Johnny Fernandes 2d23289052 Consensus tracker + active scan close Webots 140° LiDAR gap
Two deploy-time fixes that take v1 360°-trained BC/RL from 0/n to n/n
penned on the canonical 140° LiDAR proto for diff/field:

* SheepTracker now supports a consensus stage: new detections start as
  candidate tracks invisible to get_positions(). A candidate must
  accumulate consensus_k matches within consensus_radius_m of itself
  inside a consensus_max_age window to be promoted; otherwise it
  expires. Real sheep self-confirm within 3 frames (≪0.05 m/step);
  wall-return cluster centroids jitter beyond 0.3 m as the dog moves
  and never promote. consensus_k=1 (default) is a no-op so unconfigured
  callers and HERDING_DEFAULT keep prior behaviour.
* HERDING_WEBOTS preset gets consensus_k=3, radius=0.3, max_age=20,
  plus longer forget_steps=300 and predict_steps=180 so confirmed
  sheep persist through long FOV-occlusion gaps a narrow 140° cone
  produces. max_new_tracks_per_step=1 still rate-caps spawn bursts.
* shepherd_dog.py BC/RL empty-obs fallback now rotates the desired
  heading with step_count so the cone actively sweeps the field
  instead of driving due north into the wall.

Verified in headless Webots (HERDING_USE_GT=0, LiDAR only):
  BC diff/field:        5/5 @ 11698, 10/10 @ 15079
  RL diff/field:        5/5 @ 10039, 9/10 @ 18200 (timeout)
  Strömbom diff/field:  5/5 @ 7528
All previously 0/n. 120 unit tests pass; 9 new consensus tests cover
the candidate stage, promotion radius, and one-shot phantom rejection.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-16 20:19:11 +00:00
2026-05-12 22:41:03 +01:00
2026-05-12 22:41:03 +01:00
2026-05-12 22:41:03 +01:00
2026-05-12 22:41:03 +01:00
2026-05-13 23:14:16 +01:00

Autonomous Shepherd-Dog Herding (Webots + RL)

Group G25 — Diogo Costa, Johnny Fernandes, Nelson Neto

A differential-drive shepherd dog that herds 110 sheep through a 3 m gate into an external pen. The dog has three deployable modes:

Mode Source Role
strombom Strömbom et al. (2014) collect/drive heuristic Analytic baseline
bc Behaviour cloning of the Strömbom teacher Imitation learning result
rl KL-regularised PPO fine-tune of bc Reward-driven refinement

sequential (single-target pin-and-push) is kept as an alternative analytic baseline.

Perception

The dog perceives sheep only through its front-mounted 140° LiDAR (180 rays, 12 m max range — see protos/ShepherdDog.proto). Each control step:

  1. Read lidar.getRangeImage(),
  2. Cluster returns into world-frame (x, y) estimates (herding/perception/lidar_perception.py),
  3. Fold them into a multi-target tracker that maintains last-seen positions for sheep currently outside the FOV (herding/perception/sheep_tracker.py).

LiDAR validation (intermediate-goal item v from docs/project.md): during development a diagnostic-dump controller captured 80 real Webots scans plus the ground-truth sheep positions. Comparing detections against GT showed clustered centroids match GT positions within 0.15 m after the +SHEEP_RADIUS surface-to-centre correction — i.e. the LiDAR pipeline produces correct sheep-position estimates from the real Webots scan, validating the sensor for the herding task.

The tracker outputs a {name: (x, y)} dict shaped exactly like the prior receiver-based one, so Strömbom, Sequential, and the BC obs builder all run unchanged on top of it. The 2D Gymnasium env (herding/perception/lidar_sim.py) raycasts sheep discs at training time, so demos collected in the env match the perception the deployed controller sees in Webots.

Privileged ground-truth perception is available for ablation — HerdingEnv(use_lidar=False).

Quick start

# 1. Set up the Python env (any venv with PyTorch + SB3)
pip install -r training/requirements.txt

# 2. Smoke test (70 pytest cases, < 1 s)
make test

# 3. Reproduce the full pipeline (~3060 min CPU)
make            # demos -> bc -> rl -> eval

# Individual stages (each rebuilds upstream artefacts if missing):
make bc_demos   # sim demos
make bc         # behaviour clone
make rl         # KL-PPO fine-tune
make eval       # 10-seed env eval of rl

# 4. Run in Webots
make webots N=10 MODE=bc          # behaviour-cloned MLP
make webots N=10 MODE=rl          # KL-PPO fine-tune
make webots N=10 MODE=strombom    # analytic baseline
# (or invoke directly: tools/run_webots.sh 10 rl)

make help lists every target and the overridable hyperparameters (e.g. make rl PPO_STEPS=2000000 KL=0.02).

Documentation map

  • This README is the project overview: architecture, quick start, and headline results.
  • training/README.md has the command-level training and evaluation details for demo collection, BC, PPO fine-tuning, and policy artifacts.
  • docs/project.md is the original course proposal/goals document, kept for traceability rather than as run instructions.

Layout

herding/                  — perception / control / world primitives
  world/                  — environment-side physics & geometry
    geometry.py             field/pen constants, robot specs
    diffdrive.py            differential-drive kinematics
    flocking_sim.py         Reynolds + Strömbom 2014 sheep dynamics
  perception/             — LiDAR → tracked-sheep pipeline
    lidar_sim.py            fast 2D raycast for the env
    lidar_perception.py     scan → world-frame cluster centroids + filters
    sheep_tracker.py        multi-target NN tracker with FOV memory
    obs.py                  32-D order-invariant observation builder
  control/                — every dog mode's action source
    strombom.py             canonical CoM collect/drive heuristic
    sequential.py           single-target "pin-and-push" alternative
    active_scan.py          wraps a base teacher with opening rotation +
                            walk-to-centre fallback
    modulation.py           shared near-sheep speed-modulation helper

controllers/
  sheep/sheep.py          — Webots sheep controller (uses herding.world.flocking_sim)
  shepherd_dog/
    shepherd_dog.py       — Webots dog controller, mode-switched
    policy_loader.py      — lazy SB3 policy loader (auto-detects frame stack)

training/
  herding_env.py          — Gymnasium env (LiDAR + tracker by default)
  bc/collect.py           — sim demos via the active-scan teacher
  bc/pretrain.py          — supervised BC of (obs, action) demos into MLP
  rl/train.py             — KL-regularised PPO fine-tune of BC
  eval.py                 — analytic + learned policy comparison harness
  bc/demos.npz            — collected demonstrations (gitignored)
  runs/                   — checkpoints (whitelisted in .gitignore)
  requirements.txt

tests/
  conftest.py             — pytest setup (adds project root to sys.path)
  test_geometry.py        — geometric predicates + constants
  test_diffdrive.py       — kinematics and (vx, vy) → wheel-speed map
  test_obs.py             — observation builder (shape, normalisation, order)
  test_control.py         — speed modulation + analytic teachers + active scan
  test_perception.py      — LiDAR sim + clustering + tracker
  test_env.py             — Gymnasium contract + determinism + reward

tools/
  run_webots.sh           — launch Webots with N sheep + chosen mode

Makefile                  — pipeline orchestrator (make / make rl / make test / …)

worlds/
  field.wbt               — main world (3 m gate, external pen)

protos/                   — Sheep / ShepherdDog robot definitions
docs/project.md           — original course proposal/goals

Shared low-level control

Every dog mode (Strömbom, Sequential, BC, RL) routes its action through herding/control/modulation.py:modulate_speed_near_sheep, which scales action magnitude down when within ~2.5 m of the nearest tracked sheep. This stops the dog from charging in at full speed and scattering the flock. Direction (intent) is preserved.

All modes also share the same EMA action smoother in controllers/shepherd_dog/shepherd_dog.py:ACTION_SMOOTH = 0.55.

Results — env eval, 10 seeds × n=1..10

max_steps=15000, full-field spawn distribution. Success rate per flock size, then mean steps over successful seeds.

Success rate (%)

n Strömbom bc rl
1 30 80 90
2 90 50 90
3 60 90 90
4 40 80 90
5 60 70 100
6 30 80 80
7 70 80 100
8 30 100 100
9 40 90 100
10 50 100 100

Mean penned per episode (out of n)

n Strömbom bc rl
1 0.30 0.80 0.90
5 3.90 4.10 5.00
8 4.20 8.00 8.00
10 7.40 10.00 10.00

Takeaways

  • BC clearly beats Strömbom under realistic LiDAR conditions (full field, partial observability). Strömbom struggles on small flocks where a single sheep can spawn beyond the LiDAR's 12 m range; BC learned active perception from the demos.
  • RL refines BC without regressing on any cell. Ties or beats BC at every flock size; biggest gains at n=1 and n=4 where BC's imitation of Strömbom's drive heuristic was sub-optimal.
  • Aggressive reward shaping doesn't help — a more aggressive variant (β=0.02, W_TIME=-0.1, W_IMITATE=0, 3 M steps) trained as an ablation was strictly worse than the conservative tune shipped here (β=0.05, W_IMITATE=0.5, 1 M steps).

License

Educational project for the Topics in Intelligent Robotics course.

S
Description
No description provided
Readme 1.7 GiB
Languages
Python 100%