Mecanum proto rewrite in b3cf990 made the wheels truly omnidirectional
in Webots, but with asymmetric slip: forward command produces ~89% of
textbook speed while strafe produces only ~38% plus a consistent
~28% backward bleed-through. v1 BC/RL trained on perfect mecanum
gym kinematics could not herd the new dynamics. To unblock that:
* `mecanum_kinematics_step` gains two parameters that scale the
realised motion to match a deployed-platform calibration:
- strafe_efficiency ∈ (0, 1] default 1.0
- strafe_to_forward_bleed default 0.0
Forward motion is untouched (textbook X-pattern continues to apply
to vx_body); only the lateral channel is scaled and bleed is added.
* `RobotConfig` exposes both as drive-config fields with the same
pass-through defaults so existing diff-drive code and existing
mecanum training pipelines see no behaviour change.
* `HERDING_MEC_WEBOTS` preset bakes in the values measured against the
current Webots mecanum proto (strafe_efficiency=0.4,
strafe_to_forward_bleed=-0.28). Training mecanum BC/RL with this
preset produces policies that compensate for the imperfect
physical mecanum at deploy.
* `HerdingEnv` plumbs `RobotConfig.strafe_*` through to
`mecanum_kinematics_step` so the preset takes effect.
* tools/gen_mecanum_wheels.py is added so the proto's 32 roller
hinges can be regenerated by editing a single set of constants
rather than hand-editing 1500+ lines of VRML.
Tests:
* 4 new mecanum_kinematics_step tests (default pass-through, strafe
scaling, backward bleed, forward unaffected by strafe params).
* 3 new RobotConfig tests (defaults, validation, preset shape).
* Sanity check: gym strafe with HERDING_MEC_WEBOTS over 100 steps
reproduces the Webots calibration to 2 decimal places.
126 unit tests pass (was 120).
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Autonomous Shepherd-Dog Herding (Webots + RL)
Group G25 — Diogo Costa, Johnny Fernandes, Nelson Neto
A differential-drive shepherd dog that herds 1–10 sheep through a 3 m gate into an external pen. The dog has three deployable modes:
| Mode | Source | Role |
|---|---|---|
strombom |
Strömbom et al. (2014) collect/drive heuristic | Analytic baseline |
bc |
Behaviour cloning of the Strömbom teacher | Imitation learning result |
rl |
KL-regularised PPO fine-tune of bc |
Reward-driven refinement |
sequential (single-target pin-and-push) is kept as an alternative
analytic baseline.
Perception
The dog perceives sheep only through its front-mounted 140° LiDAR
(180 rays, 12 m max range — see protos/ShepherdDog.proto). Each
control step:
- Read
lidar.getRangeImage(), - Cluster returns into world-frame
(x, y)estimates (herding/perception/lidar_perception.py), - Fold them into a multi-target tracker that maintains last-seen
positions for sheep currently outside the FOV
(
herding/perception/sheep_tracker.py).
LiDAR validation (intermediate-goal item v from docs/project.md):
during development a diagnostic-dump controller captured 80 real
Webots scans plus the ground-truth sheep positions. Comparing
detections against GT showed clustered centroids match GT positions
within 0.15 m after the +SHEEP_RADIUS surface-to-centre correction —
i.e. the LiDAR pipeline produces correct sheep-position estimates
from the real Webots scan, validating the sensor for the herding
task.
The tracker outputs a {name: (x, y)} dict shaped exactly like the
prior receiver-based one, so Strömbom, Sequential, and the BC obs
builder all run unchanged on top of it. The 2D Gymnasium env
(herding/perception/lidar_sim.py) raycasts sheep discs at training time, so
demos collected in the env match the perception the deployed
controller sees in Webots.
Privileged ground-truth perception is available for ablation —
HerdingEnv(use_lidar=False).
Quick start
# 1. Set up the Python env (any venv with PyTorch + SB3)
pip install -r training/requirements.txt
# 2. Smoke test (70 pytest cases, < 1 s)
make test
# 3. Reproduce the full pipeline (~30–60 min CPU)
make # demos -> bc -> rl -> eval
# Individual stages (each rebuilds upstream artefacts if missing):
make bc_demos # sim demos
make bc # behaviour clone
make rl # KL-PPO fine-tune
make eval # 10-seed env eval of rl
# 4. Run in Webots
make webots N=10 MODE=bc # behaviour-cloned MLP
make webots N=10 MODE=rl # KL-PPO fine-tune
make webots N=10 MODE=strombom # analytic baseline
# (or invoke directly: tools/run_webots.sh 10 rl)
make help lists every target and the overridable hyperparameters
(e.g. make rl PPO_STEPS=2000000 KL=0.02).
Documentation map
- This README is the project overview: architecture, quick start, and headline results.
training/README.mdhas the command-level training and evaluation details for demo collection, BC, PPO fine-tuning, and policy artifacts.docs/project.mdis the original course proposal/goals document, kept for traceability rather than as run instructions.
Layout
herding/ — perception / control / world primitives
world/ — environment-side physics & geometry
geometry.py field/pen constants, robot specs
diffdrive.py differential-drive kinematics
flocking_sim.py Reynolds + Strömbom 2014 sheep dynamics
perception/ — LiDAR → tracked-sheep pipeline
lidar_sim.py fast 2D raycast for the env
lidar_perception.py scan → world-frame cluster centroids + filters
sheep_tracker.py multi-target NN tracker with FOV memory
obs.py 32-D order-invariant observation builder
control/ — every dog mode's action source
strombom.py canonical CoM collect/drive heuristic
sequential.py single-target "pin-and-push" alternative
active_scan.py wraps a base teacher with opening rotation +
walk-to-centre fallback
modulation.py shared near-sheep speed-modulation helper
controllers/
sheep/sheep.py — Webots sheep controller (uses herding.world.flocking_sim)
shepherd_dog/
shepherd_dog.py — Webots dog controller, mode-switched
policy_loader.py — lazy SB3 policy loader (auto-detects frame stack)
training/
herding_env.py — Gymnasium env (LiDAR + tracker by default)
bc/collect.py — sim demos via the active-scan teacher
bc/pretrain.py — supervised BC of (obs, action) demos into MLP
rl/train.py — KL-regularised PPO fine-tune of BC
eval.py — analytic + learned policy comparison harness
bc/demos.npz — collected demonstrations (gitignored)
runs/ — checkpoints (whitelisted in .gitignore)
requirements.txt
tests/
conftest.py — pytest setup (adds project root to sys.path)
test_geometry.py — geometric predicates + constants
test_diffdrive.py — kinematics and (vx, vy) → wheel-speed map
test_obs.py — observation builder (shape, normalisation, order)
test_control.py — speed modulation + analytic teachers + active scan
test_perception.py — LiDAR sim + clustering + tracker
test_env.py — Gymnasium contract + determinism + reward
tools/
run_webots.sh — launch Webots with N sheep + chosen mode
Makefile — pipeline orchestrator (make / make rl / make test / …)
worlds/
field.wbt — main world (3 m gate, external pen)
protos/ — Sheep / ShepherdDog robot definitions
docs/project.md — original course proposal/goals
Shared low-level control
Every dog mode (Strömbom, Sequential, BC, RL) routes its action
through herding/control/modulation.py:modulate_speed_near_sheep,
which scales action magnitude down when within ~2.5 m of the nearest
tracked sheep. This stops the dog from charging in at full speed and
scattering the flock. Direction (intent) is preserved.
All modes also share the same EMA action smoother in
controllers/shepherd_dog/shepherd_dog.py:ACTION_SMOOTH = 0.55.
Results — env eval, 10 seeds × n=1..10
max_steps=15000, full-field spawn distribution. Success rate per
flock size, then mean steps over successful seeds.
Success rate (%)
| n | Strömbom | bc |
rl |
|---|---|---|---|
| 1 | 30 | 80 | 90 |
| 2 | 90 | 50 | 90 |
| 3 | 60 | 90 | 90 |
| 4 | 40 | 80 | 90 |
| 5 | 60 | 70 | 100 |
| 6 | 30 | 80 | 80 |
| 7 | 70 | 80 | 100 |
| 8 | 30 | 100 | 100 |
| 9 | 40 | 90 | 100 |
| 10 | 50 | 100 | 100 |
Mean penned per episode (out of n)
| n | Strömbom | bc |
rl |
|---|---|---|---|
| 1 | 0.30 | 0.80 | 0.90 |
| 5 | 3.90 | 4.10 | 5.00 |
| 8 | 4.20 | 8.00 | 8.00 |
| 10 | 7.40 | 10.00 | 10.00 |
Takeaways
- BC clearly beats Strömbom under realistic LiDAR conditions (full field, partial observability). Strömbom struggles on small flocks where a single sheep can spawn beyond the LiDAR's 12 m range; BC learned active perception from the demos.
- RL refines BC without regressing on any cell. Ties or beats BC at every flock size; biggest gains at n=1 and n=4 where BC's imitation of Strömbom's drive heuristic was sub-optimal.
- Aggressive reward shaping doesn't help — a more aggressive variant (β=0.02, W_TIME=-0.1, W_IMITATE=0, 3 M steps) trained as an ablation was strictly worse than the conservative tune shipped here (β=0.05, W_IMITATE=0.5, 1 M steps).
License
Educational project for the Topics in Intelligent Robotics course.