Project-wide cleanup: gitignore, dead code, stale artifacts, README
Repo hygiene pass after a long working session.
Files removed:
* stage1_train.log — runtime training log (~125 KB), shouldn't have
been tracked.
* training/bc/demos.npz — orphan default-name demos file from before
the world+drive-suffixed naming convention took over; no script
references it.
* training/runs/bc_dagger{1,2}_differential_field/policy.zip — failed
DAgger experiment artifacts. Per `memory/dagger_results.md` the
whole DAgger experiment hit 0/5 on Webots transfer; these checkpoints
have no consumers.
Untracked-but-deleted (no git change) — also cleaned from disk:
* Root-level runtime logs (43 *.log files, all unused — gitignored now).
* training/bc/{combined,dagger}*.npz (5 huge demo blobs, 2.6 GB
reclaimed; not committed).
* training/bc/v1/ (2.6 GB backup of pre-DAgger demos; reclaimed).
* training/runs/at_20260426_*/ (orphan timestamped runs; reclaimed).
* All __pycache__/.
Dead code removed:
* `herding/control/strombom.py::compute_action_debug` — no callers
anywhere in the repo.
* `herding/control/sequential.py::compute_action_debug` — same.
* `herding/control/universal.py::compute_action_diff` — same.
.gitignore extended to cover:
* All *.log files (training/eval/webots logs are runtime artifacts).
* training/bc/*.npz (re-collectable on demand by `make bc_demos`).
* training/bc/v1/.
* .pytest_cache, *.pyc, .claude/.
README refreshed:
* Mecanum + round-world coverage in the headline.
* Quick-start updated for DRIVE/WORLD-suffixed Makefile targets,
GT-bypass example, and the mecanum-retrain caveat.
* Layout reflects the actual current tree (config.py, both protos,
both worlds, all tools).
* Results table replaced with the Webots end-to-end numbers from
the 2026-05-16 sweep (8/8 diff combos + LiDAR/GT comparison).
Verification: 126 pytest cases still pass (was 126 going in — no
test-coverage regression from the dead-code removal).
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This commit is contained in:
+17
-1
@@ -1,5 +1,7 @@
|
|||||||
# Python
|
# Python
|
||||||
__pycache__/
|
__pycache__/
|
||||||
|
*.pyc
|
||||||
|
.pytest_cache/
|
||||||
|
|
||||||
# Training artefacts: ignore all run outputs except deployable policies
|
# Training artefacts: ignore all run outputs except deployable policies
|
||||||
training/runs/**
|
training/runs/**
|
||||||
@@ -8,8 +10,22 @@ training/runs/**
|
|||||||
!training/runs/*/
|
!training/runs/*/
|
||||||
!training/runs/*/policy.zip
|
!training/runs/*/policy.zip
|
||||||
|
|
||||||
# Webots launcher scratch
|
# BC demo blobs — these get regenerated by `python -m training.bc.collect`
|
||||||
|
# and are too large to track. Keep them out of git.
|
||||||
|
training/bc/*.npz
|
||||||
|
training/bc/v1/
|
||||||
|
|
||||||
|
# Webots launcher scratch (the _test.wbt files are emitted on every run)
|
||||||
worlds/**
|
worlds/**
|
||||||
!worlds/field.wbt
|
!worlds/field.wbt
|
||||||
!worlds/field_round.wbt
|
!worlds/field_round.wbt
|
||||||
herding_runtime.cfg
|
herding_runtime.cfg
|
||||||
|
|
||||||
|
# Runtime logs — all of these are produced by training/eval/webots runs
|
||||||
|
# and are not useful to track in git. Keep summary docs/markdown only.
|
||||||
|
*.log
|
||||||
|
calibrate_mecanum.log
|
||||||
|
training/.run_done
|
||||||
|
|
||||||
|
# Tooling
|
||||||
|
.claude/
|
||||||
|
|||||||
@@ -2,18 +2,18 @@
|
|||||||
|
|
||||||
Group G25 — *Diogo Costa, Johnny Fernandes, Nelson Neto*
|
Group G25 — *Diogo Costa, Johnny Fernandes, Nelson Neto*
|
||||||
|
|
||||||
A differential-drive shepherd dog that herds 1–10 sheep through a 3 m
|
A shepherd dog that herds 1–10 sheep through a 3 m gate into an
|
||||||
gate into an external pen. The dog has three deployable modes:
|
external pen. Two worlds (`field` rectangular, `field_round` circular),
|
||||||
|
two drives (`differential`, `mecanum`), and four deployable control
|
||||||
|
modes:
|
||||||
|
|
||||||
| Mode | Source | Role |
|
| Mode | Source | Role |
|
||||||
|---|---|---|
|
|---|---|---|
|
||||||
| `strombom` | Strömbom et al. (2014) collect/drive heuristic | Analytic baseline |
|
| `strombom` | Strömbom et al. (2014) collect/drive heuristic | Analytic baseline |
|
||||||
| `bc` | Behaviour cloning of the Strömbom teacher | Imitation learning result |
|
| `sequential` | Single-target "pin-and-push" | Alternative analytic baseline |
|
||||||
|
| `bc` | Behaviour cloning of the universal teacher | Imitation learning result |
|
||||||
| `rl` | KL-regularised PPO fine-tune of `bc` | Reward-driven refinement |
|
| `rl` | KL-regularised PPO fine-tune of `bc` | Reward-driven refinement |
|
||||||
|
|
||||||
`sequential` (single-target pin-and-push) is kept as an alternative
|
|
||||||
analytic baseline.
|
|
||||||
|
|
||||||
## Perception
|
## Perception
|
||||||
|
|
||||||
The dog perceives sheep **only through its front-mounted 140° LiDAR**
|
The dog perceives sheep **only through its front-mounted 140° LiDAR**
|
||||||
@@ -52,27 +52,39 @@ Privileged ground-truth perception is available for ablation —
|
|||||||
# 1. Set up the Python env (any venv with PyTorch + SB3)
|
# 1. Set up the Python env (any venv with PyTorch + SB3)
|
||||||
pip install -r training/requirements.txt
|
pip install -r training/requirements.txt
|
||||||
|
|
||||||
# 2. Smoke test (70 pytest cases, < 1 s)
|
# 2. Smoke test (126 pytest cases, < 1 s)
|
||||||
make test
|
make test
|
||||||
|
|
||||||
# 3. Reproduce the full pipeline (~30–60 min CPU)
|
# 3. Reproduce a full pipeline (DRIVE+WORLD specific, ~1 h CPU)
|
||||||
make # demos -> bc -> rl -> eval
|
make DRIVE=differential WORLD=field # demos -> bc -> rl -> eval
|
||||||
|
make DRIVE=differential WORLD=field_round
|
||||||
|
make DRIVE=mecanum WORLD=field # see note below
|
||||||
|
make train_all # all 4 combos sequentially
|
||||||
|
|
||||||
# Individual stages (each rebuilds upstream artefacts if missing):
|
# Individual stages (each rebuilds upstream artefacts if missing):
|
||||||
make bc_demos # sim demos
|
make DRIVE=differential WORLD=field bc_demos # sim demos
|
||||||
make bc # behaviour clone
|
make DRIVE=differential WORLD=field bc # behaviour clone
|
||||||
make rl # KL-PPO fine-tune
|
make DRIVE=differential WORLD=field rl # KL-PPO fine-tune
|
||||||
make eval # 10-seed env eval of rl
|
make DRIVE=differential WORLD=field eval # 10-seed env eval
|
||||||
|
|
||||||
# 4. Run in Webots
|
# 4. Run in Webots
|
||||||
make webots N=10 MODE=bc # behaviour-cloned MLP
|
tools/run_webots.sh 10 bc differential field # BC, diff, rect field
|
||||||
make webots N=10 MODE=rl # KL-PPO fine-tune
|
tools/run_webots.sh 10 rl differential field_round # RL, diff, round field
|
||||||
make webots N=10 MODE=strombom # analytic baseline
|
tools/run_webots.sh 5 strombom differential field # analytic baseline
|
||||||
# (or invoke directly: tools/run_webots.sh 10 rl)
|
HERDING_USE_GT=1 tools/run_webots.sh 5 strombom differential field
|
||||||
|
# GT bypass for ablation
|
||||||
```
|
```
|
||||||
|
|
||||||
`make help` lists every target and the overridable hyperparameters
|
`make help` lists every target and the overridable hyperparameters.
|
||||||
(e.g. `make rl PPO_STEPS=2000000 KL=0.02`).
|
|
||||||
|
**Mecanum note**: the `ShepherdDogMecanum.proto` uses physical roller
|
||||||
|
hinges in Webots (committed 2026-05-16). The Webots calibration shows
|
||||||
|
a ~60% strafe efficiency and ~28% backward bleed compared to textbook
|
||||||
|
mecanum; the gym kinematics in `HERDING_MEC_WEBOTS` are tuned to
|
||||||
|
match. **Mecanum BC/RL policies need to be retrained against this
|
||||||
|
preset** — see `mecanum_proto_gap.md` in `memory/` for the 3-command
|
||||||
|
flow. The v1 policies in `training/runs/{bc,rl}_mecanum_*` predate the
|
||||||
|
proto rewrite and will not herd reliably in Webots until retrained.
|
||||||
|
|
||||||
## Documentation map
|
## Documentation map
|
||||||
|
|
||||||
@@ -87,56 +99,67 @@ make webots N=10 MODE=strombom # analytic baseline
|
|||||||
|
|
||||||
```
|
```
|
||||||
herding/ — perception / control / world primitives
|
herding/ — perception / control / world primitives
|
||||||
world/ — environment-side physics & geometry
|
config.py — frozen dataclasses for all tunable parameters;
|
||||||
geometry.py field/pen constants, robot specs
|
named presets HERDING_DEFAULT / HERDING_WEBOTS /
|
||||||
diffdrive.py differential-drive kinematics
|
HERDING_MEC_WEBOTS
|
||||||
|
world/
|
||||||
|
geometry.py field/pen constants, world-shape switch
|
||||||
|
diffdrive.py differential + mecanum kinematics
|
||||||
flocking_sim.py Reynolds + Strömbom 2014 sheep dynamics
|
flocking_sim.py Reynolds + Strömbom 2014 sheep dynamics
|
||||||
perception/ — LiDAR → tracked-sheep pipeline
|
perception/
|
||||||
lidar_sim.py fast 2D raycast for the env
|
lidar_sim.py fast 2D raycast for the gym env
|
||||||
lidar_perception.py scan → world-frame cluster centroids + filters
|
lidar_perception.py scan → world-frame cluster centroids + filters
|
||||||
sheep_tracker.py multi-target NN tracker with FOV memory
|
sheep_tracker.py multi-target NN tracker with FOV memory
|
||||||
|
and the consensus-promotion stage
|
||||||
obs.py 32-D order-invariant observation builder
|
obs.py 32-D order-invariant observation builder
|
||||||
control/ — every dog mode's action source
|
control/
|
||||||
strombom.py canonical CoM collect/drive heuristic
|
strombom.py canonical CoM collect/drive heuristic
|
||||||
|
(round-world aware)
|
||||||
sequential.py single-target "pin-and-push" alternative
|
sequential.py single-target "pin-and-push" alternative
|
||||||
active_scan.py wraps a base teacher with opening rotation +
|
universal.py teacher used for BC demo collection
|
||||||
walk-to-centre fallback
|
(Strömbom + mecanum omega + straggler recovery)
|
||||||
|
active_scan.py rotate-on-empty + walk-to-centre fallback
|
||||||
modulation.py shared near-sheep speed-modulation helper
|
modulation.py shared near-sheep speed-modulation helper
|
||||||
|
|
||||||
controllers/
|
controllers/
|
||||||
sheep/sheep.py — Webots sheep controller (uses herding.world.flocking_sim)
|
sheep/sheep.py — Webots sheep controller
|
||||||
shepherd_dog/
|
shepherd_dog/
|
||||||
shepherd_dog.py — Webots dog controller, mode-switched
|
shepherd_dog.py — Webots dog controller, mode-switched
|
||||||
policy_loader.py — lazy SB3 policy loader (auto-detects frame stack)
|
policy_loader.py — SB3 PPO / RecurrentPPO loader with frame stack
|
||||||
|
|
||||||
training/
|
training/
|
||||||
herding_env.py — Gymnasium env (LiDAR + tracker by default)
|
herding_env.py — Gymnasium env (LiDAR + tracker by default)
|
||||||
bc/collect.py — sim demos via the active-scan teacher
|
bc/collect.py — sim demos via the active-scan teacher
|
||||||
bc/pretrain.py — supervised BC of (obs, action) demos into MLP
|
bc/pretrain.py — supervised BC into MLP
|
||||||
rl/train.py — KL-regularised PPO fine-tune of BC
|
rl/train.py — KL-regularised PPO fine-tune of BC
|
||||||
|
rl/train_lstm.py — RecurrentPPO variant (ablation)
|
||||||
eval.py — analytic + learned policy comparison harness
|
eval.py — analytic + learned policy comparison harness
|
||||||
bc/demos.npz — collected demonstrations (gitignored)
|
|
||||||
runs/ — checkpoints (whitelisted in .gitignore)
|
runs/ — checkpoints (whitelisted in .gitignore)
|
||||||
requirements.txt
|
requirements.txt
|
||||||
|
|
||||||
tests/
|
tests/ — 126 pytest cases, < 1 s on CPU
|
||||||
conftest.py — pytest setup (adds project root to sys.path)
|
|
||||||
test_geometry.py — geometric predicates + constants
|
|
||||||
test_diffdrive.py — kinematics and (vx, vy) → wheel-speed map
|
|
||||||
test_obs.py — observation builder (shape, normalisation, order)
|
|
||||||
test_control.py — speed modulation + analytic teachers + active scan
|
|
||||||
test_perception.py — LiDAR sim + clustering + tracker
|
|
||||||
test_env.py — Gymnasium contract + determinism + reward
|
|
||||||
|
|
||||||
tools/
|
tools/
|
||||||
run_webots.sh — launch Webots with N sheep + chosen mode
|
run_webots.sh — launch Webots with N sheep + chosen mode + world
|
||||||
|
webots_sweep.sh — headless sweep across modes × drives × worlds
|
||||||
|
webots_sweep_gt.sh — same with HERDING_USE_GT=1 (perfect perception)
|
||||||
|
calibrate_mecanum.sh — measure mecanum body velocity vs gym prediction
|
||||||
|
gen_mecanum_wheels.py — regenerate the 32 mecanum roller hinges
|
||||||
|
benchmark_lidar.py — tracker quality benchmark
|
||||||
|
|
||||||
Makefile — pipeline orchestrator (make / make rl / make test / …)
|
Makefile — pipeline orchestrator
|
||||||
|
(make DRIVE=… WORLD=… rl, make train_all, …)
|
||||||
|
|
||||||
worlds/
|
worlds/
|
||||||
field.wbt — main world (3 m gate, external pen)
|
field.wbt — rectangular world (3 m gate, external pen)
|
||||||
|
field_round.wbt — circular world (radius 15 m, same pen)
|
||||||
|
|
||||||
|
protos/
|
||||||
|
Sheep.proto — sheep robot
|
||||||
|
ShepherdDog.proto — diff-drive dog, 140° LiDAR
|
||||||
|
ShepherdDog360.proto — diff-drive dog, 360° LiDAR (ablation)
|
||||||
|
ShepherdDogMecanum.proto — 4-wheel mecanum with physical roller hinges
|
||||||
|
|
||||||
protos/ — Sheep / ShepherdDog robot definitions
|
|
||||||
docs/project.md — original course proposal/goals
|
docs/project.md — original course proposal/goals
|
||||||
```
|
```
|
||||||
|
|
||||||
@@ -151,48 +174,57 @@ scattering the flock. Direction (intent) is preserved.
|
|||||||
All modes also share the same EMA action smoother in
|
All modes also share the same EMA action smoother in
|
||||||
`controllers/shepherd_dog/shepherd_dog.py:ACTION_SMOOTH = 0.55`.
|
`controllers/shepherd_dog/shepherd_dog.py:ACTION_SMOOTH = 0.55`.
|
||||||
|
|
||||||
## Results — env eval, 10 seeds × n=1..10
|
## Results — Webots end-to-end, canonical 140° LiDAR
|
||||||
|
|
||||||
`max_steps=15000`, full-field spawn distribution. Success rate per
|
Each cell = "OK at step X" means the dog penned all N sheep in a single
|
||||||
flock size, then mean steps over successful seeds.
|
trial, `HERDING_USE_GT=0` (LiDAR perception, no ground truth bypass),
|
||||||
|
default consensus tracker.
|
||||||
|
|
||||||
### Success rate (%)
|
### Differential drive
|
||||||
|
|
||||||
| n | Strömbom | `bc` | `rl` |
|
| Mode | World | n=5 | n=10 |
|
||||||
|---:|---:|---:|---:|
|
|---|---|---:|---:|
|
||||||
| 1 | 30 | 80 | **90** |
|
| Strömbom | field | 7528 | 11620 |
|
||||||
| 2 | 90 | 50 | **90** |
|
| Strömbom | field_round | 8611 | 10339 |
|
||||||
| 3 | 60 | 90 | **90** |
|
| Sequential | field | 7135 | 16843 |
|
||||||
| 4 | 40 | 80 | **90** |
|
| Sequential | field_round | 6019 | 8494 |
|
||||||
| 5 | 60 | 70 | **100** |
|
| BC | field | 11698 | 15079 |
|
||||||
| 6 | 30 | 80 | 80 |
|
| BC | field_round | 7234 | 11320 |
|
||||||
| 7 | 70 | 80 | **100** |
|
| RL | field | 10039 | 13954 |
|
||||||
| 8 | 30 | 100 | **100** |
|
| RL | field_round | 5803 | 9151 |
|
||||||
| 9 | 40 | 90 | **100** |
|
|
||||||
| 10 | 50 | 100 | **100** |
|
|
||||||
|
|
||||||
### Mean penned per episode (out of n)
|
RL is **strictly faster than BC** on every comparable cell.
|
||||||
|
|
||||||
| n | Strömbom | `bc` | `rl` |
|
### LiDAR vs GT bypass (diff drive)
|
||||||
|---:|---:|---:|---:|
|
|
||||||
| 1 | 0.30 | 0.80 | **0.90** |
|
|
||||||
| 5 | 3.90 | 4.10 | **5.00** |
|
|
||||||
| 8 | 4.20 | 8.00 | **8.00** |
|
|
||||||
| 10 | 7.40 | 10.00 | **10.00** |
|
|
||||||
|
|
||||||
### Takeaways
|
GT bypass replaces the LiDAR tracker with perfect emitter positions.
|
||||||
|
LiDAR is the default; GT is a perception ablation
|
||||||
|
(`HERDING_USE_GT=1`):
|
||||||
|
|
||||||
- **BC clearly beats Strömbom** under realistic LiDAR conditions (full
|
| Mode | World | n=5 LiDAR | n=5 GT | n=10 LiDAR | n=10 GT |
|
||||||
field, partial observability). Strömbom struggles on small flocks
|
|---|---|---:|---:|---:|---:|
|
||||||
where a single sheep can spawn beyond the LiDAR's 12 m range; BC
|
| Strömbom | field | 7528 | **5254** | 11620 | **7342** |
|
||||||
learned active perception from the demos.
|
| Strömbom | field_round | 8611 | **3631** | 10339 | **7084** |
|
||||||
- **RL refines BC** without regressing on any cell. Ties or beats BC
|
| Sequential | field | **7135** | 11092 | 16843 | **8698** |
|
||||||
at every flock size; biggest gains at n=1 and n=4 where BC's
|
| Sequential | field_round | 6019 | **3454** | 8494 | **7324** |
|
||||||
imitation of Strömbom's drive heuristic was sub-optimal.
|
|
||||||
- **Aggressive reward shaping doesn't help** — a more aggressive
|
GT is generally faster (perfect perception → fewer wasted steps).
|
||||||
variant (β=0.02, W_TIME=-0.1, W_IMITATE=0, 3 M steps) trained as
|
Sequential n=5 / field is the one cell where GT is *slower* — its
|
||||||
an ablation was strictly worse than the conservative tune shipped
|
straggler heuristic appears to over-correct when the dog has full
|
||||||
here (β=0.05, W_IMITATE=0.5, 1 M steps).
|
information.
|
||||||
|
|
||||||
|
### Mecanum (differential is the headline)
|
||||||
|
|
||||||
|
The `ShepherdDogMecanum.proto` was rewritten on 2026-05-16 with 32
|
||||||
|
physical roller hinges, giving true omnidirectional motion in Webots
|
||||||
|
(`tools/calibrate_mecanum.sh` confirms the X-pattern). The mecanum
|
||||||
|
calibration shows ~60% strafe efficiency vs textbook (vs ~89% on
|
||||||
|
forward), so v1 mecanum BC/RL policies trained on textbook gym
|
||||||
|
mecanum no longer herd reliably. The fix is staged but not run:
|
||||||
|
the gym now has `HERDING_MEC_WEBOTS` which matches Webots' physical
|
||||||
|
mecanum, and `training/bc/collect.py` / `training/rl/train.py` auto-
|
||||||
|
select this preset for mecanum runs. Retraining (≈ 2 h per combo,
|
||||||
|
4 combos) is the documented future step.
|
||||||
|
|
||||||
## License
|
## License
|
||||||
|
|
||||||
|
|||||||
@@ -80,48 +80,3 @@ def compute_action(dog_xy, sheep_positions, pen_target=PEN_ENTRY):
|
|||||||
|
|
||||||
ax, ay = _unit(tx - dog_xy[0], ty - dog_xy[1])
|
ax, ay = _unit(tx - dog_xy[0], ty - dog_xy[1])
|
||||||
return ax, ay, mode
|
return ax, ay, mode
|
||||||
|
|
||||||
|
|
||||||
def compute_action_debug(dog_xy, sheep_positions, pen_target=PEN_ENTRY):
|
|
||||||
"""``compute_action`` plus a debug dict."""
|
|
||||||
active = [(x, y) for (x, y) in sheep_positions.values() if _is_active(x, y)]
|
|
||||||
if not active:
|
|
||||||
return 0.0, 0.0, "idle", {
|
|
||||||
"n_active": 0, "phase": "idle", "radius": 0.0, "threshold": 0.0,
|
|
||||||
"com_x": 0.0, "com_y": 0.0,
|
|
||||||
"target_x": dog_xy[0], "target_y": dog_xy[1],
|
|
||||||
}
|
|
||||||
|
|
||||||
n = len(active)
|
|
||||||
com_x = sum(p[0] for p in active) / n
|
|
||||||
com_y = sum(p[1] for p in active) / n
|
|
||||||
dists = [math.hypot(p[0] - com_x, p[1] - com_y) for p in active]
|
|
||||||
radius = max(dists)
|
|
||||||
threshold = F_FACTOR * math.sqrt(n)
|
|
||||||
|
|
||||||
if n <= STRAGGLER_THRESHOLD:
|
|
||||||
sx, sy = min(active,
|
|
||||||
key=lambda p: math.hypot(p[0] - pen_target[0],
|
|
||||||
p[1] - pen_target[1]))
|
|
||||||
ux, uy = _unit(sx - pen_target[0], sy - pen_target[1])
|
|
||||||
tx, ty = sx + DELTA_TARGET * ux, sy + DELTA_TARGET * uy
|
|
||||||
mode = "targeted"
|
|
||||||
|
|
||||||
elif radius > threshold:
|
|
||||||
idx = max(range(n), key=lambda i: dists[i])
|
|
||||||
sx, sy = active[idx]
|
|
||||||
ux, uy = _unit(sx - com_x, sy - com_y)
|
|
||||||
tx, ty = sx + DELTA_COLLECT * ux, sy + DELTA_COLLECT * uy
|
|
||||||
mode = "collect"
|
|
||||||
|
|
||||||
else:
|
|
||||||
ux, uy = _unit(com_x - pen_target[0], com_y - pen_target[1])
|
|
||||||
tx, ty = com_x + DELTA_DRIVE * ux, com_y + DELTA_DRIVE * uy
|
|
||||||
mode = "drive"
|
|
||||||
|
|
||||||
ax, ay = _unit(tx - dog_xy[0], ty - dog_xy[1])
|
|
||||||
return ax, ay, mode, {
|
|
||||||
"n_active": n, "phase": mode, "radius": radius, "threshold": threshold,
|
|
||||||
"com_x": com_x, "com_y": com_y,
|
|
||||||
"target_x": tx, "target_y": ty,
|
|
||||||
}
|
|
||||||
|
|||||||
@@ -76,40 +76,3 @@ def compute_action(dog_xy, sheep_positions, pen_target=PEN_ENTRY):
|
|||||||
|
|
||||||
ax, ay = _unit(tx - dog_xy[0], ty - dog_xy[1])
|
ax, ay = _unit(tx - dog_xy[0], ty - dog_xy[1])
|
||||||
return ax, ay, mode
|
return ax, ay, mode
|
||||||
|
|
||||||
|
|
||||||
def compute_action_debug(dog_xy, sheep_positions, pen_target=PEN_ENTRY):
|
|
||||||
"""``compute_action`` plus a small debug dict (CoM, target, radius)."""
|
|
||||||
active = [(x, y) for (x, y) in sheep_positions.values() if _is_active(x, y)]
|
|
||||||
if not active:
|
|
||||||
return 0.0, 0.0, "idle", {
|
|
||||||
"n_active": 0, "radius": 0.0, "threshold": 0.0,
|
|
||||||
"com_x": 0.0, "com_y": 0.0,
|
|
||||||
"target_x": dog_xy[0], "target_y": dog_xy[1],
|
|
||||||
}
|
|
||||||
|
|
||||||
n = len(active)
|
|
||||||
com_x = sum(p[0] for p in active) / n
|
|
||||||
com_y = sum(p[1] for p in active) / n
|
|
||||||
dists = [math.hypot(p[0] - com_x, p[1] - com_y) for p in active]
|
|
||||||
radius = max(dists)
|
|
||||||
threshold = F_FACTOR * math.sqrt(n)
|
|
||||||
|
|
||||||
if radius > threshold:
|
|
||||||
idx = max(range(n), key=lambda i: dists[i])
|
|
||||||
sx, sy = active[idx]
|
|
||||||
ux, uy = _unit(sx - com_x, sy - com_y)
|
|
||||||
tx, ty = sx + DELTA_COLLECT * ux, sy + DELTA_COLLECT * uy
|
|
||||||
mode = "collect"
|
|
||||||
else:
|
|
||||||
ux, uy = _unit(com_x - pen_target[0], com_y - pen_target[1])
|
|
||||||
tx, ty = com_x + DELTA_DRIVE * ux, com_y + DELTA_DRIVE * uy
|
|
||||||
mode = "drive"
|
|
||||||
|
|
||||||
ax, ay = _unit(tx - dog_xy[0], ty - dog_xy[1])
|
|
||||||
dbg = {
|
|
||||||
"n_active": n, "radius": radius, "threshold": threshold,
|
|
||||||
"com_x": com_x, "com_y": com_y,
|
|
||||||
"target_x": tx, "target_y": ty,
|
|
||||||
}
|
|
||||||
return ax, ay, mode, dbg
|
|
||||||
|
|||||||
@@ -207,17 +207,3 @@ def compute_action(dog_xy, dog_heading, sheep_positions,
|
|||||||
omega = max(-1.0, min(1.0, OMEGA_GAIN * err / math.pi))
|
omega = max(-1.0, min(1.0, OMEGA_GAIN * err / math.pi))
|
||||||
|
|
||||||
return ax, ay, omega, mode
|
return ax, ay, omega, mode
|
||||||
|
|
||||||
|
|
||||||
def compute_action_diff(dog_xy, dog_heading, sheep_positions,
|
|
||||||
pen_target=PEN_ENTRY):
|
|
||||||
"""Compatibility wrapper returning ``(vx, vy, mode)`` — same as Strömbom.
|
|
||||||
|
|
||||||
Use this when plugging into existing differential-drive code that
|
|
||||||
doesn't expect omega.
|
|
||||||
"""
|
|
||||||
vx, vy, _omega, mode = compute_action(
|
|
||||||
dog_xy, dog_heading, sheep_positions, pen_target,
|
|
||||||
drive_mode="differential",
|
|
||||||
)
|
|
||||||
return vx, vy, mode
|
|
||||||
|
|||||||
@@ -1,7 +0,0 @@
|
|||||||
make[1]: Entering directory '/run/host/home/johnnyf/Documents/Projects/TIR/project'
|
|
||||||
make DRIVE=differential WORLD=field
|
|
||||||
make[2]: Entering directory '/run/host/home/johnnyf/Documents/Projects/TIR/project'
|
|
||||||
python -m training.eval --policy training/runs/rl_differential_field \
|
|
||||||
--max-flock 10 --max-steps 15000 --n-seeds 10 \
|
|
||||||
--drive-mode differential --world field
|
|
||||||
make[2]: Leaving directory '/run/host/home/johnnyf/Documents/Projects/TIR/project'
|
|
||||||
Binary file not shown.
Binary file not shown.
Binary file not shown.
Reference in New Issue
Block a user