Commit Graph

38 Commits

Author SHA1 Message Date
Johnny Fernandes 0a27ad9a26 Full retrain pipeline + hybrid policy set
Ran end-to-end clean retrain + gym eval + 24-cell Webots sweep
(tools/full_pipeline.sh). Results:

  Differential — all 16 cells pen N/N. Updated policies committed.
  Mecanum     — new training stochastically regressed (only 2/8 cells
                vs the v2 baseline's 4/8). v2 baseline mec policies
                are RESTORED in this commit (training/runs/{bc,rl}_
                mecanum_*) — they remain the deliverable.

The retrain pipeline itself is committed for reproducibility
(tools/full_pipeline.sh: clean → train_all → eval_all → 24-cell
Webots sweep). The v2 mec policies are also backed up locally to
_backup_pretrain/mec_v2_baseline/ (gitignored).

Verified after restore:
  bc mec field_round n=10 → 10/10 in 147 s sim
  rl diff field n=5      → 5/5  in 137 s sim
2026-05-20 08:07:39 +00:00
Johnny Fernandes 27c0f65722 Mecanum Webots via Supervisor kinematic injection
Replace the failing ODE-rolled mecanum chassis dynamics with a
Supervisor.setVelocity call that uses the gym mecanum forward
kinematics formula directly. Wheel motors still spin (visual);
chassis motion comes from the gym model so training and deployment
match by construction.

Results (seed=42, n=10 sheep): BC + RL mecanum pen 10/10 in both
field and field_round. n=5 mecanum cells still 0/5 due to tracker
phantoms anchored to wall corners under the 360° LiDAR — documented
in docs/status.md as the remaining gap.

Cleanup: drop deploy-time hacks (HERDING_HEADING_*, HERDING_OMEGA_CLAMP,
HERDING_TRACKER_*) that were workarounds for the old ODE chaos;
revert the proto inertiaMatrix, roller dampingConstant, and reduced
motor torque since they no longer carry load; refresh comments
around the mecanum config presets.
2026-05-18 22:46:37 +00:00
Johnny Fernandes a584a034e9 Project-wide cleanup: gitignore, dead code, stale artifacts, README
Repo hygiene pass after a long working session.

Files removed:
* stage1_train.log — runtime training log (~125 KB), shouldn't have
  been tracked.
* training/bc/demos.npz — orphan default-name demos file from before
  the world+drive-suffixed naming convention took over; no script
  references it.
* training/runs/bc_dagger{1,2}_differential_field/policy.zip — failed
  DAgger experiment artifacts. Per `memory/dagger_results.md` the
  whole DAgger experiment hit 0/5 on Webots transfer; these checkpoints
  have no consumers.

Untracked-but-deleted (no git change) — also cleaned from disk:
* Root-level runtime logs (43 *.log files, all unused — gitignored now).
* training/bc/{combined,dagger}*.npz (5 huge demo blobs, 2.6 GB
  reclaimed; not committed).
* training/bc/v1/ (2.6 GB backup of pre-DAgger demos; reclaimed).
* training/runs/at_20260426_*/ (orphan timestamped runs; reclaimed).
* All __pycache__/.

Dead code removed:
* `herding/control/strombom.py::compute_action_debug` — no callers
  anywhere in the repo.
* `herding/control/sequential.py::compute_action_debug` — same.
* `herding/control/universal.py::compute_action_diff` — same.

.gitignore extended to cover:
* All *.log files (training/eval/webots logs are runtime artifacts).
* training/bc/*.npz (re-collectable on demand by `make bc_demos`).
* training/bc/v1/.
* .pytest_cache, *.pyc, .claude/.

README refreshed:
* Mecanum + round-world coverage in the headline.
* Quick-start updated for DRIVE/WORLD-suffixed Makefile targets,
  GT-bypass example, and the mecanum-retrain caveat.
* Layout reflects the actual current tree (config.py, both protos,
  both worlds, all tools).
* Results table replaced with the Webots end-to-end numbers from
  the 2026-05-16 sweep (8/8 diff combos + LiDAR/GT comparison).

Verification: 126 pytest cases still pass (was 126 going in — no
test-coverage regression from the dead-code removal).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-17 01:38:19 +00:00
Johnny Fernandes 876e14e74f LSTM (RecurrentPPO) experiment + recurrent policy support
Adds RecurrentPPO-based training as an alternative to MLP+frame-stack.
The LSTM gives the policy unbounded temporal memory, addressing the
partial-obs failure mode of the 140° Webots LiDAR (tracker briefly
empties when the dog turns; sporadic phantom tracks confuse decisions).

* training/rl/train_lstm.py: from-scratch RecurrentPPO trainer (no BC
  init, no KL term since there's no reference). Uses HERDING_WEBOTS
  preset so the obs distribution matches deployment.
* training/eval.py: auto-detects RecurrentPPO zips, maintains LSTM
  hidden state across steps, resets between episodes.
* controllers/shepherd_dog/policy_loader.py: PolicyHandle supports
  recurrent policies — state managed inside, reset_recurrent() exposed.

Result on diff/field after 3M steps:
- Gym (default 360°): 69% avg success across n=1..10
- Gym (HERDING_WEBOTS preset, training env): 2% — penning 3-4/5 but
  rarely all 5
- Webots LiDAR 140°: 0/5 (same wall as DAgger and v1 policies)

Conclusion: architectural changes (LSTM vs MLP) don't close the
perception sim-to-real gap. The gym LiDAR sim doesn't faithfully
reproduce Webots phantom-track distribution; any policy trained on the
gym proxy fails to handle real Webots phantoms regardless of
architecture. Closing this gap requires either modeling Webots phantom
patterns in the gym sim (multi-day work) or Webots-in-the-loop
training (very slow). See memory/lstm_results.md for details.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-16 19:22:32 +00:00
Johnny Fernandes dd5ac669e5 Webots sim-to-real fixes, DAgger pipeline, 360° proto variant
Today's session worked across the full Webots delivery stack — found and
fixed a cluster of bugs blocking the BC/RL transfer, then explored
training-side mitigations for the residual perception gap.

Bug fixes:
- Makefile FP_RATE default 2.0 → 0.0: BC demos used fp_rate=0 but RL
  fine-tune defaulted to fp_rate=2, poisoning the BC obs distribution
  and stalling PPO at 0% success across 1.46M+ steps.
- controllers/{shepherd_dog,sheep}/runtime.ini: Webots was launching
  controllers under system python3 (no numpy) and they were crashing
  silently. Pinned to the conda tir env.
- herding/config.py HERDING_WEBOTS preset: pen_latch_depth 0.5 → 2.0,
  max_new_tracks_per_step 3 → 1, static_reject 0.8 → 1.2. Stops phantom
  FPs near the gate from latching as permanently-penned tracks.
- herding/perception/sheep_tracker.py: penned tracks now decay at
  forget_steps × 8 instead of living forever. Adds get_positions
  min_freshness filter for deploy-time use.

Training/eval matches deployment:
- training/bc/collect.py: --dagger-policy flag for DAgger rollouts
  (policy drives, teacher labels) + --use-webots-preset for matched
  140° tracker + DR config.
- controllers/shepherd_dog/shepherd_dog.py: scan-fallback (0, 0.6) when
  BC/RL sees empty sheep_positions — recovers from FOV gaps.

Tooling:
- tools/dagger_round.sh: one-shot DAgger round (collect + concat + bc).
- tools/webots_sweep_gt.sh: full sweep with HERDING_USE_GT=1 for the
  perception-gap diagnosis matrix.
- protos/ShepherdDog360.proto: 360° FOV variant for the FOV-ablation
  comparison. Canonical proto stays at 140° per project spec.

Artifacts: v1 BC/RL policies for all 4 (drive × world) combos trained
in clean gym (success: diff/field 90-100%, diff/round 58%, mec/field
60-100%, mec/round 50-100%). DAgger r1/r2 BCs for diff/field show
12%→38% progression on gym HERDING_WEBOTS proxy but did not close
to actual Webots LiDAR (0/5 throughout). Next: LSTM policy or
learned tracker per the project-state memory.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-16 17:21:02 +00:00
Johnny Fernandes 0f807003a5 Results from last checkpoint 2026-05-13 20:26:18 +00:00
Johnny Fernandes be58ad2054 Results from last checkpoinr 2026-05-13 07:49:17 +00:00
Johnny Fernandes 5c2ee4bba5 Checkpoint 8 2026-05-12 22:41:03 +01:00
Johnny Fernandes b457155538 Checkpoint 5 - incomplete 2026-05-11 10:35:39 +01:00
Johnny Fernandes 6688325d89 Checkpoint 4 2026-05-11 00:42:52 +01:00
Johnny Fernandes 2a6db038df Checkpoint 3 2026-05-10 12:46:14 +01:00
Johnny Fernandes 1bb9415414 Checkpoint 2 2026-05-07 22:00:10 +01:00
Johnny Fernandes 57b1735e1a Mimics webots approach better + debug. Lucky number 2026-04-26 20:36:36 +01:00
Johnny Fernandes deeae3193e Mimics webots approach better + debug. Lucky number 2026-04-26 18:55:53 +01:00
Johnny Fernandes 1af7d03ce2 Mimic webots physics 2026-04-26 18:22:26 +01:00
Johnny Fernandes 8110fc3143 Run n3 2026-04-26 16:42:55 +00:00
Johnny Fernandes 27fe6d1bf5 Run v3 2026-04-26 16:01:30 +00:00
Johnny Fernandes a561f8a697 Run v2 2026-04-26 13:32:48 +00:00
Johnny Fernandes acf0810425 Test26_1200 2026-04-26 11:04:23 +00:00
Johnny Fernandes 61f8a7db15 Cleanup and new approach 2026-04-26 01:50:01 +01:00
Johnny Fernandes 6612dbc1ba Test25_2330 2026-04-25 22:32:06 +00:00
Johnny Fernandes 841f5fa520 Test25_2000 2026-04-25 19:17:40 +00:00
Johnny Fernandes 5005128c07 Test25_1820 2026-04-25 17:19:02 +00:00
Johnny Fernandes 75d030cb49 Test25_1800 2026-04-25 17:00:19 +00:00
Johnny Fernandes 3a5decb185 Test25_1700 2026-04-25 16:02:10 +00:00
Johnny Fernandes 4350c7d320 Test25_1600 2026-04-25 15:06:06 +00:00
Johnny Fernandes e7c1d82f5c Test25_1315 2026-04-25 12:14:36 +00:00
Johnny Fernandes 19bfac9bd9 Test25_1245 2026-04-25 11:47:37 +00:00
Johnny Fernandes 433652cb94 Test25_1215 2026-04-25 11:16:12 +00:00
Johnny Fernandes 062de676c9 Test25_0030 2026-04-24 23:37:03 +00:00
Johnny Fernandes 5a61a424ee Test25_0010 2026-04-24 23:10:33 +00:00
Johnny Fernandes eb29cdf402 Test24_2100 2026-04-24 20:08:25 +00:00
Johnny Fernandes efe996a5a9 Test24_1900 2026-04-24 18:00:20 +00:00
Johnny Fernandes 65d881aa0f Test24_1800 2026-04-24 17:00:14 +00:00
Johnny Fernandes c2da9c10e4 Test24_1725 2026-04-24 16:24:54 +00:00
Johnny Fernandes 1e3b67d194 Test24_0150 2026-04-24 00:50:17 +00:00
Johnny Fernandes 9e13eb060d Classic approach results 2026-04-23 17:23:57 +00:00
Johnny Fernandes ea6e66b16c Classic approach results 2026-04-23 12:43:47 +00:00