Checkpoint 8

2026-05-12 22:41:03 +01:00
parent a01a5c9cef
commit 5c2ee4bba5
31 changed files with 3189 additions and 380 deletions
@@ -1,4 +1,9 @@
-# Training pipeline
+# Training and Evaluation Details
+
+This file is the command-level companion to the root README. It focuses
+on data collection, BC, PPO fine-tuning, evaluation flags, and generated
+artifacts; use the root README for the high-level architecture and
+Webots demo quick start.

 Two stages, strictly sequential:

@@ -26,16 +31,6 @@ runs/              — checkpoints (whitelisted entries in top-level .gitignore)
 run with ``python -m pytest tests/``.)
 ```

-## Setup
-
-```
-pip install -r requirements.txt
-```
-
-CPU is the default and recommended device — SB3 PPO with an MLP policy
-of this size runs faster on CPU than GPU because the bottleneck is
-rollout collection, not gradient compute.
-
 ## End-to-end pipeline

 The simplest way to run everything is the Makefile at the project
@@ -93,12 +88,3 @@ python -m training.eval --policy strombom    --max-flock 10 --max-steps 15000 --
 python -m training.eval --policy sequential  --max-flock 10 --max-steps 15000 --n-seeds 10
 ```

-## Webots inference
-
-```
-tools/run_webots.sh 10 bc          # or rl, strombom, sequential
-```
-
-The dog controller loads `runs/bc` for `bc` mode and `runs/rl` for
-`rl` mode. Override with `HERDING_POLICY_DIR=…` for a specific
-checkpoint.