Checkpoint 8

This commit is contained in:
Johnny Fernandes
2026-05-12 22:41:03 +01:00
parent a01a5c9cef
commit 5c2ee4bba5
31 changed files with 3189 additions and 380 deletions
+6 -20
View File
@@ -1,4 +1,9 @@
# Training pipeline
# Training and Evaluation Details
This file is the command-level companion to the root README. It focuses
on data collection, BC, PPO fine-tuning, evaluation flags, and generated
artifacts; use the root README for the high-level architecture and
Webots demo quick start.
Two stages, strictly sequential:
@@ -26,16 +31,6 @@ runs/ — checkpoints (whitelisted entries in top-level .gitignore)
run with ``python -m pytest tests/``.)
```
## Setup
```
pip install -r requirements.txt
```
CPU is the default and recommended device — SB3 PPO with an MLP policy
of this size runs faster on CPU than GPU because the bottleneck is
rollout collection, not gradient compute.
## End-to-end pipeline
The simplest way to run everything is the Makefile at the project
@@ -93,12 +88,3 @@ python -m training.eval --policy strombom --max-flock 10 --max-steps 15000 --
python -m training.eval --policy sequential --max-flock 10 --max-steps 15000 --n-seeds 10
```
## Webots inference
```
tools/run_webots.sh 10 bc # or rl, strombom, sequential
```
The dog controller loads `runs/bc` for `bc` mode and `runs/rl` for
`rl` mode. Override with `HERDING_POLICY_DIR=…` for a specific
checkpoint.