Notebooks gerador com bom nome + README

2026-05-14 20:20:46 +01:00
parent f46320f81e
commit 3bff7eefb0
9 changed files with 1127 additions and 504 deletions
@@ -1,125 +1,166 @@
-# DRL_PROJ — DeepFake Detection
+# Deep Learning Face Project

-Deep learning project for binary deepfake detection on the DeepFakeFace dataset.
+This repository contains a two-part deep learning project on the
+DeepFakeFace (DFF) dataset:

-## Project structure
+1. **Classifier:** detect whether a face image is real or fake.
+2. **Generator:** train generative models that produce new fake face images.

-```
+The project is written as an experimental report. The notebooks are the main
+deliverable: they show the pipeline, the intermediate failures, the ablations,
+the decisions, and the final models. Read them in order.
+
+## Project Story
+
+The work follows the same principle in both parts: start with a simple
+baseline, inspect what fails, change one important factor at a time, and keep
+the evidence tied to saved logs and saved artifacts.
+
+For the **classifier**, the story moves from dataset understanding to
+preprocessing, baseline models, controlled ablations, Grad-CAM inspection,
+stronger model families, and data scaling. The final practical classifier is a
+ResNet50-style pipeline using face crops, 224x224 inputs, ImageNet/default
+normalization, and no stochastic augmentation at validation/test time.
+
+For the **generator**, the story starts with raw baseline failures, then locks
+the data pipeline before comparing three parallel model-family branches:
+GAN, VAE, and DDPM. The final comparison keeps quality versus speed central:
+DDPM gives the best saved FID and visual quality, GAN is the best
+quality-speed compromise, and VAE is the fastest but smoothest option.
+
+## How To Read The Project
+
+Start with the classifier notebooks, then read the generator notebooks. The
+generator has one linear setup stage followed by three parallel branches:
+GAN, VAE, and DDPM. Those branches are numbered in reading order, but they are
+conceptually parallel experiments after the pipeline is selected.
+
+### Classifier Notebooks
+
+Read these first:
+
+1. `classifier/notebooks/01_eda.ipynb`  
+   Dataset composition, real/fake source mapping, image statistics, and
+   shortcut risks.
+2. `classifier/notebooks/02_preprocessing.ipynb`  
+   Deterministic preprocessing, train-only augmentation, face crops, and
+   normalization.
+3. `classifier/notebooks/03_phase1_analysis.ipynb`  
+   SimpleCNN and ResNet18 controlled baselines.
+4. `classifier/notebooks/04_phase2_analysis.ipynb`  
+   Resolution, normalization, source holdouts, facecrop, and augmentation
+   ablations.
+5. `classifier/notebooks/05_gradcam_analysis.ipynb`  
+   Qualitative localization analysis across the classifier pipeline.
+6. `classifier/notebooks/06_phase3_model_family_analysis.ipynb`  
+   Stronger pretrained model families and the ResNet50 practical choice.
+7. `classifier/notebooks/07_phase4_data_scaling_analysis.ipynb`  
+   Data scaling for strong backbones and the final classifier decision.
+
+### Generator Notebooks
+
+Read these after the classifier:
+
+1. `generator/notebooks/01_baseline_sanity_check.ipynb`  
+   Raw baseline failures and why the data pipeline must be fixed first.
+2. `generator/notebooks/02_pipeline_selection.ipynb`  
+   Controlled pipeline ablations: resolution, alignment, augmentation, and
+   raw/aligned mixing.
+3. `generator/notebooks/03_gan_stability_progression.ipynb`  
+   GAN branch: DCGAN -> WGAN-GP -> spectral normalization + GroupNorm +
+   self-attention -> 128x128 check.
+4. `generator/notebooks/04_vae_loss_progression.ipynb`  
+   VAE branch: MSE + KL -> perceptual loss -> PatchGAN adversarial loss.
+5. `generator/notebooks/05_ddpm_recipe_progression.ipynb`  
+   DDPM branch: linear schedule -> cosine schedule -> v-prediction -> wider
+   backbone.
+6. `generator/notebooks/06_final_family_comparison.ipynb`  
+   Final comparison of the selected GAN, VAE, and DDPM recipes under saved
+   Phase 5 conditions.
+7. `generator/notebooks/07_final_sample_showcase.ipynb`  
+   Curated final sample examples from saved outputs. This is qualitative
+   showcase material, not a replacement for FID.
+
+## What The Notebooks Do
+
+The notebooks are analysis/report chapters. They load existing configs, logs,
+figures, saved sample grids, checkpoints, and prediction summaries. They are
+not intended to launch new training runs.
+
+When a notebook shows a plot or image grid, the surrounding markdown explains:
+
+- what the artifact shows;
+- why it is needed;
+- how it supports the phase decision;
+- what limitation remains.
+
+This is important because the project is evaluated not only by final
+performance, but by the documented evolution of the solution.
+
+## Repository Layout
+
+```text
 DRL_PROJ/
-  classifier/       ← discriminative model (real vs. fake classifier)
-    src/            ← model definitions, training, evaluation, preprocessing
-    configs/        ← experiment configs organised by phase
-      phase1/       ← baseline models (SimpleCNN, ResNet18)
-      phase2/       ← architecture sweep (ResNet variants, face-crop)
-      phase3/       ← EfficientNet, ViT, frequency-aware training
-      phase4/       ← ensemble strategies
-    tools/          ← analyse.py, ensemble.py, inference.py, facecrop.py
-    notebooks/      ← EDA, preprocessing, evaluation, GradCAM
-    outputs/        ← models, logs, figures (gitignored except .pt/.json)
-    run.py          ← main training entry point
-  generator/        ← generative model (GAN / VAE / diffusion) — in progress
-  pipeline/         ← Vast.ai ephemeral GPU orchestration
-  data/             ← dataset root (gitignored)
-  cropped/          ← MTCNN pre-cropped faces (gitignored)
-    classifier/     ← bbox crops for the classifier
-    generator/      ← landmark-aligned crops for the generator
+  classifier/
+    configs/       experiment configs by phase
+    notebooks/     classifier report notebooks
+    outputs/       saved logs, figures, Grad-CAM panels, checkpoints
+    src/           classifier data, models, training, evaluation
+    tools/         facecrop, Grad-CAM, inference, reevaluation helpers
+
+  generator/
+    configs/       generator configs by phase/family
+    notebooks/     generator report notebooks and notebook builder
+    outputs/       saved logs, sample grids, final showcase artifacts
+    src/           generator data, models, training, metrics
+    tools/         sampling and utility scripts
+
+  data/            original DFF dataset root, not committed
+  cropped/         preprocessed face crops, not committed
+  docs/            project statement and supporting documents
+  pipeline/        optional remote/GPU orchestration helpers
 ```

-## Setup
+## Rebuilding The Generator Notebooks

-Create a local environment when you want to run the code directly on a machine you control:
+The generator notebooks are generated from a single source file:

-```bash
-python3 -m venv .venv
-source .venv/bin/activate
-python -m pip install --upgrade pip setuptools wheel
-python -m pip install -r requirements.txt
+```powershell
+cd generator/notebooks
+python _build.py
 ```

-## Local Training
+That builder writes the numbered generator notebooks listed above. It uses
+existing saved logs and artifacts; it does not train models.

-```bash
-python3 classifier/run.py classifier/configs/phase2/p2_resnet18_facecrop.json
-python3 classifier/run.py classifier/configs/phase3/p3_efficientnet_b0.json
+## Running The Code
+
+Create an environment and install the project requirements:
+
+```powershell
+python -m venv .venv
+.\.venv\Scripts\python.exe -m pip install --upgrade pip setuptools wheel
+.\.venv\Scripts\python.exe -m pip install -r requirements.txt
 ```

-## Ephemeral Vast.ai Pipeline
+The raw dataset should be placed under `data/`. Preprocessed crops are stored
+under `cropped/`. These folders are intentionally not committed.

-The deployment/orchestration path now lives under [`pipeline/`](/run/host/mnt/shared/UP/DRL/DRL_PROJ/pipeline/README.md).
+Execution entry points exist in `classifier/run.py` and `generator/run.py` for
+reproducibility, but the report notebooks should be read as analysis over
+already saved results.

-One-time setup:
+## Final Takeaway

-```bash
-cat > pipeline/.env <<'EOF'
-VAST_API_KEY=<your-api-key>
-VAST_SSH_PRIVATE_KEY=/home/your-user/.ssh/id_ed25519
-EOF
-```
+The project is best understood as a sequence of controlled decisions:

-End-to-end ephemeral run:
+1. cleanly define the data and preprocessing;
+2. establish simple baselines;
+3. improve one factor at a time;
+4. compare model families using saved evidence;
+5. report both performance and limitations.

-```bash
-python3 -m pipeline run classifier/configs/phase2/p2_resnet18_facecrop.json --upload-data
-```
-
-Interactive offer selection:
-
-```bash
-python3 -m pipeline offers --select-offer
-```
-
-You can override the ranking mode per run:
-
-```bash
-python3 -m pipeline offers --sort price
-python3 -m pipeline offers --sort performance
-python3 -m pipeline offers --sort performance --price 0.14
-```
-
-You can also filter by region:
-
-```bash
-python3 -m pipeline offers --select-offer --region europe
-python3 -m pipeline offers --select-offer --region Portugal
-python3 -m pipeline offers --select-offer --region US
-python3 -m pipeline offers --select-offer --region europe --price 0.14
-```
-
-To inspect which region strings are currently available from the search results:
-
-```bash
-python3 -m pipeline offers --list-regions
-```
-
-That command:
- ensures your SSH public key is registered with Vast.ai
- searches offers using the filters in `pipeline/defaults/vast.json`
- creates an instance
- waits for SSH readiness
- syncs the repo
- uploads `data/` when `--upload-data` is set
- runs `python3 classifier/run.py ...`
- downloads `classifier/outputs/`
- for generator runs, rsyncs `generator/outputs/` back every 25 epochs and again at completion
- destroys the instance automatically unless `--keep-on-failure` is set
-
-Useful commands:
-
-```bash
-python3 -m pipeline up
-python3 -m pipeline status <instance_id>
-python3 -m pipeline down <instance_id>
-```
-
-To override the default Vast search/runtime settings, copy `pipeline/defaults/vast.json`, edit it, and pass:
-
-```bash
-python3 -m pipeline run classifier/configs/phase3/p3_efficientnet_b0.json --pipeline-config /path/to/vast.override.json
-```
-
-The default policy in `pipeline/defaults/vast.json` now targets:
- `1x` GPU
- `RTX 3090` or `RTX 3090 Ti`
- `<= $0.20/hour`
- sorted by `dlperf` descending
- uses `vastai/pytorch:latest` as the default image
+The classifier becomes reliable through source-aware preprocessing, stronger
+pretrained backbones, and scaling. The generator improves by first locking the
+face-aligned pipeline and then selecting the best recipe inside each model
+family before the final GAN/VAE/DDPM comparison.
@@ -2,9 +2,10 @@
 "cells": [
  {
   "cell_type": "markdown",
+   "id": "b6a7c89b",
   "metadata": {},
   "source": [
-    "# Phase 0 - Baseline Sanity Check\n",
+    "# 01 - Baseline Sanity Check\n",
    "\n",
    "Phase 0 is the starting point of the generator story. It uses the raw, un-aligned\n",
    "images and very plain versions of each model family so we can confirm that the\n",
@@ -33,7 +34,15 @@
  {
   "cell_type": "code",
   "execution_count": 1,
-   "metadata": {},
+   "id": "c354bb59",
+   "metadata": {
+    "execution": {
+     "iopub.execute_input": "2026-05-14T19:07:52.326287Z",
+     "iopub.status.busy": "2026-05-14T19:07:52.325284Z",
+     "iopub.status.idle": "2026-05-14T19:08:01.950318Z",
+     "shell.execute_reply": "2026-05-14T19:08:01.947303Z"
+    }
+   },
   "outputs": [],
   "source": [
    "import json\n",
@@ -92,17 +101,28 @@
  },
  {
   "cell_type": "markdown",
+   "id": "a6c786f4",
   "metadata": {},
   "source": [
    "## 1. Training loss curves\n",
    "\n",
-    "These curves check that the loops ran and produced stable logs. They are not enough to prove visual quality."
+    "These curves check that the loops ran and produced stable logs. They are not enough to prove visual quality, but they are needed before interpreting samples: a broken optimization loop would make every later visual comparison meaningless.\n",
+    "\n",
+    "**What to look for:** the curves should move smoothly enough to show that each family is learning something. The limitation is that loss scale differs by family, so the curves compare stability, not final image quality."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
-   "metadata": {},
+   "id": "47441617",
+   "metadata": {
+    "execution": {
+     "iopub.execute_input": "2026-05-14T19:08:01.960833Z",
+     "iopub.status.busy": "2026-05-14T19:08:01.958829Z",
+     "iopub.status.idle": "2026-05-14T19:08:03.894140Z",
+     "shell.execute_reply": "2026-05-14T19:08:03.891170Z"
+    }
+   },
   "outputs": [
    {
     "data": {
@@ -152,17 +172,28 @@
  },
  {
   "cell_type": "markdown",
+   "id": "836428fe",
   "metadata": {},
   "source": [
    "## 2. Final sample grids\n",
    "\n",
-    "The final previews show the practical failure mode of the raw pipeline: the samples have some face-like structure, but identity, alignment, and detail are not under control. These PNGs are displayed exactly as saved, so older Phase 0 matrices keep their original layout instead of being forced into 4x4."
+    "The final previews show the practical failure mode of the raw pipeline: the samples have some face-like structure, but identity, alignment, and detail are not under control. These PNGs are displayed exactly as saved, so older Phase 0 matrices keep their original layout instead of being forced into 4x4.\n",
+    "\n",
+    "**Why this matters:** this is the visual evidence that the first bottleneck is not only the model family. The data still contains too much pose, scale, and background variation for tiny baseline recipes."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
-   "metadata": {},
+   "id": "9389ea9c",
+   "metadata": {
+    "execution": {
+     "iopub.execute_input": "2026-05-14T19:08:03.902776Z",
+     "iopub.status.busy": "2026-05-14T19:08:03.901424Z",
+     "iopub.status.idle": "2026-05-14T19:08:05.698092Z",
+     "shell.execute_reply": "2026-05-14T19:08:05.693983Z"
+    }
+   },
   "outputs": [
    {
     "data": {
@@ -192,17 +223,28 @@
  },
  {
   "cell_type": "markdown",
+   "id": "b596f509",
   "metadata": {},
   "source": [
    "## 3. Progression - early vs late\n",
    "\n",
-    "The progression grids make the baseline failure visible over time. Later samples improve slightly, but the raw input distribution keeps the task too broad. The saved matrices are shown in their original layout."
+    "The progression grids make the baseline failure visible over time. Later samples improve slightly, but the raw input distribution keeps the task too broad. The saved matrices are shown in their original layout.\n",
+    "\n",
+    "**How to read it:** if more epochs only turn noise into rough face-like blobs, the next decision should be pipeline cleanup rather than simply training the same recipe longer."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
-   "metadata": {},
+   "id": "01959758",
+   "metadata": {
+    "execution": {
+     "iopub.execute_input": "2026-05-14T19:08:05.750674Z",
+     "iopub.status.busy": "2026-05-14T19:08:05.748669Z",
+     "iopub.status.idle": "2026-05-14T19:08:09.444392Z",
+     "shell.execute_reply": "2026-05-14T19:08:09.441869Z"
+    }
+   },
   "outputs": [
    {
     "data": {
@@ -257,6 +299,7 @@
  },
  {
   "cell_type": "markdown",
+   "id": "85964677",
   "metadata": {},
   "source": [
    "## 4. What this phase proves\n",
@@ -2,9 +2,10 @@
 "cells": [
  {
   "cell_type": "markdown",
+   "id": "b5ec2417",
   "metadata": {},
   "source": [
-    "# Phase 2 - GAN Progression\n",
+    "# 03 - GAN Stability Progression\n",
    "\n",
    "Phase 2 keeps the Phase 1 pipeline fixed and changes the GAN recipe. This makes\n",
    "the question narrow: once the data is aligned, what model changes are needed to\n",
@@ -29,6 +30,7 @@
  },
  {
   "cell_type": "markdown",
+   "id": "e8e9b53c",
   "metadata": {},
   "source": [
    "> ### FID is not comparable across phases\n",
@@ -49,6 +51,7 @@
  },
  {
   "cell_type": "markdown",
+   "id": "cc979d85",
   "metadata": {},
   "source": [
    "### Reference: Phase 0 baseline from the same family\n",
@@ -60,9 +63,16 @@
  },
  {
   "cell_type": "code",
-   "execution_count": 9,
-   "id": "bf821370",
-   "metadata": {},
+   "execution_count": 1,
+   "id": "a352836b",
+   "metadata": {
+    "execution": {
+     "iopub.execute_input": "2026-05-14T19:08:45.741982Z",
+     "iopub.status.busy": "2026-05-14T19:08:45.741982Z",
+     "iopub.status.idle": "2026-05-14T19:08:47.336989Z",
+     "shell.execute_reply": "2026-05-14T19:08:47.334437Z"
+    }
+   },
   "outputs": [],
   "source": [
    "import json\n",
@@ -75,6 +85,12 @@
    "\n",
    "plt.rcParams.update({\"figure.dpi\": 120, \"font.size\": 10})\n",
    "\n",
+    "try:\n",
+    "    display\n",
+    "except NameError:\n",
+    "    def display(obj):\n",
+    "        print(obj)\n",
+    "\n",
    "def find_generator_root():\n",
    "    for base in [Path.cwd(), *Path.cwd().parents]:\n",
    "        for candidate in [base, base / \"generator\"]:\n",
@@ -115,7 +131,7 @@
  },
  {
   "cell_type": "markdown",
-   "id": "f627af73",
+   "id": "e1871b79",
   "metadata": {},
   "source": [
    "## 1. Load experiment logs\n",
@@ -125,9 +141,16 @@
  },
  {
   "cell_type": "code",
-   "execution_count": 10,
-   "id": "59f61b4e",
-   "metadata": {},
+   "execution_count": 2,
+   "id": "32a2b843",
+   "metadata": {
+    "execution": {
+     "iopub.execute_input": "2026-05-14T19:08:47.342985Z",
+     "iopub.status.busy": "2026-05-14T19:08:47.342985Z",
+     "iopub.status.idle": "2026-05-14T19:08:47.368505Z",
+     "shell.execute_reply": "2026-05-14T19:08:47.365495Z"
+    }
+   },
   "outputs": [
    {
     "name": "stdout",
@@ -163,7 +186,7 @@
  },
  {
   "cell_type": "markdown",
-   "id": "c1bad44a",
+   "id": "494c64aa",
   "metadata": {},
   "source": [
    "## 2. FID comparison table\n",
@@ -173,72 +196,79 @@
  },
  {
   "cell_type": "code",
-   "execution_count": 11,
-   "id": "528d3bb2",
-   "metadata": {},
+   "execution_count": 3,
+   "id": "72a04040",
+   "metadata": {
+    "execution": {
+     "iopub.execute_input": "2026-05-14T19:08:47.376020Z",
+     "iopub.status.busy": "2026-05-14T19:08:47.375017Z",
+     "iopub.status.idle": "2026-05-14T19:08:47.640711Z",
+     "shell.execute_reply": "2026-05-14T19:08:47.638190Z"
+    }
+   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<style type=\"text/css\">\n",
       "</style>\n",
-       "<table id=\"T_0b020\">\n",
+       "<table id=\"T_cf90d\">\n",
       "  <thead>\n",
       "    <tr>\n",
       "      <th class=\"blank level0\" >&nbsp;</th>\n",
-       "      <th id=\"T_0b020_level0_col0\" class=\"col_heading level0 col0\" >Run</th>\n",
-       "      <th id=\"T_0b020_level0_col1\" class=\"col_heading level0 col1\" >FID@25</th>\n",
-       "      <th id=\"T_0b020_level0_col2\" class=\"col_heading level0 col2\" >FID@50</th>\n",
-       "      <th id=\"T_0b020_level0_col3\" class=\"col_heading level0 col3\" >FID@100</th>\n",
-       "      <th id=\"T_0b020_level0_col4\" class=\"col_heading level0 col4\" >Best FID</th>\n",
-       "      <th id=\"T_0b020_level0_col5\" class=\"col_heading level0 col5\" >Train (min)</th>\n",
+       "      <th id=\"T_cf90d_level0_col0\" class=\"col_heading level0 col0\" >Run</th>\n",
+       "      <th id=\"T_cf90d_level0_col1\" class=\"col_heading level0 col1\" >FID@25</th>\n",
+       "      <th id=\"T_cf90d_level0_col2\" class=\"col_heading level0 col2\" >FID@50</th>\n",
+       "      <th id=\"T_cf90d_level0_col3\" class=\"col_heading level0 col3\" >FID@100</th>\n",
+       "      <th id=\"T_cf90d_level0_col4\" class=\"col_heading level0 col4\" >Best FID</th>\n",
+       "      <th id=\"T_cf90d_level0_col5\" class=\"col_heading level0 col5\" >Train (min)</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
-       "      <th id=\"T_0b020_level0_row0\" class=\"row_heading level0 row0\" >2</th>\n",
-       "      <td id=\"T_0b020_row0_col0\" class=\"data row0 col0\" >2.3 + SN + Attn</td>\n",
-       "      <td id=\"T_0b020_row0_col1\" class=\"data row0 col1\" >274.4</td>\n",
-       "      <td id=\"T_0b020_row0_col2\" class=\"data row0 col2\" >223.2</td>\n",
-       "      <td id=\"T_0b020_row0_col3\" class=\"data row0 col3\" >110.1</td>\n",
-       "      <td id=\"T_0b020_row0_col4\" class=\"data row0 col4\" >110.1</td>\n",
-       "      <td id=\"T_0b020_row0_col5\" class=\"data row0 col5\" >39.0</td>\n",
+       "      <th id=\"T_cf90d_level0_row0\" class=\"row_heading level0 row0\" >2</th>\n",
+       "      <td id=\"T_cf90d_row0_col0\" class=\"data row0 col0\" >2.3 + SN + Attn</td>\n",
+       "      <td id=\"T_cf90d_row0_col1\" class=\"data row0 col1\" >274.4</td>\n",
+       "      <td id=\"T_cf90d_row0_col2\" class=\"data row0 col2\" >223.2</td>\n",
+       "      <td id=\"T_cf90d_row0_col3\" class=\"data row0 col3\" >110.1</td>\n",
+       "      <td id=\"T_cf90d_row0_col4\" class=\"data row0 col4\" >110.1</td>\n",
+       "      <td id=\"T_cf90d_row0_col5\" class=\"data row0 col5\" >39.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
-       "      <th id=\"T_0b020_level0_row1\" class=\"row_heading level0 row1\" >3</th>\n",
-       "      <td id=\"T_0b020_row1_col0\" class=\"data row1 col0\" >2.4 + 128x128</td>\n",
-       "      <td id=\"T_0b020_row1_col1\" class=\"data row1 col1\" >428.6</td>\n",
-       "      <td id=\"T_0b020_row1_col2\" class=\"data row1 col2\" >264.3</td>\n",
-       "      <td id=\"T_0b020_row1_col3\" class=\"data row1 col3\" >186.0</td>\n",
-       "      <td id=\"T_0b020_row1_col4\" class=\"data row1 col4\" >186.0</td>\n",
-       "      <td id=\"T_0b020_row1_col5\" class=\"data row1 col5\" >97.7</td>\n",
+       "      <th id=\"T_cf90d_level0_row1\" class=\"row_heading level0 row1\" >3</th>\n",
+       "      <td id=\"T_cf90d_row1_col0\" class=\"data row1 col0\" >2.4 + 128x128</td>\n",
+       "      <td id=\"T_cf90d_row1_col1\" class=\"data row1 col1\" >428.6</td>\n",
+       "      <td id=\"T_cf90d_row1_col2\" class=\"data row1 col2\" >264.3</td>\n",
+       "      <td id=\"T_cf90d_row1_col3\" class=\"data row1 col3\" >186.0</td>\n",
+       "      <td id=\"T_cf90d_row1_col4\" class=\"data row1 col4\" >186.0</td>\n",
+       "      <td id=\"T_cf90d_row1_col5\" class=\"data row1 col5\" >97.7</td>\n",
       "    </tr>\n",
       "    <tr>\n",
-       "      <th id=\"T_0b020_level0_row2\" class=\"row_heading level0 row2\" >1</th>\n",
-       "      <td id=\"T_0b020_row2_col0\" class=\"data row2 col0\" >2.2 WGAN-GP</td>\n",
-       "      <td id=\"T_0b020_row2_col1\" class=\"data row2 col1\" >489.6</td>\n",
-       "      <td id=\"T_0b020_row2_col2\" class=\"data row2 col2\" >474.6</td>\n",
-       "      <td id=\"T_0b020_row2_col3\" class=\"data row2 col3\" >421.3</td>\n",
-       "      <td id=\"T_0b020_row2_col4\" class=\"data row2 col4\" >421.3</td>\n",
-       "      <td id=\"T_0b020_row2_col5\" class=\"data row2 col5\" >27.1</td>\n",
+       "      <th id=\"T_cf90d_level0_row2\" class=\"row_heading level0 row2\" >1</th>\n",
+       "      <td id=\"T_cf90d_row2_col0\" class=\"data row2 col0\" >2.2 WGAN-GP</td>\n",
+       "      <td id=\"T_cf90d_row2_col1\" class=\"data row2 col1\" >489.6</td>\n",
+       "      <td id=\"T_cf90d_row2_col2\" class=\"data row2 col2\" >474.6</td>\n",
+       "      <td id=\"T_cf90d_row2_col3\" class=\"data row2 col3\" >421.3</td>\n",
+       "      <td id=\"T_cf90d_row2_col4\" class=\"data row2 col4\" >421.3</td>\n",
+       "      <td id=\"T_cf90d_row2_col5\" class=\"data row2 col5\" >27.1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
-       "      <th id=\"T_0b020_level0_row3\" class=\"row_heading level0 row3\" >0</th>\n",
-       "      <td id=\"T_0b020_row3_col0\" class=\"data row3 col0\" >2.1 DCGAN (BCE)</td>\n",
-       "      <td id=\"T_0b020_row3_col1\" class=\"data row3 col1\" >444.3</td>\n",
-       "      <td id=\"T_0b020_row3_col2\" class=\"data row3 col2\" >438.9</td>\n",
-       "      <td id=\"T_0b020_row3_col3\" class=\"data row3 col3\" >429.3</td>\n",
-       "      <td id=\"T_0b020_row3_col4\" class=\"data row3 col4\" >429.3</td>\n",
-       "      <td id=\"T_0b020_row3_col5\" class=\"data row3 col5\" >17.8</td>\n",
+       "      <th id=\"T_cf90d_level0_row3\" class=\"row_heading level0 row3\" >0</th>\n",
+       "      <td id=\"T_cf90d_row3_col0\" class=\"data row3 col0\" >2.1 DCGAN (BCE)</td>\n",
+       "      <td id=\"T_cf90d_row3_col1\" class=\"data row3 col1\" >444.3</td>\n",
+       "      <td id=\"T_cf90d_row3_col2\" class=\"data row3 col2\" >438.9</td>\n",
+       "      <td id=\"T_cf90d_row3_col3\" class=\"data row3 col3\" >429.3</td>\n",
+       "      <td id=\"T_cf90d_row3_col4\" class=\"data row3 col4\" >429.3</td>\n",
+       "      <td id=\"T_cf90d_row3_col5\" class=\"data row3 col5\" >17.8</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n"
      ],
      "text/plain": [
-       "<pandas.io.formats.style.Styler at 0x2717dc6bfb0>"
+       "<pandas.io.formats.style.Styler at 0x225ffe45010>"
      ]
     },
-     "execution_count": 11,
+     "execution_count": 3,
     "metadata": {},
     "output_type": "execute_result"
    }
@@ -265,17 +295,26 @@
  },
  {
   "cell_type": "markdown",
-   "id": "1f77c814",
+   "id": "33bd39b9",
   "metadata": {},
   "source": [
-    "## 3. FID curves - progression"
+    "## 3. FID curves - progression\n",
+    "\n",
+    "This plot shows whether improvements happen gradually or as a step change. It is needed because the final FID table hides training dynamics: here the key story is that the 2.3 stability package changes the whole trajectory, while 2.1 and 2.2 remain collapsed."
   ]
  },
  {
   "cell_type": "code",
-   "execution_count": 12,
-   "id": "6984a849",
-   "metadata": {},
+   "execution_count": 4,
+   "id": "bc19fb53",
+   "metadata": {
+    "execution": {
+     "iopub.execute_input": "2026-05-14T19:08:47.648234Z",
+     "iopub.status.busy": "2026-05-14T19:08:47.647237Z",
+     "iopub.status.idle": "2026-05-14T19:08:48.493796Z",
+     "shell.execute_reply": "2026-05-14T19:08:48.490766Z"
+    }
+   },
   "outputs": [
    {
     "data": {
@@ -303,7 +342,7 @@
  },
  {
   "cell_type": "markdown",
-   "id": "f51cdb73",
+   "id": "77d10609",
   "metadata": {},
   "source": [
    "## 4. Training dynamics\n",
@@ -313,9 +352,16 @@
  },
  {
   "cell_type": "code",
-   "execution_count": 13,
-   "id": "0f420780",
-   "metadata": {},
+   "execution_count": 5,
+   "id": "57be87f9",
+   "metadata": {
+    "execution": {
+     "iopub.execute_input": "2026-05-14T19:08:48.500457Z",
+     "iopub.status.busy": "2026-05-14T19:08:48.499437Z",
+     "iopub.status.idle": "2026-05-14T19:08:49.538522Z",
+     "shell.execute_reply": "2026-05-14T19:08:49.535987Z"
+    }
+   },
   "outputs": [
    {
     "data": {
@@ -348,7 +394,7 @@
  },
  {
   "cell_type": "markdown",
-   "id": "7c480a9e",
+   "id": "e78fcd57",
   "metadata": {},
   "source": [
    "## 5. Sample grids - epoch 100\n",
@@ -364,9 +410,16 @@
  },
  {
   "cell_type": "code",
-   "execution_count": 14,
-   "id": "7cba06c7",
-   "metadata": {},
+   "execution_count": 6,
+   "id": "1d426097",
+   "metadata": {
+    "execution": {
+     "iopub.execute_input": "2026-05-14T19:08:49.546183Z",
+     "iopub.status.busy": "2026-05-14T19:08:49.545170Z",
+     "iopub.status.idle": "2026-05-14T19:08:50.817249Z",
+     "shell.execute_reply": "2026-05-14T19:08:50.813674Z"
+    }
+   },
   "outputs": [
    {
     "data": {
@@ -392,17 +445,26 @@
  },
  {
   "cell_type": "markdown",
-   "id": "e984c1c7",
+   "id": "754f83e7",
   "metadata": {},
   "source": [
-    "## 6. Progression - epoch 10 -> 50 -> 100"
+    "## 6. Progression - epoch 10 -> 50 -> 100\n",
+    "\n",
+    "These panels connect time to visual quality. For the collapsed runs, the gray grids are still information: they show that more epochs did not fix the recipe. For the stabilized run, the same timeline shows recognizable faces emerging."
   ]
  },
  {
   "cell_type": "code",
-   "execution_count": 15,
-   "id": "bf39e6ec",
-   "metadata": {},
+   "execution_count": 7,
+   "id": "77d5b876",
+   "metadata": {
+    "execution": {
+     "iopub.execute_input": "2026-05-14T19:08:50.842106Z",
+     "iopub.status.busy": "2026-05-14T19:08:50.842106Z",
+     "iopub.status.idle": "2026-05-14T19:08:54.153113Z",
+     "shell.execute_reply": "2026-05-14T19:08:54.151583Z"
+    }
+   },
   "outputs": [
    {
     "data": {
@@ -463,17 +525,26 @@
  },
  {
   "cell_type": "markdown",
-   "id": "3800b23b",
+   "id": "fdb18bce",
   "metadata": {},
   "source": [
-    "## 7. Pairwise comparison - what each step bought us"
+    "## 7. Pairwise comparison - what each step bought us\n",
+    "\n",
+    "Each pair isolates one decision. The purpose is to avoid saying simply that the final GAN is better: the comparison shows that Wasserstein loss alone is insufficient, the stability package is decisive, and 128x128 is premature under the saved compute budget."
   ]
  },
  {
   "cell_type": "code",
-   "execution_count": 16,
-   "id": "a113d215",
-   "metadata": {},
+   "execution_count": 8,
+   "id": "4aed1b79",
+   "metadata": {
+    "execution": {
+     "iopub.execute_input": "2026-05-14T19:08:54.188809Z",
+     "iopub.status.busy": "2026-05-14T19:08:54.186606Z",
+     "iopub.status.idle": "2026-05-14T19:08:56.310301Z",
+     "shell.execute_reply": "2026-05-14T19:08:56.307480Z"
+    }
+   },
   "outputs": [
    {
     "data": {
@@ -527,6 +598,7 @@
  },
  {
   "cell_type": "markdown",
+   "id": "95fd4f90",
   "metadata": {},
   "source": [
    "## 8. What this phase proves\n",
@@ -555,11 +627,21 @@
 ],
 "metadata": {
  "kernelspec": {
-   "display_name": "Python 3",
+   "display_name": ".venv",
+   "language": "python",
   "name": "python3"
  },
  "language_info": {
-   "name": "python"
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.12.10"
  }
 },
 "nbformat": 4,
@@ -2,9 +2,10 @@
 "cells": [
  {
   "cell_type": "markdown",
+   "id": "2d43e849",
   "metadata": {},
   "source": [
-    "# Phase 4 - DDPM Progression\n",
+    "# 05 - DDPM Recipe Progression\n",
    "\n",
    "Phase 4 applies the same report logic to diffusion models. The pipeline is\n",
    "already fixed, so this notebook isolates the DDPM recipe: schedule, prediction\n",
@@ -44,6 +45,7 @@
  },
  {
   "cell_type": "markdown",
+   "id": "fe4b4147",
   "metadata": {},
   "source": [
    "### Reference: Phase 0 baseline from the same family\n",
@@ -57,7 +59,15 @@
  {
   "cell_type": "code",
   "execution_count": 1,
-   "metadata": {},
+   "id": "ad73d7ca",
+   "metadata": {
+    "execution": {
+     "iopub.execute_input": "2026-05-14T19:09:11.615922Z",
+     "iopub.status.busy": "2026-05-14T19:09:11.615922Z",
+     "iopub.status.idle": "2026-05-14T19:09:13.772612Z",
+     "shell.execute_reply": "2026-05-14T19:09:13.769968Z"
+    }
+   },
   "outputs": [],
   "source": [
    "import json\n",
@@ -116,6 +126,7 @@
  },
  {
   "cell_type": "markdown",
+   "id": "2fd648bc",
   "metadata": {},
   "source": [
    "## 1. Load experiment logs\n",
@@ -126,7 +137,15 @@
  {
   "cell_type": "code",
   "execution_count": 2,
-   "metadata": {},
+   "id": "5abd8b09",
+   "metadata": {
+    "execution": {
+     "iopub.execute_input": "2026-05-14T19:09:13.780145Z",
+     "iopub.status.busy": "2026-05-14T19:09:13.779153Z",
+     "iopub.status.idle": "2026-05-14T19:09:13.800005Z",
+     "shell.execute_reply": "2026-05-14T19:09:13.797633Z"
+    }
+   },
   "outputs": [
    {
     "name": "stdout",
@@ -154,6 +173,7 @@
  },
  {
   "cell_type": "markdown",
+   "id": "4d563bd4",
   "metadata": {},
   "source": [
    "## 2. FID comparison table\n",
@@ -164,72 +184,80 @@
  {
   "cell_type": "code",
   "execution_count": 3,
-   "metadata": {},
+   "id": "bf039a10",
+   "metadata": {
+    "execution": {
+     "iopub.execute_input": "2026-05-14T19:09:13.808455Z",
+     "iopub.status.busy": "2026-05-14T19:09:13.807458Z",
+     "iopub.status.idle": "2026-05-14T19:09:14.118770Z",
+     "shell.execute_reply": "2026-05-14T19:09:14.115475Z"
+    }
+   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<style type=\"text/css\">\n",
       "</style>\n",
-       "<table id=\"T_7ea0f\">\n",
+       "<table id=\"T_2697a\">\n",
       "  <thead>\n",
       "    <tr>\n",
       "      <th class=\"blank level0\" >&nbsp;</th>\n",
-       "      <th id=\"T_7ea0f_level0_col0\" class=\"col_heading level0 col0\" >Run</th>\n",
-       "      <th id=\"T_7ea0f_level0_col1\" class=\"col_heading level0 col1\" >FID@25</th>\n",
-       "      <th id=\"T_7ea0f_level0_col2\" class=\"col_heading level0 col2\" >FID@50</th>\n",
-       "      <th id=\"T_7ea0f_level0_col3\" class=\"col_heading level0 col3\" >FID@100</th>\n",
-       "      <th id=\"T_7ea0f_level0_col4\" class=\"col_heading level0 col4\" >Best FID</th>\n",
-       "      <th id=\"T_7ea0f_level0_col5\" class=\"col_heading level0 col5\" >Loss@100</th>\n",
-       "      <th id=\"T_7ea0f_level0_col6\" class=\"col_heading level0 col6\" >Train (min)</th>\n",
+       "      <th id=\"T_2697a_level0_col0\" class=\"col_heading level0 col0\" >Run</th>\n",
+       "      <th id=\"T_2697a_level0_col1\" class=\"col_heading level0 col1\" >FID@25</th>\n",
+       "      <th id=\"T_2697a_level0_col2\" class=\"col_heading level0 col2\" >FID@50</th>\n",
+       "      <th id=\"T_2697a_level0_col3\" class=\"col_heading level0 col3\" >FID@100</th>\n",
+       "      <th id=\"T_2697a_level0_col4\" class=\"col_heading level0 col4\" >Best FID</th>\n",
+       "      <th id=\"T_2697a_level0_col5\" class=\"col_heading level0 col5\" >Loss@100</th>\n",
+       "      <th id=\"T_2697a_level0_col6\" class=\"col_heading level0 col6\" >Train (min)</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
-       "      <th id=\"T_7ea0f_level0_row0\" class=\"row_heading level0 row0\" >3</th>\n",
-       "      <td id=\"T_7ea0f_row0_col0\" class=\"data row0 col0\" >4.4 wider net</td>\n",
-       "      <td id=\"T_7ea0f_row0_col1\" class=\"data row0 col1\" >325.0</td>\n",
-       "      <td id=\"T_7ea0f_row0_col2\" class=\"data row0 col2\" >170.9</td>\n",
-       "      <td id=\"T_7ea0f_row0_col3\" class=\"data row0 col3\" >30.0</td>\n",
-       "      <td id=\"T_7ea0f_row0_col4\" class=\"data row0 col4\" >30.0</td>\n",
-       "      <td id=\"T_7ea0f_row0_col5\" class=\"data row0 col5\" >0.0582</td>\n",
-       "      <td id=\"T_7ea0f_row0_col6\" class=\"data row0 col6\" >536.4</td>\n",
+       "      <th id=\"T_2697a_level0_row0\" class=\"row_heading level0 row0\" >3</th>\n",
+       "      <td id=\"T_2697a_row0_col0\" class=\"data row0 col0\" >4.4 wider net</td>\n",
+       "      <td id=\"T_2697a_row0_col1\" class=\"data row0 col1\" >325.0</td>\n",
+       "      <td id=\"T_2697a_row0_col2\" class=\"data row0 col2\" >170.9</td>\n",
+       "      <td id=\"T_2697a_row0_col3\" class=\"data row0 col3\" >30.0</td>\n",
+       "      <td id=\"T_2697a_row0_col4\" class=\"data row0 col4\" >30.0</td>\n",
+       "      <td id=\"T_2697a_row0_col5\" class=\"data row0 col5\" >0.0582</td>\n",
+       "      <td id=\"T_2697a_row0_col6\" class=\"data row0 col6\" >536.4</td>\n",
       "    </tr>\n",
       "    <tr>\n",
-       "      <th id=\"T_7ea0f_level0_row1\" class=\"row_heading level0 row1\" >2</th>\n",
-       "      <td id=\"T_7ea0f_row1_col0\" class=\"data row1 col0\" >4.3 cosine / v</td>\n",
-       "      <td id=\"T_7ea0f_row1_col1\" class=\"data row1 col1\" >315.9</td>\n",
-       "      <td id=\"T_7ea0f_row1_col2\" class=\"data row1 col2\" >160.7</td>\n",
-       "      <td id=\"T_7ea0f_row1_col3\" class=\"data row1 col3\" >34.5</td>\n",
-       "      <td id=\"T_7ea0f_row1_col4\" class=\"data row1 col4\" >34.5</td>\n",
-       "      <td id=\"T_7ea0f_row1_col5\" class=\"data row1 col5\" >0.0594</td>\n",
-       "      <td id=\"T_7ea0f_row1_col6\" class=\"data row1 col6\" >278.8</td>\n",
+       "      <th id=\"T_2697a_level0_row1\" class=\"row_heading level0 row1\" >2</th>\n",
+       "      <td id=\"T_2697a_row1_col0\" class=\"data row1 col0\" >4.3 cosine / v</td>\n",
+       "      <td id=\"T_2697a_row1_col1\" class=\"data row1 col1\" >315.9</td>\n",
+       "      <td id=\"T_2697a_row1_col2\" class=\"data row1 col2\" >160.7</td>\n",
+       "      <td id=\"T_2697a_row1_col3\" class=\"data row1 col3\" >34.5</td>\n",
+       "      <td id=\"T_2697a_row1_col4\" class=\"data row1 col4\" >34.5</td>\n",
+       "      <td id=\"T_2697a_row1_col5\" class=\"data row1 col5\" >0.0594</td>\n",
+       "      <td id=\"T_2697a_row1_col6\" class=\"data row1 col6\" >278.8</td>\n",
       "    </tr>\n",
       "    <tr>\n",
-       "      <th id=\"T_7ea0f_level0_row2\" class=\"row_heading level0 row2\" >1</th>\n",
-       "      <td id=\"T_7ea0f_row2_col0\" class=\"data row2 col0\" >4.2 cosine / epsilon</td>\n",
-       "      <td id=\"T_7ea0f_row2_col1\" class=\"data row2 col1\" >249.7</td>\n",
-       "      <td id=\"T_7ea0f_row2_col2\" class=\"data row2 col2\" >282.2</td>\n",
-       "      <td id=\"T_7ea0f_row2_col3\" class=\"data row2 col3\" >132.3</td>\n",
-       "      <td id=\"T_7ea0f_row2_col4\" class=\"data row2 col4\" >132.3</td>\n",
-       "      <td id=\"T_7ea0f_row2_col5\" class=\"data row2 col5\" >0.0285</td>\n",
-       "      <td id=\"T_7ea0f_row2_col6\" class=\"data row2 col6\" >258.8</td>\n",
+       "      <th id=\"T_2697a_level0_row2\" class=\"row_heading level0 row2\" >1</th>\n",
+       "      <td id=\"T_2697a_row2_col0\" class=\"data row2 col0\" >4.2 cosine / epsilon</td>\n",
+       "      <td id=\"T_2697a_row2_col1\" class=\"data row2 col1\" >249.7</td>\n",
+       "      <td id=\"T_2697a_row2_col2\" class=\"data row2 col2\" >282.2</td>\n",
+       "      <td id=\"T_2697a_row2_col3\" class=\"data row2 col3\" >132.3</td>\n",
+       "      <td id=\"T_2697a_row2_col4\" class=\"data row2 col4\" >132.3</td>\n",
+       "      <td id=\"T_2697a_row2_col5\" class=\"data row2 col5\" >0.0285</td>\n",
+       "      <td id=\"T_2697a_row2_col6\" class=\"data row2 col6\" >258.8</td>\n",
       "    </tr>\n",
       "    <tr>\n",
-       "      <th id=\"T_7ea0f_level0_row3\" class=\"row_heading level0 row3\" >0</th>\n",
-       "      <td id=\"T_7ea0f_row3_col0\" class=\"data row3 col0\" >4.1 linear / epsilon</td>\n",
-       "      <td id=\"T_7ea0f_row3_col1\" class=\"data row3 col1\" >333.3</td>\n",
-       "      <td id=\"T_7ea0f_row3_col2\" class=\"data row3 col2\" >311.4</td>\n",
-       "      <td id=\"T_7ea0f_row3_col3\" class=\"data row3 col3\" >134.5</td>\n",
-       "      <td id=\"T_7ea0f_row3_col4\" class=\"data row3 col4\" >134.5</td>\n",
-       "      <td id=\"T_7ea0f_row3_col5\" class=\"data row3 col5\" >0.0150</td>\n",
-       "      <td id=\"T_7ea0f_row3_col6\" class=\"data row3 col6\" >259.9</td>\n",
+       "      <th id=\"T_2697a_level0_row3\" class=\"row_heading level0 row3\" >0</th>\n",
+       "      <td id=\"T_2697a_row3_col0\" class=\"data row3 col0\" >4.1 linear / epsilon</td>\n",
+       "      <td id=\"T_2697a_row3_col1\" class=\"data row3 col1\" >333.3</td>\n",
+       "      <td id=\"T_2697a_row3_col2\" class=\"data row3 col2\" >311.4</td>\n",
+       "      <td id=\"T_2697a_row3_col3\" class=\"data row3 col3\" >134.5</td>\n",
+       "      <td id=\"T_2697a_row3_col4\" class=\"data row3 col4\" >134.5</td>\n",
+       "      <td id=\"T_2697a_row3_col5\" class=\"data row3 col5\" >0.0150</td>\n",
+       "      <td id=\"T_2697a_row3_col6\" class=\"data row3 col6\" >259.9</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n"
      ],
      "text/plain": [
-       "<pandas.io.formats.style.Styler at 0x25ae55fd8b0>"
+       "<pandas.io.formats.style.Styler at 0x204a4dc4950>"
      ]
     },
     "execution_count": 3,
@@ -258,15 +286,26 @@
  },
  {
   "cell_type": "markdown",
+   "id": "f914b6a3",
   "metadata": {},
   "source": [
-    "## 3. FID curves - progression"
+    "## 3. FID curves - progression\n",
+    "\n",
+    "The curves make the DDPM recipe evolution visible. The main evidence is not only that the wider final model wins, but that the big jump happens when the prediction target changes to v-prediction."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
-   "metadata": {},
+   "id": "f4473091",
+   "metadata": {
+    "execution": {
+     "iopub.execute_input": "2026-05-14T19:09:14.126299Z",
+     "iopub.status.busy": "2026-05-14T19:09:14.126299Z",
+     "iopub.status.idle": "2026-05-14T19:09:14.871321Z",
+     "shell.execute_reply": "2026-05-14T19:09:14.868302Z"
+    }
+   },
   "outputs": [
    {
     "data": {
@@ -294,6 +333,7 @@
  },
  {
   "cell_type": "markdown",
+   "id": "ebf477f6",
   "metadata": {},
   "source": [
    "## 4. Training loss\n",
@@ -304,7 +344,15 @@
  {
   "cell_type": "code",
   "execution_count": 5,
-   "metadata": {},
+   "id": "7aa4954f",
+   "metadata": {
+    "execution": {
+     "iopub.execute_input": "2026-05-14T19:09:14.879854Z",
+     "iopub.status.busy": "2026-05-14T19:09:14.878851Z",
+     "iopub.status.idle": "2026-05-14T19:09:15.438805Z",
+     "shell.execute_reply": "2026-05-14T19:09:15.437273Z"
+    }
+   },
   "outputs": [
    {
     "data": {
@@ -332,15 +380,26 @@
  },
  {
   "cell_type": "markdown",
+   "id": "583bc733",
   "metadata": {},
   "source": [
-    "## 5. Sample grids - epoch 100"
+    "## 5. Sample grids - epoch 100\n",
+    "\n",
+    "These grids show the qualitative side of the FID drop. They should be read as independent samples from each checkpoint, with attention to global face structure, texture noise, and artifact frequency."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
-   "metadata": {},
+   "id": "db2f8780",
+   "metadata": {
+    "execution": {
+     "iopub.execute_input": "2026-05-14T19:09:15.444735Z",
+     "iopub.status.busy": "2026-05-14T19:09:15.444735Z",
+     "iopub.status.idle": "2026-05-14T19:09:17.113892Z",
+     "shell.execute_reply": "2026-05-14T19:09:17.109886Z"
+    }
+   },
   "outputs": [
    {
     "data": {
@@ -367,6 +426,7 @@
  },
  {
   "cell_type": "markdown",
+   "id": "62183fef",
   "metadata": {},
   "source": [
    "## 6. Progression - epoch 10 -> 50 -> 100\n",
@@ -377,7 +437,15 @@
  {
   "cell_type": "code",
   "execution_count": 7,
-   "metadata": {},
+   "id": "8fc44956",
+   "metadata": {
+    "execution": {
+     "iopub.execute_input": "2026-05-14T19:09:17.166548Z",
+     "iopub.status.busy": "2026-05-14T19:09:17.165543Z",
+     "iopub.status.idle": "2026-05-14T19:09:20.642400Z",
+     "shell.execute_reply": "2026-05-14T19:09:20.638875Z"
+    }
+   },
   "outputs": [
    {
     "data": {
@@ -438,6 +506,7 @@
  },
  {
   "cell_type": "markdown",
+   "id": "1e4e025a",
   "metadata": {},
   "source": [
    "## 7. Noise schedule visualization\n",
@@ -448,7 +517,15 @@
  {
   "cell_type": "code",
   "execution_count": 8,
-   "metadata": {},
+   "id": "ad0a9460",
+   "metadata": {
+    "execution": {
+     "iopub.execute_input": "2026-05-14T19:09:20.684626Z",
+     "iopub.status.busy": "2026-05-14T19:09:20.683477Z",
+     "iopub.status.idle": "2026-05-14T19:09:21.615882Z",
+     "shell.execute_reply": "2026-05-14T19:09:21.613250Z"
+    }
+   },
   "outputs": [
    {
     "data": {
@@ -485,6 +562,7 @@
  },
  {
   "cell_type": "markdown",
+   "id": "ae1b0cff",
   "metadata": {},
   "source": [
    "## 8. What this phase proves\n",
@@ -2,9 +2,10 @@
 "cells": [
  {
   "cell_type": "markdown",
+   "id": "c493f3b6",
   "metadata": {},
   "source": [
-    "# Phase 6 - Final Selected Samples\n",
+    "# 07 - Final Sample Showcase\n",
    "\n",
    "This final notebook is a small showcase chapter. Phase 5 compared the model\n",
    "families quantitatively; this notebook selects the three strongest individual\n",
@@ -28,7 +29,15 @@
  {
   "cell_type": "code",
   "execution_count": 1,
-   "metadata": {},
+   "id": "de83c749",
+   "metadata": {
+    "execution": {
+     "iopub.execute_input": "2026-05-14T19:09:57.835552Z",
+     "iopub.status.busy": "2026-05-14T19:09:57.834037Z",
+     "iopub.status.idle": "2026-05-14T19:10:07.120969Z",
+     "shell.execute_reply": "2026-05-14T19:10:07.118954Z"
+    }
+   },
   "outputs": [],
   "source": [
    "import json\n",
@@ -101,6 +110,7 @@
  },
  {
   "cell_type": "markdown",
+   "id": "c1549235",
   "metadata": {},
   "source": [
    "## 1. Candidate pool\n",
@@ -112,7 +122,15 @@
  {
   "cell_type": "code",
   "execution_count": 2,
-   "metadata": {},
+   "id": "3d6920c0",
+   "metadata": {
+    "execution": {
+     "iopub.execute_input": "2026-05-14T19:10:07.129524Z",
+     "iopub.status.busy": "2026-05-14T19:10:07.128523Z",
+     "iopub.status.idle": "2026-05-14T19:10:07.186457Z",
+     "shell.execute_reply": "2026-05-14T19:10:07.184404Z"
+    }
+   },
   "outputs": [
    {
     "data": {
@@ -204,6 +222,7 @@
  },
  {
   "cell_type": "markdown",
+   "id": "ddf6948b",
   "metadata": {},
   "source": [
    "## 2. Selection method\n",
@@ -222,7 +241,15 @@
  {
   "cell_type": "code",
   "execution_count": 3,
-   "metadata": {},
+   "id": "5b9c7533",
+   "metadata": {
+    "execution": {
+     "iopub.execute_input": "2026-05-14T19:10:07.195450Z",
+     "iopub.status.busy": "2026-05-14T19:10:07.194451Z",
+     "iopub.status.idle": "2026-05-14T19:10:12.841338Z",
+     "shell.execute_reply": "2026-05-14T19:10:12.837814Z"
+    }
+   },
   "outputs": [
    {
     "data": {
@@ -361,6 +388,7 @@
  },
  {
   "cell_type": "markdown",
+   "id": "765d5532",
   "metadata": {},
   "source": [
    "## 3. Top three per architecture\n",
@@ -372,7 +400,15 @@
  {
   "cell_type": "code",
   "execution_count": 4,
-   "metadata": {},
+   "id": "91930ee8",
+   "metadata": {
+    "execution": {
+     "iopub.execute_input": "2026-05-14T19:10:12.848661Z",
+     "iopub.status.busy": "2026-05-14T19:10:12.846661Z",
+     "iopub.status.idle": "2026-05-14T19:10:13.415473Z",
+     "shell.execute_reply": "2026-05-14T19:10:13.412445Z"
+    }
+   },
   "outputs": [
    {
     "data": {
@@ -577,6 +613,7 @@
  },
  {
   "cell_type": "markdown",
+   "id": "ff25482c",
   "metadata": {},
   "source": [
    "## 4. Final selected images\n",
@@ -587,7 +624,15 @@
  {
   "cell_type": "code",
   "execution_count": 5,
-   "metadata": {},
+   "id": "57b87ab7",
+   "metadata": {
+    "execution": {
+     "iopub.execute_input": "2026-05-14T19:10:13.422889Z",
+     "iopub.status.busy": "2026-05-14T19:10:13.420888Z",
+     "iopub.status.idle": "2026-05-14T19:10:14.341719Z",
+     "shell.execute_reply": "2026-05-14T19:10:14.339886Z"
+    }
+   },
   "outputs": [
    {
     "data": {
@@ -611,6 +656,7 @@
  },
  {
   "cell_type": "markdown",
+   "id": "8cd4f5cd",
   "metadata": {},
   "source": [
    "## 5. Report conclusion\n",
@@ -9,6 +9,7 @@ story structure: goal, what changed, evidence, decision, and conclusion.

 Real metric values are pulled from outputs/logs/*.json at build time and
 rendered into markdown headers and conclusions, so reports never drift from data.
+The generated filenames are numbered to make the intended reading order clear.
 """
 import json
 from pathlib import Path
@@ -17,6 +18,18 @@ ROOT = Path(__file__).resolve().parents[1]
 LOGS = ROOT / "outputs" / "logs"
 OUT  = ROOT / "notebooks"

+NOTEBOOK_SEQUENCE = {
+    "phase0": "01_baseline_sanity_check",
+    "phase1": "02_pipeline_selection",
+    "phase2": "03_gan_stability_progression",
+    "phase3": "04_vae_loss_progression",
+    "phase4": "05_ddpm_recipe_progression",
+    "phase5": "06_final_family_comparison",
+    "phase6": "07_final_sample_showcase",
+}
+
+OLD_GENERATED_PATTERNS = ["phase*.ipynb"]
+

 # notebook helpers
 def md(text):     return {"cell_type": "markdown", "metadata": {}, "source": text.splitlines(keepends=True)}
@@ -36,6 +49,12 @@ def write_nb(name, cells):
    path.write_text(json.dumps(nb, indent=1))
    print(f"  wrote {path.relative_to(ROOT)}")

+def remove_old_generated_notebooks():
+    for pattern in OLD_GENERATED_PATTERNS:
+        for path in OUT.glob(pattern):
+            path.unlink()
+            print(f"  removed {path.relative_to(ROOT)}")
+

 # log-derived facts, computed once and baked into markdown
 def load(name):
@@ -120,7 +139,7 @@ def build_phase0():
    p0 = {n: load(n) for n in ["p0_wgan", "p0_vae", "p0_ddpm", "p0_ddpm_small"]}
    cells = [
        md(f"""\
-# Phase 0 - Baseline Sanity Check
+# 01 - Baseline Sanity Check

 Phase 0 is the starting point of the generator story. It uses the raw, un-aligned
 images and very plain versions of each model family so we can confirm that the
@@ -146,7 +165,7 @@ FID was not logged in Phase 0. The evidence here is loss behavior plus saved
 sample grids.
 """),
        code(SHARED_IMPORTS),
-        md("## 1. Training loss curves\n\nThese curves check that the loops ran and produced stable logs. They are not enough to prove visual quality."),
+        md("## 1. Training loss curves\n\nThese curves check that the loops ran and produced stable logs. They are not enough to prove visual quality, but they are needed before interpreting samples: a broken optimization loop would make every later visual comparison meaningless.\n\n**What to look for:** the curves should move smoothly enough to show that each family is learning something. The limitation is that loss scale differs by family, so the curves compare stability, not final image quality."),
        code("""\
 runs = {n: load_log(n) for n in ["p0_wgan", "p0_vae", "p0_ddpm", "p0_ddpm_small"]}
 runs = {k: v for k, v in runs.items() if v}
@@ -181,7 +200,7 @@ axes[2].set_xlabel("Epoch"); axes[2].legend()

 plt.tight_layout(); plt.show()
 """),
-        md("## 2. Final sample grids\n\nThe final previews show the practical failure mode of the raw pipeline: the samples have some face-like structure, but identity, alignment, and detail are not under control. These PNGs are displayed exactly as saved, so older Phase 0 matrices keep their original layout instead of being forced into 4x4."),
+        md("## 2. Final sample grids\n\nThe final previews show the practical failure mode of the raw pipeline: the samples have some face-like structure, but identity, alignment, and detail are not under control. These PNGs are displayed exactly as saved, so older Phase 0 matrices keep their original layout instead of being forced into 4x4.\n\n**Why this matters:** this is the visual evidence that the first bottleneck is not only the model family. The data still contains too much pose, scale, and background variation for tiny baseline recipes."),
        code("""\
 last_epochs = {"p0_wgan": 200, "p0_vae": 100, "p0_ddpm": 200, "p0_ddpm_small": 100}

@@ -196,7 +215,7 @@ for ax, (name, ep) in zip(axes, last_epochs.items()):
    ax.axis("off")
 plt.tight_layout(); plt.show()
 """),
-        md("## 3. Progression - early vs late\n\nThe progression grids make the baseline failure visible over time. Later samples improve slightly, but the raw input distribution keeps the task too broad. The saved matrices are shown in their original layout."),
+        md("## 3. Progression - early vs late\n\nThe progression grids make the baseline failure visible over time. Later samples improve slightly, but the raw input distribution keeps the task too broad. The saved matrices are shown in their original layout.\n\n**How to read it:** if more epochs only turn noise into rough face-like blobs, the next decision should be pipeline cleanup rather than simply training the same recipe longer."),
        code("""\
 checkpoints = {
    "p0_wgan":       [50, 100, 200],
@@ -231,7 +250,7 @@ locks the pipeline before the project spends more compute on stronger recipes.
 establishes the baseline failure and motivates the move to aligned face crops.
 """),
    ]
-    write_nb("phase0_analysis", cells)
+    write_nb(NOTEBOOK_SEQUENCE["phase0"], cells)


 # PHASE 1 - Pipeline ablations with a DCGAN proxy
@@ -246,7 +265,7 @@ def build_phase1():

    cells = [
        md(f"""\
-# Phase 1 - Pipeline Selection
+# 02 - Pipeline Selection

 Phase 1 answers the data-handling question left open by the baseline. Instead
 of changing the model family, it uses a cheap DCGAN proxy and varies one
@@ -279,7 +298,7 @@ without any pipeline tuning, and it also collapsed. Phase 1 below uses the same
 with the data pipeline systematically varied; the architecture limitation is constant.
 """),
        code(SHARED_IMPORTS),
-        md("## 1. Load all experiment logs\n\nAll evidence in this notebook comes from the existing Phase 1 logs and sample folders."),
+        md("## 1. Load all experiment logs\n\nAll evidence in this notebook comes from the existing Phase 1 logs and sample folders. The cell is intentionally simple: it only inventories already saved experiments so the reader knows which pipeline ablations are being compared."),
        code("""\
 run_names = sorted(p.stem for p in LOGS.glob("p1*.json"))
 runs = {name: load_log(name) for name in run_names}
@@ -300,7 +319,7 @@ experiment_groups = {
                              "p1d_dcgan_combined": "Aligned + raw mixed"},
 }
 """),
-        md("## 2. FID comparison table\n\nThe table ranks the proxy runs. The values are useful within Phase 1, but they should not be compared directly with later FID protocols."),
+        md("## 2. FID comparison table\n\nThe table ranks the proxy runs. It is needed because the visual samples alone can be misleading: a run can look slightly better in one grid while still being worse across the saved distribution. The values are useful within Phase 1, but they should not be compared directly with later FID protocols."),
        code("""\
 rows = []
 for name in run_names:
@@ -332,7 +351,7 @@ ax.set_title("Phase 1 - FID across all pipeline ablations")
 ax.set_xticks(x); ax.set_xticklabels(labels, rotation=30, ha="right")
 ax.legend(); plt.tight_layout(); plt.show()
 """),
-        md("## 3. Controlled ablation results\n\nEach subplot holds the model approximately fixed and changes one pipeline factor. This is the decision evidence for the rest of the generator suite."),
+        md("## 3. Controlled ablation results\n\nEach subplot holds the model approximately fixed and changes one pipeline factor. This is the decision evidence for the rest of the generator suite: alignment, resolution, augmentation, and dataset mixing are treated as pipeline choices, not as disconnected experiments.\n\n**What to look for:** a useful pipeline change should lower FID consistently inside its ablation group, not only produce one nicer-looking example."),
        code("""\
 fig, axes = plt.subplots(2, 2, figsize=(14, 10))
 axes = axes.flatten()
@@ -356,6 +375,8 @@ plt.tight_layout(); plt.show()
 ## 4. Data pipeline visualization

 What each ablation actually changes, shown on the input data the model sees.
+These figures are not model outputs. They explain the input distribution that
+each model has to learn, which is why they sit next to the ablation results.
 """),
        code("""\
 import random
@@ -399,7 +420,7 @@ def show_unavailable(ax, message):
    ax.text(0.5, 0.5, message, ha="center", va="center", wrap=True, transform=ax.transAxes)
    ax.axis("off")
 """),
-        md("### 4A - Resolution\n\nSame raw image at 64x64 and 128x128. This is a paired comparison layout, so it keeps the original 2x4 format instead of being forced into a 4x4 sample grid."),
+        md("### 4A - Resolution\n\nSame raw image at 64x64 and 128x128. This is a paired comparison layout, so it keeps the original 2x4 format instead of being forced into a 4x4 sample grid.\n\n**Interpretation:** 128x128 carries more detail, but it also makes the proxy generator learn a harder distribution. The later decision favors 64x64 because stable face structure matters more than nominal resolution at this budget."),
        code("""\
 paths = sample_paths(RAW, k=4)
 fig, axes = plt.subplots(2, 4, figsize=(12, 6))
@@ -414,7 +435,7 @@ else:
 fig.suptitle("1A - Resolution: same image at two scales", fontsize=12, fontweight="bold")
 plt.tight_layout(); plt.show()
 """),
-        md("### 4B - Alignment\n\nRaw vs MTCNN-aligned 64x64 crops. This paired layout keeps the original 2x4 format so each raw image is directly above its aligned crop."),
+        md("### 4B - Alignment\n\nRaw vs MTCNN-aligned 64x64 crops. This paired layout keeps the original 2x4 format so each raw image is directly above its aligned crop.\n\n**Interpretation:** alignment removes background and scale variation before the generator spends capacity on it. This is why alignment becomes the strongest pipeline lever."),
        code("""\
 pairs = matched_pairs(k=4)
 fig, axes = plt.subplots(2, 4, figsize=(12, 6))
@@ -429,7 +450,7 @@ else:
 fig.suptitle("1B - Alignment: same source image, raw vs MTCNN-aligned", fontsize=12, fontweight="bold")
 plt.tight_layout(); plt.show()
 """),
-        md("### 4C - Augmentation\n\nOne aligned image, then deterministic examples of the saved augmentation idea. This keeps the original compact strip because the point is to compare transforms on one image, not to make a generated 4x4 sample matrix."),
+        md("### 4C - Augmentation\n\nOne aligned image, then deterministic examples of the saved augmentation idea. This keeps the original compact strip because the point is to compare transforms on one image, not to make a generated 4x4 sample matrix.\n\n**Interpretation:** augmentation can make the training distribution broader, but it can also blur already scarce structure. Phase 1 treats it as a pipeline setting to justify, not as an automatic improvement."),
        code("""\
 src = sample_paths(ALIGNED, k=1)
 if src:
@@ -454,7 +475,7 @@ else:
    fig.suptitle("1C - Augmentation", fontsize=12, fontweight="bold")
    plt.tight_layout(); plt.show()
 """),
-        md("### 4D - Dataset mixing\n\nMixing raw and aligned images asks one generator to model two different input distributions. This keeps the original paired 2x4 layout so the contrast is easy to read."),
+        md("### 4D - Dataset mixing\n\nMixing raw and aligned images asks one generator to model two different input distributions. This keeps the original paired 2x4 layout so the contrast is easy to read.\n\n**Interpretation:** mixing increases nuisance variation and makes the generator solve two problems at once. The later phases therefore inherit aligned-only data."),
        code("""\
 pairs = matched_pairs(k=4)
 fig, axes = plt.subplots(2, 4, figsize=(12, 6))
@@ -492,7 +513,7 @@ decision. Alignment is the main fix; Phase 2 can now focus on the GAN recipe
 instead of fighting raw-image variance.
 """),
    ]
-    write_nb("phase1_analysis", cells)
+    write_nb(NOTEBOOK_SEQUENCE["phase1"], cells)


 # PHASE 2 - GAN architecture and objective evolution
@@ -504,7 +525,7 @@ def build_phase2():

    cells = [
        md(f"""\
-# Phase 2 - GAN Progression
+# 03 - GAN Stability Progression

 Phase 2 keeps the Phase 1 pipeline fixed and changes the GAN recipe. This makes
 the question narrow: once the data is aligned, what model changes are needed to
@@ -591,7 +612,7 @@ df = pd.DataFrame(rows).sort_values("Best FID")
 df.style.format({"FID@25": "{:.1f}", "FID@50": "{:.1f}", "FID@100": "{:.1f}",
                 "Best FID": "{:.1f}", "Train (min)": "{:.1f}"})
 """),
-        md("## 3. FID curves - progression"),
+        md("## 3. FID curves - progression\n\nThis plot shows whether improvements happen gradually or as a step change. It is needed because the final FID table hides training dynamics: here the key story is that the 2.3 stability package changes the whole trajectory, while 2.1 and 2.2 remain collapsed."),
        code("""\
 fig, ax = plt.subplots(figsize=(10, 5))
 cmap = plt.cm.viridis
@@ -643,7 +664,7 @@ for ax, name in zip(axes, run_names):
    show_image_or_missing(ax, img_path, title)
 plt.tight_layout(); plt.show()
 """),
-        md("## 6. Progression - epoch 10 -> 50 -> 100"),
+        md("## 6. Progression - epoch 10 -> 50 -> 100\n\nThese panels connect time to visual quality. For the collapsed runs, the gray grids are still information: they show that more epochs did not fix the recipe. For the stabilized run, the same timeline shows recognizable faces emerging."),
        code("""\
 check_epochs = [10, 50, 100]
 for name in run_names:
@@ -659,7 +680,7 @@ for name in run_names:
    fig.suptitle(run_labels[name], fontsize=11, fontweight="bold")
    plt.tight_layout(); plt.show()
 """),
-        md("## 7. Pairwise comparison - what each step bought us"),
+        md("## 7. Pairwise comparison - what each step bought us\n\nEach pair isolates one decision. The purpose is to avoid saying simply that the final GAN is better: the comparison shows that Wasserstein loss alone is insufficient, the stability package is decisive, and 128x128 is premature under the saved compute budget."),
        code("""\
 transitions = [
    ("2.1 -> 2.2: BCE -> Wasserstein",      "p2_1_dcgan",         "p2_2_wgan"),
@@ -702,7 +723,7 @@ usable generator recipe, but it also shows that higher resolution is not helpful
 without enough training budget.
 """),
    ]
-    write_nb("phase2_analysis", cells)
+    write_nb(NOTEBOOK_SEQUENCE["phase2"], cells)


 # PHASE 3 - VAE composite-loss evolution
@@ -713,7 +734,7 @@ def build_phase3():

    cells = [
        md(f"""\
-# Phase 3 - VAE Progression
+# 04 - VAE Loss Progression

 Phase 3 studies the VAE family after the pipeline has been locked. The baseline
 VAE is fast and stable, but its MSE + KL objective tends to average away facial
@@ -775,7 +796,7 @@ df = pd.DataFrame(rows).sort_values("Best FID")
 df.style.format({"FID@50": "{:.1f}", "FID@100": "{:.1f}", "Best FID": "{:.1f}",
                 "Recon@100": "{:.4f}", "KL@100": "{:.2f}", "Train (min)": "{:.1f}"})
 """),
-        md("## 3. FID curves - progression"),
+        md("## 3. FID curves - progression\n\nThe curves show how each extra loss changes the generation trajectory, not just the final checkpoint. A useful VAE loss should improve prior-sample FID while preserving the stable behavior that makes VAEs attractive."),
        code("""\
 fig, ax = plt.subplots(figsize=(10, 5))
 cmap = plt.cm.plasma
@@ -805,7 +826,7 @@ axes[1].set_title("KL divergence");        axes[1].set_xlabel("Epoch"); axes[1].
 axes[2].set_title("Perceptual (VGG16)");   axes[2].set_xlabel("Epoch"); axes[2].legend(fontsize=8)
 plt.tight_layout(); plt.show()
 """),
-        md("## 5. Prior samples - epoch 100"),
+        md("## 5. Prior samples - epoch 100\n\nThese are generated samples from the latent prior, so they answer the true generator question: if we sample a random latent vector, do we get plausible faces? This is different from reconstruction quality."),
        code("""\
 fig, axes = plt.subplots(1, 3, figsize=(13, 4.5))
 for ax, name in zip(axes, run_names):
@@ -818,7 +839,7 @@ for ax, name in zip(axes, run_names):
 fig.suptitle("Prior samples (decoded from N(0, I))", fontsize=12, fontweight="bold")
 plt.tight_layout(); plt.show()
 """),
-        md("## 6. Reconstructions - epoch 100"),
+        md("## 6. Reconstructions - epoch 100\n\nReconstructions show whether the encoder-decoder still preserves real input structure. They are useful as a diagnostic, but they are not the final generation metric because reconstructing a known image is easier than sampling a new one."),
        code("""\
 fig, axes = plt.subplots(1, 3, figsize=(13, 4.5))
 for ax, name in zip(axes, run_names):
@@ -830,7 +851,7 @@ for ax, name in zip(axes, run_names):
 fig.suptitle("Reconstructions (real / decoded interleaved)", fontsize=12, fontweight="bold")
 plt.tight_layout(); plt.show()
 """),
-        md("## 7. Progression - epoch 10 -> 50 -> 100 (prior samples)"),
+        md("## 7. Progression - epoch 10 -> 50 -> 100 (prior samples)\n\nThe timeline shows how the sampled faces change as the loss stack trains. The limitation remains visible: the VAE becomes more structured and detailed, but it still tends toward smoother faces than GAN or DDPM samples."),
        code("""\
 check_epochs = [10, 50, 100]
 for name in run_names:
@@ -867,7 +888,7 @@ complementary losses, but even the selected recipe remains a speed-oriented
 family rather than the strongest quality candidate.
 """),
    ]
-    write_nb("phase3_analysis", cells)
+    write_nb(NOTEBOOK_SEQUENCE["phase3"], cells)


 # PHASE 4 - DDPM schedule, target, and width evolution
@@ -879,7 +900,7 @@ def build_phase4():

    cells = [
        md(f"""\
-# Phase 4 - DDPM Progression
+# 05 - DDPM Recipe Progression

 Phase 4 applies the same report logic to diffusion models. The pipeline is
 already fixed, so this notebook isolates the DDPM recipe: schedule, prediction
@@ -957,7 +978,7 @@ df = pd.DataFrame(rows).sort_values("Best FID")
 df.style.format({"FID@25": "{:.1f}", "FID@50": "{:.1f}", "FID@100": "{:.1f}",
                 "Best FID": "{:.1f}", "Loss@100": "{:.4f}", "Train (min)": "{:.1f}"})
 """),
-        md("## 3. FID curves - progression"),
+        md("## 3. FID curves - progression\n\nThe curves make the DDPM recipe evolution visible. The main evidence is not only that the wider final model wins, but that the big jump happens when the prediction target changes to v-prediction."),
        code("""\
 fig, ax = plt.subplots(figsize=(10, 5))
 cmap = plt.cm.cividis
@@ -983,7 +1004,7 @@ ax.set_xlabel("Epoch"); ax.set_ylabel("MSE on prediction target")
 ax.set_title("Loss (epsilon-MSE and v-MSE are not directly comparable)")
 ax.legend(); plt.tight_layout(); plt.show()
 """),
-        md("## 5. Sample grids - epoch 100"),
+        md("## 5. Sample grids - epoch 100\n\nThese grids show the qualitative side of the FID drop. They should be read as independent samples from each checkpoint, with attention to global face structure, texture noise, and artifact frequency."),
        code("""\
 fig, axes = plt.subplots(1, 4, figsize=(16, 4.5))
 for ax, name in zip(axes, run_names):
@@ -1054,7 +1075,7 @@ base_ch=192, and attention at 32/16/8 for the final comparison.
 into the strongest quality candidate for Phase 5.
 """),
    ]
-    write_nb("phase4_analysis", cells)
+    write_nb(NOTEBOOK_SEQUENCE["phase4"], cells)


 # PHASE 5 - Cross-family final comparison
@@ -1069,7 +1090,7 @@ def build_phase5():

    cells = [
        md(f"""\
-# Phase 5 - Final Comparison
+# 06 - Final Family Comparison

 Phase 5 is the project-level comparison. It loads the already trained best
 recipes from the GAN, VAE, and DDPM branches and compares their saved logs,
@@ -1136,7 +1157,7 @@ ax.set_title("Final comparison: quality vs training time")
 ax.grid(alpha=0.25)
 plt.tight_layout(); plt.show()
 """),
-        md("## 3. FID curves - all three families"),
+        md("## 3. FID curves - all three families\n\nThis plot puts the selected family recipes on one timeline. It is needed because the best final FID alone does not show convergence behavior: DDPM reaches the best quality, GAN remains close with less time, and VAE is fast but saturates at a higher FID."),
        code("""\
 fig, ax = plt.subplots(figsize=(10, 5))
 for fam, info in FAMILIES.items():
@@ -1222,6 +1243,11 @@ Smooth interpolation between two latent codes reveals whether the generator has
 learned a continuous manifold rather than a sparse memorisation. DDPM has no
 encoder, so this section is GAN/VAE only.

+The interpolation figures are not a ranking metric. They are included to make
+the learned representation easier to inspect: smooth transitions support the
+claim that the models learned a face manifold, while sudden jumps would suggest
+fragile or memorised structure.
+
 **Checkpoint loading note:** the cell below uses the same priority as
 `tools/sampling.py`: `final_ema` first, then `best_ema` as fallback. This avoids
 using a best-FID EMA snapshot that may have been saved very early for a
@@ -1340,14 +1366,14 @@ The final comparison supports DDPM as the best-quality generator and GAN as the
 best practical compromise.
 """),
    ]
-    write_nb("phase5_analysis", cells)
+    write_nb(NOTEBOOK_SEQUENCE["phase5"], cells)


 # PHASE 6 - Final selected sample showcase
 def build_phase6():
    cells = [
        md("""\
-# Phase 6 - Final Selected Samples
+# 07 - Final Sample Showcase

 This final notebook is a small showcase chapter. Phase 5 compared the model
 families quantitatively; this notebook selects the three strongest individual
@@ -1565,11 +1591,12 @@ large generated pool, so they should be used as final qualitative examples, not
 as a replacement for the full distribution-level metrics.
 """),
    ]
-    write_nb("phase6_final_showcase", cells)
+    write_nb(NOTEBOOK_SEQUENCE["phase6"], cells)


 if __name__ == "__main__":
    print("Building notebooks...")
+    remove_old_generated_notebooks()
    build_phase0()
    build_phase1()
    build_phase2()