Notebooks gerador com bom nome + README

This commit is contained in:
DiogoCosta18
2026-05-14 20:20:46 +01:00
parent f46320f81e
commit 3bff7eefb0
9 changed files with 1127 additions and 504 deletions
+148 -107
View File
@@ -1,125 +1,166 @@
# DRL_PROJ — DeepFake Detection # Deep Learning Face Project
Deep learning project for binary deepfake detection on the DeepFakeFace dataset. This repository contains a two-part deep learning project on the
DeepFakeFace (DFF) dataset:
## Project structure 1. **Classifier:** detect whether a face image is real or fake.
2. **Generator:** train generative models that produce new fake face images.
``` The project is written as an experimental report. The notebooks are the main
deliverable: they show the pipeline, the intermediate failures, the ablations,
the decisions, and the final models. Read them in order.
## Project Story
The work follows the same principle in both parts: start with a simple
baseline, inspect what fails, change one important factor at a time, and keep
the evidence tied to saved logs and saved artifacts.
For the **classifier**, the story moves from dataset understanding to
preprocessing, baseline models, controlled ablations, Grad-CAM inspection,
stronger model families, and data scaling. The final practical classifier is a
ResNet50-style pipeline using face crops, 224x224 inputs, ImageNet/default
normalization, and no stochastic augmentation at validation/test time.
For the **generator**, the story starts with raw baseline failures, then locks
the data pipeline before comparing three parallel model-family branches:
GAN, VAE, and DDPM. The final comparison keeps quality versus speed central:
DDPM gives the best saved FID and visual quality, GAN is the best
quality-speed compromise, and VAE is the fastest but smoothest option.
## How To Read The Project
Start with the classifier notebooks, then read the generator notebooks. The
generator has one linear setup stage followed by three parallel branches:
GAN, VAE, and DDPM. Those branches are numbered in reading order, but they are
conceptually parallel experiments after the pipeline is selected.
### Classifier Notebooks
Read these first:
1. `classifier/notebooks/01_eda.ipynb`
Dataset composition, real/fake source mapping, image statistics, and
shortcut risks.
2. `classifier/notebooks/02_preprocessing.ipynb`
Deterministic preprocessing, train-only augmentation, face crops, and
normalization.
3. `classifier/notebooks/03_phase1_analysis.ipynb`
SimpleCNN and ResNet18 controlled baselines.
4. `classifier/notebooks/04_phase2_analysis.ipynb`
Resolution, normalization, source holdouts, facecrop, and augmentation
ablations.
5. `classifier/notebooks/05_gradcam_analysis.ipynb`
Qualitative localization analysis across the classifier pipeline.
6. `classifier/notebooks/06_phase3_model_family_analysis.ipynb`
Stronger pretrained model families and the ResNet50 practical choice.
7. `classifier/notebooks/07_phase4_data_scaling_analysis.ipynb`
Data scaling for strong backbones and the final classifier decision.
### Generator Notebooks
Read these after the classifier:
1. `generator/notebooks/01_baseline_sanity_check.ipynb`
Raw baseline failures and why the data pipeline must be fixed first.
2. `generator/notebooks/02_pipeline_selection.ipynb`
Controlled pipeline ablations: resolution, alignment, augmentation, and
raw/aligned mixing.
3. `generator/notebooks/03_gan_stability_progression.ipynb`
GAN branch: DCGAN -> WGAN-GP -> spectral normalization + GroupNorm +
self-attention -> 128x128 check.
4. `generator/notebooks/04_vae_loss_progression.ipynb`
VAE branch: MSE + KL -> perceptual loss -> PatchGAN adversarial loss.
5. `generator/notebooks/05_ddpm_recipe_progression.ipynb`
DDPM branch: linear schedule -> cosine schedule -> v-prediction -> wider
backbone.
6. `generator/notebooks/06_final_family_comparison.ipynb`
Final comparison of the selected GAN, VAE, and DDPM recipes under saved
Phase 5 conditions.
7. `generator/notebooks/07_final_sample_showcase.ipynb`
Curated final sample examples from saved outputs. This is qualitative
showcase material, not a replacement for FID.
## What The Notebooks Do
The notebooks are analysis/report chapters. They load existing configs, logs,
figures, saved sample grids, checkpoints, and prediction summaries. They are
not intended to launch new training runs.
When a notebook shows a plot or image grid, the surrounding markdown explains:
- what the artifact shows;
- why it is needed;
- how it supports the phase decision;
- what limitation remains.
This is important because the project is evaluated not only by final
performance, but by the documented evolution of the solution.
## Repository Layout
```text
DRL_PROJ/ DRL_PROJ/
classifier/ ← discriminative model (real vs. fake classifier) classifier/
src/ ← model definitions, training, evaluation, preprocessing configs/ experiment configs by phase
configs/ ← experiment configs organised by phase notebooks/ classifier report notebooks
phase1/ ← baseline models (SimpleCNN, ResNet18) outputs/ saved logs, figures, Grad-CAM panels, checkpoints
phase2/ ← architecture sweep (ResNet variants, face-crop) src/ classifier data, models, training, evaluation
phase3/ ← EfficientNet, ViT, frequency-aware training tools/ facecrop, Grad-CAM, inference, reevaluation helpers
phase4/ ← ensemble strategies
tools/ ← analyse.py, ensemble.py, inference.py, facecrop.py generator/
notebooks/ ← EDA, preprocessing, evaluation, GradCAM configs/ generator configs by phase/family
outputs/ ← models, logs, figures (gitignored except .pt/.json) notebooks/ generator report notebooks and notebook builder
run.py ← main training entry point outputs/ saved logs, sample grids, final showcase artifacts
generator/ generative model (GAN / VAE / diffusion) — in progress src/ generator data, models, training, metrics
pipeline/ ← Vast.ai ephemeral GPU orchestration tools/ sampling and utility scripts
data/ ← dataset root (gitignored)
cropped/ ← MTCNN pre-cropped faces (gitignored) data/ original DFF dataset root, not committed
classifier/ ← bbox crops for the classifier cropped/ preprocessed face crops, not committed
generator/ ← landmark-aligned crops for the generator docs/ project statement and supporting documents
pipeline/ optional remote/GPU orchestration helpers
``` ```
## Setup ## Rebuilding The Generator Notebooks
Create a local environment when you want to run the code directly on a machine you control: The generator notebooks are generated from a single source file:
```bash ```powershell
python3 -m venv .venv cd generator/notebooks
source .venv/bin/activate python _build.py
python -m pip install --upgrade pip setuptools wheel
python -m pip install -r requirements.txt
``` ```
## Local Training That builder writes the numbered generator notebooks listed above. It uses
existing saved logs and artifacts; it does not train models.
```bash ## Running The Code
python3 classifier/run.py classifier/configs/phase2/p2_resnet18_facecrop.json
python3 classifier/run.py classifier/configs/phase3/p3_efficientnet_b0.json Create an environment and install the project requirements:
```powershell
python -m venv .venv
.\.venv\Scripts\python.exe -m pip install --upgrade pip setuptools wheel
.\.venv\Scripts\python.exe -m pip install -r requirements.txt
``` ```
## Ephemeral Vast.ai Pipeline The raw dataset should be placed under `data/`. Preprocessed crops are stored
under `cropped/`. These folders are intentionally not committed.
The deployment/orchestration path now lives under [`pipeline/`](/run/host/mnt/shared/UP/DRL/DRL_PROJ/pipeline/README.md). Execution entry points exist in `classifier/run.py` and `generator/run.py` for
reproducibility, but the report notebooks should be read as analysis over
already saved results.
One-time setup: ## Final Takeaway
```bash The project is best understood as a sequence of controlled decisions:
cat > pipeline/.env <<'EOF'
VAST_API_KEY=<your-api-key>
VAST_SSH_PRIVATE_KEY=/home/your-user/.ssh/id_ed25519
EOF
```
End-to-end ephemeral run: 1. cleanly define the data and preprocessing;
2. establish simple baselines;
3. improve one factor at a time;
4. compare model families using saved evidence;
5. report both performance and limitations.
```bash The classifier becomes reliable through source-aware preprocessing, stronger
python3 -m pipeline run classifier/configs/phase2/p2_resnet18_facecrop.json --upload-data pretrained backbones, and scaling. The generator improves by first locking the
``` face-aligned pipeline and then selecting the best recipe inside each model
family before the final GAN/VAE/DDPM comparison.
Interactive offer selection:
```bash
python3 -m pipeline offers --select-offer
```
You can override the ranking mode per run:
```bash
python3 -m pipeline offers --sort price
python3 -m pipeline offers --sort performance
python3 -m pipeline offers --sort performance --price 0.14
```
You can also filter by region:
```bash
python3 -m pipeline offers --select-offer --region europe
python3 -m pipeline offers --select-offer --region Portugal
python3 -m pipeline offers --select-offer --region US
python3 -m pipeline offers --select-offer --region europe --price 0.14
```
To inspect which region strings are currently available from the search results:
```bash
python3 -m pipeline offers --list-regions
```
That command:
- ensures your SSH public key is registered with Vast.ai
- searches offers using the filters in `pipeline/defaults/vast.json`
- creates an instance
- waits for SSH readiness
- syncs the repo
- uploads `data/` when `--upload-data` is set
- runs `python3 classifier/run.py ...`
- downloads `classifier/outputs/`
- for generator runs, rsyncs `generator/outputs/` back every 25 epochs and again at completion
- destroys the instance automatically unless `--keep-on-failure` is set
Useful commands:
```bash
python3 -m pipeline up
python3 -m pipeline status <instance_id>
python3 -m pipeline down <instance_id>
```
To override the default Vast search/runtime settings, copy `pipeline/defaults/vast.json`, edit it, and pass:
```bash
python3 -m pipeline run classifier/configs/phase3/p3_efficientnet_b0.json --pipeline-config /path/to/vast.override.json
```
The default policy in `pipeline/defaults/vast.json` now targets:
- `1x` GPU
- `RTX 3090` or `RTX 3090 Ti`
- `<= $0.20/hour`
- sorted by `dlperf` descending
- uses `vastai/pytorch:latest` as the default image
@@ -2,9 +2,10 @@
"cells": [ "cells": [
{ {
"cell_type": "markdown", "cell_type": "markdown",
"id": "b6a7c89b",
"metadata": {}, "metadata": {},
"source": [ "source": [
"# Phase 0 - Baseline Sanity Check\n", "# 01 - Baseline Sanity Check\n",
"\n", "\n",
"Phase 0 is the starting point of the generator story. It uses the raw, un-aligned\n", "Phase 0 is the starting point of the generator story. It uses the raw, un-aligned\n",
"images and very plain versions of each model family so we can confirm that the\n", "images and very plain versions of each model family so we can confirm that the\n",
@@ -33,7 +34,15 @@
{ {
"cell_type": "code", "cell_type": "code",
"execution_count": 1, "execution_count": 1,
"metadata": {}, "id": "c354bb59",
"metadata": {
"execution": {
"iopub.execute_input": "2026-05-14T19:07:52.326287Z",
"iopub.status.busy": "2026-05-14T19:07:52.325284Z",
"iopub.status.idle": "2026-05-14T19:08:01.950318Z",
"shell.execute_reply": "2026-05-14T19:08:01.947303Z"
}
},
"outputs": [], "outputs": [],
"source": [ "source": [
"import json\n", "import json\n",
@@ -92,17 +101,28 @@
}, },
{ {
"cell_type": "markdown", "cell_type": "markdown",
"id": "a6c786f4",
"metadata": {}, "metadata": {},
"source": [ "source": [
"## 1. Training loss curves\n", "## 1. Training loss curves\n",
"\n", "\n",
"These curves check that the loops ran and produced stable logs. They are not enough to prove visual quality." "These curves check that the loops ran and produced stable logs. They are not enough to prove visual quality, but they are needed before interpreting samples: a broken optimization loop would make every later visual comparison meaningless.\n",
"\n",
"**What to look for:** the curves should move smoothly enough to show that each family is learning something. The limitation is that loss scale differs by family, so the curves compare stability, not final image quality."
] ]
}, },
{ {
"cell_type": "code", "cell_type": "code",
"execution_count": 2, "execution_count": 2,
"metadata": {}, "id": "47441617",
"metadata": {
"execution": {
"iopub.execute_input": "2026-05-14T19:08:01.960833Z",
"iopub.status.busy": "2026-05-14T19:08:01.958829Z",
"iopub.status.idle": "2026-05-14T19:08:03.894140Z",
"shell.execute_reply": "2026-05-14T19:08:03.891170Z"
}
},
"outputs": [ "outputs": [
{ {
"data": { "data": {
@@ -152,17 +172,28 @@
}, },
{ {
"cell_type": "markdown", "cell_type": "markdown",
"id": "836428fe",
"metadata": {}, "metadata": {},
"source": [ "source": [
"## 2. Final sample grids\n", "## 2. Final sample grids\n",
"\n", "\n",
"The final previews show the practical failure mode of the raw pipeline: the samples have some face-like structure, but identity, alignment, and detail are not under control. These PNGs are displayed exactly as saved, so older Phase 0 matrices keep their original layout instead of being forced into 4x4." "The final previews show the practical failure mode of the raw pipeline: the samples have some face-like structure, but identity, alignment, and detail are not under control. These PNGs are displayed exactly as saved, so older Phase 0 matrices keep their original layout instead of being forced into 4x4.\n",
"\n",
"**Why this matters:** this is the visual evidence that the first bottleneck is not only the model family. The data still contains too much pose, scale, and background variation for tiny baseline recipes."
] ]
}, },
{ {
"cell_type": "code", "cell_type": "code",
"execution_count": 3, "execution_count": 3,
"metadata": {}, "id": "9389ea9c",
"metadata": {
"execution": {
"iopub.execute_input": "2026-05-14T19:08:03.902776Z",
"iopub.status.busy": "2026-05-14T19:08:03.901424Z",
"iopub.status.idle": "2026-05-14T19:08:05.698092Z",
"shell.execute_reply": "2026-05-14T19:08:05.693983Z"
}
},
"outputs": [ "outputs": [
{ {
"data": { "data": {
@@ -192,17 +223,28 @@
}, },
{ {
"cell_type": "markdown", "cell_type": "markdown",
"id": "b596f509",
"metadata": {}, "metadata": {},
"source": [ "source": [
"## 3. Progression - early vs late\n", "## 3. Progression - early vs late\n",
"\n", "\n",
"The progression grids make the baseline failure visible over time. Later samples improve slightly, but the raw input distribution keeps the task too broad. The saved matrices are shown in their original layout." "The progression grids make the baseline failure visible over time. Later samples improve slightly, but the raw input distribution keeps the task too broad. The saved matrices are shown in their original layout.\n",
"\n",
"**How to read it:** if more epochs only turn noise into rough face-like blobs, the next decision should be pipeline cleanup rather than simply training the same recipe longer."
] ]
}, },
{ {
"cell_type": "code", "cell_type": "code",
"execution_count": 4, "execution_count": 4,
"metadata": {}, "id": "01959758",
"metadata": {
"execution": {
"iopub.execute_input": "2026-05-14T19:08:05.750674Z",
"iopub.status.busy": "2026-05-14T19:08:05.748669Z",
"iopub.status.idle": "2026-05-14T19:08:09.444392Z",
"shell.execute_reply": "2026-05-14T19:08:09.441869Z"
}
},
"outputs": [ "outputs": [
{ {
"data": { "data": {
@@ -257,6 +299,7 @@
}, },
{ {
"cell_type": "markdown", "cell_type": "markdown",
"id": "85964677",
"metadata": {}, "metadata": {},
"source": [ "source": [
"## 4. What this phase proves\n", "## 4. What this phase proves\n",
File diff suppressed because one or more lines are too long
@@ -2,9 +2,10 @@
"cells": [ "cells": [
{ {
"cell_type": "markdown", "cell_type": "markdown",
"id": "b5ec2417",
"metadata": {}, "metadata": {},
"source": [ "source": [
"# Phase 2 - GAN Progression\n", "# 03 - GAN Stability Progression\n",
"\n", "\n",
"Phase 2 keeps the Phase 1 pipeline fixed and changes the GAN recipe. This makes\n", "Phase 2 keeps the Phase 1 pipeline fixed and changes the GAN recipe. This makes\n",
"the question narrow: once the data is aligned, what model changes are needed to\n", "the question narrow: once the data is aligned, what model changes are needed to\n",
@@ -29,6 +30,7 @@
}, },
{ {
"cell_type": "markdown", "cell_type": "markdown",
"id": "e8e9b53c",
"metadata": {}, "metadata": {},
"source": [ "source": [
"> ### FID is not comparable across phases\n", "> ### FID is not comparable across phases\n",
@@ -49,6 +51,7 @@
}, },
{ {
"cell_type": "markdown", "cell_type": "markdown",
"id": "cc979d85",
"metadata": {}, "metadata": {},
"source": [ "source": [
"### Reference: Phase 0 baseline from the same family\n", "### Reference: Phase 0 baseline from the same family\n",
@@ -60,9 +63,16 @@
}, },
{ {
"cell_type": "code", "cell_type": "code",
"execution_count": 9, "execution_count": 1,
"id": "bf821370", "id": "a352836b",
"metadata": {}, "metadata": {
"execution": {
"iopub.execute_input": "2026-05-14T19:08:45.741982Z",
"iopub.status.busy": "2026-05-14T19:08:45.741982Z",
"iopub.status.idle": "2026-05-14T19:08:47.336989Z",
"shell.execute_reply": "2026-05-14T19:08:47.334437Z"
}
},
"outputs": [], "outputs": [],
"source": [ "source": [
"import json\n", "import json\n",
@@ -75,6 +85,12 @@
"\n", "\n",
"plt.rcParams.update({\"figure.dpi\": 120, \"font.size\": 10})\n", "plt.rcParams.update({\"figure.dpi\": 120, \"font.size\": 10})\n",
"\n", "\n",
"try:\n",
" display\n",
"except NameError:\n",
" def display(obj):\n",
" print(obj)\n",
"\n",
"def find_generator_root():\n", "def find_generator_root():\n",
" for base in [Path.cwd(), *Path.cwd().parents]:\n", " for base in [Path.cwd(), *Path.cwd().parents]:\n",
" for candidate in [base, base / \"generator\"]:\n", " for candidate in [base, base / \"generator\"]:\n",
@@ -115,7 +131,7 @@
}, },
{ {
"cell_type": "markdown", "cell_type": "markdown",
"id": "f627af73", "id": "e1871b79",
"metadata": {}, "metadata": {},
"source": [ "source": [
"## 1. Load experiment logs\n", "## 1. Load experiment logs\n",
@@ -125,9 +141,16 @@
}, },
{ {
"cell_type": "code", "cell_type": "code",
"execution_count": 10, "execution_count": 2,
"id": "59f61b4e", "id": "32a2b843",
"metadata": {}, "metadata": {
"execution": {
"iopub.execute_input": "2026-05-14T19:08:47.342985Z",
"iopub.status.busy": "2026-05-14T19:08:47.342985Z",
"iopub.status.idle": "2026-05-14T19:08:47.368505Z",
"shell.execute_reply": "2026-05-14T19:08:47.365495Z"
}
},
"outputs": [ "outputs": [
{ {
"name": "stdout", "name": "stdout",
@@ -163,7 +186,7 @@
}, },
{ {
"cell_type": "markdown", "cell_type": "markdown",
"id": "c1bad44a", "id": "494c64aa",
"metadata": {}, "metadata": {},
"source": [ "source": [
"## 2. FID comparison table\n", "## 2. FID comparison table\n",
@@ -173,72 +196,79 @@
}, },
{ {
"cell_type": "code", "cell_type": "code",
"execution_count": 11, "execution_count": 3,
"id": "528d3bb2", "id": "72a04040",
"metadata": {}, "metadata": {
"execution": {
"iopub.execute_input": "2026-05-14T19:08:47.376020Z",
"iopub.status.busy": "2026-05-14T19:08:47.375017Z",
"iopub.status.idle": "2026-05-14T19:08:47.640711Z",
"shell.execute_reply": "2026-05-14T19:08:47.638190Z"
}
},
"outputs": [ "outputs": [
{ {
"data": { "data": {
"text/html": [ "text/html": [
"<style type=\"text/css\">\n", "<style type=\"text/css\">\n",
"</style>\n", "</style>\n",
"<table id=\"T_0b020\">\n", "<table id=\"T_cf90d\">\n",
" <thead>\n", " <thead>\n",
" <tr>\n", " <tr>\n",
" <th class=\"blank level0\" >&nbsp;</th>\n", " <th class=\"blank level0\" >&nbsp;</th>\n",
" <th id=\"T_0b020_level0_col0\" class=\"col_heading level0 col0\" >Run</th>\n", " <th id=\"T_cf90d_level0_col0\" class=\"col_heading level0 col0\" >Run</th>\n",
" <th id=\"T_0b020_level0_col1\" class=\"col_heading level0 col1\" >FID@25</th>\n", " <th id=\"T_cf90d_level0_col1\" class=\"col_heading level0 col1\" >FID@25</th>\n",
" <th id=\"T_0b020_level0_col2\" class=\"col_heading level0 col2\" >FID@50</th>\n", " <th id=\"T_cf90d_level0_col2\" class=\"col_heading level0 col2\" >FID@50</th>\n",
" <th id=\"T_0b020_level0_col3\" class=\"col_heading level0 col3\" >FID@100</th>\n", " <th id=\"T_cf90d_level0_col3\" class=\"col_heading level0 col3\" >FID@100</th>\n",
" <th id=\"T_0b020_level0_col4\" class=\"col_heading level0 col4\" >Best FID</th>\n", " <th id=\"T_cf90d_level0_col4\" class=\"col_heading level0 col4\" >Best FID</th>\n",
" <th id=\"T_0b020_level0_col5\" class=\"col_heading level0 col5\" >Train (min)</th>\n", " <th id=\"T_cf90d_level0_col5\" class=\"col_heading level0 col5\" >Train (min)</th>\n",
" </tr>\n", " </tr>\n",
" </thead>\n", " </thead>\n",
" <tbody>\n", " <tbody>\n",
" <tr>\n", " <tr>\n",
" <th id=\"T_0b020_level0_row0\" class=\"row_heading level0 row0\" >2</th>\n", " <th id=\"T_cf90d_level0_row0\" class=\"row_heading level0 row0\" >2</th>\n",
" <td id=\"T_0b020_row0_col0\" class=\"data row0 col0\" >2.3 + SN + Attn</td>\n", " <td id=\"T_cf90d_row0_col0\" class=\"data row0 col0\" >2.3 + SN + Attn</td>\n",
" <td id=\"T_0b020_row0_col1\" class=\"data row0 col1\" >274.4</td>\n", " <td id=\"T_cf90d_row0_col1\" class=\"data row0 col1\" >274.4</td>\n",
" <td id=\"T_0b020_row0_col2\" class=\"data row0 col2\" >223.2</td>\n", " <td id=\"T_cf90d_row0_col2\" class=\"data row0 col2\" >223.2</td>\n",
" <td id=\"T_0b020_row0_col3\" class=\"data row0 col3\" >110.1</td>\n", " <td id=\"T_cf90d_row0_col3\" class=\"data row0 col3\" >110.1</td>\n",
" <td id=\"T_0b020_row0_col4\" class=\"data row0 col4\" >110.1</td>\n", " <td id=\"T_cf90d_row0_col4\" class=\"data row0 col4\" >110.1</td>\n",
" <td id=\"T_0b020_row0_col5\" class=\"data row0 col5\" >39.0</td>\n", " <td id=\"T_cf90d_row0_col5\" class=\"data row0 col5\" >39.0</td>\n",
" </tr>\n", " </tr>\n",
" <tr>\n", " <tr>\n",
" <th id=\"T_0b020_level0_row1\" class=\"row_heading level0 row1\" >3</th>\n", " <th id=\"T_cf90d_level0_row1\" class=\"row_heading level0 row1\" >3</th>\n",
" <td id=\"T_0b020_row1_col0\" class=\"data row1 col0\" >2.4 + 128x128</td>\n", " <td id=\"T_cf90d_row1_col0\" class=\"data row1 col0\" >2.4 + 128x128</td>\n",
" <td id=\"T_0b020_row1_col1\" class=\"data row1 col1\" >428.6</td>\n", " <td id=\"T_cf90d_row1_col1\" class=\"data row1 col1\" >428.6</td>\n",
" <td id=\"T_0b020_row1_col2\" class=\"data row1 col2\" >264.3</td>\n", " <td id=\"T_cf90d_row1_col2\" class=\"data row1 col2\" >264.3</td>\n",
" <td id=\"T_0b020_row1_col3\" class=\"data row1 col3\" >186.0</td>\n", " <td id=\"T_cf90d_row1_col3\" class=\"data row1 col3\" >186.0</td>\n",
" <td id=\"T_0b020_row1_col4\" class=\"data row1 col4\" >186.0</td>\n", " <td id=\"T_cf90d_row1_col4\" class=\"data row1 col4\" >186.0</td>\n",
" <td id=\"T_0b020_row1_col5\" class=\"data row1 col5\" >97.7</td>\n", " <td id=\"T_cf90d_row1_col5\" class=\"data row1 col5\" >97.7</td>\n",
" </tr>\n", " </tr>\n",
" <tr>\n", " <tr>\n",
" <th id=\"T_0b020_level0_row2\" class=\"row_heading level0 row2\" >1</th>\n", " <th id=\"T_cf90d_level0_row2\" class=\"row_heading level0 row2\" >1</th>\n",
" <td id=\"T_0b020_row2_col0\" class=\"data row2 col0\" >2.2 WGAN-GP</td>\n", " <td id=\"T_cf90d_row2_col0\" class=\"data row2 col0\" >2.2 WGAN-GP</td>\n",
" <td id=\"T_0b020_row2_col1\" class=\"data row2 col1\" >489.6</td>\n", " <td id=\"T_cf90d_row2_col1\" class=\"data row2 col1\" >489.6</td>\n",
" <td id=\"T_0b020_row2_col2\" class=\"data row2 col2\" >474.6</td>\n", " <td id=\"T_cf90d_row2_col2\" class=\"data row2 col2\" >474.6</td>\n",
" <td id=\"T_0b020_row2_col3\" class=\"data row2 col3\" >421.3</td>\n", " <td id=\"T_cf90d_row2_col3\" class=\"data row2 col3\" >421.3</td>\n",
" <td id=\"T_0b020_row2_col4\" class=\"data row2 col4\" >421.3</td>\n", " <td id=\"T_cf90d_row2_col4\" class=\"data row2 col4\" >421.3</td>\n",
" <td id=\"T_0b020_row2_col5\" class=\"data row2 col5\" >27.1</td>\n", " <td id=\"T_cf90d_row2_col5\" class=\"data row2 col5\" >27.1</td>\n",
" </tr>\n", " </tr>\n",
" <tr>\n", " <tr>\n",
" <th id=\"T_0b020_level0_row3\" class=\"row_heading level0 row3\" >0</th>\n", " <th id=\"T_cf90d_level0_row3\" class=\"row_heading level0 row3\" >0</th>\n",
" <td id=\"T_0b020_row3_col0\" class=\"data row3 col0\" >2.1 DCGAN (BCE)</td>\n", " <td id=\"T_cf90d_row3_col0\" class=\"data row3 col0\" >2.1 DCGAN (BCE)</td>\n",
" <td id=\"T_0b020_row3_col1\" class=\"data row3 col1\" >444.3</td>\n", " <td id=\"T_cf90d_row3_col1\" class=\"data row3 col1\" >444.3</td>\n",
" <td id=\"T_0b020_row3_col2\" class=\"data row3 col2\" >438.9</td>\n", " <td id=\"T_cf90d_row3_col2\" class=\"data row3 col2\" >438.9</td>\n",
" <td id=\"T_0b020_row3_col3\" class=\"data row3 col3\" >429.3</td>\n", " <td id=\"T_cf90d_row3_col3\" class=\"data row3 col3\" >429.3</td>\n",
" <td id=\"T_0b020_row3_col4\" class=\"data row3 col4\" >429.3</td>\n", " <td id=\"T_cf90d_row3_col4\" class=\"data row3 col4\" >429.3</td>\n",
" <td id=\"T_0b020_row3_col5\" class=\"data row3 col5\" >17.8</td>\n", " <td id=\"T_cf90d_row3_col5\" class=\"data row3 col5\" >17.8</td>\n",
" </tr>\n", " </tr>\n",
" </tbody>\n", " </tbody>\n",
"</table>\n" "</table>\n"
], ],
"text/plain": [ "text/plain": [
"<pandas.io.formats.style.Styler at 0x2717dc6bfb0>" "<pandas.io.formats.style.Styler at 0x225ffe45010>"
] ]
}, },
"execution_count": 11, "execution_count": 3,
"metadata": {}, "metadata": {},
"output_type": "execute_result" "output_type": "execute_result"
} }
@@ -265,17 +295,26 @@
}, },
{ {
"cell_type": "markdown", "cell_type": "markdown",
"id": "1f77c814", "id": "33bd39b9",
"metadata": {}, "metadata": {},
"source": [ "source": [
"## 3. FID curves - progression" "## 3. FID curves - progression\n",
"\n",
"This plot shows whether improvements happen gradually or as a step change. It is needed because the final FID table hides training dynamics: here the key story is that the 2.3 stability package changes the whole trajectory, while 2.1 and 2.2 remain collapsed."
] ]
}, },
{ {
"cell_type": "code", "cell_type": "code",
"execution_count": 12, "execution_count": 4,
"id": "6984a849", "id": "bc19fb53",
"metadata": {}, "metadata": {
"execution": {
"iopub.execute_input": "2026-05-14T19:08:47.648234Z",
"iopub.status.busy": "2026-05-14T19:08:47.647237Z",
"iopub.status.idle": "2026-05-14T19:08:48.493796Z",
"shell.execute_reply": "2026-05-14T19:08:48.490766Z"
}
},
"outputs": [ "outputs": [
{ {
"data": { "data": {
@@ -303,7 +342,7 @@
}, },
{ {
"cell_type": "markdown", "cell_type": "markdown",
"id": "f51cdb73", "id": "77d10609",
"metadata": {}, "metadata": {},
"source": [ "source": [
"## 4. Training dynamics\n", "## 4. Training dynamics\n",
@@ -313,9 +352,16 @@
}, },
{ {
"cell_type": "code", "cell_type": "code",
"execution_count": 13, "execution_count": 5,
"id": "0f420780", "id": "57be87f9",
"metadata": {}, "metadata": {
"execution": {
"iopub.execute_input": "2026-05-14T19:08:48.500457Z",
"iopub.status.busy": "2026-05-14T19:08:48.499437Z",
"iopub.status.idle": "2026-05-14T19:08:49.538522Z",
"shell.execute_reply": "2026-05-14T19:08:49.535987Z"
}
},
"outputs": [ "outputs": [
{ {
"data": { "data": {
@@ -348,7 +394,7 @@
}, },
{ {
"cell_type": "markdown", "cell_type": "markdown",
"id": "7c480a9e", "id": "e78fcd57",
"metadata": {}, "metadata": {},
"source": [ "source": [
"## 5. Sample grids - epoch 100\n", "## 5. Sample grids - epoch 100\n",
@@ -364,9 +410,16 @@
}, },
{ {
"cell_type": "code", "cell_type": "code",
"execution_count": 14, "execution_count": 6,
"id": "7cba06c7", "id": "1d426097",
"metadata": {}, "metadata": {
"execution": {
"iopub.execute_input": "2026-05-14T19:08:49.546183Z",
"iopub.status.busy": "2026-05-14T19:08:49.545170Z",
"iopub.status.idle": "2026-05-14T19:08:50.817249Z",
"shell.execute_reply": "2026-05-14T19:08:50.813674Z"
}
},
"outputs": [ "outputs": [
{ {
"data": { "data": {
@@ -392,17 +445,26 @@
}, },
{ {
"cell_type": "markdown", "cell_type": "markdown",
"id": "e984c1c7", "id": "754f83e7",
"metadata": {}, "metadata": {},
"source": [ "source": [
"## 6. Progression - epoch 10 -> 50 -> 100" "## 6. Progression - epoch 10 -> 50 -> 100\n",
"\n",
"These panels connect time to visual quality. For the collapsed runs, the gray grids are still information: they show that more epochs did not fix the recipe. For the stabilized run, the same timeline shows recognizable faces emerging."
] ]
}, },
{ {
"cell_type": "code", "cell_type": "code",
"execution_count": 15, "execution_count": 7,
"id": "bf39e6ec", "id": "77d5b876",
"metadata": {}, "metadata": {
"execution": {
"iopub.execute_input": "2026-05-14T19:08:50.842106Z",
"iopub.status.busy": "2026-05-14T19:08:50.842106Z",
"iopub.status.idle": "2026-05-14T19:08:54.153113Z",
"shell.execute_reply": "2026-05-14T19:08:54.151583Z"
}
},
"outputs": [ "outputs": [
{ {
"data": { "data": {
@@ -463,17 +525,26 @@
}, },
{ {
"cell_type": "markdown", "cell_type": "markdown",
"id": "3800b23b", "id": "fdb18bce",
"metadata": {}, "metadata": {},
"source": [ "source": [
"## 7. Pairwise comparison - what each step bought us" "## 7. Pairwise comparison - what each step bought us\n",
"\n",
"Each pair isolates one decision. The purpose is to avoid saying simply that the final GAN is better: the comparison shows that Wasserstein loss alone is insufficient, the stability package is decisive, and 128x128 is premature under the saved compute budget."
] ]
}, },
{ {
"cell_type": "code", "cell_type": "code",
"execution_count": 16, "execution_count": 8,
"id": "a113d215", "id": "4aed1b79",
"metadata": {}, "metadata": {
"execution": {
"iopub.execute_input": "2026-05-14T19:08:54.188809Z",
"iopub.status.busy": "2026-05-14T19:08:54.186606Z",
"iopub.status.idle": "2026-05-14T19:08:56.310301Z",
"shell.execute_reply": "2026-05-14T19:08:56.307480Z"
}
},
"outputs": [ "outputs": [
{ {
"data": { "data": {
@@ -527,6 +598,7 @@
}, },
{ {
"cell_type": "markdown", "cell_type": "markdown",
"id": "95fd4f90",
"metadata": {}, "metadata": {},
"source": [ "source": [
"## 8. What this phase proves\n", "## 8. What this phase proves\n",
@@ -555,11 +627,21 @@
], ],
"metadata": { "metadata": {
"kernelspec": { "kernelspec": {
"display_name": "Python 3", "display_name": ".venv",
"language": "python",
"name": "python3" "name": "python3"
}, },
"language_info": { "language_info": {
"name": "python" "codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.12.10"
} }
}, },
"nbformat": 4, "nbformat": 4,
File diff suppressed because one or more lines are too long
@@ -2,9 +2,10 @@
"cells": [ "cells": [
{ {
"cell_type": "markdown", "cell_type": "markdown",
"id": "2d43e849",
"metadata": {}, "metadata": {},
"source": [ "source": [
"# Phase 4 - DDPM Progression\n", "# 05 - DDPM Recipe Progression\n",
"\n", "\n",
"Phase 4 applies the same report logic to diffusion models. The pipeline is\n", "Phase 4 applies the same report logic to diffusion models. The pipeline is\n",
"already fixed, so this notebook isolates the DDPM recipe: schedule, prediction\n", "already fixed, so this notebook isolates the DDPM recipe: schedule, prediction\n",
@@ -44,6 +45,7 @@
}, },
{ {
"cell_type": "markdown", "cell_type": "markdown",
"id": "fe4b4147",
"metadata": {}, "metadata": {},
"source": [ "source": [
"### Reference: Phase 0 baseline from the same family\n", "### Reference: Phase 0 baseline from the same family\n",
@@ -57,7 +59,15 @@
{ {
"cell_type": "code", "cell_type": "code",
"execution_count": 1, "execution_count": 1,
"metadata": {}, "id": "ad73d7ca",
"metadata": {
"execution": {
"iopub.execute_input": "2026-05-14T19:09:11.615922Z",
"iopub.status.busy": "2026-05-14T19:09:11.615922Z",
"iopub.status.idle": "2026-05-14T19:09:13.772612Z",
"shell.execute_reply": "2026-05-14T19:09:13.769968Z"
}
},
"outputs": [], "outputs": [],
"source": [ "source": [
"import json\n", "import json\n",
@@ -116,6 +126,7 @@
}, },
{ {
"cell_type": "markdown", "cell_type": "markdown",
"id": "2fd648bc",
"metadata": {}, "metadata": {},
"source": [ "source": [
"## 1. Load experiment logs\n", "## 1. Load experiment logs\n",
@@ -126,7 +137,15 @@
{ {
"cell_type": "code", "cell_type": "code",
"execution_count": 2, "execution_count": 2,
"metadata": {}, "id": "5abd8b09",
"metadata": {
"execution": {
"iopub.execute_input": "2026-05-14T19:09:13.780145Z",
"iopub.status.busy": "2026-05-14T19:09:13.779153Z",
"iopub.status.idle": "2026-05-14T19:09:13.800005Z",
"shell.execute_reply": "2026-05-14T19:09:13.797633Z"
}
},
"outputs": [ "outputs": [
{ {
"name": "stdout", "name": "stdout",
@@ -154,6 +173,7 @@
}, },
{ {
"cell_type": "markdown", "cell_type": "markdown",
"id": "4d563bd4",
"metadata": {}, "metadata": {},
"source": [ "source": [
"## 2. FID comparison table\n", "## 2. FID comparison table\n",
@@ -164,72 +184,80 @@
{ {
"cell_type": "code", "cell_type": "code",
"execution_count": 3, "execution_count": 3,
"metadata": {}, "id": "bf039a10",
"metadata": {
"execution": {
"iopub.execute_input": "2026-05-14T19:09:13.808455Z",
"iopub.status.busy": "2026-05-14T19:09:13.807458Z",
"iopub.status.idle": "2026-05-14T19:09:14.118770Z",
"shell.execute_reply": "2026-05-14T19:09:14.115475Z"
}
},
"outputs": [ "outputs": [
{ {
"data": { "data": {
"text/html": [ "text/html": [
"<style type=\"text/css\">\n", "<style type=\"text/css\">\n",
"</style>\n", "</style>\n",
"<table id=\"T_7ea0f\">\n", "<table id=\"T_2697a\">\n",
" <thead>\n", " <thead>\n",
" <tr>\n", " <tr>\n",
" <th class=\"blank level0\" >&nbsp;</th>\n", " <th class=\"blank level0\" >&nbsp;</th>\n",
" <th id=\"T_7ea0f_level0_col0\" class=\"col_heading level0 col0\" >Run</th>\n", " <th id=\"T_2697a_level0_col0\" class=\"col_heading level0 col0\" >Run</th>\n",
" <th id=\"T_7ea0f_level0_col1\" class=\"col_heading level0 col1\" >FID@25</th>\n", " <th id=\"T_2697a_level0_col1\" class=\"col_heading level0 col1\" >FID@25</th>\n",
" <th id=\"T_7ea0f_level0_col2\" class=\"col_heading level0 col2\" >FID@50</th>\n", " <th id=\"T_2697a_level0_col2\" class=\"col_heading level0 col2\" >FID@50</th>\n",
" <th id=\"T_7ea0f_level0_col3\" class=\"col_heading level0 col3\" >FID@100</th>\n", " <th id=\"T_2697a_level0_col3\" class=\"col_heading level0 col3\" >FID@100</th>\n",
" <th id=\"T_7ea0f_level0_col4\" class=\"col_heading level0 col4\" >Best FID</th>\n", " <th id=\"T_2697a_level0_col4\" class=\"col_heading level0 col4\" >Best FID</th>\n",
" <th id=\"T_7ea0f_level0_col5\" class=\"col_heading level0 col5\" >Loss@100</th>\n", " <th id=\"T_2697a_level0_col5\" class=\"col_heading level0 col5\" >Loss@100</th>\n",
" <th id=\"T_7ea0f_level0_col6\" class=\"col_heading level0 col6\" >Train (min)</th>\n", " <th id=\"T_2697a_level0_col6\" class=\"col_heading level0 col6\" >Train (min)</th>\n",
" </tr>\n", " </tr>\n",
" </thead>\n", " </thead>\n",
" <tbody>\n", " <tbody>\n",
" <tr>\n", " <tr>\n",
" <th id=\"T_7ea0f_level0_row0\" class=\"row_heading level0 row0\" >3</th>\n", " <th id=\"T_2697a_level0_row0\" class=\"row_heading level0 row0\" >3</th>\n",
" <td id=\"T_7ea0f_row0_col0\" class=\"data row0 col0\" >4.4 wider net</td>\n", " <td id=\"T_2697a_row0_col0\" class=\"data row0 col0\" >4.4 wider net</td>\n",
" <td id=\"T_7ea0f_row0_col1\" class=\"data row0 col1\" >325.0</td>\n", " <td id=\"T_2697a_row0_col1\" class=\"data row0 col1\" >325.0</td>\n",
" <td id=\"T_7ea0f_row0_col2\" class=\"data row0 col2\" >170.9</td>\n", " <td id=\"T_2697a_row0_col2\" class=\"data row0 col2\" >170.9</td>\n",
" <td id=\"T_7ea0f_row0_col3\" class=\"data row0 col3\" >30.0</td>\n", " <td id=\"T_2697a_row0_col3\" class=\"data row0 col3\" >30.0</td>\n",
" <td id=\"T_7ea0f_row0_col4\" class=\"data row0 col4\" >30.0</td>\n", " <td id=\"T_2697a_row0_col4\" class=\"data row0 col4\" >30.0</td>\n",
" <td id=\"T_7ea0f_row0_col5\" class=\"data row0 col5\" >0.0582</td>\n", " <td id=\"T_2697a_row0_col5\" class=\"data row0 col5\" >0.0582</td>\n",
" <td id=\"T_7ea0f_row0_col6\" class=\"data row0 col6\" >536.4</td>\n", " <td id=\"T_2697a_row0_col6\" class=\"data row0 col6\" >536.4</td>\n",
" </tr>\n", " </tr>\n",
" <tr>\n", " <tr>\n",
" <th id=\"T_7ea0f_level0_row1\" class=\"row_heading level0 row1\" >2</th>\n", " <th id=\"T_2697a_level0_row1\" class=\"row_heading level0 row1\" >2</th>\n",
" <td id=\"T_7ea0f_row1_col0\" class=\"data row1 col0\" >4.3 cosine / v</td>\n", " <td id=\"T_2697a_row1_col0\" class=\"data row1 col0\" >4.3 cosine / v</td>\n",
" <td id=\"T_7ea0f_row1_col1\" class=\"data row1 col1\" >315.9</td>\n", " <td id=\"T_2697a_row1_col1\" class=\"data row1 col1\" >315.9</td>\n",
" <td id=\"T_7ea0f_row1_col2\" class=\"data row1 col2\" >160.7</td>\n", " <td id=\"T_2697a_row1_col2\" class=\"data row1 col2\" >160.7</td>\n",
" <td id=\"T_7ea0f_row1_col3\" class=\"data row1 col3\" >34.5</td>\n", " <td id=\"T_2697a_row1_col3\" class=\"data row1 col3\" >34.5</td>\n",
" <td id=\"T_7ea0f_row1_col4\" class=\"data row1 col4\" >34.5</td>\n", " <td id=\"T_2697a_row1_col4\" class=\"data row1 col4\" >34.5</td>\n",
" <td id=\"T_7ea0f_row1_col5\" class=\"data row1 col5\" >0.0594</td>\n", " <td id=\"T_2697a_row1_col5\" class=\"data row1 col5\" >0.0594</td>\n",
" <td id=\"T_7ea0f_row1_col6\" class=\"data row1 col6\" >278.8</td>\n", " <td id=\"T_2697a_row1_col6\" class=\"data row1 col6\" >278.8</td>\n",
" </tr>\n", " </tr>\n",
" <tr>\n", " <tr>\n",
" <th id=\"T_7ea0f_level0_row2\" class=\"row_heading level0 row2\" >1</th>\n", " <th id=\"T_2697a_level0_row2\" class=\"row_heading level0 row2\" >1</th>\n",
" <td id=\"T_7ea0f_row2_col0\" class=\"data row2 col0\" >4.2 cosine / epsilon</td>\n", " <td id=\"T_2697a_row2_col0\" class=\"data row2 col0\" >4.2 cosine / epsilon</td>\n",
" <td id=\"T_7ea0f_row2_col1\" class=\"data row2 col1\" >249.7</td>\n", " <td id=\"T_2697a_row2_col1\" class=\"data row2 col1\" >249.7</td>\n",
" <td id=\"T_7ea0f_row2_col2\" class=\"data row2 col2\" >282.2</td>\n", " <td id=\"T_2697a_row2_col2\" class=\"data row2 col2\" >282.2</td>\n",
" <td id=\"T_7ea0f_row2_col3\" class=\"data row2 col3\" >132.3</td>\n", " <td id=\"T_2697a_row2_col3\" class=\"data row2 col3\" >132.3</td>\n",
" <td id=\"T_7ea0f_row2_col4\" class=\"data row2 col4\" >132.3</td>\n", " <td id=\"T_2697a_row2_col4\" class=\"data row2 col4\" >132.3</td>\n",
" <td id=\"T_7ea0f_row2_col5\" class=\"data row2 col5\" >0.0285</td>\n", " <td id=\"T_2697a_row2_col5\" class=\"data row2 col5\" >0.0285</td>\n",
" <td id=\"T_7ea0f_row2_col6\" class=\"data row2 col6\" >258.8</td>\n", " <td id=\"T_2697a_row2_col6\" class=\"data row2 col6\" >258.8</td>\n",
" </tr>\n", " </tr>\n",
" <tr>\n", " <tr>\n",
" <th id=\"T_7ea0f_level0_row3\" class=\"row_heading level0 row3\" >0</th>\n", " <th id=\"T_2697a_level0_row3\" class=\"row_heading level0 row3\" >0</th>\n",
" <td id=\"T_7ea0f_row3_col0\" class=\"data row3 col0\" >4.1 linear / epsilon</td>\n", " <td id=\"T_2697a_row3_col0\" class=\"data row3 col0\" >4.1 linear / epsilon</td>\n",
" <td id=\"T_7ea0f_row3_col1\" class=\"data row3 col1\" >333.3</td>\n", " <td id=\"T_2697a_row3_col1\" class=\"data row3 col1\" >333.3</td>\n",
" <td id=\"T_7ea0f_row3_col2\" class=\"data row3 col2\" >311.4</td>\n", " <td id=\"T_2697a_row3_col2\" class=\"data row3 col2\" >311.4</td>\n",
" <td id=\"T_7ea0f_row3_col3\" class=\"data row3 col3\" >134.5</td>\n", " <td id=\"T_2697a_row3_col3\" class=\"data row3 col3\" >134.5</td>\n",
" <td id=\"T_7ea0f_row3_col4\" class=\"data row3 col4\" >134.5</td>\n", " <td id=\"T_2697a_row3_col4\" class=\"data row3 col4\" >134.5</td>\n",
" <td id=\"T_7ea0f_row3_col5\" class=\"data row3 col5\" >0.0150</td>\n", " <td id=\"T_2697a_row3_col5\" class=\"data row3 col5\" >0.0150</td>\n",
" <td id=\"T_7ea0f_row3_col6\" class=\"data row3 col6\" >259.9</td>\n", " <td id=\"T_2697a_row3_col6\" class=\"data row3 col6\" >259.9</td>\n",
" </tr>\n", " </tr>\n",
" </tbody>\n", " </tbody>\n",
"</table>\n" "</table>\n"
], ],
"text/plain": [ "text/plain": [
"<pandas.io.formats.style.Styler at 0x25ae55fd8b0>" "<pandas.io.formats.style.Styler at 0x204a4dc4950>"
] ]
}, },
"execution_count": 3, "execution_count": 3,
@@ -258,15 +286,26 @@
}, },
{ {
"cell_type": "markdown", "cell_type": "markdown",
"id": "f914b6a3",
"metadata": {}, "metadata": {},
"source": [ "source": [
"## 3. FID curves - progression" "## 3. FID curves - progression\n",
"\n",
"The curves make the DDPM recipe evolution visible. The main evidence is not only that the wider final model wins, but that the big jump happens when the prediction target changes to v-prediction."
] ]
}, },
{ {
"cell_type": "code", "cell_type": "code",
"execution_count": 4, "execution_count": 4,
"metadata": {}, "id": "f4473091",
"metadata": {
"execution": {
"iopub.execute_input": "2026-05-14T19:09:14.126299Z",
"iopub.status.busy": "2026-05-14T19:09:14.126299Z",
"iopub.status.idle": "2026-05-14T19:09:14.871321Z",
"shell.execute_reply": "2026-05-14T19:09:14.868302Z"
}
},
"outputs": [ "outputs": [
{ {
"data": { "data": {
@@ -294,6 +333,7 @@
}, },
{ {
"cell_type": "markdown", "cell_type": "markdown",
"id": "ebf477f6",
"metadata": {}, "metadata": {},
"source": [ "source": [
"## 4. Training loss\n", "## 4. Training loss\n",
@@ -304,7 +344,15 @@
{ {
"cell_type": "code", "cell_type": "code",
"execution_count": 5, "execution_count": 5,
"metadata": {}, "id": "7aa4954f",
"metadata": {
"execution": {
"iopub.execute_input": "2026-05-14T19:09:14.879854Z",
"iopub.status.busy": "2026-05-14T19:09:14.878851Z",
"iopub.status.idle": "2026-05-14T19:09:15.438805Z",
"shell.execute_reply": "2026-05-14T19:09:15.437273Z"
}
},
"outputs": [ "outputs": [
{ {
"data": { "data": {
@@ -332,15 +380,26 @@
}, },
{ {
"cell_type": "markdown", "cell_type": "markdown",
"id": "583bc733",
"metadata": {}, "metadata": {},
"source": [ "source": [
"## 5. Sample grids - epoch 100" "## 5. Sample grids - epoch 100\n",
"\n",
"These grids show the qualitative side of the FID drop. They should be read as independent samples from each checkpoint, with attention to global face structure, texture noise, and artifact frequency."
] ]
}, },
{ {
"cell_type": "code", "cell_type": "code",
"execution_count": 6, "execution_count": 6,
"metadata": {}, "id": "db2f8780",
"metadata": {
"execution": {
"iopub.execute_input": "2026-05-14T19:09:15.444735Z",
"iopub.status.busy": "2026-05-14T19:09:15.444735Z",
"iopub.status.idle": "2026-05-14T19:09:17.113892Z",
"shell.execute_reply": "2026-05-14T19:09:17.109886Z"
}
},
"outputs": [ "outputs": [
{ {
"data": { "data": {
@@ -367,6 +426,7 @@
}, },
{ {
"cell_type": "markdown", "cell_type": "markdown",
"id": "62183fef",
"metadata": {}, "metadata": {},
"source": [ "source": [
"## 6. Progression - epoch 10 -> 50 -> 100\n", "## 6. Progression - epoch 10 -> 50 -> 100\n",
@@ -377,7 +437,15 @@
{ {
"cell_type": "code", "cell_type": "code",
"execution_count": 7, "execution_count": 7,
"metadata": {}, "id": "8fc44956",
"metadata": {
"execution": {
"iopub.execute_input": "2026-05-14T19:09:17.166548Z",
"iopub.status.busy": "2026-05-14T19:09:17.165543Z",
"iopub.status.idle": "2026-05-14T19:09:20.642400Z",
"shell.execute_reply": "2026-05-14T19:09:20.638875Z"
}
},
"outputs": [ "outputs": [
{ {
"data": { "data": {
@@ -438,6 +506,7 @@
}, },
{ {
"cell_type": "markdown", "cell_type": "markdown",
"id": "1e4e025a",
"metadata": {}, "metadata": {},
"source": [ "source": [
"## 7. Noise schedule visualization\n", "## 7. Noise schedule visualization\n",
@@ -448,7 +517,15 @@
{ {
"cell_type": "code", "cell_type": "code",
"execution_count": 8, "execution_count": 8,
"metadata": {}, "id": "ad0a9460",
"metadata": {
"execution": {
"iopub.execute_input": "2026-05-14T19:09:20.684626Z",
"iopub.status.busy": "2026-05-14T19:09:20.683477Z",
"iopub.status.idle": "2026-05-14T19:09:21.615882Z",
"shell.execute_reply": "2026-05-14T19:09:21.613250Z"
}
},
"outputs": [ "outputs": [
{ {
"data": { "data": {
@@ -485,6 +562,7 @@
}, },
{ {
"cell_type": "markdown", "cell_type": "markdown",
"id": "ae1b0cff",
"metadata": {}, "metadata": {},
"source": [ "source": [
"## 8. What this phase proves\n", "## 8. What this phase proves\n",
File diff suppressed because one or more lines are too long
@@ -2,9 +2,10 @@
"cells": [ "cells": [
{ {
"cell_type": "markdown", "cell_type": "markdown",
"id": "c493f3b6",
"metadata": {}, "metadata": {},
"source": [ "source": [
"# Phase 6 - Final Selected Samples\n", "# 07 - Final Sample Showcase\n",
"\n", "\n",
"This final notebook is a small showcase chapter. Phase 5 compared the model\n", "This final notebook is a small showcase chapter. Phase 5 compared the model\n",
"families quantitatively; this notebook selects the three strongest individual\n", "families quantitatively; this notebook selects the three strongest individual\n",
@@ -28,7 +29,15 @@
{ {
"cell_type": "code", "cell_type": "code",
"execution_count": 1, "execution_count": 1,
"metadata": {}, "id": "de83c749",
"metadata": {
"execution": {
"iopub.execute_input": "2026-05-14T19:09:57.835552Z",
"iopub.status.busy": "2026-05-14T19:09:57.834037Z",
"iopub.status.idle": "2026-05-14T19:10:07.120969Z",
"shell.execute_reply": "2026-05-14T19:10:07.118954Z"
}
},
"outputs": [], "outputs": [],
"source": [ "source": [
"import json\n", "import json\n",
@@ -101,6 +110,7 @@
}, },
{ {
"cell_type": "markdown", "cell_type": "markdown",
"id": "c1549235",
"metadata": {}, "metadata": {},
"source": [ "source": [
"## 1. Candidate pool\n", "## 1. Candidate pool\n",
@@ -112,7 +122,15 @@
{ {
"cell_type": "code", "cell_type": "code",
"execution_count": 2, "execution_count": 2,
"metadata": {}, "id": "3d6920c0",
"metadata": {
"execution": {
"iopub.execute_input": "2026-05-14T19:10:07.129524Z",
"iopub.status.busy": "2026-05-14T19:10:07.128523Z",
"iopub.status.idle": "2026-05-14T19:10:07.186457Z",
"shell.execute_reply": "2026-05-14T19:10:07.184404Z"
}
},
"outputs": [ "outputs": [
{ {
"data": { "data": {
@@ -204,6 +222,7 @@
}, },
{ {
"cell_type": "markdown", "cell_type": "markdown",
"id": "ddf6948b",
"metadata": {}, "metadata": {},
"source": [ "source": [
"## 2. Selection method\n", "## 2. Selection method\n",
@@ -222,7 +241,15 @@
{ {
"cell_type": "code", "cell_type": "code",
"execution_count": 3, "execution_count": 3,
"metadata": {}, "id": "5b9c7533",
"metadata": {
"execution": {
"iopub.execute_input": "2026-05-14T19:10:07.195450Z",
"iopub.status.busy": "2026-05-14T19:10:07.194451Z",
"iopub.status.idle": "2026-05-14T19:10:12.841338Z",
"shell.execute_reply": "2026-05-14T19:10:12.837814Z"
}
},
"outputs": [ "outputs": [
{ {
"data": { "data": {
@@ -361,6 +388,7 @@
}, },
{ {
"cell_type": "markdown", "cell_type": "markdown",
"id": "765d5532",
"metadata": {}, "metadata": {},
"source": [ "source": [
"## 3. Top three per architecture\n", "## 3. Top three per architecture\n",
@@ -372,7 +400,15 @@
{ {
"cell_type": "code", "cell_type": "code",
"execution_count": 4, "execution_count": 4,
"metadata": {}, "id": "91930ee8",
"metadata": {
"execution": {
"iopub.execute_input": "2026-05-14T19:10:12.848661Z",
"iopub.status.busy": "2026-05-14T19:10:12.846661Z",
"iopub.status.idle": "2026-05-14T19:10:13.415473Z",
"shell.execute_reply": "2026-05-14T19:10:13.412445Z"
}
},
"outputs": [ "outputs": [
{ {
"data": { "data": {
@@ -577,6 +613,7 @@
}, },
{ {
"cell_type": "markdown", "cell_type": "markdown",
"id": "ff25482c",
"metadata": {}, "metadata": {},
"source": [ "source": [
"## 4. Final selected images\n", "## 4. Final selected images\n",
@@ -587,7 +624,15 @@
{ {
"cell_type": "code", "cell_type": "code",
"execution_count": 5, "execution_count": 5,
"metadata": {}, "id": "57b87ab7",
"metadata": {
"execution": {
"iopub.execute_input": "2026-05-14T19:10:13.422889Z",
"iopub.status.busy": "2026-05-14T19:10:13.420888Z",
"iopub.status.idle": "2026-05-14T19:10:14.341719Z",
"shell.execute_reply": "2026-05-14T19:10:14.339886Z"
}
},
"outputs": [ "outputs": [
{ {
"data": { "data": {
@@ -611,6 +656,7 @@
}, },
{ {
"cell_type": "markdown", "cell_type": "markdown",
"id": "8cd4f5cd",
"metadata": {}, "metadata": {},
"source": [ "source": [
"## 5. Report conclusion\n", "## 5. Report conclusion\n",
+61 -34
View File
@@ -9,6 +9,7 @@ story structure: goal, what changed, evidence, decision, and conclusion.
Real metric values are pulled from outputs/logs/*.json at build time and Real metric values are pulled from outputs/logs/*.json at build time and
rendered into markdown headers and conclusions, so reports never drift from data. rendered into markdown headers and conclusions, so reports never drift from data.
The generated filenames are numbered to make the intended reading order clear.
""" """
import json import json
from pathlib import Path from pathlib import Path
@@ -17,6 +18,18 @@ ROOT = Path(__file__).resolve().parents[1]
LOGS = ROOT / "outputs" / "logs" LOGS = ROOT / "outputs" / "logs"
OUT = ROOT / "notebooks" OUT = ROOT / "notebooks"
NOTEBOOK_SEQUENCE = {
"phase0": "01_baseline_sanity_check",
"phase1": "02_pipeline_selection",
"phase2": "03_gan_stability_progression",
"phase3": "04_vae_loss_progression",
"phase4": "05_ddpm_recipe_progression",
"phase5": "06_final_family_comparison",
"phase6": "07_final_sample_showcase",
}
OLD_GENERATED_PATTERNS = ["phase*.ipynb"]
# notebook helpers # notebook helpers
def md(text): return {"cell_type": "markdown", "metadata": {}, "source": text.splitlines(keepends=True)} def md(text): return {"cell_type": "markdown", "metadata": {}, "source": text.splitlines(keepends=True)}
@@ -36,6 +49,12 @@ def write_nb(name, cells):
path.write_text(json.dumps(nb, indent=1)) path.write_text(json.dumps(nb, indent=1))
print(f" wrote {path.relative_to(ROOT)}") print(f" wrote {path.relative_to(ROOT)}")
def remove_old_generated_notebooks():
for pattern in OLD_GENERATED_PATTERNS:
for path in OUT.glob(pattern):
path.unlink()
print(f" removed {path.relative_to(ROOT)}")
# log-derived facts, computed once and baked into markdown # log-derived facts, computed once and baked into markdown
def load(name): def load(name):
@@ -120,7 +139,7 @@ def build_phase0():
p0 = {n: load(n) for n in ["p0_wgan", "p0_vae", "p0_ddpm", "p0_ddpm_small"]} p0 = {n: load(n) for n in ["p0_wgan", "p0_vae", "p0_ddpm", "p0_ddpm_small"]}
cells = [ cells = [
md(f"""\ md(f"""\
# Phase 0 - Baseline Sanity Check # 01 - Baseline Sanity Check
Phase 0 is the starting point of the generator story. It uses the raw, un-aligned Phase 0 is the starting point of the generator story. It uses the raw, un-aligned
images and very plain versions of each model family so we can confirm that the images and very plain versions of each model family so we can confirm that the
@@ -146,7 +165,7 @@ FID was not logged in Phase 0. The evidence here is loss behavior plus saved
sample grids. sample grids.
"""), """),
code(SHARED_IMPORTS), code(SHARED_IMPORTS),
md("## 1. Training loss curves\n\nThese curves check that the loops ran and produced stable logs. They are not enough to prove visual quality."), md("## 1. Training loss curves\n\nThese curves check that the loops ran and produced stable logs. They are not enough to prove visual quality, but they are needed before interpreting samples: a broken optimization loop would make every later visual comparison meaningless.\n\n**What to look for:** the curves should move smoothly enough to show that each family is learning something. The limitation is that loss scale differs by family, so the curves compare stability, not final image quality."),
code("""\ code("""\
runs = {n: load_log(n) for n in ["p0_wgan", "p0_vae", "p0_ddpm", "p0_ddpm_small"]} runs = {n: load_log(n) for n in ["p0_wgan", "p0_vae", "p0_ddpm", "p0_ddpm_small"]}
runs = {k: v for k, v in runs.items() if v} runs = {k: v for k, v in runs.items() if v}
@@ -181,7 +200,7 @@ axes[2].set_xlabel("Epoch"); axes[2].legend()
plt.tight_layout(); plt.show() plt.tight_layout(); plt.show()
"""), """),
md("## 2. Final sample grids\n\nThe final previews show the practical failure mode of the raw pipeline: the samples have some face-like structure, but identity, alignment, and detail are not under control. These PNGs are displayed exactly as saved, so older Phase 0 matrices keep their original layout instead of being forced into 4x4."), md("## 2. Final sample grids\n\nThe final previews show the practical failure mode of the raw pipeline: the samples have some face-like structure, but identity, alignment, and detail are not under control. These PNGs are displayed exactly as saved, so older Phase 0 matrices keep their original layout instead of being forced into 4x4.\n\n**Why this matters:** this is the visual evidence that the first bottleneck is not only the model family. The data still contains too much pose, scale, and background variation for tiny baseline recipes."),
code("""\ code("""\
last_epochs = {"p0_wgan": 200, "p0_vae": 100, "p0_ddpm": 200, "p0_ddpm_small": 100} last_epochs = {"p0_wgan": 200, "p0_vae": 100, "p0_ddpm": 200, "p0_ddpm_small": 100}
@@ -196,7 +215,7 @@ for ax, (name, ep) in zip(axes, last_epochs.items()):
ax.axis("off") ax.axis("off")
plt.tight_layout(); plt.show() plt.tight_layout(); plt.show()
"""), """),
md("## 3. Progression - early vs late\n\nThe progression grids make the baseline failure visible over time. Later samples improve slightly, but the raw input distribution keeps the task too broad. The saved matrices are shown in their original layout."), md("## 3. Progression - early vs late\n\nThe progression grids make the baseline failure visible over time. Later samples improve slightly, but the raw input distribution keeps the task too broad. The saved matrices are shown in their original layout.\n\n**How to read it:** if more epochs only turn noise into rough face-like blobs, the next decision should be pipeline cleanup rather than simply training the same recipe longer."),
code("""\ code("""\
checkpoints = { checkpoints = {
"p0_wgan": [50, 100, 200], "p0_wgan": [50, 100, 200],
@@ -231,7 +250,7 @@ locks the pipeline before the project spends more compute on stronger recipes.
establishes the baseline failure and motivates the move to aligned face crops. establishes the baseline failure and motivates the move to aligned face crops.
"""), """),
] ]
write_nb("phase0_analysis", cells) write_nb(NOTEBOOK_SEQUENCE["phase0"], cells)
# PHASE 1 - Pipeline ablations with a DCGAN proxy # PHASE 1 - Pipeline ablations with a DCGAN proxy
@@ -246,7 +265,7 @@ def build_phase1():
cells = [ cells = [
md(f"""\ md(f"""\
# Phase 1 - Pipeline Selection # 02 - Pipeline Selection
Phase 1 answers the data-handling question left open by the baseline. Instead Phase 1 answers the data-handling question left open by the baseline. Instead
of changing the model family, it uses a cheap DCGAN proxy and varies one of changing the model family, it uses a cheap DCGAN proxy and varies one
@@ -279,7 +298,7 @@ without any pipeline tuning, and it also collapsed. Phase 1 below uses the same
with the data pipeline systematically varied; the architecture limitation is constant. with the data pipeline systematically varied; the architecture limitation is constant.
"""), """),
code(SHARED_IMPORTS), code(SHARED_IMPORTS),
md("## 1. Load all experiment logs\n\nAll evidence in this notebook comes from the existing Phase 1 logs and sample folders."), md("## 1. Load all experiment logs\n\nAll evidence in this notebook comes from the existing Phase 1 logs and sample folders. The cell is intentionally simple: it only inventories already saved experiments so the reader knows which pipeline ablations are being compared."),
code("""\ code("""\
run_names = sorted(p.stem for p in LOGS.glob("p1*.json")) run_names = sorted(p.stem for p in LOGS.glob("p1*.json"))
runs = {name: load_log(name) for name in run_names} runs = {name: load_log(name) for name in run_names}
@@ -300,7 +319,7 @@ experiment_groups = {
"p1d_dcgan_combined": "Aligned + raw mixed"}, "p1d_dcgan_combined": "Aligned + raw mixed"},
} }
"""), """),
md("## 2. FID comparison table\n\nThe table ranks the proxy runs. The values are useful within Phase 1, but they should not be compared directly with later FID protocols."), md("## 2. FID comparison table\n\nThe table ranks the proxy runs. It is needed because the visual samples alone can be misleading: a run can look slightly better in one grid while still being worse across the saved distribution. The values are useful within Phase 1, but they should not be compared directly with later FID protocols."),
code("""\ code("""\
rows = [] rows = []
for name in run_names: for name in run_names:
@@ -332,7 +351,7 @@ ax.set_title("Phase 1 - FID across all pipeline ablations")
ax.set_xticks(x); ax.set_xticklabels(labels, rotation=30, ha="right") ax.set_xticks(x); ax.set_xticklabels(labels, rotation=30, ha="right")
ax.legend(); plt.tight_layout(); plt.show() ax.legend(); plt.tight_layout(); plt.show()
"""), """),
md("## 3. Controlled ablation results\n\nEach subplot holds the model approximately fixed and changes one pipeline factor. This is the decision evidence for the rest of the generator suite."), md("## 3. Controlled ablation results\n\nEach subplot holds the model approximately fixed and changes one pipeline factor. This is the decision evidence for the rest of the generator suite: alignment, resolution, augmentation, and dataset mixing are treated as pipeline choices, not as disconnected experiments.\n\n**What to look for:** a useful pipeline change should lower FID consistently inside its ablation group, not only produce one nicer-looking example."),
code("""\ code("""\
fig, axes = plt.subplots(2, 2, figsize=(14, 10)) fig, axes = plt.subplots(2, 2, figsize=(14, 10))
axes = axes.flatten() axes = axes.flatten()
@@ -356,6 +375,8 @@ plt.tight_layout(); plt.show()
## 4. Data pipeline visualization ## 4. Data pipeline visualization
What each ablation actually changes, shown on the input data the model sees. What each ablation actually changes, shown on the input data the model sees.
These figures are not model outputs. They explain the input distribution that
each model has to learn, which is why they sit next to the ablation results.
"""), """),
code("""\ code("""\
import random import random
@@ -399,7 +420,7 @@ def show_unavailable(ax, message):
ax.text(0.5, 0.5, message, ha="center", va="center", wrap=True, transform=ax.transAxes) ax.text(0.5, 0.5, message, ha="center", va="center", wrap=True, transform=ax.transAxes)
ax.axis("off") ax.axis("off")
"""), """),
md("### 4A - Resolution\n\nSame raw image at 64x64 and 128x128. This is a paired comparison layout, so it keeps the original 2x4 format instead of being forced into a 4x4 sample grid."), md("### 4A - Resolution\n\nSame raw image at 64x64 and 128x128. This is a paired comparison layout, so it keeps the original 2x4 format instead of being forced into a 4x4 sample grid.\n\n**Interpretation:** 128x128 carries more detail, but it also makes the proxy generator learn a harder distribution. The later decision favors 64x64 because stable face structure matters more than nominal resolution at this budget."),
code("""\ code("""\
paths = sample_paths(RAW, k=4) paths = sample_paths(RAW, k=4)
fig, axes = plt.subplots(2, 4, figsize=(12, 6)) fig, axes = plt.subplots(2, 4, figsize=(12, 6))
@@ -414,7 +435,7 @@ else:
fig.suptitle("1A - Resolution: same image at two scales", fontsize=12, fontweight="bold") fig.suptitle("1A - Resolution: same image at two scales", fontsize=12, fontweight="bold")
plt.tight_layout(); plt.show() plt.tight_layout(); plt.show()
"""), """),
md("### 4B - Alignment\n\nRaw vs MTCNN-aligned 64x64 crops. This paired layout keeps the original 2x4 format so each raw image is directly above its aligned crop."), md("### 4B - Alignment\n\nRaw vs MTCNN-aligned 64x64 crops. This paired layout keeps the original 2x4 format so each raw image is directly above its aligned crop.\n\n**Interpretation:** alignment removes background and scale variation before the generator spends capacity on it. This is why alignment becomes the strongest pipeline lever."),
code("""\ code("""\
pairs = matched_pairs(k=4) pairs = matched_pairs(k=4)
fig, axes = plt.subplots(2, 4, figsize=(12, 6)) fig, axes = plt.subplots(2, 4, figsize=(12, 6))
@@ -429,7 +450,7 @@ else:
fig.suptitle("1B - Alignment: same source image, raw vs MTCNN-aligned", fontsize=12, fontweight="bold") fig.suptitle("1B - Alignment: same source image, raw vs MTCNN-aligned", fontsize=12, fontweight="bold")
plt.tight_layout(); plt.show() plt.tight_layout(); plt.show()
"""), """),
md("### 4C - Augmentation\n\nOne aligned image, then deterministic examples of the saved augmentation idea. This keeps the original compact strip because the point is to compare transforms on one image, not to make a generated 4x4 sample matrix."), md("### 4C - Augmentation\n\nOne aligned image, then deterministic examples of the saved augmentation idea. This keeps the original compact strip because the point is to compare transforms on one image, not to make a generated 4x4 sample matrix.\n\n**Interpretation:** augmentation can make the training distribution broader, but it can also blur already scarce structure. Phase 1 treats it as a pipeline setting to justify, not as an automatic improvement."),
code("""\ code("""\
src = sample_paths(ALIGNED, k=1) src = sample_paths(ALIGNED, k=1)
if src: if src:
@@ -454,7 +475,7 @@ else:
fig.suptitle("1C - Augmentation", fontsize=12, fontweight="bold") fig.suptitle("1C - Augmentation", fontsize=12, fontweight="bold")
plt.tight_layout(); plt.show() plt.tight_layout(); plt.show()
"""), """),
md("### 4D - Dataset mixing\n\nMixing raw and aligned images asks one generator to model two different input distributions. This keeps the original paired 2x4 layout so the contrast is easy to read."), md("### 4D - Dataset mixing\n\nMixing raw and aligned images asks one generator to model two different input distributions. This keeps the original paired 2x4 layout so the contrast is easy to read.\n\n**Interpretation:** mixing increases nuisance variation and makes the generator solve two problems at once. The later phases therefore inherit aligned-only data."),
code("""\ code("""\
pairs = matched_pairs(k=4) pairs = matched_pairs(k=4)
fig, axes = plt.subplots(2, 4, figsize=(12, 6)) fig, axes = plt.subplots(2, 4, figsize=(12, 6))
@@ -492,7 +513,7 @@ decision. Alignment is the main fix; Phase 2 can now focus on the GAN recipe
instead of fighting raw-image variance. instead of fighting raw-image variance.
"""), """),
] ]
write_nb("phase1_analysis", cells) write_nb(NOTEBOOK_SEQUENCE["phase1"], cells)
# PHASE 2 - GAN architecture and objective evolution # PHASE 2 - GAN architecture and objective evolution
@@ -504,7 +525,7 @@ def build_phase2():
cells = [ cells = [
md(f"""\ md(f"""\
# Phase 2 - GAN Progression # 03 - GAN Stability Progression
Phase 2 keeps the Phase 1 pipeline fixed and changes the GAN recipe. This makes Phase 2 keeps the Phase 1 pipeline fixed and changes the GAN recipe. This makes
the question narrow: once the data is aligned, what model changes are needed to the question narrow: once the data is aligned, what model changes are needed to
@@ -591,7 +612,7 @@ df = pd.DataFrame(rows).sort_values("Best FID")
df.style.format({"FID@25": "{:.1f}", "FID@50": "{:.1f}", "FID@100": "{:.1f}", df.style.format({"FID@25": "{:.1f}", "FID@50": "{:.1f}", "FID@100": "{:.1f}",
"Best FID": "{:.1f}", "Train (min)": "{:.1f}"}) "Best FID": "{:.1f}", "Train (min)": "{:.1f}"})
"""), """),
md("## 3. FID curves - progression"), md("## 3. FID curves - progression\n\nThis plot shows whether improvements happen gradually or as a step change. It is needed because the final FID table hides training dynamics: here the key story is that the 2.3 stability package changes the whole trajectory, while 2.1 and 2.2 remain collapsed."),
code("""\ code("""\
fig, ax = plt.subplots(figsize=(10, 5)) fig, ax = plt.subplots(figsize=(10, 5))
cmap = plt.cm.viridis cmap = plt.cm.viridis
@@ -643,7 +664,7 @@ for ax, name in zip(axes, run_names):
show_image_or_missing(ax, img_path, title) show_image_or_missing(ax, img_path, title)
plt.tight_layout(); plt.show() plt.tight_layout(); plt.show()
"""), """),
md("## 6. Progression - epoch 10 -> 50 -> 100"), md("## 6. Progression - epoch 10 -> 50 -> 100\n\nThese panels connect time to visual quality. For the collapsed runs, the gray grids are still information: they show that more epochs did not fix the recipe. For the stabilized run, the same timeline shows recognizable faces emerging."),
code("""\ code("""\
check_epochs = [10, 50, 100] check_epochs = [10, 50, 100]
for name in run_names: for name in run_names:
@@ -659,7 +680,7 @@ for name in run_names:
fig.suptitle(run_labels[name], fontsize=11, fontweight="bold") fig.suptitle(run_labels[name], fontsize=11, fontweight="bold")
plt.tight_layout(); plt.show() plt.tight_layout(); plt.show()
"""), """),
md("## 7. Pairwise comparison - what each step bought us"), md("## 7. Pairwise comparison - what each step bought us\n\nEach pair isolates one decision. The purpose is to avoid saying simply that the final GAN is better: the comparison shows that Wasserstein loss alone is insufficient, the stability package is decisive, and 128x128 is premature under the saved compute budget."),
code("""\ code("""\
transitions = [ transitions = [
("2.1 -> 2.2: BCE -> Wasserstein", "p2_1_dcgan", "p2_2_wgan"), ("2.1 -> 2.2: BCE -> Wasserstein", "p2_1_dcgan", "p2_2_wgan"),
@@ -702,7 +723,7 @@ usable generator recipe, but it also shows that higher resolution is not helpful
without enough training budget. without enough training budget.
"""), """),
] ]
write_nb("phase2_analysis", cells) write_nb(NOTEBOOK_SEQUENCE["phase2"], cells)
# PHASE 3 - VAE composite-loss evolution # PHASE 3 - VAE composite-loss evolution
@@ -713,7 +734,7 @@ def build_phase3():
cells = [ cells = [
md(f"""\ md(f"""\
# Phase 3 - VAE Progression # 04 - VAE Loss Progression
Phase 3 studies the VAE family after the pipeline has been locked. The baseline Phase 3 studies the VAE family after the pipeline has been locked. The baseline
VAE is fast and stable, but its MSE + KL objective tends to average away facial VAE is fast and stable, but its MSE + KL objective tends to average away facial
@@ -775,7 +796,7 @@ df = pd.DataFrame(rows).sort_values("Best FID")
df.style.format({"FID@50": "{:.1f}", "FID@100": "{:.1f}", "Best FID": "{:.1f}", df.style.format({"FID@50": "{:.1f}", "FID@100": "{:.1f}", "Best FID": "{:.1f}",
"Recon@100": "{:.4f}", "KL@100": "{:.2f}", "Train (min)": "{:.1f}"}) "Recon@100": "{:.4f}", "KL@100": "{:.2f}", "Train (min)": "{:.1f}"})
"""), """),
md("## 3. FID curves - progression"), md("## 3. FID curves - progression\n\nThe curves show how each extra loss changes the generation trajectory, not just the final checkpoint. A useful VAE loss should improve prior-sample FID while preserving the stable behavior that makes VAEs attractive."),
code("""\ code("""\
fig, ax = plt.subplots(figsize=(10, 5)) fig, ax = plt.subplots(figsize=(10, 5))
cmap = plt.cm.plasma cmap = plt.cm.plasma
@@ -805,7 +826,7 @@ axes[1].set_title("KL divergence"); axes[1].set_xlabel("Epoch"); axes[1].
axes[2].set_title("Perceptual (VGG16)"); axes[2].set_xlabel("Epoch"); axes[2].legend(fontsize=8) axes[2].set_title("Perceptual (VGG16)"); axes[2].set_xlabel("Epoch"); axes[2].legend(fontsize=8)
plt.tight_layout(); plt.show() plt.tight_layout(); plt.show()
"""), """),
md("## 5. Prior samples - epoch 100"), md("## 5. Prior samples - epoch 100\n\nThese are generated samples from the latent prior, so they answer the true generator question: if we sample a random latent vector, do we get plausible faces? This is different from reconstruction quality."),
code("""\ code("""\
fig, axes = plt.subplots(1, 3, figsize=(13, 4.5)) fig, axes = plt.subplots(1, 3, figsize=(13, 4.5))
for ax, name in zip(axes, run_names): for ax, name in zip(axes, run_names):
@@ -818,7 +839,7 @@ for ax, name in zip(axes, run_names):
fig.suptitle("Prior samples (decoded from N(0, I))", fontsize=12, fontweight="bold") fig.suptitle("Prior samples (decoded from N(0, I))", fontsize=12, fontweight="bold")
plt.tight_layout(); plt.show() plt.tight_layout(); plt.show()
"""), """),
md("## 6. Reconstructions - epoch 100"), md("## 6. Reconstructions - epoch 100\n\nReconstructions show whether the encoder-decoder still preserves real input structure. They are useful as a diagnostic, but they are not the final generation metric because reconstructing a known image is easier than sampling a new one."),
code("""\ code("""\
fig, axes = plt.subplots(1, 3, figsize=(13, 4.5)) fig, axes = plt.subplots(1, 3, figsize=(13, 4.5))
for ax, name in zip(axes, run_names): for ax, name in zip(axes, run_names):
@@ -830,7 +851,7 @@ for ax, name in zip(axes, run_names):
fig.suptitle("Reconstructions (real / decoded interleaved)", fontsize=12, fontweight="bold") fig.suptitle("Reconstructions (real / decoded interleaved)", fontsize=12, fontweight="bold")
plt.tight_layout(); plt.show() plt.tight_layout(); plt.show()
"""), """),
md("## 7. Progression - epoch 10 -> 50 -> 100 (prior samples)"), md("## 7. Progression - epoch 10 -> 50 -> 100 (prior samples)\n\nThe timeline shows how the sampled faces change as the loss stack trains. The limitation remains visible: the VAE becomes more structured and detailed, but it still tends toward smoother faces than GAN or DDPM samples."),
code("""\ code("""\
check_epochs = [10, 50, 100] check_epochs = [10, 50, 100]
for name in run_names: for name in run_names:
@@ -867,7 +888,7 @@ complementary losses, but even the selected recipe remains a speed-oriented
family rather than the strongest quality candidate. family rather than the strongest quality candidate.
"""), """),
] ]
write_nb("phase3_analysis", cells) write_nb(NOTEBOOK_SEQUENCE["phase3"], cells)
# PHASE 4 - DDPM schedule, target, and width evolution # PHASE 4 - DDPM schedule, target, and width evolution
@@ -879,7 +900,7 @@ def build_phase4():
cells = [ cells = [
md(f"""\ md(f"""\
# Phase 4 - DDPM Progression # 05 - DDPM Recipe Progression
Phase 4 applies the same report logic to diffusion models. The pipeline is Phase 4 applies the same report logic to diffusion models. The pipeline is
already fixed, so this notebook isolates the DDPM recipe: schedule, prediction already fixed, so this notebook isolates the DDPM recipe: schedule, prediction
@@ -957,7 +978,7 @@ df = pd.DataFrame(rows).sort_values("Best FID")
df.style.format({"FID@25": "{:.1f}", "FID@50": "{:.1f}", "FID@100": "{:.1f}", df.style.format({"FID@25": "{:.1f}", "FID@50": "{:.1f}", "FID@100": "{:.1f}",
"Best FID": "{:.1f}", "Loss@100": "{:.4f}", "Train (min)": "{:.1f}"}) "Best FID": "{:.1f}", "Loss@100": "{:.4f}", "Train (min)": "{:.1f}"})
"""), """),
md("## 3. FID curves - progression"), md("## 3. FID curves - progression\n\nThe curves make the DDPM recipe evolution visible. The main evidence is not only that the wider final model wins, but that the big jump happens when the prediction target changes to v-prediction."),
code("""\ code("""\
fig, ax = plt.subplots(figsize=(10, 5)) fig, ax = plt.subplots(figsize=(10, 5))
cmap = plt.cm.cividis cmap = plt.cm.cividis
@@ -983,7 +1004,7 @@ ax.set_xlabel("Epoch"); ax.set_ylabel("MSE on prediction target")
ax.set_title("Loss (epsilon-MSE and v-MSE are not directly comparable)") ax.set_title("Loss (epsilon-MSE and v-MSE are not directly comparable)")
ax.legend(); plt.tight_layout(); plt.show() ax.legend(); plt.tight_layout(); plt.show()
"""), """),
md("## 5. Sample grids - epoch 100"), md("## 5. Sample grids - epoch 100\n\nThese grids show the qualitative side of the FID drop. They should be read as independent samples from each checkpoint, with attention to global face structure, texture noise, and artifact frequency."),
code("""\ code("""\
fig, axes = plt.subplots(1, 4, figsize=(16, 4.5)) fig, axes = plt.subplots(1, 4, figsize=(16, 4.5))
for ax, name in zip(axes, run_names): for ax, name in zip(axes, run_names):
@@ -1054,7 +1075,7 @@ base_ch=192, and attention at 32/16/8 for the final comparison.
into the strongest quality candidate for Phase 5. into the strongest quality candidate for Phase 5.
"""), """),
] ]
write_nb("phase4_analysis", cells) write_nb(NOTEBOOK_SEQUENCE["phase4"], cells)
# PHASE 5 - Cross-family final comparison # PHASE 5 - Cross-family final comparison
@@ -1069,7 +1090,7 @@ def build_phase5():
cells = [ cells = [
md(f"""\ md(f"""\
# Phase 5 - Final Comparison # 06 - Final Family Comparison
Phase 5 is the project-level comparison. It loads the already trained best Phase 5 is the project-level comparison. It loads the already trained best
recipes from the GAN, VAE, and DDPM branches and compares their saved logs, recipes from the GAN, VAE, and DDPM branches and compares their saved logs,
@@ -1136,7 +1157,7 @@ ax.set_title("Final comparison: quality vs training time")
ax.grid(alpha=0.25) ax.grid(alpha=0.25)
plt.tight_layout(); plt.show() plt.tight_layout(); plt.show()
"""), """),
md("## 3. FID curves - all three families"), md("## 3. FID curves - all three families\n\nThis plot puts the selected family recipes on one timeline. It is needed because the best final FID alone does not show convergence behavior: DDPM reaches the best quality, GAN remains close with less time, and VAE is fast but saturates at a higher FID."),
code("""\ code("""\
fig, ax = plt.subplots(figsize=(10, 5)) fig, ax = plt.subplots(figsize=(10, 5))
for fam, info in FAMILIES.items(): for fam, info in FAMILIES.items():
@@ -1222,6 +1243,11 @@ Smooth interpolation between two latent codes reveals whether the generator has
learned a continuous manifold rather than a sparse memorisation. DDPM has no learned a continuous manifold rather than a sparse memorisation. DDPM has no
encoder, so this section is GAN/VAE only. encoder, so this section is GAN/VAE only.
The interpolation figures are not a ranking metric. They are included to make
the learned representation easier to inspect: smooth transitions support the
claim that the models learned a face manifold, while sudden jumps would suggest
fragile or memorised structure.
**Checkpoint loading note:** the cell below uses the same priority as **Checkpoint loading note:** the cell below uses the same priority as
`tools/sampling.py`: `final_ema` first, then `best_ema` as fallback. This avoids `tools/sampling.py`: `final_ema` first, then `best_ema` as fallback. This avoids
using a best-FID EMA snapshot that may have been saved very early for a using a best-FID EMA snapshot that may have been saved very early for a
@@ -1340,14 +1366,14 @@ The final comparison supports DDPM as the best-quality generator and GAN as the
best practical compromise. best practical compromise.
"""), """),
] ]
write_nb("phase5_analysis", cells) write_nb(NOTEBOOK_SEQUENCE["phase5"], cells)
# PHASE 6 - Final selected sample showcase # PHASE 6 - Final selected sample showcase
def build_phase6(): def build_phase6():
cells = [ cells = [
md("""\ md("""\
# Phase 6 - Final Selected Samples # 07 - Final Sample Showcase
This final notebook is a small showcase chapter. Phase 5 compared the model This final notebook is a small showcase chapter. Phase 5 compared the model
families quantitatively; this notebook selects the three strongest individual families quantitatively; this notebook selects the three strongest individual
@@ -1565,11 +1591,12 @@ large generated pool, so they should be used as final qualitative examples, not
as a replacement for the full distribution-level metrics. as a replacement for the full distribution-level metrics.
"""), """),
] ]
write_nb("phase6_final_showcase", cells) write_nb(NOTEBOOK_SEQUENCE["phase6"], cells)
if __name__ == "__main__": if __name__ == "__main__":
print("Building notebooks...") print("Building notebooks...")
remove_old_generated_notebooks()
build_phase0() build_phase0()
build_phase1() build_phase1()
build_phase2() build_phase2()