{ "cells": [ { "cell_type": "markdown", "id": "2d43e849", "metadata": {}, "source": [ "# 05 - DDPM Recipe Progression\n", "\n", "Phase 4 applies the same report logic to diffusion models. The pipeline is\n", "already fixed, so this notebook isolates the DDPM recipe: schedule, prediction\n", "target, and backbone width.\n", "\n", "The story is stepwise. A cosine schedule helps, v-prediction is the major gain,\n", "and the wider backbone becomes useful only after the target and schedule are\n", "improved.\n", "\n", "## What this phase changes\n", "\n", "| Run | Recipe change |\n", "|---|---|\n", "| `p4_1_ddpm_linear` | Linear noise schedule, epsilon-prediction |\n", "| `p4_2_ddpm_cosine` | Cosine noise schedule |\n", "| `p4_3_ddpm_vpred` | v-prediction target |\n", "| `p4_4_ddpm_wider` | Wider U-Net: base channels 192 with attention at 32/16/8 |\n", "\n", "Sampling previews use DDIM-50. Logged FID uses DDIM-100 against the saved real\n", "reference set.\n", "\n", "**Headline result:** `p4_4_ddpm_wider` reaches **best FID = 30.0**.\n", "\n", "## How to read DDPM sample grids\n", "\n", "The DDPM grids should not be read as the same faces improving from epoch to\n", "epoch. GAN and VAE previews can reuse a fixed latent grid, so each position can\n", "look like the same latent code becoming sharper over training. A DDPM preview\n", "starts from noise and runs a stochastic reverse-diffusion sampler. Unless the\n", "exact initial noise and sampler randomness are fixed and stored, each epoch\n", "preview is a fresh draw from the model.\n", "\n", "So for DDPM, the progression panels show distribution-level improvement:\n", "cleaner faces, fewer artifacts, and better global structure. They are not\n", "identity-by-identity refinements of the same preview images.\n" ] }, { "cell_type": "markdown", "id": "fe4b4147", "metadata": {}, "source": [ "### Reference: Phase 0 baseline from the same family\n", "\n", "`p0_ddpm` was a vanilla DDPM (linear schedule, epsilon-prediction, base_ch=128) on raw\n", "un-aligned data. Outputs were noisy face-shaped textures. Phase 4 fixes the\n", "pipeline (aligned 64) and walks through the standard set of post-2020 DDPM\n", "improvements one at a time.\n" ] }, { "cell_type": "code", "execution_count": 1, "id": "ad73d7ca", "metadata": { "execution": { "iopub.execute_input": "2026-05-14T19:09:11.615922Z", "iopub.status.busy": "2026-05-14T19:09:11.615922Z", "iopub.status.idle": "2026-05-14T19:09:13.772612Z", "shell.execute_reply": "2026-05-14T19:09:13.769968Z" } }, "outputs": [], "source": [ "import json\n", "from pathlib import Path\n", "\n", "import matplotlib.pyplot as plt\n", "import matplotlib.image as mpimg\n", "import numpy as np\n", "import pandas as pd\n", "\n", "plt.rcParams.update({\"figure.dpi\": 120, \"font.size\": 10})\n", "\n", "try:\n", " display\n", "except NameError:\n", " def display(obj):\n", " print(obj)\n", "\n", "def find_generator_root():\n", " for base in [Path.cwd(), *Path.cwd().parents]:\n", " for candidate in [base, base / \"generator\"]:\n", " if (candidate / \"outputs\" / \"logs\").exists() and (candidate / \"outputs\" / \"samples\").exists():\n", " return candidate.resolve()\n", " raise FileNotFoundError(\"Could not locate generator/outputs from the current working directory\")\n", "\n", "GENERATOR_ROOT = find_generator_root()\n", "PROJECT_ROOT = GENERATOR_ROOT.parent\n", "OUTPUTS = GENERATOR_ROOT / \"outputs\"\n", "LOGS = OUTPUTS / \"logs\"\n", "SAMPLES = OUTPUTS / \"samples\"\n", "\n", "\n", "def load_log(name):\n", " p = LOGS / f\"{name}.json\"\n", " return json.load(open(p)) if p.exists() else None\n", "\n", "def get_fid(log, epoch):\n", " fid = log.get(\"history\", {}).get(\"fid\", {})\n", " return fid.get(str(epoch))\n", "\n", "def fid_series(log):\n", " fid = log.get(\"history\", {}).get(\"fid\", {})\n", " items = sorted((int(k), v) for k, v in fid.items())\n", " return [e for e, _ in items], [v for _, v in items]\n", "\n", "def show_image_or_missing(ax, path, title=None):\n", " if path.exists():\n", " ax.imshow(mpimg.imread(str(path)))\n", " else:\n", " ax.text(0.5, 0.5, f\"missing artifact\\n{path.name}\", ha=\"center\", va=\"center\", transform=ax.transAxes)\n", " if title:\n", " ax.set_title(title, fontsize=9)\n", " ax.axis(\"off\")\n", "\n" ] }, { "cell_type": "markdown", "id": "2fd648bc", "metadata": {}, "source": [ "## 1. Load experiment logs\n", "\n", "The notebook reads existing DDPM logs only. Sampling and FID values are already saved." ] }, { "cell_type": "code", "execution_count": 2, "id": "5abd8b09", "metadata": { "execution": { "iopub.execute_input": "2026-05-14T19:09:13.780145Z", "iopub.status.busy": "2026-05-14T19:09:13.779153Z", "iopub.status.idle": "2026-05-14T19:09:13.800005Z", "shell.execute_reply": "2026-05-14T19:09:13.797633Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " p4_1_ddpm_linear: OK\n", " p4_2_ddpm_cosine: OK\n", " p4_3_ddpm_vpred: OK\n", " p4_4_ddpm_wider: OK\n" ] } ], "source": [ "run_names = [\"p4_1_ddpm_linear\", \"p4_2_ddpm_cosine\", \"p4_3_ddpm_vpred\", \"p4_4_ddpm_wider\"]\n", "run_labels = {\n", " \"p4_1_ddpm_linear\": \"4.1 linear / epsilon\",\n", " \"p4_2_ddpm_cosine\": \"4.2 cosine / epsilon\",\n", " \"p4_3_ddpm_vpred\": \"4.3 cosine / v\",\n", " \"p4_4_ddpm_wider\": \"4.4 wider net\",\n", "}\n", "runs = {n: load_log(n) for n in run_names}\n", "runs = {k: v for k, v in runs.items() if v}\n", "for n in run_names: print(f\" {n}: {'OK' if n in runs else 'MISSING'}\")\n" ] }, { "cell_type": "markdown", "id": "4d563bd4", "metadata": {}, "source": [ "## 2. FID comparison table\n", "\n", "The table shows whether each recipe change improves generation quality under the saved DDIM-100 FID protocol." ] }, { "cell_type": "code", "execution_count": 3, "id": "bf039a10", "metadata": { "execution": { "iopub.execute_input": "2026-05-14T19:09:13.808455Z", "iopub.status.busy": "2026-05-14T19:09:13.807458Z", "iopub.status.idle": "2026-05-14T19:09:14.118770Z", "shell.execute_reply": "2026-05-14T19:09:14.115475Z" } }, "outputs": [ { "data": { "text/html": [ "\n", "
| \n", " | Run | \n", "FID@25 | \n", "FID@50 | \n", "FID@100 | \n", "Best FID | \n", "Loss@100 | \n", "Train (min) | \n", "
|---|---|---|---|---|---|---|---|
| 3 | \n", "4.4 wider net | \n", "325.0 | \n", "170.9 | \n", "30.0 | \n", "30.0 | \n", "0.0582 | \n", "536.4 | \n", "
| 2 | \n", "4.3 cosine / v | \n", "315.9 | \n", "160.7 | \n", "34.5 | \n", "34.5 | \n", "0.0594 | \n", "278.8 | \n", "
| 1 | \n", "4.2 cosine / epsilon | \n", "249.7 | \n", "282.2 | \n", "132.3 | \n", "132.3 | \n", "0.0285 | \n", "258.8 | \n", "
| 0 | \n", "4.1 linear / epsilon | \n", "333.3 | \n", "311.4 | \n", "134.5 | \n", "134.5 | \n", "0.0150 | \n", "259.9 | \n", "