{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Phase 4 — DDPM Evolution\n", "\n", "Iterate on the diffusion side: noise schedule, prediction target, model width.\n", "Sampling everywhere uses DDIM with 50 steps (matches training preview); FID\n", "uses DDIM-100 against 5000 real images.\n", "\n", "| Run | Step |\n", "|--------------------|--------------------------------------------------------|\n", "| `p4_1_ddpm_linear` | Linear β-schedule, ε-prediction (DDPM baseline) |\n", "| `p4_2_ddpm_cosine` | Cosine β-schedule (Nichol & Dhariwal) |\n", "| `p4_3_ddpm_vpred` | + v-prediction target (Salimans & Ho) |\n", "| `p4_4_ddpm_wider` | + base_ch 128 → 192, num_res_blocks 2, attn at 32/16/8 |\n", "\n", "**Headline result:** `p4_4_ddpm_wider` — **best FID = 30.0** at 100 epochs.\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Reference: phase 0 baseline (same family)\n", "\n", "`p0_ddpm` was a vanilla DDPM (linear schedule, ε-prediction, base_ch=128) on raw\n", "un-aligned data. Outputs were noisy face-shaped textures. Phase 4 fixes the\n", "pipeline (aligned 64) and walks through the standard set of post-2020 DDPM\n", "improvements one at a time.\n" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "import json\n", "from pathlib import Path\n", "\n", "import matplotlib.pyplot as plt\n", "import matplotlib.image as mpimg\n", "import numpy as np\n", "import pandas as pd\n", "\n", "plt.rcParams.update({\"figure.dpi\": 120, \"font.size\": 10})\n", "\n", "OUTPUTS = Path(\"../outputs\")\n", "LOGS = OUTPUTS / \"logs\"\n", "SAMPLES = OUTPUTS / \"samples\"\n", "\n", "\n", "def load_log(name):\n", " p = LOGS / f\"{name}.json\"\n", " return json.load(open(p)) if p.exists() else None\n", "\n", "def get_fid(log, epoch):\n", " fid = log.get(\"history\", {}).get(\"fid\", {})\n", " return fid.get(str(epoch))\n", "\n", "def fid_series(log):\n", " fid = log.get(\"history\", {}).get(\"fid\", {})\n", " items = sorted((int(k), v) for k, v in fid.items())\n", " return [e for e, _ in items], [v for _, v in items]\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 1. Load experiment logs" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " p4_1_ddpm_linear: OK\n", " p4_2_ddpm_cosine: OK\n", " p4_3_ddpm_vpred: OK\n", " p4_4_ddpm_wider: OK\n" ] } ], "source": [ "run_names = [\"p4_1_ddpm_linear\", \"p4_2_ddpm_cosine\", \"p4_3_ddpm_vpred\", \"p4_4_ddpm_wider\"]\n", "run_labels = {\n", " \"p4_1_ddpm_linear\": \"4.1 linear / ε\",\n", " \"p4_2_ddpm_cosine\": \"4.2 cosine / ε\",\n", " \"p4_3_ddpm_vpred\": \"4.3 cosine / v\",\n", " \"p4_4_ddpm_wider\": \"4.4 wider net\",\n", "}\n", "runs = {n: load_log(n) for n in run_names}\n", "runs = {k: v for k, v in runs.items() if v}\n", "for n in run_names: print(f\" {n}: {'OK' if n in runs else 'MISSING'}\")\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 2. FID comparison table" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "
| \n", " | Run | \n", "FID@25 | \n", "FID@50 | \n", "FID@100 | \n", "Best FID | \n", "Loss@100 | \n", "Train (min) | \n", "
|---|---|---|---|---|---|---|---|
| 3 | \n", "4.4 wider net | \n", "325.0 | \n", "170.9 | \n", "30.0 | \n", "30.0 | \n", "0.0582 | \n", "536.4 | \n", "
| 2 | \n", "4.3 cosine / v | \n", "315.9 | \n", "160.7 | \n", "34.5 | \n", "34.5 | \n", "0.0594 | \n", "278.8 | \n", "
| 1 | \n", "4.2 cosine / ε | \n", "249.7 | \n", "282.2 | \n", "132.3 | \n", "132.3 | \n", "0.0285 | \n", "258.8 | \n", "
| 0 | \n", "4.1 linear / ε | \n", "333.3 | \n", "311.4 | \n", "134.5 | \n", "134.5 | \n", "0.0150 | \n", "259.9 | \n", "