DRL_PROJ/classifier/notebooks/07_phase4_data_scaling_analysis.ipynb

{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 07 - Phase 4 Data-Scaling Analysis\n",
    "\n",
    "Phase 4 is the natural next question after Phase 3. Once the best model families have been identified at the 20% data setting, the experiment asks whether more training data improves performance and source generalization.\n",
    "\n",
    "In the repository state used to create this notebook, Phase 4 configs exist but no `p4_*` logs or checkpoints are present under `classifier/outputs`. For that reason, this notebook is result-gated: it documents the planned experiment matrix now, and the analysis cells automatically switch on when the corresponding logs are added later.\n",
    "\n",
    "No Phase 4 metric is claimed unless it is loaded from an existing `classifier/outputs/logs/p4_*.json` file.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Project root: c:\\Users\\diogo\\Documents\\MIA_UP\\2_Semestre\\DRL\\DRL_2\\DRL_PROJ\n"
     ]
    }
   ],
   "source": [
    "from __future__ import annotations\n",
    "\n",
    "import json\n",
    "import sys\n",
    "from pathlib import Path\n",
    "\n",
    "import matplotlib.pyplot as plt\n",
    "import numpy as np\n",
    "import pandas as pd\n",
    "\n",
    "\n",
    "def find_project_root(start: Path | None = None) -> Path:\n",
    "    \"\"\"Find DRL_PROJ whether the notebook runs from repo root or classifier/notebooks.\"\"\"\n",
    "    start = Path.cwd() if start is None else Path(start)\n",
    "    for candidate in [start, *start.parents]:\n",
    "        if (candidate / \"classifier\").is_dir() and (candidate / \"docs\" / \"DRL_Project.md\").exists():\n",
    "            return candidate\n",
    "    raise RuntimeError(\"Could not find DRL_PROJ root. Run this notebook from inside the repository.\")\n",
    "\n",
    "\n",
    "PROJECT_ROOT = find_project_root()\n",
    "CLASSIFIER_ROOT = PROJECT_ROOT / \"classifier\"\n",
    "if str(CLASSIFIER_ROOT) not in sys.path:\n",
    "    sys.path.insert(0, str(CLASSIFIER_ROOT))\n",
    "\n",
    "CONFIGS_DIR = CLASSIFIER_ROOT / \"configs\"\n",
    "LOGS_DIR = CLASSIFIER_ROOT / \"outputs\" / \"logs\"\n",
    "MODELS_DIR = CLASSIFIER_ROOT / \"outputs\" / \"models\"\n",
    "FIGURES_DIR = CLASSIFIER_ROOT / \"outputs\" / \"figures\"\n",
    "ANALYSIS_DIR = CLASSIFIER_ROOT / \"outputs\" / \"analysis\"\n",
    "FIGURES_DIR.mkdir(parents=True, exist_ok=True)\n",
    "ANALYSIS_DIR.mkdir(parents=True, exist_ok=True)\n",
    "\n",
    "print(f\"Project root: {PROJECT_ROOT}\")\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 1. Planned Phase 4 matrix\n",
    "\n",
    "The Phase 4 configs keep the Phase 3 preprocessing setup: pretrained backbones, 224x224 input, facecropped classifier data, no augmentation, and the same cross-validation protocol. The controlled variable is data fraction: 20%, 50%, and 100%.\n",
    "\n",
    "The model families selected for scaling are ResNet50, EfficientNet-B0, and ConvNeXt-Tiny. They are the strongest Phase 3 families and represent different capacity/efficiency tradeoffs.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>run</th>\n",
       "      <th>backbone</th>\n",
       "      <th>subsample</th>\n",
       "      <th>image_size</th>\n",
       "      <th>data_dir</th>\n",
       "      <th>augment</th>\n",
       "      <th>pretrained</th>\n",
       "      <th>epochs</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>p4_convnext_tiny_20pct</td>\n",
       "      <td>convnext_tiny</td>\n",
       "      <td>0.2</td>\n",
       "      <td>224</td>\n",
       "      <td>cropped/classifier</td>\n",
       "      <td>False</td>\n",
       "      <td>True</td>\n",
       "      <td>15</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>p4_convnext_tiny_50pct</td>\n",
       "      <td>convnext_tiny</td>\n",
       "      <td>0.5</td>\n",
       "      <td>224</td>\n",
       "      <td>cropped/classifier</td>\n",
       "      <td>False</td>\n",
       "      <td>True</td>\n",
       "      <td>15</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>p4_convnext_tiny_100pct</td>\n",
       "      <td>convnext_tiny</td>\n",
       "      <td>1.0</td>\n",
       "      <td>224</td>\n",
       "      <td>cropped/classifier</td>\n",
       "      <td>False</td>\n",
       "      <td>True</td>\n",
       "      <td>15</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>p4_efficientnet_b0_20pct</td>\n",
       "      <td>efficientnet_b0</td>\n",
       "      <td>0.2</td>\n",
       "      <td>224</td>\n",
       "      <td>cropped/classifier</td>\n",
       "      <td>False</td>\n",
       "      <td>True</td>\n",
       "      <td>15</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>p4_efficientnet_b0_50pct</td>\n",
       "      <td>efficientnet_b0</td>\n",
       "      <td>0.5</td>\n",
       "      <td>224</td>\n",
       "      <td>cropped/classifier</td>\n",
       "      <td>False</td>\n",
       "      <td>True</td>\n",
       "      <td>15</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>p4_efficientnet_b0_100pct</td>\n",
       "      <td>efficientnet_b0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>224</td>\n",
       "      <td>cropped/classifier</td>\n",
       "      <td>False</td>\n",
       "      <td>True</td>\n",
       "      <td>15</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>p4_resnet50_20pct</td>\n",
       "      <td>resnet50</td>\n",
       "      <td>0.2</td>\n",
       "      <td>224</td>\n",
       "      <td>cropped/classifier</td>\n",
       "      <td>False</td>\n",
       "      <td>True</td>\n",
       "      <td>15</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>p4_resnet50_50pct</td>\n",
       "      <td>resnet50</td>\n",
       "      <td>0.5</td>\n",
       "      <td>224</td>\n",
       "      <td>cropped/classifier</td>\n",
       "      <td>False</td>\n",
       "      <td>True</td>\n",
       "      <td>15</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>p4_resnet50_100pct</td>\n",
       "      <td>resnet50</td>\n",
       "      <td>1.0</td>\n",
       "      <td>224</td>\n",
       "      <td>cropped/classifier</td>\n",
       "      <td>False</td>\n",
       "      <td>True</td>\n",
       "      <td>15</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                         run         backbone  subsample  image_size  \\\n",
       "1     p4_convnext_tiny_20pct    convnext_tiny        0.2         224   \n",
       "2     p4_convnext_tiny_50pct    convnext_tiny        0.5         224   \n",
       "0    p4_convnext_tiny_100pct    convnext_tiny        1.0         224   \n",
       "4   p4_efficientnet_b0_20pct  efficientnet_b0        0.2         224   \n",
       "5   p4_efficientnet_b0_50pct  efficientnet_b0        0.5         224   \n",
       "3  p4_efficientnet_b0_100pct  efficientnet_b0        1.0         224   \n",
       "7          p4_resnet50_20pct         resnet50        0.2         224   \n",
       "8          p4_resnet50_50pct         resnet50        0.5         224   \n",
       "6         p4_resnet50_100pct         resnet50        1.0         224   \n",
       "\n",
       "             data_dir  augment  pretrained  epochs  \n",
       "1  cropped/classifier    False        True      15  \n",
       "2  cropped/classifier    False        True      15  \n",
       "0  cropped/classifier    False        True      15  \n",
       "4  cropped/classifier    False        True      15  \n",
       "5  cropped/classifier    False        True      15  \n",
       "3  cropped/classifier    False        True      15  \n",
       "7  cropped/classifier    False        True      15  \n",
       "8  cropped/classifier    False        True      15  \n",
       "6  cropped/classifier    False        True      15  "
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th>subsample</th>\n",
       "      <th>0.2</th>\n",
       "      <th>0.5</th>\n",
       "      <th>1.0</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>backbone</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>convnext_tiny</th>\n",
       "      <td>p4_convnext_tiny_20pct</td>\n",
       "      <td>p4_convnext_tiny_50pct</td>\n",
       "      <td>p4_convnext_tiny_100pct</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>efficientnet_b0</th>\n",
       "      <td>p4_efficientnet_b0_20pct</td>\n",
       "      <td>p4_efficientnet_b0_50pct</td>\n",
       "      <td>p4_efficientnet_b0_100pct</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>resnet50</th>\n",
       "      <td>p4_resnet50_20pct</td>\n",
       "      <td>p4_resnet50_50pct</td>\n",
       "      <td>p4_resnet50_100pct</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "subsample                             0.2                       0.5  \\\n",
       "backbone                                                              \n",
       "convnext_tiny      p4_convnext_tiny_20pct    p4_convnext_tiny_50pct   \n",
       "efficientnet_b0  p4_efficientnet_b0_20pct  p4_efficientnet_b0_50pct   \n",
       "resnet50                p4_resnet50_20pct         p4_resnet50_50pct   \n",
       "\n",
       "subsample                              1.0  \n",
       "backbone                                    \n",
       "convnext_tiny      p4_convnext_tiny_100pct  \n",
       "efficientnet_b0  p4_efficientnet_b0_100pct  \n",
       "resnet50                p4_resnet50_100pct  "
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "def load_json(path: Path) -> dict:\n",
    "    return json.loads(path.read_text(encoding=\"utf-8\"))\n",
    "\n",
    "\n",
    "def resolve_config(path: Path) -> dict:\n",
    "    cfg = load_json(path)\n",
    "    parent = cfg.pop(\"extends\", None)\n",
    "    if parent:\n",
    "        base = resolve_config(path.parent / parent)\n",
    "        base.update(cfg)\n",
    "        cfg = base\n",
    "    return cfg\n",
    "\n",
    "\n",
    "phase4_configs = []\n",
    "for path in sorted((CONFIGS_DIR / \"phase4\").glob(\"p4_*.json\")):\n",
    "    cfg = resolve_config(path)\n",
    "    phase4_configs.append({\n",
    "        \"run\": cfg.get(\"run_name\", path.stem),\n",
    "        \"backbone\": cfg.get(\"backbone\"),\n",
    "        \"subsample\": cfg.get(\"subsample\"),\n",
    "        \"image_size\": cfg.get(\"image_size\"),\n",
    "        \"data_dir\": cfg.get(\"data_dir\"),\n",
    "        \"augment\": cfg.get(\"augment\", False),\n",
    "        \"pretrained\": cfg.get(\"pretrained\", True),\n",
    "        \"epochs\": cfg.get(\"epochs\"),\n",
    "    })\n",
    "\n",
    "config_df = pd.DataFrame(phase4_configs).sort_values([\"backbone\", \"subsample\"])\n",
    "display(config_df)\n",
    "\n",
    "matrix = config_df.pivot(index=\"backbone\", columns=\"subsample\", values=\"run\")\n",
    "display(matrix)\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 2. Result gate\n",
    "\n",
    "The cell below checks for `p4_*` logs. If none are present, Phase 4 remains a planned experiment and the notebook stops at the design/status interpretation. If logs are added later, the same notebook will load them and produce scaling curves, source diagnostics, and report-ready conclusions from those logs.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "No Phase 4 result logs found under classifier/outputs/logs.\n",
      "Missing planned runs:\n",
      "- p4_convnext_tiny_100pct\n",
      "- p4_convnext_tiny_20pct\n",
      "- p4_convnext_tiny_50pct\n",
      "- p4_efficientnet_b0_100pct\n",
      "- p4_efficientnet_b0_20pct\n",
      "- p4_efficientnet_b0_50pct\n",
      "- p4_resnet50_100pct\n",
      "- p4_resnet50_20pct\n",
      "- p4_resnet50_50pct\n"
     ]
    }
   ],
   "source": [
    "def log_path(run_name: str) -> Path:\n",
    "    return LOGS_DIR / f\"{run_name}.json\"\n",
    "\n",
    "\n",
    "def load_run_if_present(run_name: str) -> dict | None:\n",
    "    path = log_path(run_name)\n",
    "    return load_json(path) if path.exists() else None\n",
    "\n",
    "\n",
    "def agg_metric(results: dict, metric: str, field: str = \"mean\"):\n",
    "    return results.get(\"aggregated_metrics\", {}).get(metric, {}).get(field, np.nan)\n",
    "\n",
    "\n",
    "def checkpoint_mb(run_name: str) -> float:\n",
    "    path = MODELS_DIR / f\"{run_name}_fold0_best.pt\"\n",
    "    return path.stat().st_size / (1024 * 1024) if path.exists() else np.nan\n",
    "\n",
    "\n",
    "result_rows = []\n",
    "missing_runs = []\n",
    "for row in phase4_configs:\n",
    "    run_name = row[\"run\"]\n",
    "    results = load_run_if_present(run_name)\n",
    "    if results is None:\n",
    "        missing_runs.append(run_name)\n",
    "        continue\n",
    "    cfg = {**row, **results.get(\"config\", {})}\n",
    "    result_rows.append({\n",
    "        \"run\": run_name,\n",
    "        \"backbone\": cfg.get(\"backbone\"),\n",
    "        \"subsample\": cfg.get(\"subsample\", row.get(\"subsample\")),\n",
    "        \"auc\": agg_metric(results, \"auc_roc\"),\n",
    "        \"auc_std\": agg_metric(results, \"auc_roc\", \"std\"),\n",
    "        \"accuracy\": agg_metric(results, \"accuracy\"),\n",
    "        \"f1\": agg_metric(results, \"f1\"),\n",
    "        \"checkpoint_mb\": checkpoint_mb(run_name),\n",
    "    })\n",
    "\n",
    "phase4_results_df = pd.DataFrame(result_rows)\n",
    "if phase4_results_df.empty:\n",
    "    print(\"No Phase 4 result logs found under classifier/outputs/logs.\")\n",
    "    print(\"Missing planned runs:\")\n",
    "    for run_name in missing_runs:\n",
    "        print(f\"- {run_name}\")\n",
    "else:\n",
    "    phase4_results_df = phase4_results_df.sort_values([\"backbone\", \"subsample\"])\n",
    "    display(\n",
    "        phase4_results_df.style.format({\n",
    "            \"subsample\": \"{:.1f}\",\n",
    "            \"auc\": \"{:.4f}\",\n",
    "            \"auc_std\": \"{:.4f}\",\n",
    "            \"accuracy\": \"{:.4f}\",\n",
    "            \"f1\": \"{:.4f}\",\n",
    "            \"checkpoint_mb\": \"{:.1f}\",\n",
    "        })\n",
    "    )\n",
    "    if missing_runs:\n",
    "        print(\"Some planned Phase 4 runs are still missing:\")\n",
    "        for run_name in missing_runs:\n",
    "            print(f\"- {run_name}\")\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "If the status cell reports no logs, the correct interpretation is: Phase 4 has been designed but not yet analyzed. The report can describe the intended purpose, but it should not include Phase 4 performance claims.\n",
    "\n",
    "When logs exist, the next sections answer three questions: does more data improve each backbone, which backbone benefits most, and does scaling reduce source-specific weakness?\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 3. Scaling curves, once results exist\n",
    "\n",
    "These cells are guarded. They produce figures only when at least one `p4_*` log is available.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Skipping scaling plots because no p4 logs are available yet.\n"
     ]
    }
   ],
   "source": [
    "if phase4_results_df.empty:\n",
    "    print(\"Skipping scaling plots because no p4 logs are available yet.\")\n",
    "else:\n",
    "    fig, axes = plt.subplots(1, 3, figsize=(13, 4), sharex=True)\n",
    "    for ax, metric, title in zip(axes, [\"auc\", \"accuracy\", \"f1\"], [\"AUC\", \"Accuracy\", \"F1\"]):\n",
    "        for backbone, sub in phase4_results_df.groupby(\"backbone\"):\n",
    "            sub = sub.sort_values(\"subsample\")\n",
    "            ax.plot(sub[\"subsample\"] * 100, sub[metric], marker=\"o\", label=backbone)\n",
    "        ax.set_xlabel(\"Training data used (%)\")\n",
    "        ax.set_ylabel(title)\n",
    "        ax.set_title(f\"Phase 4 {title} scaling\")\n",
    "        ax.grid(alpha=0.25)\n",
    "    axes[0].legend(fontsize=8)\n",
    "    fig.tight_layout()\n",
    "    fig.savefig(FIGURES_DIR / \"07_phase4_scaling_curves.png\", dpi=180, bbox_inches=\"tight\")\n",
    "    plt.show()\n",
    "\n",
    "    gains = []\n",
    "    for backbone, sub in phase4_results_df.groupby(\"backbone\"):\n",
    "        sub = sub.set_index(\"subsample\")\n",
    "        if 0.2 in sub.index and 1.0 in sub.index:\n",
    "            gains.append({\n",
    "                \"backbone\": backbone,\n",
    "                \"auc_20pct\": sub.loc[0.2, \"auc\"],\n",
    "                \"auc_100pct\": sub.loc[1.0, \"auc\"],\n",
    "                \"auc_gain\": sub.loc[1.0, \"auc\"] - sub.loc[0.2, \"auc\"],\n",
    "                \"accuracy_gain\": sub.loc[1.0, \"accuracy\"] - sub.loc[0.2, \"accuracy\"],\n",
    "                \"f1_gain\": sub.loc[1.0, \"f1\"] - sub.loc[0.2, \"f1\"],\n",
    "            })\n",
    "    gains_df = pd.DataFrame(gains)\n",
    "    display(gains_df.style.format({\"auc_20pct\": \"{:.4f}\", \"auc_100pct\": \"{:.4f}\", \"auc_gain\": \"{:+.4f}\", \"accuracy_gain\": \"{:+.4f}\", \"f1_gain\": \"{:+.4f}\"}))\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 4. Source diagnostics, once results exist\n",
    "\n",
    "The most important Phase 4 question is not only whether AUC rises. It is whether extra data improves the weak source pairs found earlier, especially source generalization around `text2img` and `insight`.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Skipping source diagnostics because no p4 logs are available yet.\n"
     ]
    }
   ],
   "source": [
    "def pairwise_rows(run_name: str, results: dict) -> list[dict]:\n",
    "    rows = []\n",
    "    for pair, metrics in results.get(\"aggregated_pairwise\", {}).items():\n",
    "        rows.append({\n",
    "            \"run\": run_name,\n",
    "            \"pair\": pair,\n",
    "            \"pairwise_auc\": metrics.get(\"auc_roc\", {}).get(\"mean\", np.nan),\n",
    "            \"pairwise_f1\": metrics.get(\"f1\", {}).get(\"mean\", np.nan),\n",
    "            \"pairwise_accuracy\": metrics.get(\"accuracy\", {}).get(\"mean\", np.nan),\n",
    "        })\n",
    "    return rows\n",
    "\n",
    "\n",
    "if phase4_results_df.empty:\n",
    "    print(\"Skipping source diagnostics because no p4 logs are available yet.\")\n",
    "else:\n",
    "    pair_rows = []\n",
    "    for _, row in phase4_results_df.iterrows():\n",
    "        results = load_run_if_present(row[\"run\"])\n",
    "        for pair_row in pairwise_rows(row[\"run\"], results):\n",
    "            pair_rows.append({**pair_row, \"backbone\": row[\"backbone\"], \"subsample\": row[\"subsample\"]})\n",
    "    pair_df = pd.DataFrame(pair_rows)\n",
    "    display(pair_df.sort_values([\"pair\", \"backbone\", \"subsample\"]).style.format({\"subsample\": \"{:.1f}\", \"pairwise_auc\": \"{:.4f}\", \"pairwise_f1\": \"{:.4f}\", \"pairwise_accuracy\": \"{:.4f}\"}))\n",
    "\n",
    "    for pair in sorted(pair_df[\"pair\"].unique()):\n",
    "        fig, ax = plt.subplots(figsize=(6, 3.6))\n",
    "        sub_pair = pair_df[pair_df[\"pair\"] == pair]\n",
    "        for backbone, sub in sub_pair.groupby(\"backbone\"):\n",
    "            sub = sub.sort_values(\"subsample\")\n",
    "            ax.plot(sub[\"subsample\"] * 100, sub[\"pairwise_auc\"], marker=\"o\", label=backbone)\n",
    "        ax.set_title(f\"Phase 4 source-pair scaling: {pair}\")\n",
    "        ax.set_xlabel(\"Training data used (%)\")\n",
    "        ax.set_ylabel(\"Pairwise AUC\")\n",
    "        ax.grid(alpha=0.25)\n",
    "        ax.legend(fontsize=8)\n",
    "        fig.tight_layout()\n",
    "        fig.savefig(FIGURES_DIR / f\"07_phase4_{pair}_scaling.png\", dpi=180, bbox_inches=\"tight\")\n",
    "        plt.show()\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Conclusion\n",
    "\n",
    "At the time this notebook was added, Phase 4 is a planned data-scaling analysis rather than a completed result chapter. The configs define a clean experiment: take the strongest Phase 3 families and train them at 20%, 50%, and 100% of the available facecropped data.\n",
    "\n",
    "The report should not make positive or negative Phase 4 performance claims until `p4_*` logs are present. Once those logs exist, the key result to look for is not just a higher global AUC. The stronger claim would be that more data also improves pairwise source behavior, especially for the sources that exposed generalization limits earlier.\n"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": ".venv",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.12.10"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}