Compare commits

..

17 Commits

Author SHA1 Message Date
jalf a5dc093a15 Delete generator/notebooks/_build.py 2026-05-15 00:25:44 +01:00
Johnny Fernandes 1ed2b7a7a0 Final final 2026-05-14 23:19:26 +01:00
Johnny Fernandes afd26f47d2 Final polish 2026-05-14 21:16:03 +01:00
DiogoCosta18 3bff7eefb0 Notebooks gerador com bom nome + README 2026-05-14 20:20:46 +01:00
DiogoCosta18 f46320f81e Notebooks Classificador 2026-05-14 16:25:30 +01:00
DiogoCosta18 2062a91985 Notebooks Classificador 2026-05-14 16:20:33 +01:00
DiogoCosta18 9ae334410d Notebooks Terminados 2026-05-11 17:36:08 +01:00
DiogoCosta18 522a8f8d46 Notebooks classificador terminados 2026-05-06 21:43:32 +01:00
DiogoCosta18 69666d6aa0 Notebooks todos sem resultados fase 4 2026-05-06 20:31:07 +01:00
DiogoCosta18 b5313e3320 Correcoes 5 notebooks 2026-05-06 20:31:06 +01:00
Johnny Fernandes 580808d9ad Phase 4 classifier results 2026-05-06 19:14:19 +00:00
Johnny Fernandes e09cbf6a1a Updating phase 4 classifier 2026-05-06 12:42:22 +01:00
DiogoCosta18 ac3d6e1f6d Notebooks Classifier 2026-05-06 00:53:07 +01:00
DiogoCosta18 bed9be0b17 Notebooks Classifier 2026-05-06 00:53:06 +01:00
Johnny Fernandes 920cc983c4 Phase 4 classifier 2026-05-05 12:17:18 +01:00
Johnny Fernandes c89f934e74 Phase 3 classifier results 2026-05-05 11:42:01 +01:00
Johnny Fernandes 66913b2354 Phase 3 classifier 2026-05-05 11:41:07 +01:00
226 changed files with 25025 additions and 7827 deletions
+3
View File
@@ -67,3 +67,6 @@ generator/outputs/samples/*
.venv/ .venv/
.ipynb_checkpoints/ .ipynb_checkpoints/
__pycache__/ __pycache__/
#Presentation
presentation_inputs.zip
+218 -79
View File
@@ -1,125 +1,264 @@
# DRL_PROJ — DeepFake Detection # Deep learning face project
Deep learning project for binary deepfake detection on the DeepFakeFace dataset. This repository contains a two-part deep learning project on the
DeepFakeFace (DFF) dataset:
## Project structure 1. **Classifier:** detect whether a face image is real or fake.
2. **Generator:** train generative models that produce new fake face images.
``` The project is written as an experimental report. The notebooks are the main
deliverable: they show the pipeline, the intermediate failures, the ablations,
the decisions, and the final models. Read them in order.
## Project story
The work follows the same principle in both parts: start with a simple
baseline, inspect what fails, change one important factor at a time, and keep
the evidence tied to saved logs and saved artifacts.
For the **classifier**, the story moves from dataset understanding to
preprocessing, baseline models, controlled ablations, Grad-CAM inspection,
stronger model families, and data scaling. The final practical classifier is a
ResNet50-style pipeline using face crops, 224×224 inputs, ImageNet/default
normalization, and no stochastic augmentation at validation/test time.
For the **generator**, the story starts with raw baseline failures, then locks
the data pipeline before comparing three parallel model-family branches:
GAN, VAE, and DDPM. The final comparison keeps quality versus speed central:
DDPM gives the best saved FID and visual quality, GAN is the best
quality-speed compromise, and VAE is the fastest but smoothest option.
## How to read the project
Start with the classifier notebooks, then read the generator notebooks. The
generator has one linear setup stage followed by three parallel branches:
GAN, VAE, and DDPM. Those branches are numbered in reading order, but they are
conceptually parallel experiments after the pipeline is selected.
### Classifier notebooks
Read these first:
1. `classifier/notebooks/01_eda.ipynb`
Dataset composition, real/fake source mapping, image statistics, and
shortcut risks.
2. `classifier/notebooks/02_preprocessing.ipynb`
Deterministic preprocessing, train-only augmentation, face crops, and
normalization.
3. `classifier/notebooks/03_phase1_analysis.ipynb`
SimpleCNN and ResNet18 controlled baselines.
4. `classifier/notebooks/04_phase2_analysis.ipynb`
Resolution, normalization, source holdouts, facecrop, and augmentation
ablations.
5. `classifier/notebooks/05_gradcam_analysis.ipynb`
Qualitative localization analysis across the classifier pipeline.
6. `classifier/notebooks/06_phase3_model_family_analysis.ipynb`
Stronger pretrained model families and the ResNet50 practical choice.
7. `classifier/notebooks/07_phase4_data_scaling_analysis.ipynb`
Data scaling for strong backbones and the final classifier decision.
### Generator notebooks
Read these after the classifier:
1. `generator/notebooks/01_baseline_sanity_check.ipynb`
Raw baseline failures and why the data pipeline must be fixed first.
2. `generator/notebooks/02_pipeline_selection.ipynb`
Controlled pipeline ablations: resolution, alignment, augmentation, and
raw/aligned mixing.
3. `generator/notebooks/03_gan_stability_progression.ipynb`
GAN branch: DCGAN → WGAN-GP → spectral normalization + GroupNorm +
self-attention → 128×128 check.
4. `generator/notebooks/04_vae_loss_progression.ipynb`
VAE branch: MSE + KL → perceptual loss → PatchGAN adversarial loss.
5. `generator/notebooks/05_ddpm_recipe_progression.ipynb`
DDPM branch: linear schedule → cosine schedule → v-prediction → wider
backbone.
6. `generator/notebooks/06_final_family_comparison.ipynb`
Final comparison of the selected GAN, VAE, and DDPM recipes under saved
Phase 5 conditions.
7. `generator/notebooks/07_final_sample_showcase.ipynb`
Curated final sample examples from saved outputs. This is qualitative
showcase material, not a replacement for FID.
## What the notebooks do
The notebooks are analysis/report chapters. They load existing configs, logs,
figures, saved sample grids, checkpoints, and prediction summaries. They are
not intended to launch new training runs.
When a notebook shows a plot or image grid, the surrounding markdown explains:
- what the artifact shows;
- why it is needed;
- how it supports the phase decision;
- what limitation remains.
This is important because the project is evaluated not only by final
performance, but by the documented evolution of the solution.
## Repository layout
```text
DRL_PROJ/ DRL_PROJ/
classifier/ ← discriminative model (real vs. fake classifier) classifier/
src/ ← model definitions, training, evaluation, preprocessing configs/ experiment configs by phase
configs/ ← experiment configs organised by phase notebooks/ classifier report notebooks
phase1/ ← baseline models (SimpleCNN, ResNet18) outputs/ saved logs, figures, Grad-CAM panels, checkpoints
phase2/ ← architecture sweep (ResNet variants, face-crop) src/ classifier data, models, training, evaluation
phase3/ ← EfficientNet, ViT, frequency-aware training tests/ unit and smoke tests
phase4/ ← ensemble strategies tools/ facecrop, Grad-CAM, inference, reevaluation helpers
tools/ ← analyse.py, ensemble.py, inference.py, facecrop.py
notebooks/ ← EDA, preprocessing, evaluation, GradCAM generator/
outputs/ ← models, logs, figures (gitignored except .pt/.json) configs/ generator configs by phase/family
run.py ← main training entry point notebooks/ generator report notebooks and notebook builder
generator/ ← generative model (GAN / VAE / diffusion) — in progress outputs/ saved logs, sample grids, final showcase artifacts
pipeline/ ← Vast.ai ephemeral GPU orchestration src/ generator data, models, training, metrics
data/ ← dataset root (gitignored) tests/ unit and smoke tests
cropped/ ← MTCNN pre-cropped faces (gitignored) tools/ sampling and utility scripts
classifier/ ← bbox crops for the classifier
generator/ ← landmark-aligned crops for the generator data/ original DFF dataset root, not committed
cropped/ preprocessed face crops, not committed
docs/ project statement and supporting documents
pipeline/ optional remote/GPU orchestration helpers
``` ```
## Rebuilding the generator notebooks
The generator notebooks are generated from a single source file:
```bash
cd generator/notebooks
python _build.py
```
That builder writes the numbered generator notebooks listed above. It uses
existing saved logs and artifacts; it does not train models.
## Setup ## Setup
Create a local environment when you want to run the code directly on a machine you control: Create a conda environment and install the project requirements:
```bash ```bash
python3 -m venv .venv conda create -n drl python=3.12
source .venv/bin/activate conda activate drl
python -m pip install --upgrade pip setuptools wheel python -m pip install --upgrade pip setuptools wheel
python -m pip install -r requirements.txt python -m pip install -r requirements.txt
``` ```
## Local Training Use **Python 3.12**; some dependencies (for example `facenet-pytorch`) are
unreliable on 3.13+.
The raw dataset should be placed under `data/`. Preprocessed crops are stored
under `cropped/`. These folders are intentionally not committed. To download
and extract the dataset:
```bash ```bash
python3 classifier/run.py classifier/configs/phase2/p2_resnet18_facecrop.json python classifier/tools/fetch_ds.py
python3 classifier/run.py classifier/configs/phase3/p3_efficientnet_b0.json python classifier/tools/fetch_ds.py --data-dir /path/to/DFF
``` ```
## Ephemeral Vast.ai Pipeline Expected layout under the data root: `wiki/<identity>/*.jpg`,
`inpainting/...`, `text2img/...`, `insight/...`.
The deployment/orchestration path now lives under [`pipeline/`](/run/host/mnt/shared/UP/DRL/DRL_PROJ/pipeline/README.md). ## Classifier — training
One-time setup: From the repository root:
```bash ```bash
cat > pipeline/.env <<'EOF' # CPU (slow but valid)
VAST_API_KEY=<your-api-key> python classifier/run.py classifier/configs/phase4/p4_convnext_tiny_100pct.json
VAST_SSH_PRIVATE_KEY=/home/your-user/.ssh/id_ed25519
EOF # GPU when CUDA is available
python classifier/run.py classifier/configs/phase4/p4_convnext_tiny_100pct.json --use-gpu
``` ```
End-to-end ephemeral run: Training uses 5-fold stratified group cross-validation. Per-fold checkpoints
are saved as `classifier/outputs/models/{run_name}_fold{k}_best.pt` (and
`_final.pt`). Override data or output locations with `--data-dir` and
`--output-root`.
**Primary delivery model** (best Phase 4 detector): config
`classifier/configs/phase4/p4_convnext_tiny_100pct.json` with per-fold
weights `classifier/outputs/models/p4_convnext_tiny_100pct_fold*_best.pt`.
## Classifier — inference
Classify a single image as real or fake:
```bash ```bash
python3 -m pipeline run classifier/configs/phase2/p2_resnet18_facecrop.json --upload-data python classifier/tools/inference.py image.jpg classifier/configs/phase4/p4_convnext_tiny_100pct.json
``` ```
Interactive offer selection: This loads the config and the matching checkpoint, runs the image through the
model, and prints a result like:
```
Image : image.jpg
Model : p4_convnext_tiny_100pct (convnext_tiny)
Device: cuda
Result: FAKE (confidence: 74.7%)
P(fake): 0.7466 P(real): 0.2534
```
If you omit `--checkpoint`, the tool automatically looks for a saved
checkpoint under `classifier/outputs/models/` — first the single-run
`{run_name}_best.pt`, then CV fold files `{run_name}_fold{k}_best.pt`, then
`{run_name}_fold{k}_final.pt`. To use a specific fold:
```bash ```bash
python3 -m pipeline offers --select-offer python classifier/tools/inference.py image.jpg classifier/configs/phase4/p4_convnext_tiny_100pct.json \
--checkpoint classifier/outputs/models/p4_convnext_tiny_100pct_fold0_best.pt
``` ```
You can override the ranking mode per run: ## Generator — training
From the repository root:
```bash ```bash
python3 -m pipeline offers --sort price python generator/run.py generator/configs/phase0/p0_vae.json
python3 -m pipeline offers --sort performance python generator/run.py generator/configs/phase0/p0_ddpm.json
python3 -m pipeline offers --sort performance --price 0.14
``` ```
You can also filter by region: Generator training expects real-face images (default source is `wiki`); use
`--data-dir` to point at your dataset tree. Checkpoints are saved under
`generator/outputs/models/{run_name}_final_ema.pt` (EMA shadow) and
`{run_name}_best_ema.pt` (lowest-FID snapshot).
## Generator — inference (sampling)
Generate 4×4 sample grids from Phase 5 EMA checkpoints:
```bash ```bash
python3 -m pipeline offers --select-offer --region europe python generator/tools/sampling.py --models p5_gan p5_vae p5_ddpm --samples 10
python3 -m pipeline offers --select-offer --region Portugal
python3 -m pipeline offers --select-offer --region US
python3 -m pipeline offers --select-offer --region europe --price 0.14
``` ```
To inspect which region strings are currently available from the search results: Options:
```bash - `--models` — which models to sample from (`p5_gan`, `p5_vae`, `p5_ddpm`;
python3 -m pipeline offers --list-regions defaults to all three).
``` - `--samples` — number of grids per model (default 10).
- `--output-dir` — where to write the PNGs (default
`generator/outputs/samples/final_comparison/`).
- `--truncation` — optional latent truncation for the GAN (lower = less
diversity but sharper).
- `--device``cuda` or `cpu` (default: auto-detect).
That command: Each grid is a 4×4 PNG of 16 images sampled from the model's EMA weights.
- ensures your SSH public key is registered with Vast.ai GAN samples are drawn from random latent vectors, VAE samples decode from the
- searches offers using the filters in `pipeline/defaults/vast.json` learned prior, and DDPM samples use 50-step DDIM.
- creates an instance
- waits for SSH readiness
- syncs the repo
- uploads `data/` when `--upload-data` is set
- runs `python3 classifier/run.py ...`
- downloads `classifier/outputs/`
- for generator runs, rsyncs `generator/outputs/` back every 25 epochs and again at completion
- destroys the instance automatically unless `--keep-on-failure` is set
Useful commands: ## Final takeaway
```bash The project is best understood as a sequence of controlled decisions:
python3 -m pipeline up
python3 -m pipeline status <instance_id>
python3 -m pipeline down <instance_id>
```
To override the default Vast search/runtime settings, copy `pipeline/defaults/vast.json`, edit it, and pass: 1. cleanly define the data and preprocessing;
2. establish simple baselines;
3. improve one factor at a time;
4. compare model families using saved evidence;
5. report both performance and limitations.
```bash The classifier becomes reliable through source-aware preprocessing, stronger
python3 -m pipeline run classifier/configs/phase3/p3_efficientnet_b0.json --pipeline-config /path/to/vast.override.json pretrained backbones, and scaling. The generator improves by first locking the
``` face-aligned pipeline and then selecting the best recipe inside each model
family before the final GAN/VAE/DDPM comparison.
The default policy in `pipeline/defaults/vast.json` now targets:
- `1x` GPU
- `RTX 3090` or `RTX 3090 Ti`
- `<= $0.20/hour`
- sorted by `dlperf` descending
- uses `vastai/pytorch:latest` as the default image
+7
View File
@@ -0,0 +1,7 @@
{
"pretrained": true,
"epochs": 15,
"image_size": 224,
"augment": false,
"data_dir": "cropped/classifier"
}
@@ -0,0 +1,6 @@
{
"extends": "_base.json",
"run_name": "p4_convnext_tiny_100pct",
"backbone": "convnext_tiny",
"subsample": 1.0
}
@@ -0,0 +1,6 @@
{
"extends": "_base.json",
"run_name": "p4_convnext_tiny_50pct",
"backbone": "convnext_tiny",
"subsample": 0.5
}
@@ -0,0 +1,6 @@
{
"extends": "_base.json",
"run_name": "p4_efficientnet_b0_100pct",
"backbone": "efficientnet_b0",
"subsample": 1.0
}
@@ -0,0 +1,6 @@
{
"extends": "_base.json",
"run_name": "p4_efficientnet_b0_50pct",
"backbone": "efficientnet_b0",
"subsample": 0.5
}
@@ -0,0 +1,6 @@
{
"extends": "_base.json",
"run_name": "p4_resnet50_100pct",
"backbone": "resnet50",
"subsample": 1.0
}
@@ -0,0 +1,6 @@
{
"extends": "_base.json",
"run_name": "p4_resnet50_50pct",
"backbone": "resnet50",
"subsample": 0.5
}
+18
View File
@@ -0,0 +1,18 @@
{
"extends": "../shared.json",
"run_name": "smoke",
"backbone": "simple_cnn",
"cnn_preset": "micro",
"dropout": 0.0,
"epochs": 1,
"cv_folds": 2,
"image_size": 64,
"batch_size": 8,
"num_workers": 0,
"early_stopping_patience": 0,
"subsample": 1.0,
"augment": false,
"lr": 0.001,
"T_max": 1,
"data_dir": "data"
}
File diff suppressed because one or more lines are too long
File diff suppressed because one or more lines are too long
File diff suppressed because one or more lines are too long
@@ -1,702 +0,0 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Phase 1 analysis: Architecture baseline\n",
"\n",
"This notebook analyzes the results of Phase 1 experiments comparing SimpleCNN and ResNet18 baselines under identical conditions.\n",
"\n",
"## Experimental setup\n",
"- **Models**: SimpleCNN (medium preset), ResNet18 (pretrained)\n",
"- **Data**: 20% subsample\n",
"- **Resolution**: 128×128\n",
"- **Face crop**: No\n",
"- **Augmentation**: No\n",
"- **Optimizer**: AdamW (lr=1e-4, weight_decay=1e-4)\n",
"- **Scheduler**: CosineAnnealingLR (T_max=15)\n",
"- **Epochs**: 15 with early stopping (patience=5)\n",
"- **Batch size**: 32\n",
"- **Cross-validation**: 5-fold stratified group CV by basename\n",
"- **Seed**: 42"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import json\n",
"import numpy as np\n",
"import pandas as pd\n",
"import matplotlib.pyplot as plt\n",
"import seaborn as sns\n",
"from pathlib import Path\n",
"from scipy import stats\n",
"\n",
"# Set style\n",
"sns.set_style(\"whitegrid\")\n",
"plt.rcParams['figure.figsize'] = (12, 6)\n",
"plt.rcParams['font.size'] = 10\n",
"\n",
"# Paths\n",
"OUTPUTS_DIR = Path(\"../outputs/logs\")\n",
"MODELS_DIR = Path(\"../outputs/models\")\n",
"FIGURES_DIR = Path(\"../outputs/figures\")\n",
"FIGURES_DIR.mkdir(parents=True, exist_ok=True)\n",
"\n",
"print(\"Phase 1 Analysis: Architecture Baseline\")\n",
"print(\"=\"*50)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Load CV results"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"def load_cv_results(run_name):\n",
" \"\"\"Load cross-validation results from JSON file.\"\"\"\n",
" results_path = OUTPUTS_DIR / f\"{run_name}.json\"\n",
" if not results_path.exists():\n",
" print(f\"Warning: {results_path} not found\")\n",
" return None\n",
" with open(results_path) as f:\n",
" return json.load(f)\n",
"\n",
"# Load results for both models\n",
"simplecnn_results = load_cv_results(\"p1_simplecnn_baseline\")\n",
"resnet18_results = load_cv_results(\"p1_resnet18_baseline\")\n",
"\n",
"print(f\"SimpleCNN results loaded: {simplecnn_results is not None}\")\n",
"print(f\"ResNet18 results loaded: {resnet18_results is not None}\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Overall metrics comparison\n",
"\n",
"Compare AUC, Accuracy, and F1 scores with mean ± std and 95% confidence intervals."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"def extract_aggregated_metrics(results, model_name):\n",
" \"\"\"Extract aggregated metrics from CV results.\"\"\"\n",
" if results is None:\n",
" return None\n",
" \n",
" agg = results['aggregated_metrics']\n",
" return {\n",
" 'model': model_name,\n",
" 'auc_mean': agg['auc_roc']['mean'],\n",
" 'auc_std': agg['auc_roc']['std'],\n",
" 'auc_ci': agg['auc_roc']['ci_95'],\n",
" 'acc_mean': agg['accuracy']['mean'],\n",
" 'acc_std': agg['accuracy']['std'],\n",
" 'acc_ci': agg['accuracy']['ci_95'],\n",
" 'f1_mean': agg['f1']['mean'],\n",
" 'f1_std': agg['f1']['std'],\n",
" 'f1_ci': agg['f1']['ci_95'],\n",
" }\n",
"\n",
"# Extract metrics\n",
"simplecnn_metrics = extract_aggregated_metrics(simplecnn_results, 'SimpleCNN')\n",
"resnet18_metrics = extract_aggregated_metrics(resnet18_results, 'ResNet18')\n",
"\n",
"# Create comparison table\n",
"if simplecnn_metrics and resnet18_metrics:\n",
" comparison_df = pd.DataFrame([simplecnn_metrics, resnet18_metrics])\n",
" comparison_df.set_index('model', inplace=True)\n",
" \n",
" # Format for display\n",
" display_df = comparison_df.copy()\n",
" for metric in ['auc', 'acc', 'f1']:\n",
" display_df[f'{metric}_formatted'] = (\n",
" display_df[f'{metric}_mean'].apply(lambda x: f\"{x:.4f}\") + \" ± \" +\n",
" display_df[f'{metric}_std'].apply(lambda x: f\"{x:.4f}\") +\n",
" \" (95% CI: ±\" + display_df[f'{metric}_ci'].apply(lambda x: f\"{x:.4f}\") + \")\"\n",
" )\n",
" \n",
" print(\"\\nOverall Metrics Comparison (5-fold CV):\")\n",
" print(\"=\"*80)\n",
" for col in ['auc_formatted', 'acc_formatted', 'f1_formatted']:\n",
" metric_name = col.replace('_formatted', '').upper()\n",
" print(f\"\\n{metric_name}:\")\n",
" for model in display_df.index:\n",
" print(f\" {model}: {display_df.loc[model, col]}\")\n",
" \n",
" # Print improvement\n",
" print(\"\\n\" + \"=\"*80)\n",
" print(\"ResNet18 vs SimpleCNN Improvement:\")\n",
" print(\"=\"*80)\n",
" for metric in ['auc', 'acc', 'f1']:\n",
" mean_diff = resnet18_metrics[f'{metric}_mean'] - simplecnn_metrics[f'{metric}_mean']\n",
" pct_improvement = (mean_diff / simplecnn_metrics[f'{metric}_mean']) * 100\n",
" print(f\" {metric.upper()}: +{mean_diff:.4f} (+{pct_improvement:.2f}%)\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Visualization: Overall metrics comparison"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"if simplecnn_metrics and resnet18_metrics:\n",
" fig, axes = plt.subplots(1, 3, figsize=(15, 5))\n",
" \n",
" models = ['SimpleCNN', 'ResNet18']\n",
" metrics_data = {\n",
" 'AUC-ROC': [simplecnn_metrics['auc_mean'], resnet18_metrics['auc_mean']],\n",
" 'Accuracy': [simplecnn_metrics['acc_mean'], resnet18_metrics['acc_mean']],\n",
" 'F1 Score': [simplecnn_metrics['f1_mean'], resnet18_metrics['f1_mean']],\n",
" }\n",
" errors = {\n",
" 'AUC-ROC': [simplecnn_metrics['auc_std'], resnet18_metrics['auc_std']],\n",
" 'Accuracy': [simplecnn_metrics['acc_std'], resnet18_metrics['acc_std']],\n",
" 'F1 Score': [simplecnn_metrics['f1_std'], resnet18_metrics['f1_std']],\n",
" }\n",
" \n",
" colors = ['#e74c3c', '#2ecc71'] # Red for SimpleCNN, Green for ResNet18\n",
" \n",
" for idx, (metric_name, values) in enumerate(metrics_data.items()):\n",
" ax = axes[idx]\n",
" bars = ax.bar(models, values, yerr=errors[metric_name], capsize=5, alpha=0.7, color=colors)\n",
" ax.set_ylabel(metric_name)\n",
" ax.set_title(f'{metric_name} Comparison')\n",
" ax.set_ylim(0.5, 1.0)\n",
" \n",
" # Add value labels on bars\n",
" for bar, value in zip(bars, values):\n",
" height = bar.get_height()\n",
" ax.text(bar.get_x() + bar.get_width()/2., height,\n",
" f'{value:.4f}',\n",
" ha='center', va='bottom', fontweight='bold')\n",
" \n",
" plt.tight_layout()\n",
" plt.savefig(FIGURES_DIR / 'phase1_overall_metrics.png', dpi=300, bbox_inches='tight')\n",
" plt.show()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Per-source metrics\n",
"\n",
"Analyze performance on each fake source (text2img, inpainting, insight). Note: Per-source metrics are not available in the current CV results format, so we analyze overall performance across all sources."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"def extract_per_source_metrics(results, model_name):\n",
" \"\"\"Extract per-source metrics from CV results.\"\"\"\n",
" if results is None:\n",
" return None\n",
" \n",
" # Collect per-source metrics across folds\n",
" source_metrics = {}\n",
" \n",
" for fold_result in results['fold_results']:\n",
" # Check if per_source metrics are available\n",
" if 'per_source' in fold_result['test_metrics']:\n",
" for source, metrics in fold_result['test_metrics']['per_source'].items():\n",
" if source not in source_metrics:\n",
" source_metrics[source] = {'auc': [], 'acc': [], 'f1': []}\n",
" if 'auc_roc' in metrics and metrics['auc_roc'] is not None:\n",
" source_metrics[source]['auc'].append(metrics['auc_roc'])\n",
" if 'accuracy' in metrics:\n",
" source_metrics[source]['acc'].append(metrics['accuracy'])\n",
" if 'f1' in metrics and metrics['f1'] is not None:\n",
" source_metrics[source]['f1'].append(metrics['f1'])\n",
" \n",
" # Aggregate per-source metrics\n",
" aggregated = {}\n",
" for source, metrics in source_metrics.items():\n",
" aggregated[source] = {\n",
" 'auc_mean': np.mean(metrics['auc']) if metrics['auc'] else None,\n",
" 'auc_std': np.std(metrics['auc']) if len(metrics['auc']) > 1 else 0,\n",
" 'acc_mean': np.mean(metrics['acc']) if metrics['acc'] else None,\n",
" 'acc_std': np.std(metrics['acc']) if len(metrics['acc']) > 1 else 0,\n",
" 'f1_mean': np.mean(metrics['f1']) if metrics['f1'] else None,\n",
" 'f1_std': np.std(metrics['f1']) if len(metrics['f1']) > 1 else 0,\n",
" }\n",
" \n",
" return {'model': model_name, 'sources': aggregated}\n",
"\n",
"# Extract per-source metrics\n",
"simplecnn_source = extract_per_source_metrics(simplecnn_results, 'SimpleCNN')\n",
"resnet18_source = extract_per_source_metrics(resnet18_results, 'ResNet18')\n",
"\n",
"if simplecnn_source and resnet18_source:\n",
" print(\"\\nPer-Source Metrics Comparison:\")\n",
" print(\"=\"*80)\n",
" \n",
" for source in sorted(set(simplecnn_source['sources'].keys()) | set(resnet18_source['sources'].keys())):\n",
" print(f\"\\nSource: {source}\")\n",
" print(\"-\" * 40)\n",
" \n",
" scnn = simplecnn_source['sources'].get(source, {})\n",
" r18 = resnet18_source['sources'].get(source, {})\n",
" \n",
" print(f\" SimpleCNN: AUC={scnn.get('auc_mean', 'N/A'):.4f}±{scnn.get('auc_std', 0):.4f}, \"\n",
" f\"Acc={scnn.get('acc_mean', 'N/A'):.4f}±{scnn.get('acc_std', 0):.4f}, \"\n",
" f\"F1={scnn.get('f1_mean', 'N/A'):.4f}±{scnn.get('f1_std', 0):.4f}\")\n",
" print(f\" ResNet18: AUC={r18.get('auc_mean', 'N/A'):.4f}±{r18.get('auc_std', 0):.4f}, \"\n",
" f\"Acc={r18.get('acc_mean', 'N/A'):.4f}±{r18.get('acc_std', 0):.4f}, \"\n",
" f\"F1={r18.get('f1_mean', 'N/A'):.4f}±{r18.get('f1_std', 0):.4f}\")\n",
"else:\n",
" print(\"\\nNote: Per-source metrics not available in current CV results format.\")\n",
" print(\"The models were evaluated on all sources combined.\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Train/Val/Test performance curves"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"def plot_training_curves(results, model_name, ax):\n",
" \"\"\"Plot training curves for a model.\"\"\"\n",
" if results is None:\n",
" return\n",
" \n",
" # Aggregate histories across folds\n",
" all_histories = [fold['history'] for fold in results['fold_results']]\n",
" max_epochs = max(len(h['train_loss']) for h in all_histories)\n",
" \n",
" # Pad shorter histories with NaN\n",
" for history in all_histories:\n",
" for key in ['train_loss', 'val_loss', 'train_auc', 'val_auc']:\n",
" while len(history[key]) < max_epochs:\n",
" history[key].append(np.nan)\n",
" \n",
" # Compute mean and std across folds\n",
" epochs = np.arange(1, max_epochs + 1)\n",
" \n",
" train_loss_mean = np.nanmean([h['train_loss'] for h in all_histories], axis=0)\n",
" train_loss_std = np.nanstd([h['train_loss'] for h in all_histories], axis=0)\n",
" val_loss_mean = np.nanmean([h['val_loss'] for h in all_histories], axis=0)\n",
" val_loss_std = np.nanstd([h['val_loss'] for h in all_histories], axis=0)\n",
" \n",
" train_auc_mean = np.nanmean([h['train_auc'] for h in all_histories], axis=0)\n",
" train_auc_std = np.nanstd([h['train_auc'] for h in all_histories], axis=0)\n",
" val_auc_mean = np.nanmean([h['val_auc'] for h in all_histories], axis=0)\n",
" val_auc_std = np.nanstd([h['val_auc'] for h in all_histories], axis=0)\n",
" \n",
" # Plot loss\n",
" ax[0].plot(epochs, train_loss_mean, label=f'{model_name} (train)', marker='o', linewidth=2)\n",
" ax[0].fill_between(epochs, train_loss_mean - train_loss_std, train_loss_mean + train_loss_std, alpha=0.2)\n",
" ax[0].plot(epochs, val_loss_mean, label=f'{model_name} (val)', marker='s', linewidth=2)\n",
" ax[0].fill_between(epochs, val_loss_mean - val_loss_std, val_loss_mean + val_loss_std, alpha=0.2)\n",
" ax[0].set_xlabel('Epoch', fontweight='bold')\n",
" ax[0].set_ylabel('Loss', fontweight='bold')\n",
" ax[0].set_title('Training/Validation Loss', fontweight='bold')\n",
" ax[0].legend()\n",
" ax[0].grid(True, alpha=0.3)\n",
" \n",
" # Plot AUC\n",
" ax[1].plot(epochs, train_auc_mean, label=f'{model_name} (train)', marker='o', linewidth=2)\n",
" ax[1].fill_between(epochs, train_auc_mean - train_auc_std, train_auc_mean + train_auc_std, alpha=0.2)\n",
" ax[1].plot(epochs, val_auc_mean, label=f'{model_name} (val)', marker='s', linewidth=2)\n",
" ax[1].fill_between(epochs, val_auc_mean - val_auc_std, val_auc_mean + val_auc_std, alpha=0.2)\n",
" ax[1].set_xlabel('Epoch', fontweight='bold')\n",
" ax[1].set_ylabel('AUC-ROC', fontweight='bold')\n",
" ax[1].set_title('Training/Validation AUC', fontweight='bold')\n",
" ax[1].legend()\n",
" ax[1].grid(True, alpha=0.3)\n",
" ax[1].set_ylim(0.5, 1.0)\n",
"\n",
"# Plot curves for both models\n",
"fig, axes = plt.subplots(2, 2, figsize=(15, 10))\n",
"\n",
"plot_training_curves(simplecnn_results, 'SimpleCNN', axes[0])\n",
"plot_training_curves(resnet18_results, 'ResNet18', axes[1])\n",
"\n",
"plt.tight_layout()\n",
"plt.savefig(FIGURES_DIR / 'phase1_training_curves.png', dpi=300, bbox_inches='tight')\n",
"plt.show()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Confusion matrices"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"def plot_confusion_matrices(results, model_name, ax):\n",
" \"\"\"Plot aggregated confusion matrix across folds.\"\"\"\n",
" if results is None:\n",
" return\n",
" \n",
" # Aggregate confusion matrices across folds\n",
" total_cm = np.array([[0, 0], [0, 0]])\n",
" \n",
" for fold_result in results['fold_results']:\n",
" cm = np.array(fold_result['test_metrics']['confusion_matrix'])\n",
" total_cm += cm\n",
" \n",
" # Normalize\n",
" cm_normalized = total_cm.astype('float') / total_cm.sum(axis=1)[:, np.newaxis]\n",
" \n",
" # Plot\n",
" im = ax.imshow(cm_normalized, interpolation='nearest', cmap=plt.cm.Blues, vmin=0, vmax=1)\n",
" ax.figure.colorbar(im, ax=ax)\n",
" \n",
" # Add text annotations\n",
" thresh = cm_normalized.max() / 2.\n",
" for i in range(2):\n",
" for j in range(2):\n",
" ax.text(j, i, f'{total_cm[i, j]}\\n({cm_normalized[i, j]:.2%})',\n",
" ha=\"center\", va=\"center\",\n",
" color=\"white\" if cm_normalized[i, j] > thresh else \"black\", fontsize=12)\n",
" \n",
" ax.set_ylabel('True Label', fontweight='bold')\n",
" ax.set_xlabel('Predicted Label', fontweight='bold')\n",
" ax.set_title(f'{model_name} Confusion Matrix', fontweight='bold')\n",
" ax.set_xticks([0, 1])\n",
" ax.set_yticks([0, 1])\n",
" ax.set_xticklabels(['Real', 'Fake'])\n",
" ax.set_yticklabels(['Real', 'Fake'])\n",
"\n",
"# Plot confusion matrices\n",
"fig, axes = plt.subplots(1, 2, figsize=(14, 6))\n",
"\n",
"plot_confusion_matrices(simplecnn_results, 'SimpleCNN', axes[0])\n",
"plot_confusion_matrices(resnet18_results, 'ResNet18', axes[1])\n",
"\n",
"plt.tight_layout()\n",
"plt.savefig(FIGURES_DIR / 'phase1_confusion_matrices.png', dpi=300, bbox_inches='tight')\n",
"plt.show()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Statistical significance testing\n",
"\n",
"Perform paired t-tests to determine if differences between models are statistically significant."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"def perform_statistical_tests(results1, results2, model1_name, model2_name):\n",
" \"\"\"Perform paired t-tests between two models.\"\"\"\n",
" if results1 is None or results2 is None:\n",
" return None\n",
" \n",
" # Extract test AUC values across folds\n",
" auc1 = [fold['test_metrics']['auc_roc'] for fold in results1['fold_results']]\n",
" auc2 = [fold['test_metrics']['auc_roc'] for fold in results2['fold_results']]\n",
" \n",
" # Extract test accuracy values\n",
" acc1 = [fold['test_metrics']['accuracy'] for fold in results1['fold_results']]\n",
" acc2 = [fold['test_metrics']['accuracy'] for fold in results2['fold_results']]\n",
" \n",
" # Extract test F1 values\n",
" f1_1 = [fold['test_metrics']['f1'] for fold in results1['fold_results']]\n",
" f1_2 = [fold['test_metrics']['f1'] for fold in results2['fold_results']]\n",
" \n",
" # Perform paired t-tests\n",
" results = {\n",
" 'auc': stats.ttest_rel(auc1, auc2),\n",
" 'accuracy': stats.ttest_rel(acc1, acc2),\n",
" 'f1': stats.ttest_rel(f1_1, f1_2),\n",
" }\n",
" \n",
" print(f\"\\nStatistical Significance Testing: {model1_name} vs {model2_name}\")\n",
" print(\"=\"*80)\n",
" print(f\"\\nPaired t-test (5 folds):\")\n",
" print(f\"{'Metric':<15} {'t-statistic':<15} {'p-value':<15} {'Significant (α=0.05)':<25}\")\n",
" print(\"-\"*80)\n",
" \n",
" for metric, test_result in results.items():\n",
" is_significant = test_result.pvalue < 0.05\n",
" sig_str = \"*** YES ***\" if is_significant else \"No\"\n",
" print(f\"{metric.capitalize():<15} {test_result.statistic:<15.4f} {test_result.pvalue:<15.6f} {sig_str:<25}\")\n",
" \n",
" # Also compute effect size (Cohen's d)\n",
" print(\"\\n\" + \"-\"*80)\n",
" print(\"Effect Sizes (Cohen's d):\")\n",
" print(\"-\"*80)\n",
" \n",
" def cohens_d(x1, x2):\n",
" n1, n2 = len(x1), len(x2)\n",
" var1, var2 = np.var(x1, ddof=1), np.var(x2, ddof=1)\n",
" pooled_std = np.sqrt(((n1-1)*var1 + (n2-1)*var2) / (n1+n2-2))\n",
" return (np.mean(x1) - np.mean(x2)) / pooled_std\n",
" \n",
" for metric, values in {'AUC': (auc1, auc2), 'Accuracy': (acc1, acc2), 'F1': (f1_1, f1_2)}.items():\n",
" d = cohens_d(values[0], values[1])\n",
" print(f\" {metric}: {d:.4f} ({'large' if abs(d) > 0.8 else 'medium' if abs(d) > 0.5 else 'small'} effect)\")\n",
" \n",
" return results\n",
"\n",
"# Perform statistical tests\n",
"if simplecnn_results and resnet18_results:\n",
" test_results = perform_statistical_tests(\n",
" simplecnn_results, resnet18_results, 'SimpleCNN', 'ResNet18'\n",
" )"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Grad-CAM visualizations\n",
"\n",
"Generate Grad-CAM visualizations to understand what features the models focus on.\n",
"\n",
"**Note**: This section requires the trained models and sample images. The Grad-CAM visualization code is provided but requires:\n",
"1. Loading the trained model checkpoints\n",
"2. Selecting sample images from the test set\n",
"3. Running the Grad-CAM algorithm\n",
"\n",
"For now, we provide the code structure that can be executed when models are available."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import sys\n",
"sys.path.insert(0, '..')\n",
"\n",
"from pathlib import Path\n",
"from src.data import DFFDataset, get_splits, build_transforms\n",
"from src.models import get_model\n",
"from src.utils import load_config, resolve_nested_fields\n",
"\n",
"OUTPUTS_DIR = Path(\"../outputs\")\n",
"MODELS_DIR = OUTPUTS_DIR / \"models\"\n",
"FIGURES_DIR = OUTPUTS_DIR / \"figures\"\n",
"FIGURES_DIR.mkdir(parents=True, exist_ok=True)\n",
"\n",
"# Load config and rebuild test split for fold 0\n",
"# cfg = load_config(\"../configs/phase1/p1_resnet18_baseline.json\")\n",
"# cfg = resolve_nested_fields(cfg)\n",
"# DATA_DIR = Path(\"../../data\")\n",
"# raw_ds = DFFDataset(DATA_DIR)\n",
"# splits = get_splits(raw_ds, cfg)\n",
"# transform_builder = build_transforms(raw_ds, cfg)\n",
"# _, _, test_idx = splits[0]\n",
"# test_ds = transform_builder(test_idx, train=False)\n",
"\n",
"# Load model checkpoint\n",
"# import torch\n",
"# model = get_model(cfg)\n",
"# ckpt = MODELS_DIR / \"p1_resnet18_baseline_fold0_best.pt\"\n",
"# model.load_state_dict(torch.load(ckpt, map_location=\"cpu\", weights_only=True))\n",
"\n",
"# Run Grad-CAM on top-confidence errors\n",
"# from tools.gradcam import save_overlays\n",
"# records = [...] # load from reevaluate output or predict_rows\n",
"# save_overlays(model, records, cfg, FIGURES_DIR / \"gradcam\", device=\"cpu\")\n",
"print(\"Grad-CAM ready — uncomment above once model checkpoints are available.\")\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Conclusions\n",
"\n",
"### Summary template (fill after running all cells)\n",
"\n",
"Use this section only after metrics are generated.\n",
"Replace placeholders (`<...>`) with measured values.\n",
"\n",
"#### 1. Overall performance\n",
"\n",
"**Model comparison:** `<winner model>` vs `<other model>`\n",
"\n",
"- **AUC-ROC**: `<model A mean±std>` vs `<model B mean±std>`\n",
" - **Absolute delta**: `<delta>`\n",
" - **Relative delta**: `<percent change>`\n",
" - **Statistical test**: `<test name, p-value, effect size>`\n",
"\n",
"- **Accuracy**: `<model A mean±std>` vs `<model B mean±std>`\n",
" - **Absolute delta**: `<delta>`\n",
" - **Relative delta**: `<percent change>`\n",
" - **Statistical test**: `<test name, p-value, effect size>`\n",
"\n",
"- **F1 score**: `<model A mean±std>` vs `<model B mean±std>`\n",
" - **Absolute delta**: `<delta>`\n",
" - **Relative delta**: `<percent change>`\n",
" - **Statistical test**: `<test name, p-value, effect size>`\n",
"\n",
"#### 2. Training dynamics\n",
"\n",
"- **Convergence speed**: `<which model converges faster and by how many epochs>`\n",
"- **Overfitting pattern**:\n",
" - `<model A train-vs-val behavior>`\n",
" - `<model B train-vs-val behavior>`\n",
"- **Fold stability (variance)**: `<std/CI comparison across folds>`\n",
"\n",
"#### 3. Error analysis (confusion matrix)\n",
"\n",
"- **Model A**: `<main error mode>`\n",
"- **Model B**: `<main error mode>`\n",
"- **Key difference**: `<which error type improved/worsened and by how much>`\n",
"\n",
"#### 4. Why the better model likely performs better\n",
"\n",
"1. `<reason 1 tied to architecture/pretraining>`\n",
"2. `<reason 2 tied to optimization/generalization>`\n",
"3. `<reason 3 tied to feature capacity>`\n",
"\n",
"#### 5. Recommendations for Phase 2\n",
"\n",
"- **Primary baseline**: `<model>`\n",
"- **Secondary baseline**: `<model>`\n",
"- **Priority experiments**:\n",
" - `<experiment 1>`\n",
" - `<experiment 2>`\n",
" - `<experiment 3>`\n",
"\n",
"#### 6. Limitations and next checks\n",
"\n",
"- `<missing metric or analysis 1>`\n",
"- `<missing metric or analysis 2>`\n",
"\n",
"### Final verdict\n",
"\n",
"`<One concise paragraph with the decision and rationale based on generated metrics.>`"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Save Analysis Results"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Save analysis summary\n",
"analysis_summary = {\n",
" 'phase': 'phase1',\n",
" 'models': ['SimpleCNN', 'ResNet18'],\n",
" 'simplecnn_metrics': simplecnn_metrics,\n",
" 'resnet18_metrics': resnet18_metrics,\n",
" 'improvement': {\n",
" 'auc': {\n",
" 'absolute': resnet18_metrics['auc_mean'] - simplecnn_metrics['auc_mean'],\n",
" 'percent': ((resnet18_metrics['auc_mean'] - simplecnn_metrics['auc_mean']) / simplecnn_metrics['auc_mean']) * 100\n",
" },\n",
" 'accuracy': {\n",
" 'absolute': resnet18_metrics['acc_mean'] - simplecnn_metrics['acc_mean'],\n",
" 'percent': ((resnet18_metrics['acc_mean'] - simplecnn_metrics['acc_mean']) / simplecnn_metrics['acc_mean']) * 100\n",
" },\n",
" 'f1': {\n",
" 'absolute': resnet18_metrics['f1_mean'] - simplecnn_metrics['f1_mean'],\n",
" 'percent': ((resnet18_metrics['f1_mean'] - simplecnn_metrics['f1_mean']) / simplecnn_metrics['f1_mean']) * 100\n",
" }\n",
" },\n",
" 'statistical_tests': {\n",
" 'auc_t_stat': test_results['auc'].statistic if test_results else None,\n",
" 'auc_p_value': test_results['auc'].pvalue if test_results else None,\n",
" 'acc_t_stat': test_results['accuracy'].statistic if test_results else None,\n",
" 'acc_p_value': test_results['accuracy'].pvalue if test_results else None,\n",
" 'f1_t_stat': test_results['f1'].statistic if test_results else None,\n",
" 'f1_p_value': test_results['f1'].pvalue if test_results else None,\n",
" } if test_results else None,\n",
" 'conclusions': {\n",
" 'best_model': 'ResNet18',\n",
" 'reason': 'Significantly better AUC, accuracy, and F1 scores with lower variance across folds',\n",
" 'recommendation': 'Use ResNet18 as primary baseline for Phase 2 experiments'\n",
" }\n",
"}\n",
"\n",
"with open(OUTPUTS_DIR / 'phase1_analysis_summary.json', 'w') as f:\n",
" json.dump(analysis_summary, f, indent=2)\n",
"\n",
"print(\"\\n\" + \"=\"*80)\n",
"print(\"Phase 1 Analysis Complete!\")\n",
"print(\"=\"*80)\n",
"print(\"\\nResults saved to:\")\n",
"print(f\" - {FIGURES_DIR / 'phase1_overall_metrics.png'}\")\n",
"print(f\" - {FIGURES_DIR / 'phase1_training_curves.png'}\")\n",
"print(f\" - {FIGURES_DIR / 'phase1_confusion_matrices.png'}\")\n",
"print(f\" - {OUTPUTS_DIR / 'phase1_analysis_summary.json'}\")\n",
"print(\"\\nKey Findings:\")\n",
"print(f\" - ResNet18 AUC: {resnet18_metrics['auc_mean']:.4f}±{resnet18_metrics['auc_std']:.4f}\")\n",
"print(f\" - SimpleCNN AUC: {simplecnn_metrics['auc_mean']:.4f}±{simplecnn_metrics['auc_std']:.4f}\")\n",
"print(f\" - Improvement: +{analysis_summary['improvement']['auc']['absolute']:.4f} (+{analysis_summary['improvement']['auc']['percent']:.2f}%)\")\n",
"print(f\" - Statistically significant: Yes (p < 0.001)\")"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "drl",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.12.13"
}
},
"nbformat": 4,
"nbformat_minor": 4
}
File diff suppressed because one or more lines are too long
@@ -1,904 +0,0 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "54aa00ab",
"metadata": {},
"source": [
"# Phase 2 analysis\n",
"\n",
"This notebook follows the Phase 2 config organization (`p2a` to `p2e`) and maps each section directly to its config group.\n",
"It separates three concerns:\n",
"\n",
"1. **Experimental validity**: were expected configs/logs produced, and are comparisons fair?\n",
"2. **Evidence**: what do the 5-fold CV metrics support?\n",
"3. **Decision**: which preprocessing choices should move into Phase 3?\n"
]
},
{
"cell_type": "markdown",
"id": "734db3ee",
"metadata": {},
"source": [
"## Questions\n",
"\n",
"| Section | Config group | Question | Required evidence |\n",
"|---|---|---|---|\n",
"| 2A | `p2a_*` | Shortcut analysis: normalization + source holdout | `p2a_t1_original`, `p2a_t2_real_norm`, `p2a_t3_holdout_*` |\n",
"| 2B | `p2b_*` | Does 224 improve over 128? | `p2b_simplecnn_224`, `p2b_resnet18_224`, plus P1 128 fallbacks |\n",
"| 2C | `p2c_*` | Does face cropping help? | `p2c_simplecnn_facecrop`, `p2c_resnet18_facecrop` vs `p2b_*` |\n",
"| 2D | `p2d_*` | Does augmentation help without facecrop? | `p2d_simplecnn_aug`, `p2d_resnet18_aug` vs `p2b_*` |\n",
"| 2E | `p2e_*` | Does augmentation help with facecrop? | `p2e_simplecnn_facecrop_aug`, `p2e_resnet18_facecrop_aug` vs `p2c_*` |\n",
"\n",
"Decision criteria used here:\n",
"\n",
"- Prefer changes with positive mean AUC delta and no worsening of train/validation gap.\n",
"- Treat fold-level paired tests as directional evidence, not definitive proof, because `n=5` folds is small.\n",
"- Do not claim per-source generalization unless per-source or prediction-level outputs exist.\n",
"- Prefer the simplest Phase 3 setting when deltas are small or unsupported.\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "1f4c04b3",
"metadata": {},
"outputs": [],
"source": [
"from __future__ import annotations\n",
"\n",
"import json\n",
"import math\n",
"import os\n",
"import sys\n",
"from dataclasses import dataclass\n",
"from pathlib import Path\n",
"from typing import Any\n",
"\n",
"import numpy as np\n",
"import pandas as pd\n",
"import matplotlib.pyplot as plt\n",
"import seaborn as sns\n",
"from scipy import stats\n",
"\n",
"try:\n",
" from IPython.display import display\n",
"except Exception:\n",
" def display(obj):\n",
" print(obj)\n",
"\n",
"# Robust project-root detection whether the notebook is run from repo root,\n",
"# classifier/, or classifier/notebooks/.\n",
"def find_project_root(start: Path | None = None) -> Path:\n",
" start = (start or Path.cwd()).resolve()\n",
" for candidate in [start, *start.parents]:\n",
" if (candidate / \"classifier\" / \"v2.md\").exists() and (candidate / \"classifier\" / \"impl.md\").exists():\n",
" return candidate\n",
" raise RuntimeError(f\"Could not find project root from {start}\")\n",
"\n",
"PROJECT_ROOT = find_project_root()\n",
"CLASSIFIER_DIR = PROJECT_ROOT / \"classifier\"\n",
"LOGS_DIR = CLASSIFIER_DIR / \"outputs\" / \"logs\"\n",
"FIGURES_DIR = CLASSIFIER_DIR / \"outputs\" / \"figures\" / \"phase2\"\n",
"ANALYSIS_DIR = CLASSIFIER_DIR / \"outputs\" / \"analysis\"\n",
"CONFIG_DIR = CLASSIFIER_DIR / \"configs\"\n",
"\n",
"FIGURES_DIR.mkdir(parents=True, exist_ok=True)\n",
"ANALYSIS_DIR.mkdir(parents=True, exist_ok=True)\n",
"\n",
"if str(CLASSIFIER_DIR) not in sys.path:\n",
" sys.path.insert(0, str(CLASSIFIER_DIR))\n",
"\n",
"sns.set_theme(style=\"whitegrid\", context=\"notebook\")\n",
"plt.rcParams.update({\n",
" \"figure.figsize\": (12, 7),\n",
" \"axes.spines.top\": False,\n",
" \"axes.spines.right\": False,\n",
"})\n",
"\n",
"print(f\"Project root: {PROJECT_ROOT}\")\n",
"print(f\"Logs: {LOGS_DIR}\")\n",
"print(f\"Figures: {FIGURES_DIR}\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "24830212",
"metadata": {},
"outputs": [],
"source": [
"@dataclass(frozen=True)\n",
"class RunSpec:\n",
" run: str\n",
" label: str\n",
" section: str\n",
" model: str\n",
" condition: str\n",
" intended_role: str\n",
" fallback_for: str | None = None\n",
"\n",
"RUN_SPECS = [\n",
" # 2A: shortcut analysis (normalization + source holdout), ResNet18 only.\n",
" RunSpec(\"p2a_t1_original\", \"ResNet18 ImageNet norm\", \"2A\", \"ResNet18\", \"imagenet_norm\", \"expected\"),\n",
" RunSpec(\"p2a_t2_real_norm\", \"ResNet18 real-train norm\", \"2A\", \"ResNet18\", \"real_train_norm\", \"expected\"),\n",
" RunSpec(\"p2a_t3_holdout_text2img\", \"Holdout text2img\", \"2A\", \"ResNet18\", \"holdout_text2img\", \"expected\"),\n",
" RunSpec(\"p2a_t3_holdout_inpainting\", \"Holdout inpainting\", \"2A\", \"ResNet18\", \"holdout_inpainting\", \"expected\"),\n",
" RunSpec(\"p2a_t3_holdout_insight\", \"Holdout insight\", \"2A\", \"ResNet18\", \"holdout_insight\", \"expected\"),\n",
"\n",
" # 2B: resolution effect (224 in phase2 vs 128 baseline fallback from phase1).\n",
" RunSpec(\"p1_simplecnn_baseline\", \"SimpleCNN 128 (P1 fallback)\", \"2B\", \"SimpleCNN\", \"128_no_crop_no_aug\", \"fallback\", \"p2b_simplecnn_128\"),\n",
" RunSpec(\"p1_resnet18_baseline\", \"ResNet18 128 (P1 fallback)\", \"2B\", \"ResNet18\", \"128_no_crop_no_aug\", \"fallback\", \"p2b_resnet18_128\"),\n",
" RunSpec(\"p2b_simplecnn_224\", \"SimpleCNN 224\", \"2B\", \"SimpleCNN\", \"224_no_crop_no_aug\", \"expected\"),\n",
" RunSpec(\"p2b_resnet18_224\", \"ResNet18 224\", \"2B\", \"ResNet18\", \"224_no_crop_no_aug\", \"expected\"),\n",
"\n",
" # 2C: facecrop effect at 224, no augmentation.\n",
" RunSpec(\"p2c_simplecnn_facecrop\", \"SimpleCNN facecrop\", \"2C\", \"SimpleCNN\", \"224_facecrop_no_aug\", \"expected\"),\n",
" RunSpec(\"p2c_resnet18_facecrop\", \"ResNet18 facecrop\", \"2C\", \"ResNet18\", \"224_facecrop_no_aug\", \"expected\"),\n",
"\n",
" # 2D: augmentation effect without facecrop.\n",
" RunSpec(\"p2d_simplecnn_aug\", \"SimpleCNN light aug\", \"2D\", \"SimpleCNN\", \"224_no_crop_aug\", \"expected\"),\n",
" RunSpec(\"p2d_resnet18_aug\", \"ResNet18 light aug\", \"2D\", \"ResNet18\", \"224_no_crop_aug\", \"expected\"),\n",
"\n",
" # 2E: augmentation effect with facecrop.\n",
" RunSpec(\"p2e_simplecnn_facecrop_aug\", \"SimpleCNN facecrop + aug\", \"2E\", \"SimpleCNN\", \"224_facecrop_aug\", \"expected\"),\n",
" RunSpec(\"p2e_resnet18_facecrop_aug\", \"ResNet18 facecrop + aug\", \"2E\", \"ResNet18\", \"224_facecrop_aug\", \"expected\"),\n",
"]\n",
"\n",
"# Use these aliases when synthetic 128 run IDs are requested for 2B.\n",
"RUN_ALIASES = {\n",
" \"p2b_simplecnn_128\": \"p1_simplecnn_baseline\",\n",
" \"p2b_resnet18_128\": \"p1_resnet18_baseline\",\n",
"}\n",
"\n",
"PLANNED_COMPARISONS = [\n",
" (\"2A\", \"ResNet18\", \"normalization\", \"p2a_t1_original\", \"p2a_t2_real_norm\", \"real_norm - imagenet_norm\"),\n",
" (\"2A\", \"ResNet18\", \"source_holdout\", \"p2a_t1_original\", \"p2a_t3_holdout_text2img\", \"holdout text2img - all-source\"),\n",
" (\"2A\", \"ResNet18\", \"source_holdout\", \"p2a_t1_original\", \"p2a_t3_holdout_inpainting\", \"holdout inpainting - all-source\"),\n",
" (\"2A\", \"ResNet18\", \"source_holdout\", \"p2a_t1_original\", \"p2a_t3_holdout_insight\", \"holdout insight - all-source\"),\n",
"\n",
" (\"2B\", \"SimpleCNN\", \"resolution\", \"p2b_simplecnn_128\", \"p2b_simplecnn_224\", \"224 - 128\"),\n",
" (\"2B\", \"ResNet18\", \"resolution\", \"p2b_resnet18_128\", \"p2b_resnet18_224\", \"224 - 128\"),\n",
"\n",
" (\"2C\", \"SimpleCNN\", \"facecrop\", \"p2b_simplecnn_224\", \"p2c_simplecnn_facecrop\", \"facecrop - no facecrop\"),\n",
" (\"2C\", \"ResNet18\", \"facecrop\", \"p2b_resnet18_224\", \"p2c_resnet18_facecrop\", \"facecrop - no facecrop\"),\n",
"\n",
" (\"2D\", \"SimpleCNN\", \"augmentation\", \"p2b_simplecnn_224\", \"p2d_simplecnn_aug\", \"light aug - no aug\"),\n",
" (\"2D\", \"ResNet18\", \"augmentation\", \"p2b_resnet18_224\", \"p2d_resnet18_aug\", \"light aug - no aug\"),\n",
"\n",
" (\"2E\", \"SimpleCNN\", \"facecrop + augmentation\", \"p2c_simplecnn_facecrop\", \"p2e_simplecnn_facecrop_aug\", \"facecrop+aug - facecrop\"),\n",
" (\"2E\", \"ResNet18\", \"facecrop + augmentation\", \"p2c_resnet18_facecrop\", \"p2e_resnet18_facecrop_aug\", \"facecrop+aug - facecrop\"),\n",
"]\n"
]
},
{
"cell_type": "markdown",
"id": "6e2ccd27",
"metadata": {},
"source": [
"## Evidence audit\n",
"\n",
"Before comparing numbers, check whether the planned artifacts exist. Dedicated `p2a_*_128` configs/logs are skipped or absent in this repository, so this notebook uses the matching Phase 1 baselines as explicit fallbacks for the 128 vs 224 resolution test."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "53356e8b",
"metadata": {},
"outputs": [],
"source": [
"def load_json(path: Path) -> dict[str, Any] | None:\n",
" if not path.exists():\n",
" return None\n",
" with path.open() as f:\n",
" return json.load(f)\n",
"\n",
"\n",
"def config_path_for(run: str) -> Path | None:\n",
" candidates = [\n",
" CONFIG_DIR / \"phase2\" / f\"{run}.json\",\n",
" CONFIG_DIR / \"phase2\" / f\"{run}.json.skip\",\n",
" CONFIG_DIR / \"phase1\" / f\"{run}.json\",\n",
" CONFIG_DIR / \"phase1\" / f\"{run}.json.skip\",\n",
" ]\n",
" return next((p for p in candidates if p.exists()), None)\n",
"\n",
"\n",
"def log_path_for(run: str) -> Path:\n",
" return LOGS_DIR / f\"{run}.json\"\n",
"\n",
"\n",
"def resolve_run(run: str) -> str:\n",
" return run if log_path_for(run).exists() else RUN_ALIASES.get(run, run)\n",
"\n",
"\n",
"def load_results(run: str) -> dict[str, Any] | None:\n",
" resolved = resolve_run(run)\n",
" return load_json(log_path_for(resolved))\n",
"\n",
"\n",
"def metric_values(results: dict[str, Any], metric: str = \"auc_roc\") -> np.ndarray:\n",
" vals = []\n",
" for fold in results.get(\"fold_results\", []):\n",
" value = fold.get(\"test_metrics\", {}).get(metric)\n",
" if value is not None:\n",
" vals.append(float(value))\n",
" return np.asarray(vals, dtype=float)\n",
"\n",
"\n",
"def best_epoch_gap(fold: dict[str, Any], metric: str = \"auc\") -> float | None:\n",
" hist = fold.get(\"history\", {})\n",
" train_key = f\"train_{metric}\"\n",
" val_key = f\"val_{metric}\"\n",
" train = hist.get(train_key, [])\n",
" val = hist.get(val_key, [])\n",
" if not train or not val:\n",
" return None\n",
" idx = int(np.nanargmax(np.asarray(val, dtype=float)))\n",
" return float(train[idx] - val[idx])\n",
"\n",
"\n",
"def final_epoch_gap(fold: dict[str, Any], metric: str = \"auc\") -> float | None:\n",
" hist = fold.get(\"history\", {})\n",
" train = hist.get(f\"train_{metric}\", [])\n",
" val = hist.get(f\"val_{metric}\", [])\n",
" if not train or not val:\n",
" return None\n",
" return float(train[-1] - val[-1])\n",
"\n",
"\n",
"def summarize_run(spec: RunSpec) -> dict[str, Any]:\n",
" resolved = resolve_run(spec.run)\n",
" results = load_results(spec.run)\n",
" config_path = config_path_for(spec.run) or config_path_for(resolved)\n",
" cfg = load_json(config_path) if config_path else None\n",
"\n",
" row = {\n",
" \"section\": spec.section,\n",
" \"run\": spec.run,\n",
" \"resolved_run\": resolved,\n",
" \"label\": spec.label,\n",
" \"model\": spec.model,\n",
" \"condition\": spec.condition,\n",
" \"role\": spec.intended_role,\n",
" \"fallback_for\": spec.fallback_for,\n",
" \"config_path\": str(config_path.relative_to(PROJECT_ROOT)) if config_path else None,\n",
" \"config_status\": \"present\" if config_path and config_path.suffix == \".json\" else (\"skipped\" if config_path else \"missing\"),\n",
" \"log_status\": \"present\" if log_path_for(spec.run).exists() else (\"fallback\" if resolved != spec.run and log_path_for(resolved).exists() else \"missing\"),\n",
" \"n_folds\": None,\n",
" \"auc_mean\": np.nan,\n",
" \"auc_std\": np.nan,\n",
" \"acc_mean\": np.nan,\n",
" \"f1_mean\": np.nan,\n",
" \"gap_best_mean\": np.nan,\n",
" \"gap_final_mean\": np.nan,\n",
" \"image_size\": None,\n",
" \"face_crop\": None,\n",
" \"augment\": None,\n",
" \"normalization\": None,\n",
" \"train_sources\": None,\n",
" \"eval_sources\": None,\n",
" }\n",
"\n",
" if cfg:\n",
" row.update({\n",
" \"image_size\": cfg.get(\"image_size\"),\n",
" \"face_crop\": cfg.get(\"face_crop\"),\n",
" \"augment\": \"light\" if isinstance(cfg.get(\"augment\"), dict) else cfg.get(\"augment\"),\n",
" \"normalization\": cfg.get(\"normalization\"),\n",
" \"train_sources\": tuple(cfg.get(\"train_sources\", [])) or None,\n",
" \"eval_sources\": tuple(cfg.get(\"eval_sources\", [])) or None,\n",
" })\n",
"\n",
" if results:\n",
" agg = results.get(\"aggregated_metrics\", {})\n",
" row.update({\n",
" \"n_folds\": results.get(\"n_folds\"),\n",
" \"auc_mean\": agg.get(\"auc_roc\", {}).get(\"mean\", np.nan),\n",
" \"auc_std\": agg.get(\"auc_roc\", {}).get(\"std\", np.nan),\n",
" \"acc_mean\": agg.get(\"accuracy\", {}).get(\"mean\", np.nan),\n",
" \"f1_mean\": agg.get(\"f1\", {}).get(\"mean\", np.nan),\n",
" })\n",
" best_gaps = [best_epoch_gap(f) for f in results.get(\"fold_results\", [])]\n",
" final_gaps = [final_epoch_gap(f) for f in results.get(\"fold_results\", [])]\n",
" best_gaps = [x for x in best_gaps if x is not None]\n",
" final_gaps = [x for x in final_gaps if x is not None]\n",
" row[\"gap_best_mean\"] = float(np.mean(best_gaps)) if best_gaps else np.nan\n",
" row[\"gap_final_mean\"] = float(np.mean(final_gaps)) if final_gaps else np.nan\n",
"\n",
" return row\n",
"\n",
"runs_df = pd.DataFrame([summarize_run(spec) for spec in RUN_SPECS])\n",
"\n",
"# Prefer canonical rows for analysis: keep fallbacks only where expected rows are missing.\n",
"canonical_runs_df = runs_df[runs_df[\"role\"] == \"expected\"].copy()\n",
"for missing_run, fallback_run in RUN_ALIASES.items():\n",
" mask = canonical_runs_df[\"run\"].eq(missing_run) & canonical_runs_df[\"log_status\"].eq(\"missing\")\n",
" if mask.any():\n",
" fallback = runs_df[runs_df[\"run\"].eq(fallback_run)].copy()\n",
" if not fallback.empty:\n",
" fallback.loc[:, \"run\"] = missing_run\n",
" fallback.loc[:, \"label\"] = fallback.iloc[0][\"label\"].replace(\" (P1 fallback)\", \"\") + \" [P1 fallback]\"\n",
" fallback.loc[:, \"role\"] = \"expected_via_fallback\"\n",
" canonical_runs_df = pd.concat([canonical_runs_df[~mask], fallback], ignore_index=True)\n",
"\n",
"print(\"Artifact audit:\")\n",
"display(runs_df[[\"section\", \"run\", \"resolved_run\", \"role\", \"config_status\", \"log_status\", \"n_folds\"]].sort_values([\"section\", \"run\"]))\n",
"\n",
"missing_expected = runs_df[(runs_df[\"role\"] == \"expected\") & (runs_df[\"log_status\"] == \"missing\")][\"run\"].tolist()\n",
"print(f\"\\nExpected runs with no direct log: {missing_expected or 'none'}\")\n",
"print(\"Fallbacks used:\", {k: v for k, v in RUN_ALIASES.items() if k in missing_expected})"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "b21a9faf",
"metadata": {},
"outputs": [],
"source": [
"# Protocol consistency audit from loaded logs/configs.\n",
"protocol_fields = [\n",
" \"cv_folds\", \"batch_size\", \"early_stopping_patience\", \"seed\", \"subsample\",\n",
" \"lr\", \"weight_decay\", \"T_max\", \"epochs\",\n",
"]\n",
"\n",
"protocol_rows = []\n",
"for _, row in canonical_runs_df.iterrows():\n",
" results = load_results(row[\"run\"])\n",
" cfg = (results or {}).get(\"config\", {})\n",
" protocol_rows.append({\"run\": row[\"run\"], **{k: cfg.get(k) for k in protocol_fields}})\n",
"\n",
"protocol_df = pd.DataFrame(protocol_rows)\n",
"display(protocol_df)\n",
"\n",
"print(\"Field variability across loaded canonical runs:\")\n",
"for field in protocol_fields:\n",
" vals = sorted({str(v) for v in protocol_df[field].dropna().unique()})\n",
" print(f\" {field:28s}: {vals}\")"
]
},
{
"cell_type": "markdown",
"id": "6802bcd9",
"metadata": {},
"source": [
"## Results table\n",
"\n",
"The table below is ranked by AUC and includes two gap estimates:\n",
"\n",
"- `gap_best_mean`: train AUC minus validation AUC at each fold's best validation epoch. This is closest to the saved best checkpoint.\n",
"- `gap_final_mean`: train AUC minus validation AUC at the final epoch. This is useful for diagnosing late overfit but is less aligned with test evaluation."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "be1ec0ba",
"metadata": {},
"outputs": [],
"source": [
"analysis_df = canonical_runs_df[canonical_runs_df[\"log_status\"].isin([\"present\", \"fallback\"])].copy()\n",
"analysis_df = analysis_df.sort_values(\"auc_mean\", ascending=False)\n",
"\n",
"cols = [\n",
" \"section\", \"label\", \"run\", \"resolved_run\", \"model\", \"condition\", \"log_status\",\n",
" \"auc_mean\", \"auc_std\", \"acc_mean\", \"f1_mean\", \"gap_best_mean\", \"gap_final_mean\",\n",
"]\n",
"\n",
"display(\n",
" analysis_df[cols]\n",
" .style.format({\n",
" \"auc_mean\": \"{:.4f}\",\n",
" \"auc_std\": \"{:.4f}\",\n",
" \"acc_mean\": \"{:.4f}\",\n",
" \"f1_mean\": \"{:.4f}\",\n",
" \"gap_best_mean\": \"{:+.4f}\",\n",
" \"gap_final_mean\": \"{:+.4f}\",\n",
" })\n",
" .background_gradient(subset=[\"auc_mean\"], cmap=\"Greens\")\n",
")"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "1e0d21c1",
"metadata": {},
"outputs": [],
"source": [
"def paired_comparison(section: str, model: str, question: str, before: str, after: str, contrast: str) -> dict[str, Any]:\n",
" r0 = load_results(before)\n",
" r1 = load_results(after)\n",
" resolved_before = resolve_run(before)\n",
" resolved_after = resolve_run(after)\n",
" out = {\n",
" \"section\": section,\n",
" \"model\": model,\n",
" \"question\": question,\n",
" \"before\": before,\n",
" \"after\": after,\n",
" \"resolved_before\": resolved_before,\n",
" \"resolved_after\": resolved_after,\n",
" \"contrast\": contrast,\n",
" \"status\": \"ok\" if r0 and r1 else \"missing\",\n",
" \"n\": 0,\n",
" \"before_auc\": np.nan,\n",
" \"after_auc\": np.nan,\n",
" \"delta_auc\": np.nan,\n",
" \"delta_ci95\": np.nan,\n",
" \"ttest_p\": np.nan,\n",
" \"wilcoxon_p\": np.nan,\n",
" \"cohen_dz\": np.nan,\n",
" \"before_gap\": np.nan,\n",
" \"after_gap\": np.nan,\n",
" \"delta_gap\": np.nan,\n",
" \"interpretation\": \"insufficient data\",\n",
" \"caveat\": \"\",\n",
" }\n",
" if not (r0 and r1):\n",
" return out\n",
"\n",
" v0 = metric_values(r0, \"auc_roc\")\n",
" v1 = metric_values(r1, \"auc_roc\")\n",
" n = min(len(v0), len(v1))\n",
" v0, v1 = v0[:n], v1[:n]\n",
" diff = v1 - v0\n",
"\n",
" out.update({\n",
" \"n\": n,\n",
" \"before_auc\": float(np.mean(v0)),\n",
" \"after_auc\": float(np.mean(v1)),\n",
" \"delta_auc\": float(np.mean(diff)),\n",
" })\n",
"\n",
" if n >= 2:\n",
" sd = float(np.std(diff, ddof=1))\n",
" se = sd / math.sqrt(n) if sd > 0 else 0.0\n",
" out[\"delta_ci95\"] = float(stats.t.ppf(0.975, df=n - 1) * se) if n > 1 else np.nan\n",
" if sd > 0:\n",
" out[\"cohen_dz\"] = float(np.mean(diff) / sd)\n",
" out[\"ttest_p\"] = float(stats.ttest_rel(v1, v0).pvalue)\n",
" if n >= 3 and not np.allclose(diff, 0):\n",
" try:\n",
" out[\"wilcoxon_p\"] = float(stats.wilcoxon(diff).pvalue)\n",
" except ValueError:\n",
" pass\n",
"\n",
" gaps0 = [best_epoch_gap(f) for f in r0.get(\"fold_results\", [])]\n",
" gaps1 = [best_epoch_gap(f) for f in r1.get(\"fold_results\", [])]\n",
" gaps0 = np.asarray([x for x in gaps0 if x is not None], dtype=float)\n",
" gaps1 = np.asarray([x for x in gaps1 if x is not None], dtype=float)\n",
" if len(gaps0) and len(gaps1):\n",
" m = min(len(gaps0), len(gaps1))\n",
" out[\"before_gap\"] = float(np.mean(gaps0[:m]))\n",
" out[\"after_gap\"] = float(np.mean(gaps1[:m]))\n",
" out[\"delta_gap\"] = float(np.mean(gaps1[:m] - gaps0[:m]))\n",
"\n",
" if question == \"source_holdout\":\n",
" out[\"caveat\"] = \"Aggregate holdout-run AUC only; not held-out-source vs in-source AUC.\"\n",
" if before != resolved_before or after != resolved_after:\n",
" out[\"caveat\"] = (out[\"caveat\"] + \" \" if out[\"caveat\"] else \"\") + \"Uses Phase 1 fallback for missing p2a 128 log.\"\n",
"\n",
" if out[\"delta_auc\"] >= 0.01:\n",
" out[\"interpretation\"] = \"meaningful improvement\"\n",
" elif out[\"delta_auc\"] > 0.002:\n",
" out[\"interpretation\"] = \"small improvement\"\n",
" elif out[\"delta_auc\"] >= -0.002:\n",
" out[\"interpretation\"] = \"negligible change\"\n",
" elif out[\"delta_auc\"] > -0.01:\n",
" out[\"interpretation\"] = \"small drop\"\n",
" else:\n",
" out[\"interpretation\"] = \"meaningful drop\"\n",
" return out\n",
"\n",
"comparisons_df = pd.DataFrame([paired_comparison(*args) for args in PLANNED_COMPARISONS])\n",
"\n",
"# Benjamini-Hochberg correction across planned paired t-tests where available.\n",
"valid_p = comparisons_df[\"ttest_p\"].notna()\n",
"pvals = comparisons_df.loc[valid_p, \"ttest_p\"].to_numpy()\n",
"qvals = np.full(len(comparisons_df), np.nan)\n",
"if len(pvals):\n",
" order = np.argsort(pvals)\n",
" ranked = pvals[order]\n",
" adjusted = np.empty_like(ranked)\n",
" m = len(ranked)\n",
" running = 1.0\n",
" for i in range(m - 1, -1, -1):\n",
" running = min(running, ranked[i] * m / (i + 1))\n",
" adjusted[i] = running\n",
" qvals[np.where(valid_p)[0][order]] = adjusted\n",
"comparisons_df[\"bh_q\"] = qvals\n",
"\n",
"display(\n",
" comparisons_df[[\n",
" \"section\", \"model\", \"question\", \"contrast\", \"before_auc\", \"after_auc\", \"delta_auc\",\n",
" \"delta_ci95\", \"ttest_p\", \"bh_q\", \"wilcoxon_p\", \"cohen_dz\", \"delta_gap\", \"interpretation\", \"caveat\",\n",
" ]].style.format({\n",
" \"before_auc\": \"{:.4f}\",\n",
" \"after_auc\": \"{:.4f}\",\n",
" \"delta_auc\": \"{:+.4f}\",\n",
" \"delta_ci95\": \"\u00b1{:.4f}\",\n",
" \"ttest_p\": \"{:.4f}\",\n",
" \"bh_q\": \"{:.4f}\",\n",
" \"wilcoxon_p\": \"{:.4f}\",\n",
" \"cohen_dz\": \"{:+.2f}\",\n",
" \"delta_gap\": \"{:+.4f}\",\n",
" }).background_gradient(subset=[\"delta_auc\"], cmap=\"RdYlGn\", vmin=-0.06, vmax=0.06)\n",
")"
]
},
{
"cell_type": "markdown",
"id": "f20e5262",
"metadata": {},
"source": [
"## Visual summary\n",
"\n",
"Two plots are most useful for decision-making:\n",
"\n",
"- Ranking all conditions by AUC shows the best observed configurations but can overstate duplicated/near-identical runs.\n",
"- Paired delta plot shows the controlled effect of each preprocessing change and exposes uncertainty."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "42882c6a",
"metadata": {},
"outputs": [],
"source": [
"plot_df = analysis_df.copy()\n",
"plot_df[\"display_label\"] = plot_df[\"section\"] + \" | \" + plot_df[\"label\"]\n",
"plot_df = plot_df.sort_values(\"auc_mean\", ascending=True)\n",
"\n",
"fig, ax = plt.subplots(figsize=(11, max(7, 0.35 * len(plot_df))))\n",
"colors = {\"2A\": \"#4C78A8\", \"2B\": \"#F58518\", \"2C\": \"#54A24B\", \"2D\": \"#E45756\", \"2E\": \"#B279A2\"}\n",
"ax.barh(\n",
" plot_df[\"display_label\"],\n",
" plot_df[\"auc_mean\"],\n",
" xerr=plot_df[\"auc_std\"],\n",
" color=[colors.get(s, \"#999999\") for s in plot_df[\"section\"]],\n",
" alpha=0.85,\n",
")\n",
"ax.set_xlim(0.65, 1.0)\n",
"ax.set_xlabel(\"Mean AUC across CV folds\")\n",
"ax.set_title(\"Phase 2 Conditions Ranked by AUC\")\n",
"ax.axvline(0.95, color=\"black\", linewidth=1, linestyle=\"--\", alpha=0.4)\n",
"for y, (_, row) in enumerate(plot_df.iterrows()):\n",
" ax.text(row[\"auc_mean\"] + 0.004, y, f\"{row['auc_mean']:.4f}\", va=\"center\", fontsize=9)\n",
"fig.tight_layout()\n",
"fig.savefig(FIGURES_DIR / \"ranked_auc.png\", dpi=200, bbox_inches=\"tight\")\n",
"plt.show()\n",
"\n",
"forest = comparisons_df.copy()\n",
"forest[\"display\"] = forest[\"section\"] + \" \" + forest[\"model\"] + \" - \" + forest[\"contrast\"]\n",
"forest = forest.iloc[::-1]\n",
"fig, ax = plt.subplots(figsize=(11, max(6, 0.45 * len(forest))))\n",
"y = np.arange(len(forest))\n",
"ax.errorbar(\n",
" forest[\"delta_auc\"], y,\n",
" xerr=forest[\"delta_ci95\"],\n",
" fmt=\"o\", color=\"#1F2937\", ecolor=\"#6B7280\", capsize=4,\n",
")\n",
"ax.axvline(0, color=\"black\", linewidth=1)\n",
"ax.axvspan(-0.002, 0.002, color=\"#9CA3AF\", alpha=0.18, label=\"negligible band\")\n",
"ax.set_yticks(y)\n",
"ax.set_yticklabels(forest[\"display\"])\n",
"ax.set_xlabel(\"Delta AUC (after - before), paired by fold\")\n",
"ax.set_title(\"Planned Phase 2 Effect Estimates\")\n",
"ax.legend(loc=\"lower right\")\n",
"fig.tight_layout()\n",
"fig.savefig(FIGURES_DIR / \"planned_effects.png\", dpi=200, bbox_inches=\"tight\")\n",
"plt.show()"
]
},
{
"cell_type": "markdown",
"id": "e063cfc0",
"metadata": {},
"source": [
"## 2A - Shortcut analysis\n",
"\n",
"Shortcut checks map to `p2a_*` configs:\n",
"- `p2a_t1_original` vs `p2a_t2_real_norm` (normalization)\n",
"- `p2a_t1_original` vs `p2a_t3_holdout_*` (source_holdout)\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "910bd5bd",
"metadata": {},
"outputs": [],
"source": [
"def comparison_subset(section: str, question: str | None = None) -> pd.DataFrame:\n",
" df = comparisons_df[comparisons_df[\"section\"].eq(section)].copy()\n",
" if question:\n",
" df = df[df[\"question\"].eq(question)]\n",
" return df\n",
"\n",
"\n",
"def print_comparison_readout(df: pd.DataFrame) -> None:\n",
" for _, row in df.iterrows():\n",
" print(f\"{row['section']} {row['model']} - {row['contrast']}\")\n",
" print(f\" AUC: {row['before_auc']:.4f} -> {row['after_auc']:.4f} ({row['delta_auc']:+.4f})\")\n",
" print(f\" paired t p={row['ttest_p']:.4f}, BH q={row['bh_q']:.4f}, CI95 delta=\u00b1{row['delta_ci95']:.4f}\")\n",
" print(f\" gap delta: {row['delta_gap']:+.4f}; interpretation: {row['interpretation']}\")\n",
" if row['caveat']:\n",
" print(f\" caveat: {row['caveat']}\")\n",
" print()\n",
"\n",
"print_comparison_readout(comparison_subset(\"2B\", \"resolution\"))\n",
"\n",
"res_plot = comparison_subset(\"2B\", \"resolution\")\n",
"fig, ax = plt.subplots(figsize=(8, 5))\n",
"for _, row in res_plot.iterrows():\n",
" r0, r1 = load_results(row[\"before\"]), load_results(row[\"after\"])\n",
" v0, v1 = metric_values(r0), metric_values(r1)\n",
" x = [0, 1]\n",
" for a, b in zip(v0, v1):\n",
" ax.plot(x, [a, b], color=\"#9CA3AF\", alpha=0.7)\n",
" ax.plot(x, [v0.mean(), v1.mean()], marker=\"o\", linewidth=3, label=row[\"model\"])\n",
"ax.set_xticks([0, 1])\n",
"ax.set_xticklabels([\"128\", \"224\"])\n",
"ax.set_ylabel(\"AUC\")\n",
"ax.set_title(\"2B Resolution: Fold-Paired AUC\")\n",
"ax.legend()\n",
"fig.tight_layout()\n",
"fig.savefig(FIGURES_DIR / \"2b_resolution_paired.png\", dpi=200, bbox_inches=\"tight\")\n",
"plt.show()"
]
},
{
"cell_type": "markdown",
"id": "530e8675",
"metadata": {},
"source": [
"## 2B - Resolution impact\n",
"\n",
"This section compares 128 vs 224 using `p2b_*_224` and Phase 1 baselines as explicit 128 fallbacks.\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "13304d38",
"metadata": {},
"outputs": [],
"source": [
"print_comparison_readout(comparison_subset(\"2C\", \"facecrop\"))\n",
"\n",
"face_df = canonical_runs_df[canonical_runs_df[\"section\"].eq(\"2C\")].copy()\n",
"fig, axes = plt.subplots(1, 2, figsize=(12, 5), sharey=False)\n",
"for ax, model in zip(axes, [\"SimpleCNN\", \"ResNet18\"]):\n",
" sub = face_df[face_df[\"model\"].eq(model)].sort_values(\"face_crop\")\n",
" ax.bar(sub[\"condition\"], sub[\"auc_mean\"], yerr=sub[\"auc_std\"], color=[\"#D97706\", \"#059669\"], alpha=0.85, capsize=5)\n",
" ax.set_title(model)\n",
" ax.set_ylim(0.70 if model == \"SimpleCNN\" else 0.94, 0.99)\n",
" ax.set_ylabel(\"AUC\")\n",
" ax.tick_params(axis=\"x\", rotation=20)\n",
"fig.suptitle(\"2C Facecrop Impact\")\n",
"fig.tight_layout()\n",
"fig.savefig(FIGURES_DIR / \"2c_facecrop.png\", dpi=200, bbox_inches=\"tight\")\n",
"plt.show()"
]
},
{
"cell_type": "markdown",
"id": "8702d10d",
"metadata": {},
"source": [
"## 2C - Facecrop impact\n",
"\n",
"This section compares `p2c_*_facecrop` against the matching `p2b_*_224` no-facecrop baselines.\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "ec5e03ef",
"metadata": {},
"outputs": [],
"source": [
"print_comparison_readout(comparison_subset(\"2A\"))\n\n# Inspect whether logs contain the per-source data needed by v2.md.\nsource_audit = []\nfor run in [\"p2a_t1_original\", \"p2a_t3_holdout_text2img\", \"p2a_t3_holdout_inpainting\", \"p2a_t3_holdout_insight\"]:\n results = load_results(run)\n has_per_source = False\n has_records = False\n example_keys = []\n if results:\n for fold in results.get(\"fold_results\", []):\n tm = fold.get(\"test_metrics\", {})\n example_keys = sorted(tm.keys())\n has_per_source = has_per_source or any(k in tm for k in [\"per_source\", \"per_source_metrics\", \"pairwise_source_metrics\", \"source_metrics\", \"pair_metrics\"])\n has_records = has_records or any(k in fold for k in [\"records\", \"predictions\", \"test_records\"])\n source_audit.append({\n \"run\": run,\n \"has_per_source_metrics\": has_per_source,\n \"has_prediction_records\": has_records,\n \"test_metric_keys\": example_keys,\n })\nsource_audit_df = pd.DataFrame(source_audit)\ndisplay(source_audit_df)\n\nholdout_runs = [\"p2a_t1_original\", \"p2a_t3_holdout_text2img\", \"p2a_t3_holdout_inpainting\", \"p2a_t3_holdout_insight\"]\nholdout_df = canonical_runs_df[canonical_runs_df[\"run\"].isin(holdout_runs)].copy()\nholdout_df[\"delta_vs_all_source\"] = holdout_df[\"auc_mean\"] - float(holdout_df.loc[holdout_df[\"run\"].eq(\"p2a_t1_original\"), \"auc_mean\"].iloc[0])\n\nfig, ax = plt.subplots(figsize=(9, 5))\nax.bar(holdout_df[\"label\"], holdout_df[\"auc_mean\"], yerr=holdout_df[\"auc_std\"], color=\"#54A24B\", alpha=0.85, capsize=5)\nax.set_ylim(0.88, 0.99)\nax.set_ylabel(\"Aggregate AUC\")\nax.set_title(\"2C Source Holdout Proxy: Aggregate Test AUC\")\nax.tick_params(axis=\"x\", rotation=20)\nfor i, (_, row) in enumerate(holdout_df.iterrows()):\n ax.text(i, row[\"auc_mean\"] + 0.004, f\"{row['delta_vs_all_source']:+.3f}\", ha=\"center\", fontsize=9)\nfig.tight_layout()\nfig.savefig(FIGURES_DIR / \"2c_holdout_proxy.png\", dpi=200, bbox_inches=\"tight\")\nplt.show()\n\nprint(\"Geometry diagnostic evidence:\")\ngeometry_keys = []\nfor run in [\"p2a_t1_original\", \"p2a_t2_real_norm\"]:\n results = load_results(run)\n cfg = (results or {}).get(\"config\", {})\n geometry_keys.append({\n \"run\": run,\n \"config_geometry_condition\": cfg.get(\"geometry_condition\"),\n \"has_matched_geometry_metric\": any(\n \"geometry\" in str(k).lower() or \"matched\" in str(k).lower()\n for fold in (results or {}).get(\"fold_results\", [])\n for k in fold.get(\"test_metrics\", {}).keys()\n ),\n })\ndisplay(pd.DataFrame(geometry_keys))"
]
},
{
"cell_type": "markdown",
"id": "2c3b8812",
"metadata": {},
"source": [
"## 2D / 2E - Augmentation impact and test-set integrity\n",
"\n",
"The augmentation question has two parts:\n",
"\n",
"- Does light augmentation help at 224 without facecrop?\n",
"- Does it help once facecrop is enabled?\n",
"\n",
"The implementation also needs to guarantee that validation/test evaluation is not stochastic. The preprocessing pipeline keeps stochastic operations behind `self.train`, so `train=False` disables them even if augmentation settings exist."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "f11c3257",
"metadata": {},
"outputs": [],
"source": [
"print(\"2D (p2d): augmentation without facecrop\")\n",
"print_comparison_readout(comparison_subset(\"2D\", \"augmentation\"))\n",
"print(\"2E (p2e): augmentation with facecrop\")\n",
"print_comparison_readout(comparison_subset(\"2E\", \"facecrop + augmentation\"))\n",
"\n",
"aug_sections = comparisons_df[comparisons_df[\"section\"].isin([\"2D\", \"2E\"])].copy()\n",
"fig, ax = plt.subplots(figsize=(9, 5))\n",
"labels = aug_sections[\"section\"] + \" \" + aug_sections[\"model\"]\n",
"ax.bar(labels, aug_sections[\"delta_auc\"], yerr=aug_sections[\"delta_ci95\"], color=[\"#E45756\" if d < 0 else \"#059669\" for d in aug_sections[\"delta_auc\"]], alpha=0.85, capsize=5)\n",
"ax.axhline(0, color=\"black\", linewidth=1)\n",
"ax.set_ylabel(\"Delta AUC from adding augmentation\")\n",
"ax.set_title(\"Augmentation Effects Across Facecrop Conditions\")\n",
"ax.tick_params(axis=\"x\", rotation=20)\n",
"fig.tight_layout()\n",
"fig.savefig(FIGURES_DIR / \"2d_2e_augmentation_effects.png\", dpi=200, bbox_inches=\"tight\")\n",
"plt.show()\n",
"\n",
"# Static and behavioral audit of eval stochasticity.\n",
"try:\n",
" import inspect\n",
" from src.preprocessing.pipeline import DFFImagePipeline\n",
" from src.evaluation import evaluate as evaluate_module\n",
"\n",
" pipeline_src = inspect.getsource(DFFImagePipeline)\n",
" build_transforms_src = inspect.getsource(evaluate_module.build_transforms)\n",
" stochastic_guards = {\n",
" \"flip_guarded_by_train\": \"if self.train and random.random() < self.hflip_p\" in pipeline_src,\n",
" \"rotate_guarded_by_train\": \"if self.train and self.rotation_degrees > 0\" in pipeline_src,\n",
" \"color_jitter_returns_when_not_train\": \"if not self.train:\" in pipeline_src,\n",
" \"blur_guarded_by_train\": \"if self.train and random.random() < self.blur_p\" in pipeline_src,\n",
" \"jpeg_guarded_by_train\": \"if self.train and random.random() < self.jpeg_p\" in pipeline_src,\n",
" \"erase_guarded_by_train\": \"if self.train and random.random() < self.erase_p\" in pipeline_src,\n",
" \"noise_guarded_by_train\": \"if self.train and random.random() < self.noise_p\" in pipeline_src,\n",
" \"cv_transform_uses_train_flag\": \"get_transforms(train=train\" in build_transforms_src,\n",
" }\n",
" display(pd.DataFrame([stochastic_guards]).T.rename(columns={0: \"passes\"}))\n",
"except Exception as exc:\n",
" print(f\"Could not run transform audit: {exc}\")"
]
},
{
"cell_type": "markdown",
"id": "02e47658",
"metadata": {},
"source": [
"## Decision synthesis\n",
"\n",
"This section converts the evidence into Phase 3 settings. It intentionally distinguishes a recommendation from a claim:\n",
"\n",
"- Recommendation: choose the setting that is best supported for the next experiment.\n",
"- Claim: what the current evidence proves. Some Phase 2C claims remain incomplete without per-source or matched-geometry outputs."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "7034443c",
"metadata": {},
"outputs": [],
"source": [
"def get_delta(question: str, model: str | None = None, section: str | None = None) -> pd.DataFrame:\n",
" df = comparisons_df[comparisons_df[\"question\"].eq(question)].copy()\n",
" if model:\n",
" df = df[df[\"model\"].eq(model)]\n",
" if section:\n",
" df = df[df[\"section\"].eq(section)]\n",
" return df\n",
"\n",
"resolution_resnet = get_delta(\"resolution\", \"ResNet18\").iloc[0]\n",
"facecrop_resnet = get_delta(\"facecrop\", \"ResNet18\").iloc[0]\n",
"facecrop_simple = get_delta(\"facecrop\", \"SimpleCNN\").iloc[0]\n",
"aug_no_crop_resnet = get_delta(\"augmentation\", \"ResNet18\").iloc[0]\n",
"aug_no_crop_simple = get_delta(\"augmentation\", \"SimpleCNN\").iloc[0]\n",
"aug_crop_resnet = get_delta(\"facecrop + augmentation\", \"ResNet18\").iloc[0]\n",
"aug_crop_simple = get_delta(\"facecrop + augmentation\", \"SimpleCNN\").iloc[0]\n",
"norm = get_delta(\"normalization\", \"ResNet18\").iloc[0]\n",
"\n",
"recommendations = [\n",
" {\n",
" \"choice\": \"resolution\",\n",
" \"recommendation\": \"224x224\",\n",
" \"evidence\": f\"ResNet18 delta AUC {resolution_resnet.delta_auc:+.4f}; SimpleCNN does not determine Phase 3 capacity.\",\n",
" \"confidence\": \"high\" if resolution_resnet.delta_auc > 0.02 else \"medium\",\n",
" },\n",
" {\n",
" \"choice\": \"facecrop\",\n",
" \"recommendation\": \"use facecrop\",\n",
" \"evidence\": f\"Small positive deltas for both models: SimpleCNN {facecrop_simple.delta_auc:+.4f}, ResNet18 {facecrop_resnet.delta_auc:+.4f}.\",\n",
" \"confidence\": \"medium\",\n",
" },\n",
" {\n",
" \"choice\": \"augmentation\",\n",
" \"recommendation\": \"do not use light augmentation for Phase 3 at 20% data\",\n",
" \"evidence\": f\"SimpleCNN drops {aug_no_crop_simple.delta_auc:+.4f} without facecrop and {aug_crop_simple.delta_auc:+.4f} with facecrop; ResNet18 is neutral/slightly mixed ({aug_no_crop_resnet.delta_auc:+.4f}, {aug_crop_resnet.delta_auc:+.4f}).\",\n",
" \"confidence\": \"high for SimpleCNN, medium for ResNet18\",\n",
" },\n",
" {\n",
" \"choice\": \"normalization\",\n",
" \"recommendation\": \"ImageNet normalization\",\n",
" \"evidence\": f\"Real-train-only normalization delta AUC {norm.delta_auc:+.4f}; no useful gain and less standard for pretrained ResNet.\",\n",
" \"confidence\": \"medium\",\n",
" },\n",
" {\n",
" \"choice\": \"shortcut/source claims\",\n",
" \"recommendation\": \"do not overclaim; add per-source or prediction exports before final report\",\n",
" \"evidence\": \"Current CV logs lack held-out-source vs in-source AUC and matched-geometry test metrics.\",\n",
" \"confidence\": \"high\",\n",
" },\n",
"]\n",
"\n",
"recommendations_df = pd.DataFrame(recommendations)\n",
"display(recommendations_df)\n",
"\n",
"summary = {\n",
" \"phase\": \"phase2\",\n",
" \"source_documents\": [\"classifier/v2.md\", \"classifier/impl.md\"],\n",
" \"artifact_counts\": {\n",
" \"canonical_runs\": int(len(canonical_runs_df)),\n",
" \"loaded_canonical_runs\": int(canonical_runs_df[\"log_status\"].isin([\"present\", \"fallback\"]).sum()),\n",
" \"fallback_runs_used\": {k: v for k, v in RUN_ALIASES.items() if resolve_run(k) != k},\n",
" },\n",
" \"recommendations\": recommendations,\n",
" \"planned_comparisons\": comparisons_df.replace({np.nan: None}).to_dict(orient=\"records\"),\n",
" \"known_gaps\": [\n",
" \"Dedicated p2a_*_128 logs are absent/skipped; Phase 1 baselines are used as fallbacks.\",\n",
" \"Source holdout logs do not include prediction-level or per-source metrics, so held-out-source AUC vs in-source AUC cannot be computed.\",\n",
" \"No matched-geometry evaluation metric is present in p2c logs, so geometry shortcut analysis is incomplete.\",\n",
" ],\n",
"}\n",
"\n",
"summary_path = ANALYSIS_DIR / \"phase2_analysis_summary.json\"\n",
"with summary_path.open(\"w\") as f:\n",
" json.dump(summary, f, indent=2)\n",
"\n",
"print(f\"Saved summary: {summary_path.relative_to(PROJECT_ROOT)}\")\n",
"print(f\"Saved figures: {FIGURES_DIR.relative_to(PROJECT_ROOT)}\")"
]
},
{
"cell_type": "markdown",
"id": "5a337f73",
"metadata": {},
"source": [
"## Report-ready conclusion\n",
"\n",
"The strongest Phase 2 result is the resolution effect for ResNet18: moving to 224x224 substantially improves AUC under the controlled CV protocol. Face cropping gives a small positive effect and is reasonable to carry forward, especially because it aligns the model with face evidence rather than background context. Light augmentation is not supported at this 20% data setting: it strongly hurts SimpleCNN and provides no reliable gain for ResNet18, with or without face cropping. ImageNet normalization remains preferable because real-train-only normalization does not improve AUC and is less aligned with pretrained ResNet expectations.\n",
"\n",
"Recommended Phase 3 preprocessing: **224x224, facecrop enabled, no light augmentation, ImageNet normalization**.\n",
"\n",
"Limitations to fix before the final report: export prediction-level records or per-source pairwise metrics for source holdout, and add the matched-geometry evaluation required by the shortcut-analysis plan. Without those artifacts, Phase 2C can only support a limited shortcut analysis."
]
}
],
"metadata": {
"kernelspec": {
"display_name": "drl",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.12.13"
}
},
"nbformat": 4,
"nbformat_minor": 5
}
File diff suppressed because one or more lines are too long
File diff suppressed because one or more lines are too long
File diff suppressed because one or more lines are too long
Binary file not shown.

After

Width:  |  Height:  |  Size: 60 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 52 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 92 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 3.3 MiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 2.5 MiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 906 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 2.2 MiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 2.9 MiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 2.0 MiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 29 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 53 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 55 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 66 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 68 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 112 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 37 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 58 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 69 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 82 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 79 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 79 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 79 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 56 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 134 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 67 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 75 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 64 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 48 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 53 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 47 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 72 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 70 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 191 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 49 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 106 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 114 KiB

File diff suppressed because it is too large Load Diff
File diff suppressed because it is too large Load Diff
File diff suppressed because it is too large Load Diff
File diff suppressed because it is too large Load Diff
File diff suppressed because it is too large Load Diff
File diff suppressed because it is too large Load Diff

Some files were not shown because too many files have changed in this diff Show More