Notebooks todos sem resultados fase 4

2026-05-06 20:28:29 +01:00
parent b5313e3320
commit 69666d6aa0
16 changed files with 2312 additions and 533 deletions
@@ -7,17 +7,19 @@
   "source": [
    "# 05 - Grad-CAM Interpretability Analysis\n",
    "\n",
-    "This final classifier notebook adds qualitative evidence. It does not train, tune, or reevaluate models. It loads existing configs, logs, and checkpoints, selects deterministic fold-0 examples, and renders fake-logit Grad-CAM overlays. Metrics reported in the report remain the canonical log values; checkpoint-derived candidate scores in this notebook are only used to choose visual examples.\n",
+    "This interpretability notebook adds qualitative evidence after the Phase 2 ablations. It does not train, tune, or reevaluate models. It loads existing configs, logs, and checkpoints, selects deterministic fold-0 examples, and renders fake-logit Grad-CAM overlays. Metrics reported in the report remain the canonical log values; checkpoint-derived candidate scores in this notebook are only used to choose visual examples.\n",
    "\n",
    "Grad-CAM answers a limited question: which spatial regions most support the model's fake-class logit for a selected image? It is useful for sanity checking localization, but it is not proof of causality and it should not override held-out metrics.\n",
    "\n",
    "A note on resolution: the visible Grad-CAM grid comes from the target convolutional feature map, not from the original image. ResNet18's final convolution is very coarse at 224x224 input, so its last-layer CAM is upsampled from a small grid and appears blockier than some SimpleCNN maps. That block size is architectural granularity, not model confidence. The notebook keeps the canonical last-conv CAM and also adds a finer ResNet18 diagnostic view from an earlier layer for readability.\n",
    "\n",
    "Story questions:\n",
-    "- Does the selected final run focus on facial evidence rather than background shortcuts?\n",
+    "- Does the selected Phase 2 run focus on facial evidence rather than background shortcuts?\n",
    "- Does facecrop change what the model can attend to?\n",
    "- Do augmentation and source-holdout runs reveal instability in attention?\n",
-    "- Are errors visually plausible, or do they suggest shortcut behavior?\n"
+    "- Are errors visually plausible, or do they suggest shortcut behavior?\n",
+    "\n",
+    "Roadmap link: after this qualitative check, `06_phase3_model_family_analysis.ipynb` compares stronger pretrained backbones and `07_phase4_data_scaling_analysis.ipynb` records the planned data-scaling analysis.\n"
   ]
  },
  {
@@ -1693,11 +1695,13 @@
   "id": "7a682e64",
   "metadata": {},
   "source": [
-    "## Report-ready conclusion\n",
+    "## Conclusion\n",
    "\n",
-    "Grad-CAM provides a qualitative final check on the classifier story. The selected metric setting remains `p2c_resnet18_facecrop`: 224x224 input, facecrop enabled, no augmentation, and ImageNet/default normalization. The overlays are most reassuring when they concentrate on facial regions, and most cautionary when errors or source-holdout examples show diffuse, background, or artifact-specific attention.\n",
+    "Grad-CAM provides a qualitative check on the Phase 2 classifier story. The selected metric setting remains `p2c_resnet18_facecrop`: 224x224 input, facecrop enabled, no augmentation, and ImageNet/default normalization. The overlays are most reassuring when they concentrate on facial regions, and most cautionary when errors or source-holdout examples show diffuse, background, or artifact-specific attention.\n",
    "\n",
-    "The key limitation from Phase 2 still stands: high in-distribution AUC does not guarantee source-agnostic generalization. The Grad-CAM panels help make that limitation visible, but the source-holdout pairwise AUC values are the primary quantitative evidence.\n"
+    "The key limitation from Phase 2 still stands: high in-distribution AUC does not guarantee source-agnostic generalization. The Grad-CAM panels help make that limitation visible, but the source-holdout pairwise AUC values are the primary quantitative evidence.\n",
+    "\n",
+    "Next: `06_phase3_model_family_analysis.ipynb` asks whether stronger pretrained model families improve on the selected Phase 2 pipeline.\n"
   ]
  }
 ],