Correcoes 5 notebooks

2026-05-06 17:45:55 +01:00
parent 580808d9ad
commit b5313e3320
20 changed files with 785 additions and 837 deletions
@@ -483,7 +483,7 @@
   "metadata": {},
   "source": [
    "<!-- phase2-config-matrix -->\n",
-    "**Readout:** Phase 2 contains five groups: normalization/source diagnostics (`p2a`), 224x224 resolution (`p2b`), facecrop (`p2c`), augmentation without facecrop (`p2d`), and augmentation with facecrop (`p2e`). The source-holdout configs train without one fake source and evaluate with it restored, which is why pairwise AUC is the critical readout.\n"
+    "Phase 2 has five experiment groups: normalization/source diagnostics (`p2a`), 224x224 resolution (`p2b`), facecrop (`p2c`), augmentation without facecrop (`p2d`), and augmentation with facecrop (`p2e`). The source-holdout configs deliberately remove one fake source during training and then evaluate with it restored, so pairwise AUC is the key number for those runs.\n"
   ]
  },
  {
@@ -861,7 +861,7 @@
   "id": "0354d84e",
   "metadata": {},
   "source": [
-    "**Readout:** Higher resolution barely changes SimpleCNN, but it substantially improves ResNet18 from AUC 0.9366 to 0.9660. This is the first major improvement step.\n"
+    "Higher resolution barely changes SimpleCNN, but it substantially improves ResNet18 from AUC `0.9366` to `0.9660`. This is the first major improvement step, and it fits the architecture story: the pretrained model can use the extra spatial detail, while the smaller scratch-trained CNN cannot turn it into much additional signal.\n"
   ]
  },
  {
@@ -929,7 +929,7 @@
   "id": "2aa30a40",
   "metadata": {},
   "source": [
-    "**Readout:** real_norm is only +0.0018 over the ImageNet/default normalization run. That is too small to drive the story, and ImageNet normalization remains the most standard choice for pretrained ResNet models.\n"
+    "The real-image normalization diagnostic is almost neutral: `real_norm` is only `+0.0018` AUC above the ImageNet/default normalization run. That difference is too small to drive the model choice, and ImageNet normalization remains the cleaner default for a pretrained ResNet because it matches the scale used during pretraining.\n"
   ]
  },
  {
@@ -1068,7 +1068,7 @@
   "id": "34d814ce",
   "metadata": {},
   "source": [
-    "**Readout:** Source holdout exposes the largest weakness. Holding out text2img drops `wiki_vs_text2img` to 0.7595; holding out insight drops `wiki_vs_insight` to 0.8421. Inpainting generalizes better at 0.9296. This means high global AUC should not be interpreted as source-agnostic generalization.\n"
+    "Source holdout exposes the largest weakness. Holding out `text2img` drops `wiki_vs_text2img` to `0.7595`; holding out `insight` drops `wiki_vs_insight` to `0.8421`; inpainting generalizes better at `0.9296`. The overall AUC is high, but these runs show that performance is not fully source-agnostic.\n"
   ]
  },
  {
@@ -1149,7 +1149,9 @@
   "id": "ee339e9e",
   "metadata": {},
   "source": [
-    "**Readout:** Facecrop helps ResNet18 but not SimpleCNN because the two architectures use visual evidence differently. ResNet18 is deeper and pretrained on ImageNet, so it already has strong mid-level and high-level visual features for faces, texture, edges, and object parts. When the input is cropped to the face, ResNet18 can use that concentrated facial evidence more effectively, improving from 224x224 no-crop AUC `0.9660` to facecrop AUC `0.9755`. SimpleCNN is much smaller and trained from scratch, so removing background/body/context can also remove cues that the simpler model was using.\n"
+    "Facecrop helps ResNet18 but not SimpleCNN because the two models use visual evidence at different levels. ResNet18 is deeper and pretrained on ImageNet, so it already has reusable filters for edges, textures, facial parts, and object structure. Once the input is cropped around the face, that model can spend more of its capacity on the region where manipulation evidence should live, improving from 224x224 no-crop AUC `0.9660` to facecrop AUC `0.9755`.\n",
+    "\n",
+    "SimpleCNN is much smaller and trained from scratch on this limited subset. It may rely more on broad cues such as framing, background, clothing, color balance, or source-specific texture. Facecrop removes many of those cues. That is good for the intended detector, but it can hurt a low-capacity model that had not learned enough stable face-level evidence.\n"
   ]
  },
  {
@@ -1277,9 +1279,9 @@
   "id": "202507a6",
   "metadata": {},
   "source": [
-    "**Readout:** Augmentation affects the two models very differently. For SimpleCNN, augmentation clearly hurts: without facecrop it drops from AUC `0.7853` to `0.7346`, and with facecrop it drops from `0.7661` to `0.7136`. This makes sense because SimpleCNN is small and trained from scratch. It has limited capacity to learn stable high-level face/manipulation features, so the augmented views may remove or disturb the simple cues it was using, such as color, local texture, framing, or background. Instead of improving generalization, augmentation makes the task harder for this weaker model.\n",
+    "Augmentation separates the models even more clearly. SimpleCNN drops in both settings: without facecrop it goes from AUC `0.7853` to `0.7346`, and with facecrop it goes from `0.7661` to `0.7136`. This is consistent with a low-capacity scratch model: the extra flips, rotations, color shifts, blur, noise, erasing, and recompression make the task harder before the model has learned robust face/manipulation features. Some of the cues it was using may simply be removed.\n",
    "\n",
-    "For ResNet18, augmentation is much less damaging because the model is deeper and starts from pretrained visual features. Without facecrop, augmentation is almost neutral in AUC (`0.9660` -> `0.9665`), but accuracy and F1 still fall, so it does not give a useful practical gain. With facecrop, augmentation is slightly negative (`0.9755` -> `0.9737`), suggesting that once the input is already focused on the face, extra blur/noise/erase/color changes may hide subtle fake evidence more than they help. In other words, augmentation can improve robustness only if it removes nuisance variation while preserving the real signal; here, at the 20% data setting, it seems to over-regularize rather than improve source-generalization. The best supported classifier remains ResNet18 at 224x224 with facecrop and no augmentation.\n"
+    "ResNet18 is more resilient because it starts from stronger pretrained visual features. Without facecrop, augmentation is almost neutral in AUC (`0.9660` -> `0.9665`), but accuracy and F1 still fall, so there is no practical win. With facecrop, augmentation becomes slightly negative (`0.9755` -> `0.9737`). Once the image is already concentrated on the face, the extra perturbations may hide the subtle evidence that made facecrop useful. For the current 20% data setting, augmentation looks more like over-regularization than improved generalization, so the best supported classifier remains ResNet18 at 224x224 with facecrop and no augmentation.\n"
   ]
  },
  {