Final polish

This commit is contained in:
Johnny Fernandes
2026-05-14 21:16:03 +01:00
parent 3bff7eefb0
commit afd26f47d2
732 changed files with 4149 additions and 79134 deletions
+123 -25
View File
@@ -1,4 +1,4 @@
# Deep Learning Face Project
# Deep learning face project
This repository contains a two-part deep learning project on the
DeepFakeFace (DFF) dataset:
@@ -10,7 +10,7 @@ The project is written as an experimental report. The notebooks are the main
deliverable: they show the pipeline, the intermediate failures, the ablations,
the decisions, and the final models. Read them in order.
## Project Story
## Project story
The work follows the same principle in both parts: start with a simple
baseline, inspect what fails, change one important factor at a time, and keep
@@ -19,7 +19,7 @@ the evidence tied to saved logs and saved artifacts.
For the **classifier**, the story moves from dataset understanding to
preprocessing, baseline models, controlled ablations, Grad-CAM inspection,
stronger model families, and data scaling. The final practical classifier is a
ResNet50-style pipeline using face crops, 224x224 inputs, ImageNet/default
ResNet50-style pipeline using face crops, 224×224 inputs, ImageNet/default
normalization, and no stochastic augmentation at validation/test time.
For the **generator**, the story starts with raw baseline failures, then locks
@@ -28,14 +28,14 @@ GAN, VAE, and DDPM. The final comparison keeps quality versus speed central:
DDPM gives the best saved FID and visual quality, GAN is the best
quality-speed compromise, and VAE is the fastest but smoothest option.
## How To Read The Project
## How to read the project
Start with the classifier notebooks, then read the generator notebooks. The
generator has one linear setup stage followed by three parallel branches:
GAN, VAE, and DDPM. Those branches are numbered in reading order, but they are
conceptually parallel experiments after the pipeline is selected.
### Classifier Notebooks
### Classifier notebooks
Read these first:
@@ -57,7 +57,7 @@ Read these first:
7. `classifier/notebooks/07_phase4_data_scaling_analysis.ipynb`
Data scaling for strong backbones and the final classifier decision.
### Generator Notebooks
### Generator notebooks
Read these after the classifier:
@@ -67,12 +67,12 @@ Read these after the classifier:
Controlled pipeline ablations: resolution, alignment, augmentation, and
raw/aligned mixing.
3. `generator/notebooks/03_gan_stability_progression.ipynb`
GAN branch: DCGAN -> WGAN-GP -> spectral normalization + GroupNorm +
self-attention -> 128x128 check.
GAN branch: DCGAN WGAN-GP spectral normalization + GroupNorm +
self-attention 128×128 check.
4. `generator/notebooks/04_vae_loss_progression.ipynb`
VAE branch: MSE + KL -> perceptual loss -> PatchGAN adversarial loss.
VAE branch: MSE + KL perceptual loss PatchGAN adversarial loss.
5. `generator/notebooks/05_ddpm_recipe_progression.ipynb`
DDPM branch: linear schedule -> cosine schedule -> v-prediction -> wider
DDPM branch: linear schedule cosine schedule v-prediction wider
backbone.
6. `generator/notebooks/06_final_family_comparison.ipynb`
Final comparison of the selected GAN, VAE, and DDPM recipes under saved
@@ -81,7 +81,7 @@ Read these after the classifier:
Curated final sample examples from saved outputs. This is qualitative
showcase material, not a replacement for FID.
## What The Notebooks Do
## What the notebooks do
The notebooks are analysis/report chapters. They load existing configs, logs,
figures, saved sample grids, checkpoints, and prediction summaries. They are
@@ -97,7 +97,7 @@ When a notebook shows a plot or image grid, the surrounding markdown explains:
This is important because the project is evaluated not only by final
performance, but by the documented evolution of the solution.
## Repository Layout
## Repository layout
```text
DRL_PROJ/
@@ -106,6 +106,7 @@ DRL_PROJ/
notebooks/ classifier report notebooks
outputs/ saved logs, figures, Grad-CAM panels, checkpoints
src/ classifier data, models, training, evaluation
tests/ unit and smoke tests
tools/ facecrop, Grad-CAM, inference, reevaluation helpers
generator/
@@ -113,6 +114,7 @@ DRL_PROJ/
notebooks/ generator report notebooks and notebook builder
outputs/ saved logs, sample grids, final showcase artifacts
src/ generator data, models, training, metrics
tests/ unit and smoke tests
tools/ sampling and utility scripts
data/ original DFF dataset root, not committed
@@ -121,11 +123,11 @@ DRL_PROJ/
pipeline/ optional remote/GPU orchestration helpers
```
## Rebuilding The Generator Notebooks
## Rebuilding the generator notebooks
The generator notebooks are generated from a single source file:
```powershell
```bash
cd generator/notebooks
python _build.py
```
@@ -133,24 +135,120 @@ python _build.py
That builder writes the numbered generator notebooks listed above. It uses
existing saved logs and artifacts; it does not train models.
## Running The Code
## Setup
Create an environment and install the project requirements:
Create a conda environment and install the project requirements:
```powershell
python -m venv .venv
.\.venv\Scripts\python.exe -m pip install --upgrade pip setuptools wheel
.\.venv\Scripts\python.exe -m pip install -r requirements.txt
```bash
conda create -n drl python=3.12
conda activate drl
python -m pip install --upgrade pip setuptools wheel
python -m pip install -r requirements.txt
```
Use **Python 3.12**; some dependencies (for example `facenet-pytorch`) are
unreliable on 3.13+.
The raw dataset should be placed under `data/`. Preprocessed crops are stored
under `cropped/`. These folders are intentionally not committed.
under `cropped/`. These folders are intentionally not committed. To download
and extract the dataset:
Execution entry points exist in `classifier/run.py` and `generator/run.py` for
reproducibility, but the report notebooks should be read as analysis over
already saved results.
```bash
python classifier/tools/fetch_ds.py
python classifier/tools/fetch_ds.py --data-dir /path/to/DFF
```
## Final Takeaway
Expected layout under the data root: `wiki/<identity>/*.jpg`,
`inpainting/...`, `text2img/...`, `insight/...`.
## Classifier — training
From the repository root:
```bash
# CPU (slow but valid)
python classifier/run.py classifier/configs/phase4/p4_convnext_tiny_100pct.json
# GPU when CUDA is available
python classifier/run.py classifier/configs/phase4/p4_convnext_tiny_100pct.json --use-gpu
```
Training uses 5-fold stratified group cross-validation. Per-fold checkpoints
are saved as `classifier/outputs/models/{run_name}_fold{k}_best.pt` (and
`_final.pt`). Override data or output locations with `--data-dir` and
`--output-root`.
**Primary delivery model** (best Phase 4 detector): config
`classifier/configs/phase4/p4_convnext_tiny_100pct.json` with per-fold
weights `classifier/outputs/models/p4_convnext_tiny_100pct_fold*_best.pt`.
## Classifier — inference
Classify a single image as real or fake:
```bash
python classifier/tools/inference.py image.jpg classifier/configs/phase4/p4_convnext_tiny_100pct.json
```
This loads the config and the matching checkpoint, runs the image through the
model, and prints a result like:
```
Image : image.jpg
Model : p4_convnext_tiny_100pct (convnext_tiny)
Device: cuda
Result: FAKE (confidence: 74.7%)
P(fake): 0.7466 P(real): 0.2534
```
If you omit `--checkpoint`, the tool automatically looks for a saved
checkpoint under `classifier/outputs/models/` — first the single-run
`{run_name}_best.pt`, then CV fold files `{run_name}_fold{k}_best.pt`, then
`{run_name}_fold{k}_final.pt`. To use a specific fold:
```bash
python classifier/tools/inference.py image.jpg classifier/configs/phase4/p4_convnext_tiny_100pct.json \
--checkpoint classifier/outputs/models/p4_convnext_tiny_100pct_fold0_best.pt
```
## Generator — training
From the repository root:
```bash
python generator/run.py generator/configs/phase0/p0_vae.json
python generator/run.py generator/configs/phase0/p0_ddpm.json
```
Generator training expects real-face images (default source is `wiki`); use
`--data-dir` to point at your dataset tree. Checkpoints are saved under
`generator/outputs/models/{run_name}_final_ema.pt` (EMA shadow) and
`{run_name}_best_ema.pt` (lowest-FID snapshot).
## Generator — inference (sampling)
Generate 4×4 sample grids from Phase 5 EMA checkpoints:
```bash
python generator/tools/sampling.py --models p5_gan p5_vae p5_ddpm --samples 10
```
Options:
- `--models` — which models to sample from (`p5_gan`, `p5_vae`, `p5_ddpm`;
defaults to all three).
- `--samples` — number of grids per model (default 10).
- `--output-dir` — where to write the PNGs (default
`generator/outputs/samples/final_comparison/`).
- `--truncation` — optional latent truncation for the GAN (lower = less
diversity but sharper).
- `--device``cuda` or `cpu` (default: auto-detect).
Each grid is a 4×4 PNG of 16 images sampled from the model's EMA weights.
GAN samples are drawn from random latent vectors, VAE samples decode from the
learned prior, and DDPM samples use 50-step DDIM.
## Final takeaway
The project is best understood as a sequence of controlled decisions: