Files
DRL_PROJ/README.md
T
Johnny Fernandes afd26f47d2 Final polish
2026-05-14 21:16:03 +01:00

265 lines
9.7 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Deep learning face project
This repository contains a two-part deep learning project on the
DeepFakeFace (DFF) dataset:
1. **Classifier:** detect whether a face image is real or fake.
2. **Generator:** train generative models that produce new fake face images.
The project is written as an experimental report. The notebooks are the main
deliverable: they show the pipeline, the intermediate failures, the ablations,
the decisions, and the final models. Read them in order.
## Project story
The work follows the same principle in both parts: start with a simple
baseline, inspect what fails, change one important factor at a time, and keep
the evidence tied to saved logs and saved artifacts.
For the **classifier**, the story moves from dataset understanding to
preprocessing, baseline models, controlled ablations, Grad-CAM inspection,
stronger model families, and data scaling. The final practical classifier is a
ResNet50-style pipeline using face crops, 224×224 inputs, ImageNet/default
normalization, and no stochastic augmentation at validation/test time.
For the **generator**, the story starts with raw baseline failures, then locks
the data pipeline before comparing three parallel model-family branches:
GAN, VAE, and DDPM. The final comparison keeps quality versus speed central:
DDPM gives the best saved FID and visual quality, GAN is the best
quality-speed compromise, and VAE is the fastest but smoothest option.
## How to read the project
Start with the classifier notebooks, then read the generator notebooks. The
generator has one linear setup stage followed by three parallel branches:
GAN, VAE, and DDPM. Those branches are numbered in reading order, but they are
conceptually parallel experiments after the pipeline is selected.
### Classifier notebooks
Read these first:
1. `classifier/notebooks/01_eda.ipynb`
Dataset composition, real/fake source mapping, image statistics, and
shortcut risks.
2. `classifier/notebooks/02_preprocessing.ipynb`
Deterministic preprocessing, train-only augmentation, face crops, and
normalization.
3. `classifier/notebooks/03_phase1_analysis.ipynb`
SimpleCNN and ResNet18 controlled baselines.
4. `classifier/notebooks/04_phase2_analysis.ipynb`
Resolution, normalization, source holdouts, facecrop, and augmentation
ablations.
5. `classifier/notebooks/05_gradcam_analysis.ipynb`
Qualitative localization analysis across the classifier pipeline.
6. `classifier/notebooks/06_phase3_model_family_analysis.ipynb`
Stronger pretrained model families and the ResNet50 practical choice.
7. `classifier/notebooks/07_phase4_data_scaling_analysis.ipynb`
Data scaling for strong backbones and the final classifier decision.
### Generator notebooks
Read these after the classifier:
1. `generator/notebooks/01_baseline_sanity_check.ipynb`
Raw baseline failures and why the data pipeline must be fixed first.
2. `generator/notebooks/02_pipeline_selection.ipynb`
Controlled pipeline ablations: resolution, alignment, augmentation, and
raw/aligned mixing.
3. `generator/notebooks/03_gan_stability_progression.ipynb`
GAN branch: DCGAN → WGAN-GP → spectral normalization + GroupNorm +
self-attention → 128×128 check.
4. `generator/notebooks/04_vae_loss_progression.ipynb`
VAE branch: MSE + KL → perceptual loss → PatchGAN adversarial loss.
5. `generator/notebooks/05_ddpm_recipe_progression.ipynb`
DDPM branch: linear schedule → cosine schedule → v-prediction → wider
backbone.
6. `generator/notebooks/06_final_family_comparison.ipynb`
Final comparison of the selected GAN, VAE, and DDPM recipes under saved
Phase 5 conditions.
7. `generator/notebooks/07_final_sample_showcase.ipynb`
Curated final sample examples from saved outputs. This is qualitative
showcase material, not a replacement for FID.
## What the notebooks do
The notebooks are analysis/report chapters. They load existing configs, logs,
figures, saved sample grids, checkpoints, and prediction summaries. They are
not intended to launch new training runs.
When a notebook shows a plot or image grid, the surrounding markdown explains:
- what the artifact shows;
- why it is needed;
- how it supports the phase decision;
- what limitation remains.
This is important because the project is evaluated not only by final
performance, but by the documented evolution of the solution.
## Repository layout
```text
DRL_PROJ/
classifier/
configs/ experiment configs by phase
notebooks/ classifier report notebooks
outputs/ saved logs, figures, Grad-CAM panels, checkpoints
src/ classifier data, models, training, evaluation
tests/ unit and smoke tests
tools/ facecrop, Grad-CAM, inference, reevaluation helpers
generator/
configs/ generator configs by phase/family
notebooks/ generator report notebooks and notebook builder
outputs/ saved logs, sample grids, final showcase artifacts
src/ generator data, models, training, metrics
tests/ unit and smoke tests
tools/ sampling and utility scripts
data/ original DFF dataset root, not committed
cropped/ preprocessed face crops, not committed
docs/ project statement and supporting documents
pipeline/ optional remote/GPU orchestration helpers
```
## Rebuilding the generator notebooks
The generator notebooks are generated from a single source file:
```bash
cd generator/notebooks
python _build.py
```
That builder writes the numbered generator notebooks listed above. It uses
existing saved logs and artifacts; it does not train models.
## Setup
Create a conda environment and install the project requirements:
```bash
conda create -n drl python=3.12
conda activate drl
python -m pip install --upgrade pip setuptools wheel
python -m pip install -r requirements.txt
```
Use **Python 3.12**; some dependencies (for example `facenet-pytorch`) are
unreliable on 3.13+.
The raw dataset should be placed under `data/`. Preprocessed crops are stored
under `cropped/`. These folders are intentionally not committed. To download
and extract the dataset:
```bash
python classifier/tools/fetch_ds.py
python classifier/tools/fetch_ds.py --data-dir /path/to/DFF
```
Expected layout under the data root: `wiki/<identity>/*.jpg`,
`inpainting/...`, `text2img/...`, `insight/...`.
## Classifier — training
From the repository root:
```bash
# CPU (slow but valid)
python classifier/run.py classifier/configs/phase4/p4_convnext_tiny_100pct.json
# GPU when CUDA is available
python classifier/run.py classifier/configs/phase4/p4_convnext_tiny_100pct.json --use-gpu
```
Training uses 5-fold stratified group cross-validation. Per-fold checkpoints
are saved as `classifier/outputs/models/{run_name}_fold{k}_best.pt` (and
`_final.pt`). Override data or output locations with `--data-dir` and
`--output-root`.
**Primary delivery model** (best Phase 4 detector): config
`classifier/configs/phase4/p4_convnext_tiny_100pct.json` with per-fold
weights `classifier/outputs/models/p4_convnext_tiny_100pct_fold*_best.pt`.
## Classifier — inference
Classify a single image as real or fake:
```bash
python classifier/tools/inference.py image.jpg classifier/configs/phase4/p4_convnext_tiny_100pct.json
```
This loads the config and the matching checkpoint, runs the image through the
model, and prints a result like:
```
Image : image.jpg
Model : p4_convnext_tiny_100pct (convnext_tiny)
Device: cuda
Result: FAKE (confidence: 74.7%)
P(fake): 0.7466 P(real): 0.2534
```
If you omit `--checkpoint`, the tool automatically looks for a saved
checkpoint under `classifier/outputs/models/` — first the single-run
`{run_name}_best.pt`, then CV fold files `{run_name}_fold{k}_best.pt`, then
`{run_name}_fold{k}_final.pt`. To use a specific fold:
```bash
python classifier/tools/inference.py image.jpg classifier/configs/phase4/p4_convnext_tiny_100pct.json \
--checkpoint classifier/outputs/models/p4_convnext_tiny_100pct_fold0_best.pt
```
## Generator — training
From the repository root:
```bash
python generator/run.py generator/configs/phase0/p0_vae.json
python generator/run.py generator/configs/phase0/p0_ddpm.json
```
Generator training expects real-face images (default source is `wiki`); use
`--data-dir` to point at your dataset tree. Checkpoints are saved under
`generator/outputs/models/{run_name}_final_ema.pt` (EMA shadow) and
`{run_name}_best_ema.pt` (lowest-FID snapshot).
## Generator — inference (sampling)
Generate 4×4 sample grids from Phase 5 EMA checkpoints:
```bash
python generator/tools/sampling.py --models p5_gan p5_vae p5_ddpm --samples 10
```
Options:
- `--models` — which models to sample from (`p5_gan`, `p5_vae`, `p5_ddpm`;
defaults to all three).
- `--samples` — number of grids per model (default 10).
- `--output-dir` — where to write the PNGs (default
`generator/outputs/samples/final_comparison/`).
- `--truncation` — optional latent truncation for the GAN (lower = less
diversity but sharper).
- `--device``cuda` or `cpu` (default: auto-detect).
Each grid is a 4×4 PNG of 16 images sampled from the model's EMA weights.
GAN samples are drawn from random latent vectors, VAE samples decode from the
learned prior, and DDPM samples use 50-step DDIM.
## Final takeaway
The project is best understood as a sequence of controlled decisions:
1. cleanly define the data and preprocessing;
2. establish simple baselines;
3. improve one factor at a time;
4. compare model families using saved evidence;
5. report both performance and limitations.
The classifier becomes reliable through source-aware preprocessing, stronger
pretrained backbones, and scaling. The generator improves by first locking the
face-aligned pipeline and then selecting the best recipe inside each model
family before the final GAN/VAE/DDPM comparison.