jalf/DRL_PROJ

Fork 0

Files

T

Johnny Fernandes afd26f47d2 Final polish

2026-05-14 21:16:03 +01:00

9.7 KiB

Raw Blame History

Deep learning face project

This repository contains a two-part deep learning project on the DeepFakeFace (DFF) dataset:

Classifier: detect whether a face image is real or fake.
Generator: train generative models that produce new fake face images.

The project is written as an experimental report. The notebooks are the main deliverable: they show the pipeline, the intermediate failures, the ablations, the decisions, and the final models. Read them in order.

Project story

The work follows the same principle in both parts: start with a simple baseline, inspect what fails, change one important factor at a time, and keep the evidence tied to saved logs and saved artifacts.

For the classifier, the story moves from dataset understanding to preprocessing, baseline models, controlled ablations, Grad-CAM inspection, stronger model families, and data scaling. The final practical classifier is a ResNet50-style pipeline using face crops, 224×224 inputs, ImageNet/default normalization, and no stochastic augmentation at validation/test time.

For the generator, the story starts with raw baseline failures, then locks the data pipeline before comparing three parallel model-family branches: GAN, VAE, and DDPM. The final comparison keeps quality versus speed central: DDPM gives the best saved FID and visual quality, GAN is the best quality-speed compromise, and VAE is the fastest but smoothest option.

How to read the project

Start with the classifier notebooks, then read the generator notebooks. The generator has one linear setup stage followed by three parallel branches: GAN, VAE, and DDPM. Those branches are numbered in reading order, but they are conceptually parallel experiments after the pipeline is selected.

Classifier notebooks

Read these first:

classifier/notebooks/01_eda.ipynb
Dataset composition, real/fake source mapping, image statistics, and shortcut risks.
classifier/notebooks/02_preprocessing.ipynb
Deterministic preprocessing, train-only augmentation, face crops, and normalization.
classifier/notebooks/03_phase1_analysis.ipynb
SimpleCNN and ResNet18 controlled baselines.
classifier/notebooks/04_phase2_analysis.ipynb
Resolution, normalization, source holdouts, facecrop, and augmentation ablations.
classifier/notebooks/05_gradcam_analysis.ipynb
Qualitative localization analysis across the classifier pipeline.
classifier/notebooks/06_phase3_model_family_analysis.ipynb
Stronger pretrained model families and the ResNet50 practical choice.
classifier/notebooks/07_phase4_data_scaling_analysis.ipynb
Data scaling for strong backbones and the final classifier decision.

Generator notebooks

Read these after the classifier:

generator/notebooks/01_baseline_sanity_check.ipynb
Raw baseline failures and why the data pipeline must be fixed first.
generator/notebooks/02_pipeline_selection.ipynb
Controlled pipeline ablations: resolution, alignment, augmentation, and raw/aligned mixing.
generator/notebooks/03_gan_stability_progression.ipynb
GAN branch: DCGAN → WGAN-GP → spectral normalization + GroupNorm + self-attention → 128×128 check.
generator/notebooks/04_vae_loss_progression.ipynb
VAE branch: MSE + KL → perceptual loss → PatchGAN adversarial loss.
generator/notebooks/05_ddpm_recipe_progression.ipynb
DDPM branch: linear schedule → cosine schedule → v-prediction → wider backbone.
generator/notebooks/06_final_family_comparison.ipynb
Final comparison of the selected GAN, VAE, and DDPM recipes under saved Phase 5 conditions.
generator/notebooks/07_final_sample_showcase.ipynb
Curated final sample examples from saved outputs. This is qualitative showcase material, not a replacement for FID.

What the notebooks do

The notebooks are analysis/report chapters. They load existing configs, logs, figures, saved sample grids, checkpoints, and prediction summaries. They are not intended to launch new training runs.

When a notebook shows a plot or image grid, the surrounding markdown explains:

what the artifact shows;
why it is needed;
how it supports the phase decision;
what limitation remains.

This is important because the project is evaluated not only by final performance, but by the documented evolution of the solution.

Repository layout

DRL_PROJ/
  classifier/
    configs/       experiment configs by phase
    notebooks/     classifier report notebooks
    outputs/       saved logs, figures, Grad-CAM panels, checkpoints
    src/           classifier data, models, training, evaluation
    tests/         unit and smoke tests
    tools/         facecrop, Grad-CAM, inference, reevaluation helpers

  generator/
    configs/       generator configs by phase/family
    notebooks/     generator report notebooks and notebook builder
    outputs/       saved logs, sample grids, final showcase artifacts
    src/           generator data, models, training, metrics
    tests/         unit and smoke tests
    tools/         sampling and utility scripts

  data/            original DFF dataset root, not committed
  cropped/         preprocessed face crops, not committed
  docs/            project statement and supporting documents
  pipeline/        optional remote/GPU orchestration helpers

Rebuilding the generator notebooks

The generator notebooks are generated from a single source file:

cd generator/notebooks
python _build.py

That builder writes the numbered generator notebooks listed above. It uses existing saved logs and artifacts; it does not train models.

Setup

Create a conda environment and install the project requirements:

conda create -n drl python=3.12
conda activate drl
python -m pip install --upgrade pip setuptools wheel
python -m pip install -r requirements.txt

Use Python 3.12; some dependencies (for example facenet-pytorch) are unreliable on 3.13+.

The raw dataset should be placed under data/. Preprocessed crops are stored under cropped/. These folders are intentionally not committed. To download and extract the dataset:

python classifier/tools/fetch_ds.py
python classifier/tools/fetch_ds.py --data-dir /path/to/DFF

Expected layout under the data root: wiki/<identity>/*.jpg, inpainting/..., text2img/..., insight/....

Classifier — training

From the repository root:

# CPU (slow but valid)
python classifier/run.py classifier/configs/phase4/p4_convnext_tiny_100pct.json

# GPU when CUDA is available
python classifier/run.py classifier/configs/phase4/p4_convnext_tiny_100pct.json --use-gpu

Training uses 5-fold stratified group cross-validation. Per-fold checkpoints are saved as classifier/outputs/models/{run_name}_fold{k}_best.pt (and _final.pt). Override data or output locations with --data-dir and --output-root.

Primary delivery model (best Phase 4 detector): config classifier/configs/phase4/p4_convnext_tiny_100pct.json with per-fold weights classifier/outputs/models/p4_convnext_tiny_100pct_fold*_best.pt.

Classifier — inference

Classify a single image as real or fake:

python classifier/tools/inference.py image.jpg classifier/configs/phase4/p4_convnext_tiny_100pct.json

This loads the config and the matching checkpoint, runs the image through the model, and prints a result like:

Image : image.jpg
Model : p4_convnext_tiny_100pct (convnext_tiny)
Device: cuda
Result: FAKE  (confidence: 74.7%)
P(fake): 0.7466   P(real): 0.2534

If you omit --checkpoint, the tool automatically looks for a saved checkpoint under classifier/outputs/models/ — first the single-run {run_name}_best.pt, then CV fold files {run_name}_fold{k}_best.pt, then {run_name}_fold{k}_final.pt. To use a specific fold:

python classifier/tools/inference.py image.jpg classifier/configs/phase4/p4_convnext_tiny_100pct.json \
  --checkpoint classifier/outputs/models/p4_convnext_tiny_100pct_fold0_best.pt

Generator — training

From the repository root:

python generator/run.py generator/configs/phase0/p0_vae.json
python generator/run.py generator/configs/phase0/p0_ddpm.json

Generator training expects real-face images (default source is wiki); use --data-dir to point at your dataset tree. Checkpoints are saved under generator/outputs/models/{run_name}_final_ema.pt (EMA shadow) and {run_name}_best_ema.pt (lowest-FID snapshot).

Generator — inference (sampling)

Generate 4×4 sample grids from Phase 5 EMA checkpoints:

python generator/tools/sampling.py --models p5_gan p5_vae p5_ddpm --samples 10

Options:

--models — which models to sample from (p5_gan, p5_vae, p5_ddpm; defaults to all three).
--samples — number of grids per model (default 10).
--output-dir — where to write the PNGs (default generator/outputs/samples/final_comparison/).
--truncation — optional latent truncation for the GAN (lower = less diversity but sharper).
--device — cuda or cpu (default: auto-detect).

Each grid is a 4×4 PNG of 16 images sampled from the model's EMA weights. GAN samples are drawn from random latent vectors, VAE samples decode from the learned prior, and DDPM samples use 50-step DDIM.

Final takeaway

The project is best understood as a sequence of controlled decisions:

cleanly define the data and preprocessing;
establish simple baselines;
improve one factor at a time;
compare model families using saved evidence;
report both performance and limitations.

The classifier becomes reliable through source-aware preprocessing, stronger pretrained backbones, and scaling. The generator improves by first locking the face-aligned pipeline and then selecting the best recipe inside each model family before the final GAN/VAE/DDPM comparison.

9.7 KiB Raw Blame History Unescape Escape