9.7 KiB
Deep learning face project
This repository contains a two-part deep learning project on the DeepFakeFace (DFF) dataset:
- Classifier: detect whether a face image is real or fake.
- Generator: train generative models that produce new fake face images.
The project is written as an experimental report. The notebooks are the main deliverable: they show the pipeline, the intermediate failures, the ablations, the decisions, and the final models. Read them in order.
Project story
The work follows the same principle in both parts: start with a simple baseline, inspect what fails, change one important factor at a time, and keep the evidence tied to saved logs and saved artifacts.
For the classifier, the story moves from dataset understanding to preprocessing, baseline models, controlled ablations, Grad-CAM inspection, stronger model families, and data scaling. The final practical classifier is a ResNet50-style pipeline using face crops, 224×224 inputs, ImageNet/default normalization, and no stochastic augmentation at validation/test time.
For the generator, the story starts with raw baseline failures, then locks the data pipeline before comparing three parallel model-family branches: GAN, VAE, and DDPM. The final comparison keeps quality versus speed central: DDPM gives the best saved FID and visual quality, GAN is the best quality-speed compromise, and VAE is the fastest but smoothest option.
How to read the project
Start with the classifier notebooks, then read the generator notebooks. The generator has one linear setup stage followed by three parallel branches: GAN, VAE, and DDPM. Those branches are numbered in reading order, but they are conceptually parallel experiments after the pipeline is selected.
Classifier notebooks
Read these first:
classifier/notebooks/01_eda.ipynb
Dataset composition, real/fake source mapping, image statistics, and shortcut risks.classifier/notebooks/02_preprocessing.ipynb
Deterministic preprocessing, train-only augmentation, face crops, and normalization.classifier/notebooks/03_phase1_analysis.ipynb
SimpleCNN and ResNet18 controlled baselines.classifier/notebooks/04_phase2_analysis.ipynb
Resolution, normalization, source holdouts, facecrop, and augmentation ablations.classifier/notebooks/05_gradcam_analysis.ipynb
Qualitative localization analysis across the classifier pipeline.classifier/notebooks/06_phase3_model_family_analysis.ipynb
Stronger pretrained model families and the ResNet50 practical choice.classifier/notebooks/07_phase4_data_scaling_analysis.ipynb
Data scaling for strong backbones and the final classifier decision.
Generator notebooks
Read these after the classifier:
generator/notebooks/01_baseline_sanity_check.ipynb
Raw baseline failures and why the data pipeline must be fixed first.generator/notebooks/02_pipeline_selection.ipynb
Controlled pipeline ablations: resolution, alignment, augmentation, and raw/aligned mixing.generator/notebooks/03_gan_stability_progression.ipynb
GAN branch: DCGAN → WGAN-GP → spectral normalization + GroupNorm + self-attention → 128×128 check.generator/notebooks/04_vae_loss_progression.ipynb
VAE branch: MSE + KL → perceptual loss → PatchGAN adversarial loss.generator/notebooks/05_ddpm_recipe_progression.ipynb
DDPM branch: linear schedule → cosine schedule → v-prediction → wider backbone.generator/notebooks/06_final_family_comparison.ipynb
Final comparison of the selected GAN, VAE, and DDPM recipes under saved Phase 5 conditions.generator/notebooks/07_final_sample_showcase.ipynb
Curated final sample examples from saved outputs. This is qualitative showcase material, not a replacement for FID.
What the notebooks do
The notebooks are analysis/report chapters. They load existing configs, logs, figures, saved sample grids, checkpoints, and prediction summaries. They are not intended to launch new training runs.
When a notebook shows a plot or image grid, the surrounding markdown explains:
- what the artifact shows;
- why it is needed;
- how it supports the phase decision;
- what limitation remains.
This is important because the project is evaluated not only by final performance, but by the documented evolution of the solution.
Repository layout
DRL_PROJ/
classifier/
configs/ experiment configs by phase
notebooks/ classifier report notebooks
outputs/ saved logs, figures, Grad-CAM panels, checkpoints
src/ classifier data, models, training, evaluation
tests/ unit and smoke tests
tools/ facecrop, Grad-CAM, inference, reevaluation helpers
generator/
configs/ generator configs by phase/family
notebooks/ generator report notebooks and notebook builder
outputs/ saved logs, sample grids, final showcase artifacts
src/ generator data, models, training, metrics
tests/ unit and smoke tests
tools/ sampling and utility scripts
data/ original DFF dataset root, not committed
cropped/ preprocessed face crops, not committed
docs/ project statement and supporting documents
pipeline/ optional remote/GPU orchestration helpers
Rebuilding the generator notebooks
The generator notebooks are generated from a single source file:
cd generator/notebooks
python _build.py
That builder writes the numbered generator notebooks listed above. It uses existing saved logs and artifacts; it does not train models.
Setup
Create a conda environment and install the project requirements:
conda create -n drl python=3.12
conda activate drl
python -m pip install --upgrade pip setuptools wheel
python -m pip install -r requirements.txt
Use Python 3.12; some dependencies (for example facenet-pytorch) are
unreliable on 3.13+.
The raw dataset should be placed under data/. Preprocessed crops are stored
under cropped/. These folders are intentionally not committed. To download
and extract the dataset:
python classifier/tools/fetch_ds.py
python classifier/tools/fetch_ds.py --data-dir /path/to/DFF
Expected layout under the data root: wiki/<identity>/*.jpg,
inpainting/..., text2img/..., insight/....
Classifier — training
From the repository root:
# CPU (slow but valid)
python classifier/run.py classifier/configs/phase4/p4_convnext_tiny_100pct.json
# GPU when CUDA is available
python classifier/run.py classifier/configs/phase4/p4_convnext_tiny_100pct.json --use-gpu
Training uses 5-fold stratified group cross-validation. Per-fold checkpoints
are saved as classifier/outputs/models/{run_name}_fold{k}_best.pt (and
_final.pt). Override data or output locations with --data-dir and
--output-root.
Primary delivery model (best Phase 4 detector): config
classifier/configs/phase4/p4_convnext_tiny_100pct.json with per-fold
weights classifier/outputs/models/p4_convnext_tiny_100pct_fold*_best.pt.
Classifier — inference
Classify a single image as real or fake:
python classifier/tools/inference.py image.jpg classifier/configs/phase4/p4_convnext_tiny_100pct.json
This loads the config and the matching checkpoint, runs the image through the model, and prints a result like:
Image : image.jpg
Model : p4_convnext_tiny_100pct (convnext_tiny)
Device: cuda
Result: FAKE (confidence: 74.7%)
P(fake): 0.7466 P(real): 0.2534
If you omit --checkpoint, the tool automatically looks for a saved
checkpoint under classifier/outputs/models/ — first the single-run
{run_name}_best.pt, then CV fold files {run_name}_fold{k}_best.pt, then
{run_name}_fold{k}_final.pt. To use a specific fold:
python classifier/tools/inference.py image.jpg classifier/configs/phase4/p4_convnext_tiny_100pct.json \
--checkpoint classifier/outputs/models/p4_convnext_tiny_100pct_fold0_best.pt
Generator — training
From the repository root:
python generator/run.py generator/configs/phase0/p0_vae.json
python generator/run.py generator/configs/phase0/p0_ddpm.json
Generator training expects real-face images (default source is wiki); use
--data-dir to point at your dataset tree. Checkpoints are saved under
generator/outputs/models/{run_name}_final_ema.pt (EMA shadow) and
{run_name}_best_ema.pt (lowest-FID snapshot).
Generator — inference (sampling)
Generate 4×4 sample grids from Phase 5 EMA checkpoints:
python generator/tools/sampling.py --models p5_gan p5_vae p5_ddpm --samples 10
Options:
--models— which models to sample from (p5_gan,p5_vae,p5_ddpm; defaults to all three).--samples— number of grids per model (default 10).--output-dir— where to write the PNGs (defaultgenerator/outputs/samples/final_comparison/).--truncation— optional latent truncation for the GAN (lower = less diversity but sharper).--device—cudaorcpu(default: auto-detect).
Each grid is a 4×4 PNG of 16 images sampled from the model's EMA weights. GAN samples are drawn from random latent vectors, VAE samples decode from the learned prior, and DDPM samples use 50-step DDIM.
Final takeaway
The project is best understood as a sequence of controlled decisions:
- cleanly define the data and preprocessing;
- establish simple baselines;
- improve one factor at a time;
- compare model families using saved evidence;
- report both performance and limitations.
The classifier becomes reliable through source-aware preprocessing, stronger pretrained backbones, and scaling. The generator improves by first locking the face-aligned pipeline and then selecting the best recipe inside each model family before the final GAN/VAE/DDPM comparison.