# Deep learning face project This repository contains a two-part deep learning project on the DeepFakeFace (DFF) dataset: 1. **Classifier:** detect whether a face image is real or fake. 2. **Generator:** train generative models that produce new fake face images. The project is written as an experimental report. The notebooks are the main deliverable: they show the pipeline, the intermediate failures, the ablations, the decisions, and the final models. Read them in order. ## Project story The work follows the same principle in both parts: start with a simple baseline, inspect what fails, change one important factor at a time, and keep the evidence tied to saved logs and saved artifacts. For the **classifier**, the story moves from dataset understanding to preprocessing, baseline models, controlled ablations, Grad-CAM inspection, stronger model families, and data scaling. The final practical classifier is a ResNet50-style pipeline using face crops, 224×224 inputs, ImageNet/default normalization, and no stochastic augmentation at validation/test time. For the **generator**, the story starts with raw baseline failures, then locks the data pipeline before comparing three parallel model-family branches: GAN, VAE, and DDPM. The final comparison keeps quality versus speed central: DDPM gives the best saved FID and visual quality, GAN is the best quality-speed compromise, and VAE is the fastest but smoothest option. ## How to read the project Start with the classifier notebooks, then read the generator notebooks. The generator has one linear setup stage followed by three parallel branches: GAN, VAE, and DDPM. Those branches are numbered in reading order, but they are conceptually parallel experiments after the pipeline is selected. ### Classifier notebooks Read these first: 1. `classifier/notebooks/01_eda.ipynb` Dataset composition, real/fake source mapping, image statistics, and shortcut risks. 2. `classifier/notebooks/02_preprocessing.ipynb` Deterministic preprocessing, train-only augmentation, face crops, and normalization. 3. `classifier/notebooks/03_phase1_analysis.ipynb` SimpleCNN and ResNet18 controlled baselines. 4. `classifier/notebooks/04_phase2_analysis.ipynb` Resolution, normalization, source holdouts, facecrop, and augmentation ablations. 5. `classifier/notebooks/05_gradcam_analysis.ipynb` Qualitative localization analysis across the classifier pipeline. 6. `classifier/notebooks/06_phase3_model_family_analysis.ipynb` Stronger pretrained model families and the ResNet50 practical choice. 7. `classifier/notebooks/07_phase4_data_scaling_analysis.ipynb` Data scaling for strong backbones and the final classifier decision. ### Generator notebooks Read these after the classifier: 1. `generator/notebooks/01_baseline_sanity_check.ipynb` Raw baseline failures and why the data pipeline must be fixed first. 2. `generator/notebooks/02_pipeline_selection.ipynb` Controlled pipeline ablations: resolution, alignment, augmentation, and raw/aligned mixing. 3. `generator/notebooks/03_gan_stability_progression.ipynb` GAN branch: DCGAN → WGAN-GP → spectral normalization + GroupNorm + self-attention → 128×128 check. 4. `generator/notebooks/04_vae_loss_progression.ipynb` VAE branch: MSE + KL → perceptual loss → PatchGAN adversarial loss. 5. `generator/notebooks/05_ddpm_recipe_progression.ipynb` DDPM branch: linear schedule → cosine schedule → v-prediction → wider backbone. 6. `generator/notebooks/06_final_family_comparison.ipynb` Final comparison of the selected GAN, VAE, and DDPM recipes under saved Phase 5 conditions. 7. `generator/notebooks/07_final_sample_showcase.ipynb` Curated final sample examples from saved outputs. This is qualitative showcase material, not a replacement for FID. ## What the notebooks do The notebooks are analysis/report chapters. They load existing configs, logs, figures, saved sample grids, checkpoints, and prediction summaries. They are not intended to launch new training runs. When a notebook shows a plot or image grid, the surrounding markdown explains: - what the artifact shows; - why it is needed; - how it supports the phase decision; - what limitation remains. This is important because the project is evaluated not only by final performance, but by the documented evolution of the solution. ## Repository layout ```text DRL_PROJ/ classifier/ configs/ experiment configs by phase notebooks/ classifier report notebooks outputs/ saved logs, figures, Grad-CAM panels, checkpoints src/ classifier data, models, training, evaluation tests/ unit and smoke tests tools/ facecrop, Grad-CAM, inference, reevaluation helpers generator/ configs/ generator configs by phase/family notebooks/ generator report notebooks and notebook builder outputs/ saved logs, sample grids, final showcase artifacts src/ generator data, models, training, metrics tests/ unit and smoke tests tools/ sampling and utility scripts data/ original DFF dataset root, not committed cropped/ preprocessed face crops, not committed docs/ project statement and supporting documents pipeline/ optional remote/GPU orchestration helpers ``` ## Rebuilding the generator notebooks The generator notebooks are generated from a single source file: ```bash cd generator/notebooks python _build.py ``` That builder writes the numbered generator notebooks listed above. It uses existing saved logs and artifacts; it does not train models. ## Setup Create a conda environment and install the project requirements: ```bash conda create -n drl python=3.12 conda activate drl python -m pip install --upgrade pip setuptools wheel python -m pip install -r requirements.txt ``` Use **Python 3.12**; some dependencies (for example `facenet-pytorch`) are unreliable on 3.13+. The raw dataset should be placed under `data/`. Preprocessed crops are stored under `cropped/`. These folders are intentionally not committed. To download and extract the dataset: ```bash python classifier/tools/fetch_ds.py python classifier/tools/fetch_ds.py --data-dir /path/to/DFF ``` Expected layout under the data root: `wiki//*.jpg`, `inpainting/...`, `text2img/...`, `insight/...`. ## Classifier — training From the repository root: ```bash # CPU (slow but valid) python classifier/run.py classifier/configs/phase4/p4_convnext_tiny_100pct.json # GPU when CUDA is available python classifier/run.py classifier/configs/phase4/p4_convnext_tiny_100pct.json --use-gpu ``` Training uses 5-fold stratified group cross-validation. Per-fold checkpoints are saved as `classifier/outputs/models/{run_name}_fold{k}_best.pt` (and `_final.pt`). Override data or output locations with `--data-dir` and `--output-root`. **Primary delivery model** (best Phase 4 detector): config `classifier/configs/phase4/p4_convnext_tiny_100pct.json` with per-fold weights `classifier/outputs/models/p4_convnext_tiny_100pct_fold*_best.pt`. ## Classifier — inference Classify a single image as real or fake: ```bash python classifier/tools/inference.py image.jpg classifier/configs/phase4/p4_convnext_tiny_100pct.json ``` This loads the config and the matching checkpoint, runs the image through the model, and prints a result like: ``` Image : image.jpg Model : p4_convnext_tiny_100pct (convnext_tiny) Device: cuda Result: FAKE (confidence: 74.7%) P(fake): 0.7466 P(real): 0.2534 ``` If you omit `--checkpoint`, the tool automatically looks for a saved checkpoint under `classifier/outputs/models/` — first the single-run `{run_name}_best.pt`, then CV fold files `{run_name}_fold{k}_best.pt`, then `{run_name}_fold{k}_final.pt`. To use a specific fold: ```bash python classifier/tools/inference.py image.jpg classifier/configs/phase4/p4_convnext_tiny_100pct.json \ --checkpoint classifier/outputs/models/p4_convnext_tiny_100pct_fold0_best.pt ``` ## Generator — training From the repository root: ```bash python generator/run.py generator/configs/phase0/p0_vae.json python generator/run.py generator/configs/phase0/p0_ddpm.json ``` Generator training expects real-face images (default source is `wiki`); use `--data-dir` to point at your dataset tree. Checkpoints are saved under `generator/outputs/models/{run_name}_final_ema.pt` (EMA shadow) and `{run_name}_best_ema.pt` (lowest-FID snapshot). ## Generator — inference (sampling) Generate 4×4 sample grids from Phase 5 EMA checkpoints: ```bash python generator/tools/sampling.py --models p5_gan p5_vae p5_ddpm --samples 10 ``` Options: - `--models` — which models to sample from (`p5_gan`, `p5_vae`, `p5_ddpm`; defaults to all three). - `--samples` — number of grids per model (default 10). - `--output-dir` — where to write the PNGs (default `generator/outputs/samples/final_comparison/`). - `--truncation` — optional latent truncation for the GAN (lower = less diversity but sharper). - `--device` — `cuda` or `cpu` (default: auto-detect). Each grid is a 4×4 PNG of 16 images sampled from the model's EMA weights. GAN samples are drawn from random latent vectors, VAE samples decode from the learned prior, and DDPM samples use 50-step DDIM. ## Final takeaway The project is best understood as a sequence of controlled decisions: 1. cleanly define the data and preprocessing; 2. establish simple baselines; 3. improve one factor at a time; 4. compare model families using saved evidence; 5. report both performance and limitations. The classifier becomes reliable through source-aware preprocessing, stronger pretrained backbones, and scaling. The generator improves by first locking the face-aligned pipeline and then selecting the best recipe inside each model family before the final GAN/VAE/DDPM comparison.