265 lines
9.7 KiB
Markdown
265 lines
9.7 KiB
Markdown
# Deep learning face project
|
||
|
||
This repository contains a two-part deep learning project on the
|
||
DeepFakeFace (DFF) dataset:
|
||
|
||
1. **Classifier:** detect whether a face image is real or fake.
|
||
2. **Generator:** train generative models that produce new fake face images.
|
||
|
||
The project is written as an experimental report. The notebooks are the main
|
||
deliverable: they show the pipeline, the intermediate failures, the ablations,
|
||
the decisions, and the final models. Read them in order.
|
||
|
||
## Project story
|
||
|
||
The work follows the same principle in both parts: start with a simple
|
||
baseline, inspect what fails, change one important factor at a time, and keep
|
||
the evidence tied to saved logs and saved artifacts.
|
||
|
||
For the **classifier**, the story moves from dataset understanding to
|
||
preprocessing, baseline models, controlled ablations, Grad-CAM inspection,
|
||
stronger model families, and data scaling. The final practical classifier is a
|
||
ResNet50-style pipeline using face crops, 224×224 inputs, ImageNet/default
|
||
normalization, and no stochastic augmentation at validation/test time.
|
||
|
||
For the **generator**, the story starts with raw baseline failures, then locks
|
||
the data pipeline before comparing three parallel model-family branches:
|
||
GAN, VAE, and DDPM. The final comparison keeps quality versus speed central:
|
||
DDPM gives the best saved FID and visual quality, GAN is the best
|
||
quality-speed compromise, and VAE is the fastest but smoothest option.
|
||
|
||
## How to read the project
|
||
|
||
Start with the classifier notebooks, then read the generator notebooks. The
|
||
generator has one linear setup stage followed by three parallel branches:
|
||
GAN, VAE, and DDPM. Those branches are numbered in reading order, but they are
|
||
conceptually parallel experiments after the pipeline is selected.
|
||
|
||
### Classifier notebooks
|
||
|
||
Read these first:
|
||
|
||
1. `classifier/notebooks/01_eda.ipynb`
|
||
Dataset composition, real/fake source mapping, image statistics, and
|
||
shortcut risks.
|
||
2. `classifier/notebooks/02_preprocessing.ipynb`
|
||
Deterministic preprocessing, train-only augmentation, face crops, and
|
||
normalization.
|
||
3. `classifier/notebooks/03_phase1_analysis.ipynb`
|
||
SimpleCNN and ResNet18 controlled baselines.
|
||
4. `classifier/notebooks/04_phase2_analysis.ipynb`
|
||
Resolution, normalization, source holdouts, facecrop, and augmentation
|
||
ablations.
|
||
5. `classifier/notebooks/05_gradcam_analysis.ipynb`
|
||
Qualitative localization analysis across the classifier pipeline.
|
||
6. `classifier/notebooks/06_phase3_model_family_analysis.ipynb`
|
||
Stronger pretrained model families and the ResNet50 practical choice.
|
||
7. `classifier/notebooks/07_phase4_data_scaling_analysis.ipynb`
|
||
Data scaling for strong backbones and the final classifier decision.
|
||
|
||
### Generator notebooks
|
||
|
||
Read these after the classifier:
|
||
|
||
1. `generator/notebooks/01_baseline_sanity_check.ipynb`
|
||
Raw baseline failures and why the data pipeline must be fixed first.
|
||
2. `generator/notebooks/02_pipeline_selection.ipynb`
|
||
Controlled pipeline ablations: resolution, alignment, augmentation, and
|
||
raw/aligned mixing.
|
||
3. `generator/notebooks/03_gan_stability_progression.ipynb`
|
||
GAN branch: DCGAN → WGAN-GP → spectral normalization + GroupNorm +
|
||
self-attention → 128×128 check.
|
||
4. `generator/notebooks/04_vae_loss_progression.ipynb`
|
||
VAE branch: MSE + KL → perceptual loss → PatchGAN adversarial loss.
|
||
5. `generator/notebooks/05_ddpm_recipe_progression.ipynb`
|
||
DDPM branch: linear schedule → cosine schedule → v-prediction → wider
|
||
backbone.
|
||
6. `generator/notebooks/06_final_family_comparison.ipynb`
|
||
Final comparison of the selected GAN, VAE, and DDPM recipes under saved
|
||
Phase 5 conditions.
|
||
7. `generator/notebooks/07_final_sample_showcase.ipynb`
|
||
Curated final sample examples from saved outputs. This is qualitative
|
||
showcase material, not a replacement for FID.
|
||
|
||
## What the notebooks do
|
||
|
||
The notebooks are analysis/report chapters. They load existing configs, logs,
|
||
figures, saved sample grids, checkpoints, and prediction summaries. They are
|
||
not intended to launch new training runs.
|
||
|
||
When a notebook shows a plot or image grid, the surrounding markdown explains:
|
||
|
||
- what the artifact shows;
|
||
- why it is needed;
|
||
- how it supports the phase decision;
|
||
- what limitation remains.
|
||
|
||
This is important because the project is evaluated not only by final
|
||
performance, but by the documented evolution of the solution.
|
||
|
||
## Repository layout
|
||
|
||
```text
|
||
DRL_PROJ/
|
||
classifier/
|
||
configs/ experiment configs by phase
|
||
notebooks/ classifier report notebooks
|
||
outputs/ saved logs, figures, Grad-CAM panels, checkpoints
|
||
src/ classifier data, models, training, evaluation
|
||
tests/ unit and smoke tests
|
||
tools/ facecrop, Grad-CAM, inference, reevaluation helpers
|
||
|
||
generator/
|
||
configs/ generator configs by phase/family
|
||
notebooks/ generator report notebooks and notebook builder
|
||
outputs/ saved logs, sample grids, final showcase artifacts
|
||
src/ generator data, models, training, metrics
|
||
tests/ unit and smoke tests
|
||
tools/ sampling and utility scripts
|
||
|
||
data/ original DFF dataset root, not committed
|
||
cropped/ preprocessed face crops, not committed
|
||
docs/ project statement and supporting documents
|
||
pipeline/ optional remote/GPU orchestration helpers
|
||
```
|
||
|
||
## Rebuilding the generator notebooks
|
||
|
||
The generator notebooks are generated from a single source file:
|
||
|
||
```bash
|
||
cd generator/notebooks
|
||
python _build.py
|
||
```
|
||
|
||
That builder writes the numbered generator notebooks listed above. It uses
|
||
existing saved logs and artifacts; it does not train models.
|
||
|
||
## Setup
|
||
|
||
Create a conda environment and install the project requirements:
|
||
|
||
```bash
|
||
conda create -n drl python=3.12
|
||
conda activate drl
|
||
python -m pip install --upgrade pip setuptools wheel
|
||
python -m pip install -r requirements.txt
|
||
```
|
||
|
||
Use **Python 3.12**; some dependencies (for example `facenet-pytorch`) are
|
||
unreliable on 3.13+.
|
||
|
||
The raw dataset should be placed under `data/`. Preprocessed crops are stored
|
||
under `cropped/`. These folders are intentionally not committed. To download
|
||
and extract the dataset:
|
||
|
||
```bash
|
||
python classifier/tools/fetch_ds.py
|
||
python classifier/tools/fetch_ds.py --data-dir /path/to/DFF
|
||
```
|
||
|
||
Expected layout under the data root: `wiki/<identity>/*.jpg`,
|
||
`inpainting/...`, `text2img/...`, `insight/...`.
|
||
|
||
## Classifier — training
|
||
|
||
From the repository root:
|
||
|
||
```bash
|
||
# CPU (slow but valid)
|
||
python classifier/run.py classifier/configs/phase4/p4_convnext_tiny_100pct.json
|
||
|
||
# GPU when CUDA is available
|
||
python classifier/run.py classifier/configs/phase4/p4_convnext_tiny_100pct.json --use-gpu
|
||
```
|
||
|
||
Training uses 5-fold stratified group cross-validation. Per-fold checkpoints
|
||
are saved as `classifier/outputs/models/{run_name}_fold{k}_best.pt` (and
|
||
`_final.pt`). Override data or output locations with `--data-dir` and
|
||
`--output-root`.
|
||
|
||
**Primary delivery model** (best Phase 4 detector): config
|
||
`classifier/configs/phase4/p4_convnext_tiny_100pct.json` with per-fold
|
||
weights `classifier/outputs/models/p4_convnext_tiny_100pct_fold*_best.pt`.
|
||
|
||
## Classifier — inference
|
||
|
||
Classify a single image as real or fake:
|
||
|
||
```bash
|
||
python classifier/tools/inference.py image.jpg classifier/configs/phase4/p4_convnext_tiny_100pct.json
|
||
```
|
||
|
||
This loads the config and the matching checkpoint, runs the image through the
|
||
model, and prints a result like:
|
||
|
||
```
|
||
Image : image.jpg
|
||
Model : p4_convnext_tiny_100pct (convnext_tiny)
|
||
Device: cuda
|
||
Result: FAKE (confidence: 74.7%)
|
||
P(fake): 0.7466 P(real): 0.2534
|
||
```
|
||
|
||
If you omit `--checkpoint`, the tool automatically looks for a saved
|
||
checkpoint under `classifier/outputs/models/` — first the single-run
|
||
`{run_name}_best.pt`, then CV fold files `{run_name}_fold{k}_best.pt`, then
|
||
`{run_name}_fold{k}_final.pt`. To use a specific fold:
|
||
|
||
```bash
|
||
python classifier/tools/inference.py image.jpg classifier/configs/phase4/p4_convnext_tiny_100pct.json \
|
||
--checkpoint classifier/outputs/models/p4_convnext_tiny_100pct_fold0_best.pt
|
||
```
|
||
|
||
## Generator — training
|
||
|
||
From the repository root:
|
||
|
||
```bash
|
||
python generator/run.py generator/configs/phase0/p0_vae.json
|
||
python generator/run.py generator/configs/phase0/p0_ddpm.json
|
||
```
|
||
|
||
Generator training expects real-face images (default source is `wiki`); use
|
||
`--data-dir` to point at your dataset tree. Checkpoints are saved under
|
||
`generator/outputs/models/{run_name}_final_ema.pt` (EMA shadow) and
|
||
`{run_name}_best_ema.pt` (lowest-FID snapshot).
|
||
|
||
## Generator — inference (sampling)
|
||
|
||
Generate 4×4 sample grids from Phase 5 EMA checkpoints:
|
||
|
||
```bash
|
||
python generator/tools/sampling.py --models p5_gan p5_vae p5_ddpm --samples 10
|
||
```
|
||
|
||
Options:
|
||
|
||
- `--models` — which models to sample from (`p5_gan`, `p5_vae`, `p5_ddpm`;
|
||
defaults to all three).
|
||
- `--samples` — number of grids per model (default 10).
|
||
- `--output-dir` — where to write the PNGs (default
|
||
`generator/outputs/samples/final_comparison/`).
|
||
- `--truncation` — optional latent truncation for the GAN (lower = less
|
||
diversity but sharper).
|
||
- `--device` — `cuda` or `cpu` (default: auto-detect).
|
||
|
||
Each grid is a 4×4 PNG of 16 images sampled from the model's EMA weights.
|
||
GAN samples are drawn from random latent vectors, VAE samples decode from the
|
||
learned prior, and DDPM samples use 50-step DDIM.
|
||
|
||
## Final takeaway
|
||
|
||
The project is best understood as a sequence of controlled decisions:
|
||
|
||
1. cleanly define the data and preprocessing;
|
||
2. establish simple baselines;
|
||
3. improve one factor at a time;
|
||
4. compare model families using saved evidence;
|
||
5. report both performance and limitations.
|
||
|
||
The classifier becomes reliable through source-aware preprocessing, stronger
|
||
pretrained backbones, and scaling. The generator improves by first locking the
|
||
face-aligned pipeline and then selecting the best recipe inside each model
|
||
family before the final GAN/VAE/DDPM comparison.
|