# Pipeline Orchestrates ephemeral Vast.ai GPU instances: searches for an offer, creates the instance, syncs the project, trains, downloads `outputs/`, and destroys the instance automatically. Generator runs also rsync `generator/outputs/` every 50 epochs while training is still running. ## One-time setup Create `pipeline/.env`: ```dotenv VAST_API_KEY=your-vast-api-key VAST_SSH_PRIVATE_KEY=/home/you/.ssh/id_ed25519 # optional, this is the default ``` The matching `.pub` file must exist alongside the private key. The pipeline registers it with Vast.ai automatically if it isn't there yet. ## Commands ### `run` — train on a remote GPU and fetch results ``` python -m pipeline run [options] ``` Accepts one or more config paths, or a single directory (all `*.json` inside, sorted). Duplicate configs (identical training settings after resolving `extends` and `shared.json`) are skipped automatically. | Flag | Default | Description | |------|---------|-------------| | `configs` | *(required)* | One or more config paths, or a directory of JSON configs | | `--download-data` | off | Download the DFF dataset via HuggingFace on the remote before training | | `--send-cropped` | off | Rsync local `cropped/{classifier,generator}/` to remote (picks subdirectory based on config) | | `--select-offer` | off | Interactively browse and pick the GPU offer | | `--sort` | config | Ranking mode: `price`, `performance`, or `dlp_per_dollar` | | `--region TEXT` | any | Filter by region, e.g. `europe`, `Portugal`, `US` | | `--price FLOAT` | config | Max hourly price cap in USD | | `--dry-run` | off | Print matching offers without creating an instance | | `--keep-on-failure` | off | Do not destroy the instance if training fails | | `--no-gpu` | off | Disable GPU training on remote (use CPU instead) | | `--select-template` | off | Interactively choose a Vast.ai Docker template | | `--template HASH` | config | Use a specific template hash ID | | `--pipeline-config PATH` | none | JSON file that overrides `pipeline/defaults/vast.json` | **Examples:** ```bash # Cheapest available RTX 3090 in Europe, download data on remote python -m pipeline run configs/resnet18.json --region europe --download-data # Browse offers interactively, sort by price python -m pipeline run configs/resnet18.json --select-offer --sort price # Run all configs in a directory sequentially on one instance python -m pipeline run configs/phase2/ --region europe # See what offers would be selected without spending money python -m pipeline run configs/resnet18.json --dry-run --region europe # Keep the instance alive if something goes wrong (for debugging) python -m pipeline run configs/resnet18.json --keep-on-failure # Cap price at $0.12/h python -m pipeline run configs/resnet18.json --price 0.12 ``` ### `offers` — inspect available GPU offers ``` python -m pipeline offers [options] ``` | Flag | Default | Description | |------|---------|-------------| | `--sort` | config | Ranking mode: `price`, `performance`, or `dlp_per_dollar` | | `--region TEXT` | any | Region filter | | `--price FLOAT` | config | Max hourly price cap | | `--select-offer` | off | Interactive offer picker (prints the selected offer as JSON) | | `--list-regions` | off | Print a count of available offers per region and exit | | `--limit-output INT` | 10 | How many offers to print | | `--pipeline-config PATH` | none | Pipeline config override | **Examples:** ```bash # See the 20 best-value offers under $0.15/h in Europe python -m pipeline offers --region europe --price 0.15 --limit-output 20 # List which regions have matching GPUs python -m pipeline offers --list-regions # Interactive picker — useful before committing to a run python -m pipeline offers --select-offer --sort price ``` ### `up` — create an instance without training Spins up an instance and prints SSH connection details. Useful for manual experiments or debugging. ``` python -m pipeline up [options] ``` | Flag | Default | Description | |------|---------|-------------| | `--label TEXT` | auto | Optional label for the instance | | `--select-template` | off | Interactively choose a Vast.ai Docker template | | `--template HASH` | config | Use a specific template hash ID | | `--pipeline-config PATH` | none | Pipeline config override | ```bash python -m pipeline up python -m pipeline up --label my-debug-session ``` ### `status` — show instance details ``` python -m pipeline status [--pipeline-config PATH] ``` ### `down` — destroy an instance ``` python -m pipeline down [--pipeline-config PATH] ``` ## Pipeline config overrides Pass `--pipeline-config my_overrides.json` to override any field from `pipeline/defaults/vast.json`. Only the fields you specify are changed; the rest keep their defaults (deep-merged). Useful for switching GPU types or raising the price cap for a single run without editing defaults. **Example — allow RTX 4090, higher price cap:** ```json { "search": { "gpu_names": ["RTX 4090"], "max_dph_total": 0.45 } } ``` **Key fields in `pipeline/defaults/vast.json`:** | Section | Key | Default | Meaning | |---------|-----|---------|---------| | `search` | `gpu_names` | `["RTX 3090", "RTX 3090 Ti"]` | Accepted GPU models | | `search` | `max_dph_total` | `0.40` | Max price per hour | | `search` | `sort_mode` | `"dlp_per_dollar"` | Default ranking (`price`, `performance`, or `dlp_per_dollar`) | | `search` | `min_reliability` | `0.98` | Minimum host reliability score | | `instance` | `disk_gb` | `48` | Disk size provisioned on the instance | | `instance` | `image` | `"vastai/pytorch:latest"` | Docker image | | `remote` | `workspace_dir` | `"/workspace/DRL_PROJ"` | Remote working directory | | `remote` | `ssh_timeout_seconds` | `900` | How long to wait for SSH to become available | ## Full workflow example ```bash # 1. Check what's available and how much it costs python -m pipeline offers --region europe --list-regions python -m pipeline offers --region europe --sort price --limit-output 20 # 2. Run training (auto-selects best offer, downloads data if needed) python -m pipeline run configs/resnet18.json --region europe --download-data # 3. Results land in classifier/outputs/ automatically ```